Files
i2p.i2p/installer/resources/eepsite/docroot/robots.txt
zzz 92b9d0a996 First cut at migrating to Jetty 6 and prep for using an external
Jetty 6 package.

- Add several jars from the Jetty 6 distribution
- Update jetty.xml
- Add context XML files
- Update WorkingDir to migrate the content XML files
- Update RouterConsoleRunner and LocaleWebAppHandler
- Remove all old Jetty 5.1.15 local mods;
  this will break Seedless using a custom Server() constructor
- Update I2PRequestLog to be a mod of NCSARequestLog from 6.1.26
- Put I2PRequestLog in its own jar
- Copy MultiPartRequest and other required classes from Jetty 5.1.15
  and add it to susimail, as the replacement MultiPartFilter in
  Jetty 6 is difficult to migrate to, and does not support content-type
- Update i2psnark for Jetty 6
- Disable i2psnark RunStandalone, unused and instantiated Jetty 5
- Fix up all webapp build.xml to reference new jars

Not yet working: Plugin/webapp run detection and stopping, eepsite CGI
Not well tested: Plugins, classpaths, webapps
2011-12-23 00:56:48 +00:00

78 lines
2.7 KiB
Plaintext

#
# robots.txt for your eepsite
#
# You can use this file to control how web crawling robots (for example search
# engine robots like the Googlebot) index your site. Only well-behaving robots
# will abide the rules you set in this file, thankfully those are a majority.
# Robots that do not abide the robots.txt rules will be able to index anything
# they find. So be aware that this file does not allow you to actually lock
# anyone out. It is voluntary.
# Keep this file in the root of your site, ie: myeepsite.i2p/robots.txt
# Remove the # in front of lines you want to use (uncomment). By default robots
# are allowed to index your whole site.
##### Syntax:
# User-agent: Botname
# Disallow: /directory/to/disallow/
# You can use a * in the User-agent field to select all robots:
# User-agent: *
# You can not use * as a wildcard in the Disallow string.
# To allow indexing your whole site you leave the Disallow field empty:
# Disallow:
##### Examples:
# At the time of writing there are only two active search engines in the
# I2P network: http://eepsites.i2p and http://yacysearch.i2p
# Because eepsites.i2p does abide robots.txt but not the User-agent string, the
# Yacybot is used in these examples.
# To control the eepsites.i2p robot you can use the HTML <meta> tag instead.
# Example: <META name="ROBOTS" content="NOINDEX, NOFOLLOW">
# If the robot sees above line it will neither index that url not will it
# follow links on it to further pages.
# Options for the content attribut are: INDEX or NOINDEX and FOLLOW or NOFOLLOW.
# You can also use <meta name="robots" content="noarchive"> to disable caching.
# To allow Yacy to access anything but disallow all other robots:
# User-agent: yacybot
# Disallow:
# User-agent: *
# Disallow: /
# To disallow Yacy from accessing anything:
# User-agent: yacybot
# Disallow: /
# To disallow Yacy from accessing the /stuff/ directory, eg me.i2p/stuff/ :
# User-agent: yacybot
# Disallow: /stuff/
# If Google was crawling I2P and you would not want them to index your site
# User-agent: Googlebot
# Disallow: /
# To disallow any well-behaving robots from accessing your /secret/ and
# /private/ directories:
# Keep in mind that this is NOT blocking anyone else. Use proper authentication
# if you want your private and secret things to stay private and secret. Also
# everyone can read the robots.txt file and see what things you want to hide.
# User-agent: *
# Disallow: /secret/
# Disallow: /private/
# Disallow robots to index a specific file:
# User-agent: *
# Disallow: /not/thisfile.html
# Allow everyone to access everything. This rule is active by default.
# Comment it with # at the start of the line to disable it.
User-agent: *
Disallow: