Robots.txt
Meta Robots Tags, About Robots.txt and Search Indexing Robots
Entry | Meaning |
---|---|
User-agent: * Disallow: | Because nothing is disallowed, everything is allowed for every robot. |
User-agent: mybot Disallow: / | mybot robot may not index anything, because the root path (/) is disallowed. |
User-agent: * Allow: / | For all user agents, allow. |
User-agent: BadBotAllow: /About/robot-policy.htmlDisallow: / |
The BadBot robot can see the robot policy document, but nothing else.All other user-agents are by default allowed to see everything.This only protects a site if "BadBot" follows the directives in robots.txt
|
User-agent: *Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /private |
In this example, all robots can visit the whole site, with the exception of the two directories mentioned and any path that starts with private at the host root directory, including items inprivatedir/mystuff and the fileprivateer.html
|
User-agent: BadBot Disallow: / User-agent: * Disallow: /*/private/* | The blank line indicates a new "record" - a new user agent command. All other robots can see everything except any subdirectory named "private" (using the wildcard character) |
User-agent: WeirdBotDisallow: /links/listing.htmlDisallow: /tmp/ Disallow: /private/ User-agent: * Allow: / Disallow: /temp* Alllow: *temperature* Disallow: /private/ | This keeps the WeirdBot from visiting the listing page in the links directory, the tmp directory and the private directory. Allother robots can see everything except the temp directories or files,but should crawl files and directories named "temperature", and shouldnot crawl private directories. Note that the robots will use thelongest matching string, so temps andtemporary will match the Disallow, whiletemperatures will match the Allow. |
Bad Examples - Common Wrong Entries | |
use one of the robots.txt checkers to see if your file is malformed
| |
User-agent: googlebot Disallow / | NO! This entry is missing the colon after the disallow. |
User-agent: sidewiner Disallow: /tmp/ | NO! Robots will ignore misspelled User Agent names (it should be "sidewinder"). Check your server logs for User Agent name and the listings of User Agent names. |
User-agent: MSNbot Disallow: /PRIVATE | WARNING! Many robots and webservers are case-sensitive. So this path will not match any root-level folders named private or Private. |
User-agent: * Disallow: /tmp/ User-agent: Weirdbot Disallow: /links/listing.html Disallow: /tmp/ | Robots generally read from top to bottom and stop when they reach something that applies to them. So Weirdbot would probably stop at the first record, *. Ifthere's a specific User Agent, robots don't check the * (all useragents) block, so any general directives should be repeated in thespecial blocks. |
No comments:
Post a Comment