robots.txt (old, probably outdated)

Warning: This page is pretty old and was imported from another website. Take any content here with a grain of salt.

I noticed there’s a bit of confusion on how to tweak a complex robots.txt file (aka longer than two lines :)). We have awesome documentation (of course :)), but let me pick out some of the parts that are commonly asked about:

  • Disallowing crawling doesn’t block indexing of the URLs. This is pretty widely known, but worth repeating.

  • More-specific user-agent sections replace less-specific ones. If you have a section with “user-agent: *” and one with “user-agent: googlebot”, then Googlebot will only follow the Googlebot-specific section.

  • More-specific directives trump less-specific ones. We look at the length of the “path-part”. For example, “allow: /javascript.js” will trump “disallow: /java”, but “allow: *.js” won’t.

  • The paths / URLs in the robots.txt file are case-sensitive.

For non-trivial files, tweaking can be a bit tricky. I strongly recommend using the robots.txt Tester in Search Console - it will pinpoint the line that blocks any specific URL, lets you test changes directly, and is the fastest way to let Google know of a changed robots.txt file on your site. Find out more about the tool at https://support.google.com/webmasters/answer/6062598 (archive.org)

Here’s the full documentation: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt (archive.org)

Original at https://plus.google.com/+JohnMueller/posts/iLaRPQundux (archive.org)

Warning: This page is pretty old and was imported from another website. Take any content here with a grain of salt.

Comments / questions

There's currently no commenting functionality here. If you'd like to comment, please use Twitter and @me there. Thanks!

Tweet about this - and/or - search for latest comments / top comments

Related pages