On how to tell the search engine spiders and crawlers which directories and files to include, and which to avoid.
Search engines find your web pages and files by sending out robots (also called bots, spiders or crawlers) that follow the links found on your site, read the pages they find and store the content in the search engine databases.
Dan Crow of Google puts it this way: “Usually when the Googlebot finds a page, it reads all the links on that page and then fetches those pages and indexes them. This is the basic process by which Googlebot “crawls” the web.”
But you may have directories and files you would prefer the search engine robots not to index. You may, for instance, have different versions of the same text, and you would like to tell the search engines which is the authoritative one (see: How to avoid duplicate content in search engine promotion).
How do you stop the robots?
the robots.txt file
If you are serious about search engine optimization you should make use of the Robots Exclusion Standard adding a robots.txt file to the root of you domain.
By using the robots.txt file you can tell the search engines what directories and files they should spider and include in their search results, and what directories and files to avoid.
|