The robots.txt file has been around since 1994! Its purpose is to tell search engines such as Google what website areas can be crawled by web robots (also known as Web Wanderers, Crawlers, or Spiders) to harvest information (or index) to store for searching.

More information about the robots.txt file can be seen on the robotstxt.org website.

In a nutshell, pages that you do not wish to make public are listed in this file. There are many reasons not to include specific files in indexing. These might consist of Admin pages / Web hosting control panels/shopping basket pages, or time-sensitive information.

More recently, we had to disable robots from clicking the 'add to cart' button on a website (www.rtecshop.com) because products are added to a database each time this happens; we noticed that the database was filling very quickly when robots visited the website.

However, individuals warn system administrators that the robots.txt files can give attackers valuable information on potential targets by giving them clues about directories their owners are trying to protect. We found this article on 'The Register' that is worth a read...