DigitalFlare
Call: 0843 289 5840

The robots.txt file has been around since 1994! Its purpose is to tell search engines such as Google what areas of a website can be crawled by web robots (also known as Web Wanderers, Crawlers, or Spiders) to harvest information (or index) to store for searching.

More information about the robots.txt file can be seen on the robotstxt.org website.

In a nutshell, pages that you do not wish to make public are listed in this file. There are many reasons not to include certain files in indexing these might include: Admin pages / Web hosting control panels / shopping basket pages or time-sensitive information.

More recently we had to disable robots from clicking the 'add to cart' button on a website (www.rtecshop.com) because products are added to a database each time this happens and we noticed that the database was being filled very quickly when robots visited the website.

However, individuals are now warning system administrators that the robots.txt files can give attackers valuable information on potential targets by giving them clues about directories their owners are trying to protect. We found this article on 'The Register' that is worth a read...