What is 'robots.txt' file?
robots.txt file is used to stop search engine bots to crawl specific pages that are mentioned in it.
Why the need for robots.txt?
Sometimes a need arise to stop search engine bots to visit some page so that it doesn't get appear in the list of search result.
Example of such page can be a secret login page from where only admin can login his/her site. To keep it a secret he/she must stop search engine bots to crawl that secret URL. And for this, the URL of that page should be placed in robots.txt file.
Where to place robots.txt file?
robots.txt file is placed inside the root of your web server. The search engine bots look for it just inside the root of the web hosting server without going deeper in your web address.
Example: www.domain.com/robots.txt
How to create robots.txt?
robots.txt file can be created with any text editor that is capable of generating plain text document. It doesn't matter until you don't mess up the name of the file. Also the name is case-sensitive. It should be robots.txt and nothing else.
What kind of code goes placed inside it?
Here are few sample code that could be placed inside the robots.txt file. You can customize it according to your condition.
Case I - Blocking just one URL
User-agent: Googlebot Disallow: /adminLogin.htm
User-agent depicits search engine bots. And in disallow, we pass the url of the page on which we want to block bots(in this case google bots). This script is used to block the google bots from crawling adminLogin.htm.
Case II - Blocking all URLs
User-agent: * Disallow: /
This will block all bots from crawling all the pages. So no page can be crawled by any bot.
Case III - Allowing everything
User-agent: mozilla/5 Disallow:
A blank value of disallow tell the bot to crawl all the pages. Here, mozilla bot is going to crawl all the page of the website.
Case IV - Blocking everything except for one
User-agent: NetAnts Disallow: /~somefoler/
The value of disallow beginning with ~ sign tells the NetAnts crawler not to crawl any page except for 'somefolder'. So NetAnts bot is going to crawl all the pages that are present the folder 'somefolder'.
Case V - Defining more than one rule
User-agent: * Disallow: /adminLogin.htm/ Disallow:/secret.php/
You can define any number of rules according to your needs, disallow any number of URLs and can define different rules for different bots. Just make sure that you define it in different lines.
Note: There is no guarantee that bots will be blocked after defining a robots.txt file. But still it is considered as a useful tool.