Sunday, 16 October 2011

Robots.txt : The most powerful text file on web

"Robots.txt" : You must have heard the name of this text file, if you are a web developer or you have got a little knowledge of search engine optimization. And if you don't know about this, one of the most powerful text files, then read on...

Robots.txt is a regular text file, as its extension says, which directs the search engine robots and crawlers while crawling your web pages.
It has got a piece of code in a special format, which actually is a set of rules for the web crawlers.

The most basic robots.txt file has the following code within it...

User-agent: *
Allow: /

This file allows all the crawlers to crawl your web page.

There can be many other methods to declare a "robots.txt" file.
You can allows some bots and disallow others.

The Format of Robots.txt

The file consists of one or more records separated by one or
more blank lines.

The record starts with one or more User-agent lines, followed by one or more Disallow lines,
as detailed below. Unrecognized headers are ignored.


The value of this field is the name of the robot the record is describing access policy for.
If more than one User-agent field is present the record describes an identical access policy for more than one robot. At least one field needs to be present per record.

If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.


The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved.
For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.

Any empty value, indicates that all URLs can be retrieved.

So if you want your webpages to be crawled and indexed in the search engines in a way you like, then go and generate a robots.txt file for youe website.


Post a Comment


Popular Posts

Gadget Statistics Copyright (c) Gizmo Corporation . All rights are reserved by Piyush Arora and "Gadgets Statistics"