Search the site:
(New search engine!!)

robots.txt


If you don't want all the pages on your site to be indexed by search engines you need a "robots.txt" file.

The easiest way to create the file is to use Notepad (on Windows) or a similar word processor on Mac, the important thing is that the word processor doesn't add any strange visible or invisible control characters to your text. It's VITAL that you store the robots.txt file in the root directory of your site!

The format is:

<field>":" <value>

The first part contains the names of the search engine spiders that you want to target, like this:

User-agent: googlebot
User-agent: anotherbot

If you want to target all search engines you just use a wild card, like this:

User-agent: *

The next part of the content is the list of pages or folders that you want to "hide". A page in the root folder is listed like this:

Disallow: mypage.html

A directory is specified like this:

Disallow: /pages/demos/

You are allowed to use comments but they should be on separate lines and not appended to any directives. All comments start with a "#", like this:

# This is a comment!
Disallow: /pages/demos/

Here is an example of what a robots.txt file may look like when we put all the parts together:

User-agent: *
Disallow: /pages/demos/coolstuff.html
Disallow: /pages/proofs/

The example above will keep all search engines from indexing the page /pages/demos/coolstuff.html and the entire /pages/proofs/ directory.

To be on the safe side please use capitalization and spaces like I did in the examples above.

99% of the time you want to target all search engines, but if you need to target just one, or a few, you need to find out what the name or names of those search engine spiders are. As you saw in the examples above the Google spider isn't called "Google" it's called "googlebot". If you don't know the name of the spider(s) you might find some useful information in your site log, or you can try this list. You might also want to visit robotstxt.org which is a great resource!