Excluding Your Web Site From Search Engines
People often write about how to increase your search engine ranking by using noindex and nofollow meta tags to avoid indexing of links & pages that lead to duplicate content, and robots.txt files to invite the search engines in, but what if you don’t want your site indexed at all? This is useful for private organizations and anyone who does not want their site indexed for any reason.
Noindex is fairly self explanatory. Simply put, a noindex tag tells the search engines not to index this page.
Nofollow is a similar tag. It tells the search engines not to follow any of the links on the page.
That’s just the beginning…
By adding this code to <
head> section of all of your pages:
meta name='robots' content='noindex,nofollow' />
you can tell the search engines not to do either of the above.
You can also create a plain text file called robots.txt containing this code:
Upload it to the root directory of your site. This will tell all “user agents” (a.k.a. search engines) to disallow indexing of all pages in the root directory as well as any directory within it.
So, what happens if you found this article too late and your site has already been indexed. How can you get it de-listed?
There are two answers…
The first is to upload the robots.txt file as described above. The next time your site is crawled, it will be removed from the index.
If you want your site removed from Google, and you can’t wait until the next time it’s crawled, sumbit a Google Urgent Removal Request. Google says that this process will take 3 to 5 business days.
Is there more to the story? Of course there is!
You can also use this tag in the <
head> section of your pages:
meta name="robots" content="noodp,noydir,noarchive" />
noodp – Prevents all search engines from using the DMOZ description for your site in the search results.
noydir – Prevents Yahoo! from using the Yahoo! directory description for your site in the search results.
noarchive – Prevents sites like archive.org and Google from putting copies of your pages into their archive or cache.
What about WordPress sites? You may have already noticed that you can click Settings > Privacy and choose the “I would like to block search engines, but allow normal visitors” option. This will add noindex and nofollow tags to all of your site’s pages EXCEPT the Login and Forgot Password pages. A nice way to add these tags to those pages is to use the Robots Meta plugin by Joost de Valk. This plugin actually does more than just that. Read the instructions for a better understanding of the plugin’s features.
So, now your site is not being indexed, but people can still link to it from other sites or manually type in the URL. What can you do about that?
WordPress users have the option of marking their pages and posts Private, which means that only those subscribers whose Role allows them to see those pages and posts can access them. There’s also the User Access Manager plugin that allows you to assign which subscribers can read selected pages and/or posts. You can even assign people who are not subscribed and/or logged in by IP address or IP range.
Static sites can be password protected by putting a .htaccess file in the folder. If you don’t know how to create a .htaccess file, most commercial web hosts will do this for you via your Plesk or Control Panel interface.
Who would have thought that there’s as much involved in keeping people out of your site as there is in getting them to find it and come in?
This article copyright © John Nasta 2009 – All Rights Reserved
You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.