Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users
coming from search engines.
– from the Google webmaster guidelines 1
Block Google? Are you crazy?
It might seem a bit odd that the way to get your website to rank higher is to block Googlebot from some of your pages, but that is often the case.
Search result pages on your site are a clear example of this. Every time someone searches your website they are provided a search results page. If these pages are not blocked from Google then you may end up having them indexed by Google even though they are not really content pages. This can undermine your good standing in Google.
As an oversimplified example let’s say you have a website that only has five pages. You also have a search function that helps users find what they need. When your users
perform a search of your website, a search result page is created.
If these pages are not blocked from Google, then your five page website can become a thousand page website as far as Google is concerned.
This occurs because every search result page can be indexed by Google if it is not blocked. The end result is that instead of having a five page website that is
laser focused on your subject, Google now thinks you are a thousand page website that isn’t very focused on anything much at all.
This makes Google rank you lower as it no longer sees you as a valuable site, even though you very well may be.
Think about it this way…
It is much better to have five pages doing amazing in Google than it is to have ten thousand pages getting no traffic.
Every page of your site should be amazing. If it is not, do you really want Google to see it? Do you want your audience to see it?
Make every page count. If the page doesn’t cut it, then why is it there?
Who is the page useful for?
A search result page is a great page for that one person who did the search. It is not useful for the general public. Therefore search results should be blocked from Google.
A tool result is great for that one person who is using the tool. It is not useful for the general public. Therefore tool result pages should not be indexed.
Think about who the page is useful for, and if it is just one person, it is not a useful page to be indexed by Google and it will only dillute whatever stregth your domain has.
How to Determine which pages to block
Search result pages are not the only pages to block in many websites. Often a website can have pages that are auto generated or mostly auto generated which need to be blocked as well.
Poor rankings in Google often occur because websites are not providing unique content, so it makes good sense to block the pages of your websites that are not
unique or are mostly not unique.
Another specific example of content that should be blocked are pages created primarily from affiliate databases, or any information that is not coming from you, but
is being generated somehow for you from other sources.
Google sees no value in indexing or ranking pages which are not original, useful content. Users hate to go to pages that have the same exact information as other
webpages, so Google hates it too.
On this website, I have online tools to help webmasters, many of those tools provide a result page and I
have taken measures via my robots.txt file and through robots directives to ensure that those result pages do not end up in the Google index.
How to block pages via robots.txt
One of the Google webmaster guidelines tells us to “make use of the robots.txt file on your web server”, and here is one place we can do that. Blocking pages via this file can be done in a couple ways. If you are not familiar with robots.txt file, you may want to read the linked article above first. If you are familiar already with how a robots.txt file works you can…
- Blocking folders – The most simple way to block content is by folder. If you have content you wish to block, you can arrange it into one folder and block that
entire folder. Example: Disallow: /searchresults/
- Wildcards – If all of your search results or other content you wish blocked are generated dynamically, the results may have an indicator in the url such as a
question mark “?” in the url. If you entire website is static, but you provide search results or tool results dynamically, an easy way to block this content is to use
a wildcard. Example: Disallow: /*?*