Make sure that your web server correctly supports the If-Modified-Since HTTP header.
– from the Google webmaster guidelines 1
What is if modified since?
- The if modified since header is a HTTP header that is sent to a server as a conditional request.
- If the content has not changed the server responds by providing only the headers with a 304 status code.
- If the content has changed the server responds to the request with a 200 status code and the entire requested document / resource.
Tip: To see if your site is using if modifed since, use the Google guideline tool.
Googlebot and if modified
The If-Modified-Since HTTP header essentially tells Googlebot one of two things about a webpage…
- This webpage has not changed, no need to download again.
- This webpage has changed so download again because there is new information.
One way to describe If Modified Since is to think of that little flag on your mailbox outside your house. If you put
the flag up, then the postman (or woman) knows you have mail in there and will come and get it. If the flag on your mailbox is
down then the postperson knows they don’t have to pick up any mail.
If Googlebot was the mailperson, and your webpage was the mailbox, Googlebot looks to see if that “flag” is up or
down before it accesses your page.
Why is that important?
Since Google spiders billions of pages, there is no real need to use their resources or yours to look at a webpage that
has not changed.
For those people who have very large websites the crawling process of search engine spiders can consume lots
of bandwidth and result in extra cost.
Let’s say you have a website about pets that has pages about dogs, cats, and turtles.
You have just updated the turtle page with new photos.
Every once in a while Googlebot (the search engine crawler of Google) will visit your pages. It will check each page in your
website and if it none of the pages have changed, it will not load any of those pages.
But in our case, the turtle page has changed (and has a “flag” up”). So when Googlebot comes it will see the “flag” for the
turtle page and it will access the turtle page.
It knows that the turtle page has been updated because of the If Modified Since header code. Since none of the other web
pages have been updated, there is no reason to get a new copy of them since it already has a current copy in the index.
304 status code
The “flag” we have been speaking of is actually the HTTP status code of the requested document.
When Googlebot first visits your page it will see a 200 status code which means the content loaded fine. Googlebot will take a note of when it accessed your page and then the next time it comes to the same page something new happens.
If the content has not changed since Googlebot last visited, it will receive a 304 status code and not download the body of the document again.
If it gets a 200 status code instead of a 304, Googlebot will receive the entire body of the request (the updated page/resource).
Status codes are listed in your log files, and you will typically see then in your statistics report.
If a search engine crawler sees a web page status code of 304 it knows that web page has not been updated and does not
need to be accessed again.
How to find out if your web server supports the If-Modified-Since header?
- Use the Google guidelines tool.
- Check your logs for 304 status codes
If you have access to your statistics and/ or log files you can just look for the status code of 304. If you see 304 as a status code in any of your logs then your web server supports the if modified since header.
- Using a web server that supports the If Modified Since header is recommended, and will result in less bandwidth being
used by search engine crawlers.