Posted by Dan Crow, Product ManagerThis is the third and last in my series of blog posts about the Robots Exclusion Protocol (REP). In the
first post, I introduced
robots.txt and the robots
META tags, giving an overview of when to use them. In the
second post, I shared some examples of what you can do with the REP. Today, I'll introduce two new features that we have recently added to the protocol.
As a product manager, I'm always talking to content providers to learn about your needs for REP. We are constantly looking for ways to improve the control you have over how your content is indexed. These new features will give you flexible and convenient ways to improve the detailed control you have with Google.
Tell us if a page is going to expireSometimes you know in advance that a page is going to expire in the future. Maybe you have a temporary page that will be removed at the end of the month. Perhaps some pages are available free for a week, but after that you put them into an archive that users pay to access. In these cases, you want the page to show in Google search results until it expires, then have it removed: you don't want users getting frustrated when they find a page in the results but can't access it on your site.
We have introduced a new
META tag that allows you to tell us when a page should be removed from the main Google web search results: the aptly named
unavailable_after tag. This one follows a similar syntax to other REP
META tags. For example, to specify that an HTML page should be removed from the search results after 3pm Eastern Standard Time on 25th August 2007, simply add the following tag to the first
section of the page:
<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Aug-2007 15:00:00 EST">The date and time is specified in the
RFC 850 format.
This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results. We currently only support
unavailable_after for Google web search results.
After the removal, the page stops showing in Google search results but it is not removed from our system. If you need a page to be excised from our systems completely, including any internal copies we might have, you should use the existing URL removal tool which you can read about on our
Webmaster Central blog.
Meta tags everywhereThe REP
META tags give you useful control over how each webpage on your site is indexed. But it only works for HTML pages. How can you control access to other types of documents, such as Adobe PDF files, video and audio files and other types? Well, now the same flexibility for specifying per-URL tags is available for all other files type.
We've extended our support for
META tags so they can now be associated with any file. Simply add any supported
META tag to a new
X-Robots-Tag directive in the
HTTP Header used to serve the file. Here are some illustrative examples:
- Don't display a cache link or snippet for this item in the Google search results:
X-Robots-Tag: noarchive, nosnippet
- Don't include this document in the Google search results:
X-Robots-Tag: noindex- Tell us that a document will be unavailable after 7th July 2007, 4:30pm GMT:
X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT You can combine multiple directives in the same document. For example:
- Do not show a cached link for this document, and remove it from the index after 23rd July 2007, 3pm PST:
X-Robots-Tag: noarchive
X-Robots-Tag: unavailable_after: 23 Jul 2007 15:00:00 PSTOur goal for these features is to provide more flexibility for indexing and inclusion in Google's search results. We hope you enjoy using them.