Robots Exclusion Protocol: now with even more flexibility

04:40 Three new features in Live Search Images » Live Search's WebLog

We know you love searching for people with our image search engine — we’ve worked hard to make it easy and fun. Now we’ve made it even better. Today we shipped three new features that help you find faces, portraits and black and white images. Try these...(read more)

03:05 Now there's Google Finance for Canada-update » Official Google Blog

Posted by Dion Loy, Google Finance engineer

Oops. We hit the button too soon. Watch for news about Google Finance in Canada next Tuesday.

00:38 电子杂志：有戏？没戏？ » It Talks-魏武挥的blog

打keso说电子杂志没戏后，网上就传来了很多持同样观点的声音。也一度听闻一些搞这个玩意儿的网站不行了的传说。但左侧这个图片，却能说明另外一个世界。这个图片其实是一封电子邮件的...

00:25 Robots Exclusion Protocol: now with even more flexibility » Official Google Blog

Posted by Dan Crow, Product Manager

This is the third and last in my series of blog posts about the Robots Exclusion Protocol (REP). In the first post, I introduced robots.txt and the robots META tags, giving an overview of when to use them. In the second post, I shared some examples of what you can do with the REP. Today, I'll introduce two new features that we have recently added to the protocol.

As a product manager, I'm always talking to content providers to learn about your needs for REP. We are constantly looking for ways to improve the control you have over how your content is indexed. These new features will give you flexible and convenient ways to improve the detailed control you have with Google.

Tell us if a page is going to expire
Sometimes you know in advance that a page is going to expire in the future. Maybe you have a temporary page that will be removed at the end of the month. Perhaps some pages are available free for a week, but after that you put them into an archive that users pay to access. In these cases, you want the page to show in Google search results until it expires, then have it removed: you don't want users getting frustrated when they find a page in the results but can't access it on your site.

We have introduced a new META tag that allows you to tell us when a page should be removed from the main Google web search results: the aptly named unavailable_after tag. This one follows a similar syntax to other REP META tags. For example, to specify that an HTML page should be removed from the search results after 3pm Eastern Standard Time on 25th August 2007, simply add the following tag to the first section of the page:

<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Aug-2007 15:00:00 EST">

The date and time is specified in the RFC 850 format.

This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results. We currently only support unavailable_after for Google web search results.

After the removal, the page stops showing in Google search results but it is not removed from our system. If you need a page to be excised from our systems completely, including any internal copies we might have, you should use the existing URL removal tool which you can read about on our Webmaster Central blog.

Meta tags everywhere
The REP META tags give you useful control over how each webpage on your site is indexed. But it only works for HTML pages. How can you control access to other types of documents, such as Adobe PDF files, video and audio files and other types? Well, now the same flexibility for specifying per-URL tags is available for all other files type.

We've extended our support for META tags so they can now be associated with any file. Simply add any supported META tag to a new X-Robots-Tag directive in the HTTP Header used to serve the file. Here are some illustrative examples:

Don't display a cache link or snippet for this item in the Google search results:

X-Robots-Tag: noarchive, nosnippet

Don't include this document in the Google search results:

X-Robots-Tag: noindex

Tell us that a document will be unavailable after 7th July 2007, 4:30pm GMT:

X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT

You can combine multiple directives in the same document. For example:

Do not show a cached link for this document, and remove it from the index after 23rd July 2007, 3pm PST:

X-Robots-Tag: noarchive
X-Robots-Tag: unavailable_after: 23 Jul 2007 15:00:00 PST

Our goal for these features is to provide more flexibility for indexing and inclusion in Google's search results. We hope you enjoy using them.

	七月 2007
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31