首页的字体改大了 | 13 Dec 2006

23:58 Open Clip Art Library :: openclipart.org :: Drawing Together. » del.icio.us/chedong

配合开源的矢量绘图工具： Inkscape 就是很好的替代VISIO的工具

22:10 Enterprise search made easy and free from IBM and Yahoo! » Yahoo! Search blog

Earlier this year Yahoo! and IBM got together to ask a simple question --- how can we make enterprise search easier to install, use, and maintain. The result of that question is IBM OmniFind Yahoo! Edition which we believe will...

18:45 大国崛起 » DBA notes

《大国崛起》这部记录片自从播出后引起了很多讨论。从网上下载后每天看一点，积少成多，现在也快看完了。

说是"大国崛起"，其实很多国家不过是从小变大，然后变强的过程。比如，西班牙、英国这样的国家。我在看的过程中，发现几个比较有意思的问题：那些曾经的大国在崛起的过程中不可避免的会出现一个或几个决定历史命运的英雄人物。比如法国路易十四，德意志的俾斯麦、俄国的彼得大帝。在我们受到的传统教育中，我们总被教导说"人民群众创造历史"，有的时候，在历史的十字路口，英雄人物的作用还是不可抹杀啊。另外一个有趣的是，崛起的大国几乎都是经济在背后驱动的，整个记录片如果说成是一部经济发展史也不为过。为经济作保驾护航的，是一个合理的制度，合理的制度! 这可能也是该片创作班子企图表达的主题吧。

看过《大国崛起》后，发现又了解了不少历史知识。在家里就有一套《世界近代史》，是以前的所谓"内部参考"的资料。历史，真是一个复杂的东西，从不同的角度解读，从不同的切入点着手，得到的信息都是不一样的。

"大国崛起"这个话题不可避免的要和意识形态搅合在一起。我觉得《大国崛起》是一个非常好的记录片，面向亿万的观众，要想拍出一部皆大欢喜的东西是不可能的，也不会没有任何缺陷。这不过是一些历史学家的新史观的体现，完全没有必要拔的那么高或是用很高的要求来衡量它。

观看，并引起思考，这就足够了。

--EOF--

17:25 传送带上的飞机 » 桑林志

17:04 "Panama" platform available to new advertisers in the U.S. » Yahoo! Search blog

Ahoy, Yahoo! Search Blog readers! An update from the search marketing side of the house. Today, the newYahoo! Sponsored Search advertising platform, which has carried the code name "Panama," is available to all U.S. marketers who open new accounts with...

12:34 Mozilla Thunderbird 2.0 b1, Google Toolbar for Firefox 3 Beta » Blog on 27th Floor

Mozilla Thunderbird 2.0 beta 1昨天发布，没网页说明，下载只能从Mozilla的FTP上找，工作正常。直接装到了beta 1目录，工作没有问题，原有1.5的信都没问题。开发记录上有新特性说明，初步感觉如下：

>>好像是增加了Tag图标，还增加了自定Tag的功能，可以像Gmail一样随便Tag/label了
>>还有个Folder的功能增强了，不太明白
>>Filter里面对不编码的中文标题的Crash已经改好了，可以方便地把各类信件自动转发到Gmail了
>>对付垃圾、欺骗、病毒邮件有增强
>>界面更漂亮了，选项菜单样式也改成Firefox 2那种

Google也推出了Toolbar for firefox 3 beta，有了书签功能，可以直接使用Google bookmark，有发送到功能，可以直接把网页包括其内容发送给其他人，通过Gmail进行。按钮更多了。如果点按钮没反应也没有警告，直接用xpi地址安装也可以：

http://dl.google.com/firefox/google-toolbar-ft3.xpi

12:29 Nifty Toolbar upgrades for Firefox » Official Google Blog

Posted by Annie Sullivan, Software Engineer

Ever since the latest version of the Google Toolbar for IE came out, Firefox users have been asking when they'll see the same new features in their favorite browser. Well, we've been hard at work on a new version of the Toolbar for Firefox -- Google Toolbar 3 Beta -- that lets you access your bookmarks from any computer, add custom buttons to your Toolbar, and share web pages via Blogger, Gmail, and SMS.

In addition to adding all the features from the Google Toolbar for IE, there's another one just for Firefox users. When I surf the web, I want to be able to look at all the files I come across right in my browser. But a lot of times, I have to download files and view them with a separate application instead. Since Google Docs & Spreadsheets launched, I've been able to look at those files right in my browser. Except the process is kind of clunky right now: I have to right-click on a link, download the file to my machine, and then upload it to Google Docs & Spreadsheets. I want to be able to just click on a link to a document or spreadsheet and have it show up in my browser. So I added a feature to the Google Toolbar for Firefox to do that.

Now you can surf the web a little bit faster. I hope you enjoy this feature as much as I do!

10:16 Weather Report: Yahoo! Search Index Update » Yahoo! Search blog

We are in the process of rolling out some changes to our search results. As usual, you may be seeing some changes in ranking as well as some shuffling of the pages that are included in the index throughout this...

09:30 如何测试搜索引擎的索引量大小（前篇） » 雅虎搜索日志

作者：陈朝晖雅虎美国工程师

背景知识：搜索引擎的质量指标一般包括相关性（Relevance）、时效性（Freshness）、全面性（Comprehensiveness）和可用性（Usability）等四个方面，今天我们要谈的索引量就属于完整性指标的范畴。

首先需要注意的是，对于搜索引擎，网页的索引量和抓取量是不同的概念。搜索引擎的网页抓取数量一般都要远大于索引量，因为抓取的网页中包括很多内容重复或者作弊等质量不高的网页。搜索引擎需要根据算法从抓取的网页当中取其精华，去其糟粕，挑选出有价值的网页进行索引。因此，对用户而言，搜索引擎的索引量大小才更有意义。

其次，无限制增大索引量并不一定能保证搜索质量的提升。一方面，在全面性指标中，除索引量外，还需要考虑到收录网页的质量和不同类型网页的分布。另一方面，搜索引擎的质量指标体系要保证四方面的均衡发展，不是依靠单个指标的突破就可以改善的。目前包括雅虎中国在内的主流中文搜索引擎的网页索引量都在20亿量级，基本上可以满足用户的日常查询需求。

然而，由于从外部无法直接测算出搜索引擎网页索引量的绝对值大小，很多搜索引擎服务商喜欢对外夸大自己的收录网页数，作为市场噱头。从1998年开始，Krishna Bharat和Andrei Broder就开始研究，如何通过第三方来客观比较不同搜索引擎索引量的大小。8年后，在今年5月份的WWW2006大会上，来自以色列的Ziv Bar-Yossef和Maxim Gurevich由于这方面的出色研究成果夺得了大会唯一的最佳论文奖。他们的研究算出了主流英文搜索引擎的索引量相对大小：雅虎是Google的1.28倍，Google是MSN的1.36倍。他们是如何算出这些数字的呢？下面我们将为搜索引擎爱好者介绍这个算法，以及探讨在中文搜索引擎上是如何应用的。

概述
搜索引擎的索引量或称覆盖率对搜索结果的相关性、时效性和找到率都具有深远的影响。出于市场运作的考虑，各大互联网搜索引擎不时对外公布自己索引的文档数量，然而这些数据往往不同程度地被加入了一些水份，可信度上有一个问号。因此，如何通过搜索引擎的公共接口，也就是通常所说的搜索框，比较客观、准确地测试它的索引量就成为了一个令人关注的问题。

图1，对搜索引擎的索引采样

每一个搜索引擎的索引都覆盖了互联网上全部文档的一个子集。如果我们把测试作为对这个集合的采样，那么问题的关键就在于如何实现一个近似的等概率随机采样（uniform search engine url sampler），参见图1。具体地说，假定一个搜索引擎S总共索引了|D|个文档，那么我们希望采样得到某一个具体文档的概率是1/|D|。

一旦实现了通过搜索框对索引的等概率随机采样，我们就可以在统计意义上比较有把握地估计搜索引擎索引量的相对大小。如下图所示：

图2，比较搜索引擎索引的相对大小

我们先对引擎S1随机采样N1个url。然后，通过url查询获知引擎S2索引了其中的N12个url，而没有索引另外N10个。换句话说，N1 = N10+N12 。同样地，如果我们对引擎S2随机采样N2个url，发现其中N21被S1收录而N20没有收录，N2=N20+N21。那么我们可以估计S1与S2的相对大小为：

|D1|/|D2|
≌(N12+N10) / (N12+N12N20/N21)
＝(N1N21)/(N2N12)
＝N21/N12 (如果N1══N2)

待续...

09:25 About Transferable Stock Options » Official Google Blog

Posted by Allan Brown, Director, Recognition & HR Systems

We work hard to attract and retain the world's best talent in a number of ways, and a part of that is offering competitive compensation packages. We offer standard things such as competitive salary, cash incentives, restricted stock units and stock options. But we also aim to be innovative. So today we're announcing a new compensation program called Transferable Stock Options (TSOs).

As with most employee stock option programs, Google's program to date has allowed employees to do two things with their options. Upon vesting they can (1) hold them or (2) exercise them and then hold or sell the stock. With the new TSO program, employees will have an additional alternative: they can transfer (sell) their options to a financial institution through a competitive bidding process. The ability to sell options is not a novel concept -- today people can buy and sell options to purchase GOOG stock and the stock of many other companies on the public markets. What is novel is that we are extending this ability to trade options to employee stock options.

Typically, employees get value from stock options by exercising them after vesting, and then selling the stock they get from the exercise at a higher price, provided the company's stock price has appreciated since the time of grant. With the TSO program, employees will also be able to sell vested options to the highest-bidding financial institution, which may be willing to pay a premium above the difference between the exercise price and the market price for Google stock (even when the exercise price is higher than the market price). The premium paid is for the time value of the options. More on that and how institutions would do this, and why, is here.

Employees will still have the choice of simply exercising and then holding or selling the stock too. But if they choose to sell the options, they can use a simple online tool that will show them the best price a participating financial institution is willing to pay for their vested options in real time. With that tool, they'll be able to sell their vested options to the highest bidder.

In addition to increasing the value of every option employees receive, the TSO program makes the value of their options much more tangible. In the past, employees typically valued Google stock options based simply on the difference between their option exercise price and the current market stock price (called the intrinsic value). Since Google grants options with exercise prices that are at, or above, the market price of Google stock, many employees do not value options on the day they are granted. By showing employees what financial institutions are willing to pay for their options, it is made clear that the value of their options is greater than just the intrinsic value.

We aren't offering this program for everyone or for all stock options. Google Executive Management Group (EMG) may not participate, and only employee stock options granted after our IPO are eligible. We should also note that we've discussed this program with the SEC and we'll ensure it complies with applicable securities laws.

We've chosen Morgan Stanley to manage the auction of these TSOs between our employees and the multiple bidders, and we are working with multiple financial institutions to participate as bidders in the auction. We expect to have this program up and running in the second quarter of 2007.

If you're wondering how this would work for employees, here is an example scenario. There's more about the related accounting here. And for answers to other questions, we've put together an extensive Q and A.

(You'll notice some legal language below, and at the bottom of all the related information we link to. We're including that because we will file a registration statement with the SEC as a requirement of offering this program, and we want to help you find all of the information related to this registration statement.)

Google may file a registration statement (including a prospectus) with the SEC for the offering to which this communication relates. Before you invest, you should read the prospectus in that registration statement and other documents Google has filed with the SEC for more complete information about Google and this offering. You may get these documents for free by visiting EDGAR on the SEC Web site at www.sec.gov. Alternatively, Google will arrange to send you the prospectus after filing if you request it by calling toll-free 1-866-468-4664 or sending an e-mail to investors@google.com.

08:39 ie7来了？ » chedong's Photos

chedong posted a photo:

从我的网站本月的统计来看：IE7的用户已经占6%了，geeker们升级很快啊……
Msie 7.0 6.6 %
Msie 6.0 59.7 %
Msie 5.5 0.6 %
Msie 5.01 0.2 %
Msie 5.0 0.2 %

另外一个数字是FireFox用户：
Firefox 2.0 12.60%
Firefox 1.5.0.8 3.10%
Firefox 1.5.0.7 0.30%
Firefox 1.5.0.6 0.30%
Firefox 1.5.0.5 0.10%
Firefox 1.5.0.3 0.60%
看来大部分都升级到2了。

07:36 new nabaztag tag » information aesthetics

nabaztag just came out with a new rabbit version that has a 'belly button'. next to its normal features (e.g. ear position communication is quite impressive), the wireless bunny now is able to 'listen', allowing voice messages to be easily sent, podcasts & web radio to be played etc. it now can also RFID-wise 'sniff' physical objects & react to them (for instance, holding door keys in front of it will urge it to send "I'm home" messages to your friends).

as an 'old generation' nabaztag owner, I can only welcome its increased intelligence & the extended sound features. actually, my ambient display rabbit is quite funny & useful, if it only could withhold itself of loudly announcing the arrival of spam email messages during some meetings...

[link: nabaztag.com & nabaztag.com|via engadget.com]

07:16 slopestyle image jacket » information aesthetics

a snowboard jacket that allows people to transmit images to an embedded display. wearers can also receive location-specific information (e.g. ideal slopes, location of friends, weather information) & biometric body data (e.g. dehydration, altitude sickness) can be captured & monitored. currently, the prototype is limited to a WiFi PDA that displays new images as they arrive from camera phone emails.

[link: moondial.com]

06:34 用「Keepass」来管理你的各种暗码 » Ikias.com

很多人都喜欢将自己的各种邮箱暗码，各种登录暗码等都设定为同一个暗码，以方便自己的记忆。这样做是很不安全的。如同我们拥有很多钥匙一样，具备各种各样不同的暗码是网络生活的一个基本的自我保护。这个管理各种暗码的免费软件「Keepass」v1.06不但可以帮助你去记忆各种暗码，而且它还会自动的去生成各种难以猜测到的暗码，使你的个人的秘密变得更加的安全。更好的是，你可以将「Keepass」直接解冻到你的U盘上，以方便你的携带。并且无论你使用任何的电脑打开它都不会留下痕迹。启动「Keepass」时候需要输入你设定的master key 暗码，所以即使你的U盘丢失了也不用担心密码泄漏。「Keepass」还有一体化的数据库管理以及方便快速的搜索功能等许多优点，这里我就不用多说了，你自己下载安装使用一下就会知道的。下载「Keepass」v1.06点这里KeePass-1.06-Setup.exe。安装中文化插件点这里SimplifiedChinese.lng。将下载展开后的「SimplifiedChinese.lng」文件直接放到和KeePass.exe同一个文件夹里，然后启动软件、通过「语言切换」这个菜单就可以切换为汉语画面了。...

04:47 Mystic Statistics Heuristics » Burning Questions - The FeedBurner Weblog

Just in time for the holidays, here's another post about our statistics, and this time we'll describe how we deal with metrics issues, how we think we can improve the kinds of statistics we provide, and admit that despite all this number crunching, we still don't know how many dribs are in a drab (but we know that the answer involves Planck's constant).

With over 500,000 feeds now managed, we deal with statistics anomalies like spiked/tanked subscriber counts, podcast counts, and click counts on a weekly, if not daily basis. Some of these are larger issues than others, obviously. We're sure that the good people at ComScore, HitWise, and other CamelCase-named statistics companies would agree that there are always issues and anomalies popping up that have to be beaten back with gusto like so many zombies in Dawn (or Shawn) of the Dead.

The goal we always set for ourselves is to try to maintain apples-to-apples comparisons across all types of counting and aggregator/client treatment. In other words, we try to say that regardless of what bucket some metric goes in, it should always result in the ability to look at a couple different pieces of the data (feeds, aggregators, podcatchers, etc.) and say "these make sense relative to one another." You set up some heuristics and algorithms that you then try to apply those as universally as possible and take your lumps. It's like the never-ending "uniques" debate that the web stats community has — you try to plant some stakes in the ground that get you to reasonable conclusions when you consider all the data, and then jump off the next bridge when you come to it.

Some of the metrics issues that we are continually addressing include:

Automated aggregator clicks: There are some niche aggregators and feed-reading clients that will occasionally auto-click every link in a feed, presumably for offline caching or in order to perform some contextual analysis. So you have to come up with mechanisms to discount those clicks in publisher dashboards as not counting toward subscriber click totals.
Bots as aggregators: Sometimes obvious attempts to cloak some bot, and sometimes just a hard-to-categorize service will emerge that polls a feed from loads of desktops. Now that there are many thousands of feed clients, we sometimes don't see these bots or stats until they appear on a threshold report we create internally. Publishers can see a set of "subscribers" from something that end up falling off the end of the world a month later. What happened? Sometimes the bot just goes away, sometimes it's combination of bot behavior within an otherwise valid client, and then other times after we've learned more about the bot or its behavior, we've concluded that it's not really polling a feed for the purpose of notifying a subscriber or delivering content to a subscriber.
Default feeds: Some aggregators default subscribe users to a feed in some cases. For example, perhaps you create a new account with an aggregator and announce "I'm interested in technology feeds", and you're auto-subscribed to a list of feeds. What's the right thing to do here when those numbers are reported as subscribers? We've decided to count those as subscribers...the content is being updated for that subscriber and as long as the subscriber doesn't remove it, we generally say that it's not our place to say those aren't "subscribers". Now, we also provide Total Stats publishers with a metric called "Reach," and reach does a good job of helping pro stats customers understand "how much of my total subscriber base is actually opening my feed and looking at it on a day-to-day basis". This helps publishers with large aggregator counts to understand how many of those aggregator subscribers are "active" from day to day. There are a few aggregators that report subscriber counts purely based on "active" users, not cumulative over time, which obviously provides a more accurate running metric.
Lack of visibility: There are a number of aggregators that provide no insight in their user-agents into the number of subscribers on behalf of whom they are requesting the feed. Obviously, these end up representing some undercounted number of subscribers for any publisher distributed to that aggregator, and this makes it harder for a publisher to understand their true distribution. We work with as many of these aggregators as we can in order to provide publishers with more information.

…to say nothing of the partial podcasts downloads and podcast download bots and other fun with podcast stats.

Across the board, we're seeing more and more distinct kinds of user-agents requesting feeds. Here's a quick chart of the growth in unique user-agents we've seen polling feeds just in the last six months.

UserAgentGrowth GIF image

Caveat Emptor: These chart numbers don't include user-agents with spammy identifiers that are obviously just long random strings, and hundreds of agents like "Shmucky-bot/1.0" and "Shmucky-bot/2.0" are only counted as one distinct user-agent. All of this data excludes the millions of requests a day we capture from clients with completely blank identifiers. Still, you can see the current count is well over 8,000 different kinds of feed reading entities. Everything from aggregators and search crawlers to thousands of mobile feed readers, hundreds of podcatchers, loads of language specific agents, specialty browser toolbars and more.

One of the questions we bounce around here is "what can we do to help people get more information about their statistics in order to better understand how their content is being distributed?" (although we don't speak to ourselves so eloquently). There are a few things we're always working on in this department:

Provide more statistics. When publishers have more dots to connect, they can draw a more distinct picture of their content distribution traffic. +2 points for the well-executed "dots - picture" metaphor. We'll be rolling out blog stats very very soon (remember the BlogBeat acquisition a ways back?), so publishers can get a nicer picture of their "total readership" across feed and site. This will also help publishers better make sense of anomalies in one area; e.g., did I see a traffic spike to my site the day before I saw a big spike in subscribers? Was I seeing more search traffic to my site during these periods of sustained subscriber growth? etc. We've got more surprises in this stats department as well.
Be transparent. We wrote up the peek inside TechCrunch's subscriber numbers a couple months ago as a way to help people understand what's behind the numbers. We can do more in this arena like describe the kinds of things we discuss with various publishers about traffic anomalies and other grey areas where the right metric isn't obvious.
Be Creative. When the market has trouble with some metric or approach or perceives a lack of information, there's an obvious opportunity to step in and provide that data.

04:29 How You're Using AskCity » The Ask.com Blog

It's been a whirlwind of activity over here with the launch of AskCity, and we're just now catching our breath and getting the lay of the land. We've seen hundreds of news items and blog headlines about the product, and...

00:35 Mozilla将同Linux发行版合作 » Blog on 27th Floor

以前Mozilla自己不发布专门针对某个Linux版本的浏览器，而只有一个统一的tgz包，而每个Linux发行版又都要自己修改，打上补丁，加上自己喜欢的功能或关掉自己不喜欢的功能（比如自动更新）。这些事情基本不在Mozilla的视野之内。

现在该组织声明说愿意同Linux发行版合作，一起做好这些事。Mozilla将同来自发行版的志愿者组成一个小组，共同管理补丁，专门的软件包以及在策略上共同做决定，网站上载也将给出这些特定版本的连接。同时，据称也会考虑Debian发行里的Mozilla商标问题，但还没有什么解决思路。而Debian现在已经换成了Iceweasel这套Gnu名头的浏览器了。

另外，Debian的下一个正式发行版，Etch，已经正式地冻结了，从现在起，想加新软件包进去都要经过发行组的人工处理；然后就是找Bug，打补丁，这个新版的正式发布就快了。

00:34 Opening up the Google Web Toolkit » Official Google Blog

Posted by Dan Peterson, Product Manager

Google Web Toolkit (GWT) is all about making the web a better place by making it easier to create web apps like Gmail or Google Maps. So today, we're excited to tell you that we're releasing all the source code for GWT under an open source license. We've been working hard to build great tools for AJAX development, and now we're happy to begin working with the open source community towards the same goal. The folks who are passionate about AJAX can contribute to the project and make this toolkit even better.

If you're curious about how to add some AJAX goodness to your site, see if the Google Web Toolkit is right for you.

00:14 首页的字体改大了 » 车东[Blog^2]

今天收到一个朋友的来信：说我的网站的字太小了。我去Analytics上看了一下，其中的WEB设计参数中，有一个屏幕分辨率的指标。目前我的网站上使用analytics的统计

来访者有95%以上的用户是使用1024分辨率以上（包括我自己看），为什么还要用那么小的字体呢？修改了一下style，把首页上所有 12px的字体都改成了14px（其实应该尽量避免使用固定象素大小字体，使用相对大小更好一些），之所以选择14px象素，因为我的网站有1/6左右是Firefox用户，单数大小字体对他们不适合。

如果你看到的首页还是小字体，请按F5强制刷新一下。

如果不满意还可以投上一票：

Free polls from Go2poll.com