w-index | 06 Jun 2008 - Digest by CheDong.com

22:41 释放毒素的报废计算机 » 中外对话新鲜出炉

在西非的电子废墟中，孩子们正在帮助清理被发达国家丢弃的计算机。当他们从中提取金属来赚钱的同时，也正在付出高昂的健康代价，理查德·雷报道。

每天，数以千计的被欧美丢弃的计算机抵达西非的港口，产生了大量有毒垃圾。在那里，孩子们对这些垃圾进行焚烧和拆解，提取金属以赚取现金。

发达国家所进行的电子垃圾或称e垃圾的倾销，直接违反了国际立法，给简陋城镇里的居民造成了严重的健康问题。这些城镇都是在尼日利亚的拉格斯和加纳的阿克拉的垃圾焚烧中兴起的。

活动家们认为，不道德的废品商以出口学校和医院使用材料为伪装，正在向发展中国家非法倾倒数以百万吨的危险垃圾。他们正在呼吁，加强对电子垃圾出口禁令的监管，这些垃圾能够释放铅、水银和其他危险化学品。

“加纳正在日益成为欧美垃圾的倾倒地，”加纳环保记者联合会主席麦克·阿纳恩说。“那些拆解显示器的人告诉我，他们感到恶心、头疼并伴有呼吸问题。”

每个月都有50多万台计算机抵达拉格斯，但是其中只有大约四分之一还能够使用。剩下的都当作废品出售了，然后打碎、焚烧。

“每年，几百万吨电子垃圾从发达国家消失，然后再现于发展中国家，完全不管国际禁令，”国际消费者联盟的卢克·厄普秋尔奇说，这是一个在115个国家中代表超过220个消费团体的国际组织。

电子垃圾非法贸易的利润相当可观。从一吨电子电路中提取出来的金子可能比从一吨金矿石中提取的还要多。但是，对于那些向发展中国家捐献二手设备的慈善团体和其他组织来说，非法倾倒正在使它们处于危险之中。

1992年巴塞尔禁令宣布发达国家向欠发达国家出口危险垃圾为非法，自从引入该禁令以来，计算机已经成为了一种日常物品。消费者和企业都在以越来越快的速度更换装备，产生了一座新的垃圾山。

6年前，欧盟出台了报废电子电气设备(WEEE)指令，对电子垃圾的流动引入了新的约束和限制。该指令于2007年1月在英国生效，严格管理了再循环电子垃圾的流动，并禁止其出口处理。它还引入了一项计划，规定，2005年8月后，市场上经过适当处理的电子设备的价格必须由垃圾的生产者进行补偿，如制造商、零售商、贴牌商和进口商。

但是，国际消费者联盟的伙伴组织DanWatch有证据表明，来自英国公司和地方政府的计算机设备正在被倾倒至西非。

“我们拍摄了一部电影，讲述只有6岁的儿童在土地上搜索金属废料，这些废料与数以千计的阴极射线管碎片产生的有毒垃圾乱丢在一起，” DanWatch的共同创始人本杰明·霍斯特说。“事实上，整个社会都在这个高度有毒的环境中生活工作，而毒素每天都在增加。”

运行正常的计算机设备的出口免于欧盟的WEEE规则。实际上，这些规则鼓励计算机设备的翻新和重新使用。但是，没有关于在其运往海外之前对计算机设备的重新使用进行检查的制度。

在英格兰和威尔士，垃圾的管理受环境署的审核。“我们的态度是，通常，运行设备的真正重新使用是一件好事，”该组织的政策顾问艾德里安·哈丁解释到。

问题是“真正重新使用”这个措辞。哈丁承认，该组织完全没有财力来检查发展中国家中每一件用于重新使用的物品。一部分问题是，对于重新使用物品的流动，甚至不必通报该组织，所以它都不知道哪一个集装箱是目标。

作为一个合法的面向发展中国家的二手计算机供应商，计算机辅助国际因此而声名卓著，它已经成立10年了，把11万9千多台计算机送往了肯尼亚、智利等国家。

该慈善团体在环境署注册为一个官方电子垃圾处理公司。它不能利用的所有机器都被送往欧盟内的专门回收工厂。其创始人托尼·罗伯茨相信，当前电子垃圾法规的问题是，在欧盟之外，他们无法让计算机设备的生产者为设备的合适处理而付费。

如果不能赚钱，发展中国家就没有动力去投资合适的回收工厂。结果是，电子垃圾问题可能会增加，不是因为不道德的欧洲出口商，而是因为越来越多的计算机正在被卖到发展中国家。

“当你考虑整个产品的生命周期的时候，75% 的环境损害都是在计算机第一次开机之前造成的，”罗伯茨指出。“生产、采矿、生产配套设备的工厂，和有毒材料的使用——这些才是造成环境损害的地方。所以，如果我们不让生产者为处理这些环境问题负责，我们就永远不能让计算机进行重新设计；我们就永远不能让计算机以一种更环保的方式生产。

每当计算机辅助捐助的设备达到其使用寿命尽头的时候，该公司都会努力限制对其进行处理所造成的环境破坏。例如，在肯尼亚，它正在帮助建造一家回收工厂，不只回收它自己的设备，而且处理来自全国的机器碎片。这个过程是基础性的，但是比采用垃圾掩埋法更好——而且电路板会被重新出口到英国。

罗伯茨说：“问题是，生产者正在发展中国家的市场上销售几百万台个人电脑，却没有提供资金，所以我们需要在所有市场上建立类似的资金。”

这是国际绿色和平组织的有毒物污染项目主任马丁·霍贾西克发起的一个号召。“我们希望，生产者为其产品的回收负责，”他说。

给人的希望是，在全世界，无论出售还是使用，让生产者为其计算机设备的处理而付费，这种费用会促使该产业生产“更绿色的”机器。

要想迅速停止非洲孩子们在有毒垃圾倾倒地到处乱扒的现象，欧洲的管理者，更重要的是消费者和企业必须为其计算机设备的处理而负责。

来源：www.guardian.co.uk

首页图片由 ▌ÇP▐ 摄

17:13 豆瓣寻人9：运营经理（同城活动） » 豆瓣blog

工作职责：
-同城活动的市场调研
-确定同城活动的对外合作模式，独立开展合作洽谈，开拓合作伙伴
-进行用户调查，帮助同城进行产品改进
-同城活动内容的日常维护

职位要求：
-大学本科或以上学历，专业不限，传媒、经济、社会学学科背景更佳
-有市场研究、市场推广、品牌公关相关工作经验
-对互联网产品的市场以及合作模式有一定了解
-出色的分析能力，良好的人际沟通和协调能力

工作地点位于北京。有兴趣或者问题请email至 team(a)douban.com。请注明“同城”。

欢迎转载。谢谢！

14:48 Gmail将测试众多新功能 » WebLeOn's Blog

Gmail即将推出大量实验性功能供用户测试，这里列举了一些即将推出的新功能。

1，自定义标记（Superstar）
将可以对不同的邮件加上不同的图标标签，而不只是打星。

2，快速链接（Quick Links）
可以保存各种功能、分类或者是单封邮件的链接以便快速访问。

3，休息功能（Email Addict）
课间休息，让你停止收发邮件15分钟。

4，贪吃蛇（Snakey）
是的，让你在等待邮件的时候消磨时间。

5，随即签名（Random Signature）
可以用某个Feed中的条目来为邮件随机签名。

6，自定义快捷键（Custom Keyboard Shortcuts）
可以用符合你习惯的快捷键来操作Gmail。

7，鼠标手势（Mouse gestures）
使用鼠标手势来操作Gmail。

8，口罩（Muzzle）
隐藏联系人的状态。

9，聊天图片
在Gmail中使用Talk聊天也可以显示图片了。

10，智能回复签名
把签名档自动放在最后一条引用之前。

11，固定字体
使用固定宽度的字体来查看邮件，防止格式错乱。

在Lifehacker可以看到这些新功能的截图和更详细的说明。

11:54 How would you compress your MySQL Backup » MySQL Performance Blog

Backing up MySQL Database most people compress them - which can make a good sense in terms of backup and recovery speed as well as space needed or be a serious bottleneck depending on circumstances and approach used.

First I should mention this question mainly arises for medium and large size databases - for databases below 100GB in size compression performance is usually not the problem (though backup impact on server performance may well be).

We also assume backup is done on physical level here (cold backup, slave backup, innodb hot backup or snapshot backup) as this is only way practical at this point for databases of decent size.

Two important compression questions you need to decide for backup is where to do compression (on the source or target server if you backup over network) and which compression software to use.

Compression on source server is most typical approach and it is great, though it takes extra CPU resources on the source server in additional to IO resources which may not be available, especially for CPU bound MySQL Load. The benefit in this case is less space requirement if you’re keeping the local copy as well as less network bandwidth requirements in case you’re backing up to network storage.

Compression on the destination server offloads source server (though it may run our of CPU itself, if it is target for multiple backups, plus there are higher network bandwidth requirements to transfer uncompressed backup.

What is about compression tool ? The classical tool used for backup compression is gzip - it exists almost everywhere, it is stable and relatively fast.

In many cases however it is not fast enough and becomes the bottleneck for all the backup process.

Recently I did a little benchmark compressing 1GB binlog file with GZIP (compression done from OS cache and redirected to /dev/null so we only measure compression speed). On the test box with Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz CPU. GZIP would compress this file in
48 seconds (with default options) resulting in 260MB compressed file. This gives us compression speed of about 21MB/sec - clearly much less than even single SATA hard drive can read sequentially. This file when will take about 10 seconds to decompress, meaning source file will be read at 26MB/sec to do decompression - this is again much less than hard drive sequential read performance, though the fact this gives us about 100MB/sec of uncompressed data writing is more of the issue.

Such performance also means if your goal is faster local network transfer default GZIP compression will not speed things up on the standard point to point 1Gbit network connection.

If we try gzip -1 to get fastest compression we get the same file compressed to 320MB in 27 seconds. This gives us 37MB/sec which is a lot better but still not quite enough. Also note the serious leap in compressed file size. Though in this example we used MySQL binary log file which often contains plenty of similar events, which could be the reason for so large size difference based on compression ratio. The decompression takes about same 10 seconds which gives about 32MB/sec of archive read speed and same 100MB/sec of uncompressed data.

Do we have any faster alternatives to GZIP ? There are actually quite a few but I like LZO which I was playing with since later 1990’s and which is rather active project. There is also GZIP like command like compressor using LZO library called LZOP which makes it easy drop in replacement.
I got LZOP binary which was built against LZO 1.0, more resent version 2.0 promises further performance improvements especially on 64bit systems.

With LZO default compression file compressed in 10.5 seconds and resulted in 390MB compressed file, this gives us 97MB/sec compression speed which is good enough to compress all data you can read from single drive. The file decompresses in 3.7 seconds which gives 105MB/sec read speed from archive media and 276MB/sec write speed to the hard drive - this means restoring from backup compressed with LZO will often be as fast or faster as from not compressed one.

With LZO there is also “-1″ option for even faster compression which had rather interesting results. The file compressed in 10.0 seconds (102MB/sec) and was 385MB in size - so this lower compression rate actually compressed this a bit better while being about 5% faster. The decompression speed was about the same. I’m sure the results may change based on the data being compressed but it looks like LZO uses relatively fast compression by default already.

With real server grade CPU deployment the performance should be even better, meaning you should get over +-100MB/second you can pass through 1Gbit ethernet, meaning you actually can use LZO compression for faster data transfer between the boxes (ie together with netcat)

Now as in my benchmarks there is also overhead of reading (from file cache) and piping to the /dev/null which are constant the true difference in compression speed is even larger, though as most of backup operations will need reading and writing anyway they come with this static overhead naturally added.

Entry posted by peter | 9 comments

Add to: | | | |

05:22 w-index » 格致 - 科学的乐趣

继 UCSD 的 J. E. Hirsch 提出评价研究者成绩的方法，h-index ，之后，出现了众多变体。来自中国科技大学的 Qiang Wu 最近往 arXiv 贴了篇文章：The w-index: A significant improvement of the h-index，提出了个新变体 w-index。

这个 w-index 和 h-index 很类似，是这么定义的：A researcher has index w if w of his/her papers have at least 10w citations each, and the other papers have fewer than 10(w+1) citations. 如果一个研究者有 w 篇文章至少有 10w 个引用，而其它文章的引用数少于 10(w+1)，那么他/她的指数为 w。

（h-index 的定义：A scientist has index h if h of his/her N papers have at least h citations each, and the other (N − h) papers have no more than h citations each. 如果一个科学家有 h 篇文章至少有 h 个引用，且其它 (N - h）篇文章的引用数不超过 h，那么他/她的指数为 h。）

举个例子，霍金的指数为 24，意思是说他有 24 篇文章曾被引用过至少 240 次，同时他没有 25 篇文章被引用至少 250 次。说起来比较饶口。

w 与 h 的差别在于它更强调一个科学家最顶尖的文章。两种指数做出来的排名很不一样，比如，物理学家按 w 指数排名前五位：威腾（41），安德森（26），霍金（24），Marvin Cohen（23），Frank Wilczek（23）；按 h 指数前五位：威腾（110），Marvin Cohen（94），安德森（91），Steven Weinberg （88），Michael Fisher（88）。

Qiang Wu 的文章处处模仿 J. E. Hirsch 的，比如 J. E. Hirsch 说：工作 20 年后，一个“成功的科学家”的指数应达到 20；一个“出色的科学家”的指数应达到 40；而一个“真正天才人物”的指数会达到 60。另外，他接着建议，一个研究者应该被提升为副教授如果他/她的 h 指数达到约 12，如果达到 18，那么应该提升为正教授。他自己的 h 指数为 49。

Qiang Wu 的文章里模仿道：

i) w 指数为 1 或者 2，表示该研究者已经学到了一个课题的基本。
ii) w 指数为 3 或者 4，表示该研究者已经掌握了 the art of scientific activity。
iii) w 指数为 5，表明他是位成功的研究者。
iv) w 指数为 10，表明他是为出色的科学家。
v) 工作 20 年后 w 指数超过 15，或者 30 年后指数超过 20，那就是顶尖科学家了。

请半严肃，半戏谑地看待这些文章。

相关链接：
What's your Wu index?
Number theory