18:49 Radar Theme: Synthetic Biology » O'Reilly Radar 中文站

Nat Torkington Nat Torkington 2008-08-05

Drew Endy taught undergraduate students how to make e. coli bacteria that smelled like wintergreen, using his biobricks. This shows us a future for biology where "useful biological tasks" can be "automated" using "components". The quotes indicate where research and development are going—building components, figuring out how biological amateurs can assemble them, and to what end. The overlap with open source and the low-barrier-to-entry that's reminiscent of the web are particularly interesting to us.

Watch list: Drew Endy, George Church, Christina Smolke, Open Wetware, Ginkgo Bioworks.

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it
16:54 利用数据与网络垃圾作斗争 » 谷歌中文网站管理员博客


原文:Using data to fight webspam
发表于:2008年6月27日星期三 下午4:51

这篇博客是讲述我们如何利用所收集的数据来改善我们的产品和服务系列文章中的最新一篇

作为谷歌反网络垃圾小组的负责人,我的职责是确保您得到的搜索结果尽可能的相关与翔实。也许您没有听说过网络垃圾,
网络垃圾就是搜索结果中的垃圾结果,这些垃圾结果要么狡猾地骗取了搜索结果中较高的排名位置,要么违反了搜索引擎质量指南。如果您从来没有见过网络垃圾,下面是一个很好的例子:如果您在搜索结果中点击了这样一个垃圾链接,就可能会看到以下画面(点击可浏览大图)。



您可以看到,这是一个没有任何价值的网页。这个例子中的网页几乎没有任何原创内容,还充斥着大量无关链接以及对用户没有多大用处的信息。我们努力确保您不会看到这样的搜索结果。可以想象,如果您点击了一个谷歌搜索结果的链接却最终看到了这类网页会是多么的不愉快。

现在,搜索用户并不会经常在搜索结果中看到这样露骨的、纯粹的网络垃圾。但是,早在谷歌普及之前,在我们找到有效的反网络垃圾的方法之前,网络垃圾就已经是一个大问题了。一般而言,网络垃圾真的令人非常恼火,例如您搜索自己的名字,返回结果的链接却指向了色情网页。而对于许多非常注重获得相关性信息的搜索来说,网络垃圾成了一个严重的问题。例如,一个关于前列腺癌的搜索,获得的结果却充斥着网络垃圾而不是相关信息的链接,这会大大削弱搜索引擎作为一种有用工具的价值。

来自搜索日志的数据是我们用来与网络垃圾作斗争,力求返回更纯净、更相关的搜索结果的一种工具。
IP地址和cookie信息等日志数据,使建立和使用指标系统、从不同方面衡量我们的搜索质量(例如索引的规模和覆盖范围、结果的"新鲜"程度,垃圾链接的数量等)成为可能。

每当我们创建新的衡量指标时,很重要的一点是能够审阅我们的日志数据,并利用先前的查询或搜索结果生成衡量网络垃圾的新的指标。我们使用搜索日志实现
"时间回溯",看看谷歌几个月来在用户查询方面改进了多少。当我们建立了一个新的指标能够更加精准地衡量一种新型的网络垃圾时,我们不仅可以跟踪今后我们阻击这种网络垃圾的进展情况,更可以使用日志数据分析我们在几个月前甚至几年前对同一类型网络垃圾的处理效果。

IP
cookie信息非常重要,它们能帮助我们把这种方法的应用范围仅限于"合法"的用户搜索,而不是那些由机器产生的搜索以及其他虚假搜索。举例来说,如果一个自动程序一遍又一遍地将相同的查询发送至谷歌,那么在我们衡量用户看到了多少网络垃圾之前,就应把这些搜索查询剔除出去。所有这一切——日志数据、IP地址和cookie信息——都会让您得到的搜索结果更纯净、更相关。

如果您认为网络垃圾已经不再成为一个问题了,请再仔细想想吧。去年,谷歌的索引体系遭遇了来自
.cn顶级域名的网络垃圾的疯狂攻击。一些网络垃圾制造者大量购买廉价的.cn域名,并在这些网站上堆满故意拼错的词汇和色情词汇。资深的用户可能还记得曾经读过几篇与此相关的博客,但绝大多数普通用户甚至可能从来没有注意到这些。普通的搜索用户没有注意到这些异常搜索结果的原因,是因为谷歌及时识别出了这些.cn网络垃圾,并通过一个快速跟踪项目,很好地应对了此类网络垃圾的攻击。如果没有日志数据帮助我们识别问题发生的速度和范围,可能会有更多的谷歌用户受到此类攻击的影响。

理想的情况是,绝大多数用户甚至不需要知道谷歌有这样一个反网络垃圾小组。如果我们的工作做得很出色,您可能偶尔会看到质量不高的搜索结果,但您无需面对恶意的
JavaScript重定向、令人反感的色情内容、充斥着无意义内容的页面或其他类型的网络垃圾。我们的日志数据有助于确保我们追踪到网络垃圾的新动向,并且在它们影响您的搜索体验之前采取相应的行动。

13:28 周末絮语: 社区幻象; 太空中心; 德拉米苏 » 大学小容>善用网络,助益成长!

上次说到slideshare.net里正在举行一个名字叫做Presentation Design Tennis的活动,每个人提交一个幻灯片,最后组合成一个14个幻灯片的幻灯片作品。小容参与了整个过程,虽然第一张幻灯片被宣布无效,不过,小容却中选了最后一张幻灯片。不过,看起来这个活动并不如预期的火爆,每天参与的人数很少。这反映出Slideshare.net的社区氛围还不是很成熟。

1.

最后的成品如下,这个幻灯片的主题What is community倒是非常应景——最近大陆网络圈子里的SNS社区讨论的话题非常火爆。

在小容看来,这个幻灯片协作活动本身也回答了它自己提出的问题:
• 社区意味着会话——基于了解建立信任
• 社区意味着参与——参与越多收获越多
• 社区意味着规则——共同契约达成秩序
• 社区意味着协作——每个个体彼此信任
• 社区意味着创造——群体智慧解决问题
••••••

拿最近大陆网络圈中讨论热烈的“SNS社区”来和上述的理想境界对比一下,小容感觉,在纷纷扰扰的嘈杂声中,社区俨然浮现出扑朔迷离的幻象。许多讨论缺乏统一的术语界定,语境背景也参差不齐。

2.

其实小容住的地方离太空中心很近,冬天和春天里也经常z在清晨跑步到那里。朋友也经常介绍说太空中心也很精彩,只是小容一直懒惰,许多个周末过去了,依然没有去吃吃窝边草。记得很久很久以前小容是个科幻迷,在初中时看了许多航天科普读物。或许是想着既然住这么近,那么,随时都会有机会来玩。可是日子飞快,一眨眼已经过去两个季节。

上周的时候,女友请了两周假期从纽约过来玩,于是上个周末一起去太空中心。小容这次办理了会员卡,这样倒是以后可以常去了。太空中心和NASA隔着一条马路。太空中心其实就是一个大型的以太空科普为主题的展览馆和游览场,有航天航空实物、科普图片展览、电视纪录片、电影院、航天飞机模拟控制游戏,还有太空生活模拟真人表演。此外,还有让小孩玩得不亦乐乎的冒险娱乐城。当然,还少不了餐厅和纪念品商店:)

上午的时候我们在太空中心里逐个项目欣赏,下午就从太空中心做游览车穿过马路下面的短短的隧道,到NASA游览了一圈,这圈旅程实际上就是在NASA里转了一大圈,途中逗留两站,一站是在火箭发射控制室听讲解,另一站是参观火箭公园,有硕大无比的火箭实物。小容这次不仅拍了相片,而且拍了视频。NASA游览的过程只有视频,没有照片。(网络上找到其他人的游记,请看这里)

下面先分享几张照片,更多照片已经分享在Flickr上,请看这里。视频需要等待一些时日。


航天飞机引擎


太空中心一角


太空中心一角


太空中心里的《星球大战》动画片展览

3.

在游览了太空中心后,某日发现Google的主页变成航天图案,原来是纪念NASA成立50周年。

小容很喜欢NASA成立50周年的标志。这里是NASA成立50周年的专题网站

NASA制作了许多网站,并为互联网贡献了许多宝贵的资料。对比庞大的经费预算,NASA所做的一些面向公众的公关宣传活动和信息开放看起来就很正常。

这是NASA设立的图片网站,有兴趣的朋友可以在这里搜索图片,了解太外空的秘密。

4.

上次小容去纽约的时候,女友就说想要做提拉米苏,这次她过来之后,我们买了材料和器具,经过不懈的努力,终于做出一个看起来很像样的提拉米苏。

小容把这个提拉米苏改名为德拉米苏。据说提拉米苏的意思上“带我走”,各位读者看到这里,请动动脑筋帮助小容想想德拉米苏的含义。

前文回顾:

周末絮语: 幻灯接龙; Blog成书; 追星阅读

周末絮语: 夏日街头; 村上隆展; 纪念可乐

周末絮语: 标签系统; 纽约极地; 波多黎哥

12:15 128GB or RAM finally got cheap » MySQL Performance Blog

I did not usually go to “Elite” servers on Dell web site but looking at customers system today I went to check Dell Poweredge R900. This monster takes up to 4 Quad Core CPUs and has 32 memory slots, which allows to get 128GB of memory with 4GB of memory chips. This means upgrade to default configuration to 128GB of memory will cost you just $9600 (list price). I’ve been able to configure on a web the system with 8*2.5″ hard drives RAID and 2 CPUs (just as we usually configure PowerEdge 2950) with 128GB of RAM for about $16000. This means talking to Dell Sales rep it can purchases within $15000. This may sounds as a lot but if you’re memory constrained it is cheaper per GB than buying 32GB box for $6000

So am I scale-up advocate ? No. But it is quite frequently systems are designed to have “working set to fit in memory” to perform well and such systems can avoid good consolidation factor for such application, or would allow them to delay sharding.

This box also has 4 CPU sockets which means 16 fast cores and 128GB of memory becomes commodity - Quite a challenge for MySQL to take :)

I have not had a chance to play with such box myself besides couple of customer production installations but it looks pretty sweet.


Entry posted by peter | 6 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

终极Shell——ZshLinuxTOY » 车东 在 Google 阅读器中共享的项目

[撰文/Kardinal]

有句话这样说,zsh: The last shell you’ll ever need! Z是最后一个字母,所以是终极Shell。

我曾经搜索到一个比较各种Shell的文章,Zsh交互性是A+级别的,远高于其它Shell。在编程方面,Zsh是A级的吧,也是最高的。只是不知道出于什么原因,Zsh被严重的低估了。

Continue reading →

Filed under: Apps, CLI | Permalink | Add to del.icio.us | Email this | 24 comments

07:49 GeoCommons + Mapufacture: Consolidation in the Where 2.0 Space » O'Reilly Radar 中文站

Brady Forrest Brady Forrest 2008-08-04

joke logo

Today FortiusOne announced its acquisition of Mapufacture, the web's original geo-feed aggregator. FortiusOne is the creator of GeoCommons, a geo-data repository with analysis tools (Radar post). This acquisition brings Fortiusone both talent and the technology to handle third-party feeds.

Mapufacture was created by Andrew Turner and Mikel Maron, two well-known geohackers. In addition to collaborating on Mapufacture they have also worked on the Mapstraction Javascript library (Radar post) and GeoPress, a mapping plugin for Wordpress blogs (Radar post). Mikel is also known as the creator and champion of GeoRSS. Andrew Turner will be joining the company as CTO. Mikel will become an advisor.

mapufacture

At its core Mapufacture is a map making tool. Users can create maps by combining feeds or their own geo data sets. If you're not sure where to find a geofeed check out Platial, Flickr, Yahoo! Pipes or any hosted KML file. The maps can be embedded on third-party sites. Mapufacture also provides timeline tools so that you can see how your data changes over time.

GeoCommons first launched at Where 2.0 in 2007. Later that year it went offline for a complete redesign. The company has slowly been releasing pieces of it back out. The first major component out the door is Finder! Finder! lets you search for datasets, bookmark them to your account and download them as KML, ShapeFile or a spreadsheet. You can find datasets of almost any type including some about employment, universities, or the internet.

Mapufacture will eventually be merged into GeoCommons; it will become Maker, a portion of GeoCommons that will let you make and share maps. This will complete the relaunch of GeoCommons major features.

The importance of geodata often goes unnoticed by people who don't work with it everyday. Until mashups and Google Earth I doubt many people ever gave geodata a second thought. I hope that GeoCommons continues to bridge that gap. GeoCommons is going to provide us with easy-to-use, web-based tools that allow for powerful analysis and a repository of geodata in open formats. Hopefully their tools and data will not go unnoticed.

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it
05:49 ETech 2009 CFP: Living, Reinvented » O'Reilly Radar 中文站

Brady Forrest Brady Forrest 2008-08-04

etech logo

ETech's CFP has launched. The theme this year is Living, Reinvented: The Technology of Abundance and Constraints. To that end I spent time with MITs Scratch Team (changing computer education) and the RoboScooter team (changing transportation). We're going to explore the following themes. Make sure that you get your submissions in by September 17th.

  • City Tech: Our cities are growing, getting bigger faster than ever before. People are rushing to them in search of economic and social opportunity—jobs, urban living, and access to culture. How can technology help us create livable, prosperous, sustainable cities? What should mass transit look like? How can we infuse urban infrastructure with sustainability? How are cities using citizens’ data to become smarter? What can economics tell us about the way urban populations will change and behave?

  • Materials & Mechanics: Mechanics and materials develop hand-in-hand. The creation of a new, lighter metal enables iPhones and Mars Explorers. We’ll examine the latest in mechanics and the materials that enable new developments. What mechanisms will be possible? How will the coming age of materials change our clothes, our products, and our everyday lives? Can they be made the cradle2cradle way or will we simply be clogging our landfills with ingenious, meticulously crafted waste?

  • Personalized Healthcare: Medical technology is something that almost everyone comes to rely on, whether it’s hopeful, preventive care in the form of Reseveratol, or a new limb. In no other area does the industrialized world have more of an advantage. What legal framework for personal genomics balances innovation and appropriate medical caution? How is medicine changing? How is healthcare changing across the world? Many resources are focused on anti-aging technology and drugs—is this the right direction?

  • Mobile & The Web: The next billion people will come to the Web via connected mobile devices. Currently, many of these devices are humble dumb clients, but the iPhone, Google, and Nokia are bringing smarter clients to the masses with open platforms. How will these mini-computers change our lives? How will these jumbo-sized sensors benefit us? Will we be able to use the third screen to view an augmented world? What data will be collected and who will have access to it? Is the Web ready for the Next Billion? What will their web apps look like?

  • Geek Family: Digital native mothers and fathers are starting their own families. How is that changing home technology? Education technology? What does the future geek home look like and how does it function?

  • Synthetic Biology: We can’t cover the reinvention of living without looking at the new definition of life. Synthetic biology, first pioneered in the 1970s, is becoming a factor in the development of new materials, medicines, environmental cleansing, and energy. How will this technology impact our lives? How can we be a part of it? What will bring it into the hands of the wider public?


But ETech isn’t just about “haves” and “have-nots.” Some people choose to live with constraints within the abundant world. What trends and innovations are emerging?


  • Nomadism & Shedworking: As cities and their suburbs rapidly increase their footprint, there are some who reject the crowded living conditions, but take advantage of the connectedness. They adopt a high-tech lifestyle within the constraints of a smaller space or take their posessions and their bits with them on the road, to the farthest reaches of the globe. How do they do this and what can we learn from them?

  • Sustainable Life: The American lifestyle is unsustainable. How do we move to one-Earth economy? What are Europeans doing? Will Dubai be the trendsetter with its newest sustainable city? How will a renewed interest in environmental design affect us? Last year’s keynoter Alex Steffen posited that it would be technology driving the change, not a restriction of habits or an energy diet. Right now the abundant world is being changed by rising oil and medical costs, forcing change. What technology will break through?

  • Life Hacking & Information Overload: We are bombarded with too much information, but at least some of it is relevant. What are the tools that we can use to process it? How can we identify the subset we actually care about? How do we identify the necessary bits of information that makes us more productive? Can we use cognitive science to help us deal with modern day living? What does neuroscience tell us about our brains and how we should handle learning and processing? Will ubicomp be able to help us stave off the overload or will it hasten our doom?


I'll be helped by my committee. This year it includes Mike Walsh, Annalee Newitz, Natalie Jeremijenko, Matt Webb, Nat Torkington, Matt Jones, David Pescovitz, Timo Hannay and Kati London. Thanks!

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it

^==Back Home: www.chedong.com

^==Back Digest Home: www.chedong.com/digest/

<== 2008-08-04
  八月 2008  
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
==> 2008-08-06