Paper: On Delivering Embarrassingly Distributed Cloud Services

23:46 Scaling your cookie recipes » High Scalability - Building bigger, faster, more reliable websites.

This article on scaling cookie baking recipes showed up in one my key word alerts. Lots of weird things show up in alerts, but I really like cookies and the parallels were just so delicious. Scaling in the cookie baking world is: the process of multiplying your recipe by many times to produce much more dough for many more cookies. It’s the difference between making enough dough in one batch to make two dozen cookies, or 2000 cookies.

Hey, pretty close to the website notion. Yet as any good cook knows any scaled up recipe must be tweaked a little as things change at scale. Let's see what else we're supposed to do (quoted from the article):

Be Patient - When making large batches of cookies, the most important thing that you have to remember is not to rush.

Use Fresh Ingredients - This is always an important thing to keep in mind.

Don’t use as much leavening - When you’re making a large batch of cookie dough, remember to scale down the amount of baking powder that you use.

Watch the spice - As a general rule, do not be heavy handed with spice or salt when scaling up a recipe.

Have Fun - Like any other corny list of to-do’s I have to end on a corny note.

With a little creativity you can make all sorts of interesting parallels between scaling websites and scaling cookies. I'll leave that to your ample imagination as mine has been crushed by a virtual sugar buzz. But my afternoon snack sized thought for the day is: Relax. Eat more cookies.

23:18 Event: CloudCamp Silicon Valley Unconference on 30th September » High Scalability - Building bigger, faster, more reliable websites.

CloudCamp is an interesting unconference where early adapters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged you to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate.

CloudCamp Silicon Valley 08 is scheduled for Tuesday, September 30, 2008 from 06:00 PM - 10:00 PM in Sun Microsystems' EBC Briefing Center
15 Network Circle
Menlo Park, CA 94025

CloudCamp follows an interactive, unscripted unconference format. You can propose your own session or you can attend a session proposed by someone else. Either way, you are encouraged to engage in the discussion and “Vote with your feet”, which means … “find another session if you don’t find the session helpful”. Pick and choose from the conversations; rant and rave, or sit back and watch.

At CloudCamp, we tend to discuss the following topics:

* Infrastructure as a service (Joyent, Amazon Ec2, Nirvanix, etc)
* Platform as a service (BungeeLabs, AppEngine, etc)
* Software as a service (salesforce.com)
* Application / Data / Storage (development in the cloud)

17:09 吃饭问题（下篇） » 中外对话新鲜出炉

变革的种子开始播下，但是非洲的“绿色革命”应该追求什么模式？艾伦•贝蒂认为，贫穷与多样，使得这片大陆需要一系列的解决方案。

求变：在全球最贫困的大陆，一个最复杂问题的快速转变，将在粮食危机中发生。

第一次“绿色革命”改变了亚洲和拉美的农业，新的品种和充足的肥料使农民摆脱了生存陷阱。三十年后，非洲正设法予以效仿。

整个1980年代和1990年代，非洲的农业生产率未能跟上人口增长速度。任何的增长，都是因为耕种增加，而不是产量提高。农民、农学家和发展问题专家称，仅有新技术并不会带来根本的转变，特别是在短期内。改善市场和运输将有助于拓展现有的、未充分利用的技术，借此可以更快地获得收益。

然而，非洲是该寻求基于大型商业农场的农业企业模式，还是该集中改善数百万小农户的条件，人们意见不一。而且，改造非洲农业的困难是多方面的。有些是地貌问题：非洲的土壤和气候千差万别，从马格里布的地中海气候，到南非的热带环境和温带地区。在此地种植的作物和使用的技术，往往不能照搬到其他的地方。

那么，改善的前景如何，将有助于喂饱近十亿的人口吗？在洛克菲勒基金会和盖茨基金会1.5亿美元的资助下，非洲绿色革命联盟(AGRA)——一个农民、农业企业、科学家和研究机构的联盟——于2006年成立，这项事业得以向前推进。洛克菲勒基金会在资助第一次绿色革命中，也发挥了关键的作用。

设在内罗毕的非洲绿色革命联盟主席纳曼加·恩贡吉称，亚洲农业系统的主体是品种类似的小麦和水稻，非洲作物的品种范围更广，包括木薯、高粱、稷和玉米。“一个尺码不会适合所有人。” 恩贡吉说。

同样设在内罗毕的公私合作研究机构非洲农业技术基金会(AATF)执行董事姆波科·柏康加指出，即使在同一个国家，情况反差巨大。“在西肯尼亚，在东非大裂谷的北部，有着非常肥沃的地区，农场生产率高，商业潜力得到了很好的开发，” 柏康加说，“而50公里以外的地方就是被遗忘的地区，农场的产量只有上述地区的三分之一或四分之一。”

尽管有一些大的水系，有些地区降雨量丰富，但大多数地方靠天吃饭：在非洲，不到5%的耕地得到灌溉，而南亚为40%。

发展新技术需要一段时间。近几十年来，非洲的农业研究能力几乎跟其土壤一样，极少受到关注，政府经费捉襟见肘，基础科学经费遭到大幅削减。鉴于非洲农艺条件和其他地方不同，很难借用针对其他市场的科学突破。

例如，非洲农业技术基金会有一个 “节水玉米”开发项目，这种玉米能经受住更长时间的干旱，如果气候变化使得雨水变化无常——看来是这样，这一特征将变得愈发重要。该基金会将获得美国农业企业集团孟山都公司免费提供的基础研究。墨西哥国际玉米小麦改良中心(CIMMYT)，一个在第一次绿色革命中发挥了重大作用的非营利研究机构，之后会将其移植到可以在热带环境中茁壮成长的高产玉米品种上。然后，这些品种将分送到非洲的种子公司，而无须支付专利使用费。不过，柏康加表示，在获得品种在地里进行试验之前，将需要五六年的时间。

接下来，绿色革命将不得不面临全球农业最具争议的问题之一：转基因作物。非洲国家接受转基因作物进展缓慢，南非是唯一已批准一个转基因品种的国家，尽管布基纳法索可能正要批准一个棉花品种——该品种已在印度广泛种植，埃及正在考虑转基因玉米。

在非洲，政府和活动人士之中产生的对转基因的一些反感，是发自内心的。2002年，处于粮食危机之中的赞比亚拒绝接受作为紧急援助的转基因谷物，担心本地农业受到污染。政府甚至拒绝本身对转基因深持保留态度的欧盟的援助，在分发之前将谷物磨碎，以防进入农业系统。

不过，柏康加表示，反对被夸大了，农民知之甚少，而不是坚决反对。“不是所有非洲政府都反对接受转基因，”他说，“有很多生物技术的反对者，他们制造很多的噪音，挟持当地的媒体，然后外面的媒体就认为，农民是反对的。大多数农民对转基因一无所知。”

对于关心环境影响的那些人，他指出，抗除草剂转基因玉米可以发展土壤友好型的“免耕农业”，无需翻土除草。但是，鉴于需要制定试验和安全协议——对非洲某些国家的能力是一次严重的考验——转基因被广为接受，特别是粮食作物，看来至少还得十年以后。

与此同时，扩大现有技术的应用还有很多工作可以做。在很多非洲国家，特别是更为贫困的国家，不是没有肥料或者更好的杂交种子，而是因为贫困、私营企业弱小和营销服务差等综合因素，到不了农民手里。

非洲政府过去在 1970年代的农业支持机构为国家购销管理局，收购农产品，提供一切的肥料和种子补贴，谷物战略储存以防粮食危机，通过官方干预确定目标价格。通常是在世界银行和其他援助提供者的敦促下，这个被视为浪费、滋生腐败或绝对有害的机构组织多被解散。（然而，类似机构在欧洲和美国农业中继续存在。）然而，国家退出批发业务之后留下的真空，通常没有被私人企业填补，造成农民与国内和国际市场脱节。

例如，非洲绿色革命联盟将花费4,050万美元，建立拥有10,000家经销商的网络，在农村地区出售肥料和其他农资。有些国家，如南部非洲国家马拉维，在试验“市场敏感”补贴，用来补充和激励而不是取代私人企业。

不过，绿色革命和粮食的获得对贫困的影响，不仅仅是提高产量。增长的形式，分享其好处的最佳方式，成为一些争论的话题。

英国投资基金经理人乔恩·马圭尔设立了一个“投资非洲”基金，此前，他访问了马拉维，发现村民们由于缺少雨水而无法获得收成，尽管他们生活在马拉维湖的岸边。他说：“我问他们为什么不在灌溉上投资，他们告诉我，各村庄已三年没有货币流通。”在自己对农业一无所知的情况下，他筹得了1,600万美元的资金，雇用当地的农场经理人，购买了价值350万美元的喷洒器及其他的灌溉设备。

他现在经营2,500英亩的农场，还有另外9,000户“外围种植户”家庭与之签订供应合同。他们向包括西班牙在内的全球市场出售红辣椒粉和朝天椒。他说：“西班牙人对红辣椒粉的质量感到很吃惊。”明年，他的农场计划购买他所宣称的马拉维第一台联合收割机。

马圭尔的解决方案是建立大型、出口型的农场，在灌溉方面投入大量资金。“农业发展的整个基础曾经是：我们如何帮助小农户？”他说，“在非洲，你永远不会解决那样的问题。你需要中小企业绕着大型的农场转，将他们带入全球经济。我们的外围种植户现在分享到了全球价格好处，实际上是从粮食价格危机中获益。”许多农学家不同意这种看法。肯尼亚千年发展目标中心主任格伦·丹宁称，在每一个像马拉维这样的贫困国家，第一步是提高产量和改善小农户的条件，在他们实现多种经营之前，需要获得自身的粮食保障。“小农户已证明，如果获得适当的投入，他们能够参与竞争。”他说，“亚洲绿色革命就是这样。”建立储备之后，农民便可以种植经济作物。小农户基本谷物产量的提高，通过增加供应和调节粮食价格，还将使无地者和城市贫民受益。

实际上，新的和现有技术与非洲经济和社会的互动方式至关重要，不仅仅是绿色革命是否技术上行得通，而且是否将给非洲穷人带来广泛的利益。伦敦大学亚非学院(SOAS)学者安德鲁·多沃德称，例如，对很多贫困家庭来说，接受抗除草剂的转基因作物将是灾难性的：这种作物无须人工除草，而对很多人而言，除草是一个巨大的收入来源。

对绿色革命想法的左翼批评者，不怀疑非洲能够通过新品种和投入来提高生产率，但是他们宣称，得到实惠的将是大公司和富裕的农民。美国左倾的粮食与发展政策研究所研究员拉杰·帕特尔最近向一个国会委员会表示，像非洲绿色革命联盟这样的项目，“尽管也许是出于好意，却是典型的不负责任和不可持续的技术投资”。他转而呼吁“进一步采纳和研究切合当地实际而且受民主控制的农业生态方法项目”。

在绿色革命的争论中，问五个不同的人，你会得到七种不同的答案。非洲需要私营的农业供应商，非洲需要水，非洲需要道路，非洲需要转基因作物，非洲需要大型农场，非洲需要小型农场。现实似乎是，在一个如此多样化的大陆，非洲可能需要上述所有一切，而且还不够。

来源：www.ft.com

金融时报有限公司2008年版权所有

首页图片由World Resources Institute Staff摄

12:16 Fighting MySQL Replication Lag » MySQL Performance Blog

The problem of MySQL Replication unable to catch up is quite common in MySQL world and in fact I already wrote about it. There are many aspects of managing mysql replication lag such as using proper hardware and configuring it properly. In this post I will just look at couple of query design mistakes which result in low hanging fruit troubleshooting MySQL Replication Lag

First fact you absolutely need to remember is MySQL Replication is single threaded, which means if you have any long running write query it clogs replication stream and small and fast updates which go after it in MySQL binary log can't proceed. It is either more than than just about queries - if you're using explicit transactions all updates from the transactions are buffered together and when dumped to binary log as one big chunk which can't be interleaved by any other query execution. So if you have transaction containing millions of simple updates instead of one large update to help MySQL replication lag it is not going to work.

This brings us to rule number one - if you care about replication latency you must not have any long running updates. Queries or transactions containing multiple update queries which add up to long time. I would keep the maximum query length at about 1/5th of the maximum replication lag you're ready to tolerate. So if you want your replica to be no more than 1 minute behind keep the longest update query to 10 sec or so. This is of course rule of thumb depending on differences in master/slave configuration, their load and concurrency you may need to keep the ratio higher or allow a bit longer queries.

What should you do if you need to update a lot of rows ? Use Query Chopping - this can be running update/delete with LIMIT in the loop, controlling maximum amount of values per batch in multiple row insert statement or Fetching data you're planning to update/delete and having multiple queries to delete it (see example below)

This brings us to yet another rule for smart replication - do not make Slave to do more work than it needs to do. It is crippled by having to do all of this in single thread already - do not make it even harder. If there is considerable effort needed to select rows for modification - spread it out and have separate select and update queries. In such case slave will only need to run UPDATE
Example:

PLAIN TEXT

SQL:

UPDATE posts SET spam=1 WHERE body LIKE "%cheap rolex%";

This query will perform full table scan in MySQL 5.0 (even if there are no spam posts) which will load slave significantly. You can replace it with:

PLAIN TEXT

SQL:

SELECT id FROM posts WHERE body LIKE "%cheap rolex%";
UPDATE posts SET spam=1 WHERE id IN (list of ids)

If there could be many ids matched on the first place you should also use query chopping and run update in chunks if application allows it.

In MySQL 5.1 with row level replication you will not have selection process running on SLAVE but it will not do the chopping for you.

In general this trick does not only work well for full table scan updates but in general for cases when there are much more rows examined than modified.

The next common mistake is using INSERT ... SELECT - which is in similar to what I just described but can be much worse as SELECT may end up being extremely complicated query. It is best to avoid INSERT ... SELECT going through replication in 5.0 for many reasons (locking, long query time, waste of execution on slave). Piping data through application is the best solution in many cases and is quite easy - it is trivial to write the function which will take SELECT query and the table to which store its result set and use in your application in all cases when you need this functionality.

Finally you should not overload your replication - Quite typically I see replication lagging when batch jobs are running. These can load master significantly during their run time and make it impossible for slave to run the same load through single thread. The solution in many cases is to simply space it out and slow down your batch job (such as adding sleep calls) to ensure there is enough breathing room for replication thread.

You can also have controlled execution of batch job - this is when they will check slave lag every so often and pause if it becomes too large. This is a bit more complicated approach but it saves you from running around and adjusting your sleep behavior to keep the progress fast enough and at the same time keep replication from lagging.

In many bad replication lags I've seen simply following these simple rules would avoid a lot of problems and often save massive hardware purchases or development efforts based on assumption MySQL replication can't possibly keep up any more.

Entry posted by peter | No comment

Add to: | | | |

11:43 Cloud computing, grid computing, utility computing - list of top providers » High Scalability - Building bigger, faster, more reliable websites.

You want to have a scalable website. You want a website which can handle traffic spikes (think if you are getting on Digg, Slahsdot, Reddit, Techcrunch or other very popular websites frontpage).

Regular hosting companies (especially shared hosting) can offer only so much. The servers usually get crushed under the load in short time.

But there is hope. A new breed of hosting companies emerged recently. A new breed which can offer you the scalability you need at a fraction of the cost.

Welcome to the world of “cloud computing!” (or “grid computing” or “utility computing”, which are terms for the same thing).

Here's a website which compiled a list of cloud computing hosting companies (with short descriptions, prices and customer lists for each of them).

Read the entire article about Cloud computing, grid computing, utility computing list at MyTestBox.com - web software reviews, news, tips & tricks.

11:34 用户反馈的影响（二） » 谷歌中文网站管理员博客

发表者 Reid Yokoyama, 搜索质量组

原文：The Impact of User Feedback, Part 2 (and more Popular Picks!)
发表于：2008年8月26日星期二，下午4：41

我们曾经发表过用户反馈的影响(一)这篇博客，介绍用户对垃圾结果和付费链接的举报是怎样帮助Goolge为广大用户提高搜索结果质量的。作为这篇文章的续篇，这里我们想着重介绍一下Google网站管理员中心的一个最重要的部分：网站管理员支持论坛。除英文外，网站管理员支持论坛目前已经支持包括中文在内的其他15种语言，英文网站管理员支持论坛目前已有超过37000个注册用户。如果您有与抓取索引或网站管理员工具相关的问题，这个论坛是您的问题得到解答的理想场所。这里我们要特别感谢那些活跃在论坛中间无私地贡献自己的时间和精力的超级用户们，是他们使网站管理员支持论坛成为一个越来越受欢迎的地方。Googler，包括我自己，也会适时加入讨论，阐明一些问题。这里我要说明的一点是：我们一直在尽最大可能地阅读支持论坛里的绝大部分帖子，尽管我们无法一一回复，您的反馈建议和意见始终是我们不断改善产品性能的源动力，这里有一些具体的事例：

网站地图明细
通过网站管理员工具提交Sitemap是一个使Google了解您的网站上有哪些网页的有效办法。在网站管理员支持论坛中，用户很快指出即使他们提交了含有所有网页的Sitemap，他们在site: 检索中还是发现只有一部分网页被收录。作为响应，网站管理员工具组建立了一个Sitemap明细项目来让您更好地了解您提交的Sitemap是怎样被处理的。您可以阅读Jonathan的博客文章来了解更多关于Sitemap明细的信息。

上下文语义帮助
早前我们就收到了关于在网站管理员工具中对所展示的数据进行更好的说明和文档支持的建议。我们看到网站管理员支持论坛中很多问题都是有关在内容分析工具中“元说明”和“标题标记问题”的，这促使我们在这一页上增强相关文档的支持并使这一页可以直接链接到网站管理员支持中心。类似地，我们发现用户迫切需要我们阐明“热门搜索查询”和“热门点击查询”的区别，以及这类的数据应当怎样加以应用。于是我们添加了一个扩展项目，题为“如何使用此数据？”，并在网站管理员工具中广泛地添加上下文帮助信息，从而更好的说明每个功能的用途，以及从哪里可以了解到与此相关的更多信息。

博客文章
网站管理员支持论坛同时也是一个我们了解和把握网站管理员们所关心的问题的重要途径，这样我们就可以通过发表博客对这些问题进行说明和澄清。无论是使用网站管理员工具申请重新审核，还是巧妙地处理内容重复，网站迁移的最佳方法，或如何使信息更易利用、更易抓取，我们始终坚持倾听来自于您的声音。这也使我们想到…

现在是再一次对最受关注的问题进行选择的时候了！
去年，我们投入了两个星期来征求和回答网站管理员们最关心的五个问题。这些最受关注的问题涵盖了以下主题：

由于上一次活动受到了广泛欢迎，这里我很高兴地宣布我们将再举行一次这样的活动（译者注：本次活动的工作语言为英语）。您可以浏览和回复这篇帖子来提出您最关心的网站管理员相关的问题。让我们在那里相聚吧！

00:31 Paper: On Delivering Embarrassingly Distributed Cloud Services » High Scalability - Building bigger, faster, more reliable websites.

How do we scale datacenters? Should we build a few mammoth million machine datacenters or many smaller micro datacenters? Intuitively we usually go with a bigger is better economies of scale type argument, but it may not be so. What works for Walmart may not work for White Box World. Mega datacenters may actually exhibit diseconomies of scale. It may be better to run applications over many distributed micro datacenters instead of one large one.

This paper by Ken Church, Albert Greenberg, and James Hamilton, all from Microsoft, takes a look at the different issues and concludes:

Putting it all together, the micro model offers a design point with attractive performance, 
reliability, scale and cost. Given how much the industry is currently investing in the mega 
model, the industry would do well to consider the micro alternative.

	九月 2008
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30