越危机，越坚强 …… 重新开始 | 28 Feb 2009

23:08 KISS KISS KISS » MySQL Performance Blog

When I visit customers quite often they tell me about number of creative techniques they heard on the conferences, read on the blogs, forums and Internet articles and they ask me if they should use them. My advice is frequently - do not. It is fun to be creative but creative solutions also means unproven and people who had to become creative with their system often did that because they had no choice. Of course when they came to the bunch of conferences and told their story which resonated across the Internet sticking to the people mind as a good practice.

There are 2 things you should ask yourself. First is the scale comparable - the recipes from Facebook, YouTube, Yahoo, are not good for like 99.9% of the applications because they are not even remotely close in size and so capacity requirements. Second if this “smart thing” was truly thought out architecture choice in beginning or it was the choice within code base constrains they had, and so you might not have.

Let me look into couple of most typical reservations.

Sharding - This is perhaps the technology people get obsessed with most regularly. Sometimes it looks like a homepage running on 100K database visited by 100 people in a month is attempted to be sharded. Remember as commodity hardware is advancing the size of the application when you really need to shard moves further and further away. I remember LiveJournal with 4GB of memory per box doing sharding 5 years ago…. well now you can get a box with 128G of RAM within $15K. Keeping working set in memory is not the only reason for sharding but one of the most frequent ones. The examples I like to use is YouTube - they did not shard until after Google bought them (though they were in pain) and 37Signals

When doing Performance Audit we tend to look at the required capacity and data size within current horizon. In many cases even with super optimistic assumptions application will do just fine with single “cluster” even on the current hardware for several years.

Replication Optimization People often get scared with the fact replication is single thread and often becomes bottleneck so they are using various optimizations including tricky prefetch approaches suggested by YouTube. Interesting enough this often happens even when system is far from reaching its replication capacity.

I would suggest measure and monitor your replication capacity (how long will it take a slave to catch up 1 hour lag of peak traffic ?) and act appropriately. Also focus on simple optimizations first, if you need to get to prefetch you’re quite likely beyond reasonable use of single master and should have done sharding functional partitioning or something else.

Complex Replication There are impressive numbers out there on how many slaves people run and how complex replication topologies with multiple tiers filtering and writes to some intermediary slaves people use. For me simple is best. Complex architectures are more error prone harder to maintain (upgrades etc) and troubleshoot. Remember for every single “role” in such setup you need to understand what to do with it if any other “role” in the system fails, which escalates complexity. You may need something more advanced than master and one slave but any complication needs to be justified. I also should note slaves are not overly efficient beasts - they not only store the copy of data on the disk, wasting resources but their caches are also highly redundant defeating the fact you may have a lot of total memory on the slave farm.

Reading from the Slaves The story heard is typically - Web applications often have significantly prevailing reads so to scale we better have many slaves which we can use to handle most of our read traffic right ? Sure. Unless you’re using memcache or other caching option. Successful memcache implementations often report 90% cache hit ratio meaning 10 to 1 read ratio drop backs to one to one. This means you may not need a lot of slaves if your application allows use of efficient caching.

Now lets look at the simplified case - you got pair of servers replicating as Master-Master which you typically want for high availability and online schema changes. How far you want to go making your application being capable to read from the slave ? Remember as you’re doing this for high availability and online schema changes you’re planning to operate without slave every so often, meaning one server should be able to handle all traffic from capacity planning standpoint anyway. At the same time slave can be perfect to be used for non production impacting things like analytics.

High Availability The trick with high availability is the more complex architectures and processes you use for high availability prevention the more likely it is for them to fail. Unless you’re Google scale with failures happening daily you can’t really be sure you’re handling “wild” failures, not the test ones well. Furthermore you always have to look at failures caused by other things - wrong code pushed to production, hacker break in, data center power failure etc.

Google guys tells us single MySQL server on a good hardware has MTBF somewhere between 1000 and 2000 days. This is a lot of time which means for most of applications having a pair of slaves (even though second slave is available for failover only 99% of the time) is more than enough.

I would say more. In my experience the availability of the application is only related to the MySQL redundancy for very high quality/high scale applications. I’ve seen applications having no downtime running for years on single MySQL server (which just does not crash) as well as complex no single point of failure database backend with application constantly going down because of bad code or something unpredicted.

Summary: So am I denying all MySQL industry practices (which we also covered in a great depth in our book) ? Not really. I’m just suggesting do not just grab advice from the Internet or friends tip and do not complicate beyond the need. You may start with couple of replicated nodes for high availability and maintenance if you’re in serious business (and just one server and good point in time backup if you’re on the budget) and assess any need for any complications. It may be boring but boring systems often have highest uptime

Entry posted by peter | One comment

Add to: | | | |

17:51 SEChina邮件讨论组：从一棵小树到一片森林 » 大学小容>善用网络，助益成长！

这是一篇推广SEChina邮件讨论组的贴子。这个Google邮件讨论组的地址是：

http://groups.google.com/group/SEChina

这个邮件讨论组的名字在26日被改为：Social Entrepreneur Club （社会企业家俱乐部），之前的名字是 Social Enterprise Translation Project （社会企业英文翻译志愿项目）。

现在这个小组的简介只有一句话：“从一棵小树到一片森林，让我们一起携手创建中国社会企业家社区“，小组的网站地址链接：http://wiki.1kg.org/index.php?title=SE

- ”多背一公斤“种下的一棵小树

这个邮件组最初是由”多背一公斤”团队在2006年10月份创建的，当时是为了方便参与“社会企业英文翻译志愿项目”的朋友在一起交流。“社会企业英文翻译志愿项目”的构思如下:

社会企业在目前的中国还是一个很新鲜的话题，但这方面的中文资料却异常缺乏，这给我们的学习和实践带来了困难。为此我们整理了部分英文资料，重点集中在近期我们比较关心的社会企业商业计划，盈利问题和战略思考等方面，通过这些资料的翻译，我们希望能给关注社会企业的朋友更多的灵感。欢迎加入我们的翻译项目，翻译的要求不高，不需要很精确，只需要把主要的意思表达清楚即可。

07年4月，Oliver把Webridge翻译blog调查的问卷发给他们，对他们的这个翻译项目进行了调查。这个项目以wiki形式进行，采用志愿者翻译模式，这一点和翻译blog相同。项目成员Yuming Xie执笔完成了这份答卷，并在2007年4月18日发给Oliver。大家可以在下面这个地址看到这份调查问卷。

《Webridge翻译Blog调查(社会企业英文翻译志愿项目-答卷)》
http://docs.google.com/Doc?id=ahnkcmfbz8b_236fdp68ghb

从”第一本书“（First Book）到”双子书“（Twinbooks)

“社会企业英文翻译志愿项目”的第一份成果是《第一书市商业计划》（FIRST BOOK MARKETPLACE BUSINESS PLAN
，翻译页面在此），原始资料来源于耶鲁管理学院（Yale School of Management）和高盛基金会（The Goldman Sachs Foundation）。这份全文32页的文档，详细地阐述了社会企业项目第一书市的运营情况。

多背一公斤在06年构思“双子书“项目，准备以社会企业的方式为乡村儿童提供免费图书，他们计划通过“在城市销售一本，就向乡村赠送一本”的模式，发展这个项目。在这个《双子书07年大事记》的网页中可以看到“双子书”项目的创建过程。

Oliver在《一月创想：中国克隆2.0》中提到了”译介传播国外的公益传播和社会公共事务活动“。案例翻译对有志于从事社会企业的朋友来帮助非常大，从最初的项目创想，到日常的项目运营，他山之石都有许多可资借鉴的地方。想必在”双子书”的项目创建过程，这份《第一书市商业计划》的参考价值一定很大。

- 一棵小树如何长成一片森林?

独木不成林，只有多年的持续性群体植树运动，才可能造出一片森林。然而，一棵小树的力量在荒漠中却不容忽视，它顽强的生命力展现了一种可能性，带给人们触动和感悟，感召人们行动起来加入植树运动中。

02:52 越危机，越坚强 …… 重新开始 » 刘润

- 那些不能把我们打倒的，都会使我们更加坚强

“四天，一百二十公里，每天只有几瓶水和一些黄瓜、西红柿、火烧。我们就这样在荒凉的戈壁滩上一路走下去，体会玄奘那般孤独行者的心境。每天晚上大家都要挑去脚上的水泡，用绷带扎进，不然第二天会针刺一样的痛。四天下来我走掉三个脚趾甲盖。有个女同学走丢了，被找回来后撕心裂肺的哭。十几辆车跟着，但是大部分人都不可思议的一步一步走完了全程。这绝对是一次对自己心智、毅力、勇气的极限挑战。”

清晨，喜来登早餐厅。透过落地玻璃，这个城市很安宁。安宁的有些阴霾，甚至有些死气沉沉。汪治在动情的说他2006年代表清华商学院参加“商学院戈壁徒步挑战赛”的故事。我惊呆了，真的，有时一天走的路不超过五十步的我，就这样被震撼了。他边说，敦煌的巨佛在我脑海中一点一点清晰起来，一遍一遍的召唤。突然之间，我不可遏止的、疯狂的渴望站在戈壁的土地上，就现在，不可阻挡！

半个小时前，我还在温暖的被窝里。毫无悬念，出差的生活，就是白天说话交流，晚上写信回信。昨晚又过了午夜，今天又要很早起床。我迷迷糊糊的接起电话，感谢了一下声音甜美的服务员准时打电话叫醒我，然后从舒适的“甜梦之床”上慢慢坐起，拿起一片维生素C泡腾片，丢进准备好在桌上的盛满矿泉水的水晶杯中。我老年痴呆一样的看着气泡消尽，不自觉的叹了口气，哎，然后一气喝完……

……一天，又，开始了。

我已经说不清楚我是盼望白天，还是渴望夜晚。酒店，在哪个城市并不重要，一样。十年如一日的忙碌，不会令人发狂，就会让人麻木。是啊，我加入微软第十个年头了。我表现的越来越激进和职业，也许正是因为少了最最最初的梦想，和那么一点点点激情？虽然十年来不断有人教育我，但是后知后觉的我总是最后才明白：事情永远是做不完的。我经常以思考的姿势发呆，用一片空白清洗各种甲乙丙丁的大人物小心眼、一二三四的大项目小算盘留下来的生意、生意、生意的残骸。Stephen Covey的声音适时当头棒喝般响起：BE PROACTIVE。于是，我用锋速5动力刮净胡子，从充电器上拔下所有装备，面朝未来，拿起镇定自若而充满能量的一面，戴在脸上，开始新的一天。

而，戈壁，从那一刻起，一直就没有停止召唤。我兴奋的和我身边的每一个人说。我于是知道，2009年，在受金融危机影响最深的一年，我一定会去一趟戈壁。越危机，越坚强，越要锻炼百折不挠的意志，让自己体会喝一口水的幸福，挤破满脚水泡的自豪，洗净浑身污泥的纯净，死一次而后生的荣耀……

……四天一百二十公里的涅槃……

……然后，重新开始。

注：有志同道合愿意五月同去者，请联系我。

	三月 2009
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31