Twitter系统运维经验Tim[后端技术] | 08 Feb 2010

09:39 Upcoming Google Analytics Workshop » Google Analytics Blog

Feras Alhlou, president and principal consultant of E-Nor, a Google Analytics Authorized Consultant, will be conducting a Google Analytics workshop at SMX West on March 5, called "Using Google Analytics to Improve Your Online Marketing & Business." Register here and get 10% off with this discount code: GA@SMX

Many organizations, large and small, struggle to go beyond basic web metrics of visitor counts and pageview volumes. In this workshop, senior consultant Feras Alhlou will walk you through what you need in order to drive your online marketing strategy ahead of your competition. I've had the pleasure of sitting in on Feras' workshops in the past, and it was the one of the main reasons we were so proud when E-Nor in the Bay Area joined our Google Analytics Authorized Consultant network. His clear, engaging and energetic teaching style will make the day fly by as you learn. It's time well spent.

Workshop Agenda

Morning Session – Marketer/Business Focus – Strategy & Planning

Web Analytics Strategy – approach, opportunities and limitations
How It Works – overview, accuracy and privacy implications, integrating with other data
Practical – understanding the user interface
Advanced Features Overview – clever stuff you can do with Google Analytics

Afternoon Session – Webmaster/Technical Focus – Implementation

Accounts & Profiles, Filters & Goals – structure your data properly
External Campaign Tracking – measure performance of search, email, banner campaigns
Reporting – dashboards & insights
Advanced Segmentation & Custom Reports – powerful ways to find insights

The Google Analytics team will also be at SMX West, and we'll blog about that in the near future. Hope to see you there.

Posted by Jeff Gillis, Google Analytics Team

Twitter系统运维经验Tim[后端技术] » 车东's shared items in Google Reader

Shared by Peter.Liu
Twitter做了一个”Seppaku” patch, 就是将Daemon在完成了n个requests之后主动kill掉，以保持健康的low memory状态，这种做法据了解国内也有不少公司是这样做。

最近看到的另外一个介绍Twitter技术的视频[Slides] [Video (GFWed)]，这是Twitter的John Adams在Velocity 2009的一个演讲，主要介绍了Twitter在系统运维方面一些经验。本文大部分整理的观点都在Twitter(@xmpp)上发过，这里全部整理出来并补充完整。

Twitter没有自己的硬件，都是由NTTA来提供，同时NTTA负责硬件相关的网络、带宽、负载均衡等业务，Twitter operations team只关注核心的业务，包括Performance，Availability，Capacity Planning容量规划，配置管理等，这个可能跟国内一般的互联网公司有所区别。

1. 运维经验

* Metrics

Twitter的监控后台几乎都是图表(critical metrics)，类似驾驶室的转速表，时速表，让操作者可以迅速的了解系统当前的运作状态。联想到我们做的类似监控后台，数据很多，但往往还需要浏览者做二次分析判断，像这样满屏都是图表的方法做得还不够，可以学习下这方面经验。据John介绍可以从图表上看到系统的瓶颈-系统最弱的环节(web, mq, cache, db?)
根据图表可以科学的制定系统容量规划，而不是事后救火。 Twitter operation dashboard

* 配置管理

每个系统都需要一个自动配置管理系统，越早越好，这条一整理发到Twitter上去之后引起很多回应。

* Darkmode

配置界面可以enable/disable 高计算消耗或高I/O的功能，也相当于优雅降级，系统压力过大时取消一些非核心但消耗资源大的功能。

* 进程管理

Twitter做了一个”Seppaku” patch, 就是将Daemon在完成了n个requests之后主动kill掉，以保持健康的low memory状态，这种做法据了解国内也有不少公司是这样做。

* 硬件

Twitter将CPU由AMD换成Xeon之后，获得30%性能提升，将CPU由双核/4核换成8核之后，减少了40%的CPU, 不过John也说，这种升级不适合自己购买硬件的公司。

2. 代码协同经验

* Review制度

Twitter有上百个模块，如果没有一个好的制度，容易引起代码修改冲突，并把问题带给最终用户。所以Twitter有一强制的source code review制度, 如果提交的代码的svn comment没有”reviewed by xxx”, 则pre-commit脚本会让提交失败, review过的代码提交后会通过自动配置管理系统应用到上百台服务器上。有@xiaomics同学在Twitter上马上就问，时间成本能否接受？如果有紧急功能怎么办？个人认为紧急修改时有两人在场，一人修改一人review也不是什么难事。

* 部署管理

从部署图表可以看到每个发布版本的CPU及latency变化，如果某个新版本latency图表有明显的向上跳跃，则说明该发布版本存在问题。另外在监控首页列出各个模块最后deploy版本的时间，可以清楚的看到代码库的现状。

* 团队沟通

Campfire来协同工作，campfire有点像群，但是更适合协同工作。对于Campfire就不做更多介绍，可参考Campfire官方说明。

3. cache

Memcache key hash, 使用FNV hash 代替 MD5 hash，因为FNV更快。
开发了Cache Money plugin(Ruby), 给应用程序提供read-through, write-through cache, 就像一个db访问的钩子，当读写数据库的时候会自动更新cache, 避免了繁琐的cache更新代码。
“Evictions make the cache unreliable for important configuration data”，Twitter使用memcache的一条经验是，不同类型的数据需放在不同的mc,避免eviction，跟作者前文Memcached数据被踢(evictions>0)现象分析中的一些经验一致。
Memcached SEGVs, Memcached崩溃(cold cache problem)据称会给这种高度依赖Cache的Web 2.0系统带来灾难，不知道Twitter具体怎么解决。
在Web层Twitter使用了Varnish作为反向代理，并对其评价较高。

	二月 2010
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28