23:27 Latency is Everywhere and it Costs You Sales - How to Crush it » High Scalability - Building bigger, faster, more reliable websites.

Update: Efficient data transfer through zero copy. Copying data kills. This excellent article explains the path data takes through the OS and how to reduce the number of copies to the big zero.

Latency matters. Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues per millisecond if their electronic trading platform is 5 milliseconds behind the competition.

The Amazon results were reported by Greg Linden in his presentation Make Data Useful. In one of Greg's slides Google VP Marissa Mayer, in reference to the Google results, is quoted as saying "Users really respond to speed." And everyone wants responsive users. Ka-ching! People hate waiting and they're repulsed by seemingly small delays.

The less interactive a site becomes the more likely users are to click away and do something else. Latency is the mother of interactivity. Though it's possible through various UI techniques to make pages subjectively feel faster, slow sites generally lead to higher customer defection rates, which lead to lower conversation rates, which results in lower sales. Yet for some reason latency isn't a topic talked a lot about for web apps. We talk a lot about about building high-capacity sites, but very little about how to build low-latency sites. We apparently do so at the expense of our immortal bottom line.

I wondered if latency went to zero if sales would be infinite? But alas, as Dan Pritchett says, Latency Exists, Cope!. So we can't hide the "latency problem" by appointing a Latency Czar to conduct a nice little war on latency. Instead, we need to learn how to minimize and manage latency. It turns out a lot of problems are better solved that way.

How do we recover that which is most meaningful--sales--and build low-latency systems?

read more

21:26 赛林格曼在TED上的演讲:快乐的生活、参与的生活、和有意义的生活 » 大学小容>善用网络,助益成长!

刚刚看到科学松鼠会的一篇关于积极心理学的译文,这篇译文实际上积极心理学创始师赛林格曼在TED大会上的演讲。

小容看有关积极心理学的书有一段时间了,所以对于赛林格曼可以说相当亲切。今天看到这个演讲还是激动人心。他在演讲中讲到了幸福的三种境界:

·第一种是快乐的生活(pleasant life),你在其中拥有尽可能多的积极情绪;

·第二种是参与的生活(life of engagement),当你工作、哺育、恋爱、休闲时,你觉得时间停止,亚里士多德说的就是这种生活(小容:这里讲的是福乐体验Flow,或译涌流体验。);

·第三种是有意义的生活(meaningful life)

他在演讲的结尾也谈到了对于技术、娱乐和设计的看法。

在我看来,心理学所面临的问题,和技术、娱乐、以及设计所面临的问题是相似的。我们都知道,技术、娱乐、和设计可以用作毁灭性目的。我们还知道,三者能用来减轻痛苦。顺便说一句,减轻痛苦和建立幸福之间的关系至关重要。三十年前,我刚做治疗师,那时候,我觉得只要能让病人不再抑郁、不再焦虑、不再愤怒,那么,我就能使他们幸福。事实并非如此,最佳疗效是让病人归零,他们的内心变得空虚。我后来发现,获得幸福生活的技巧,也就是获得快乐、参与、和意义的技巧,与减轻病患痛苦的技巧并不相同。我相信,技术、娱乐、和设计方面也是一样。人类社会的这三种动力能用来增加幸福感,增加积极情绪,人们也确实在这样使用它们。然而,如果各位像我这样分析幸福,不仅仅着眼于快乐,同时也看到涌动体验和意义感,那么设计、技术、和娱乐也能用来增强生活中的这两个方面。

总而言之,我们应当保持乐观的第十一个理由,就是我相信,设计、技术和娱乐能增强全世界人类的幸福感。如果技术能在未来的十到二十年内能帮助人们获得快乐的生活、参与的生活、和有意义的生活,那么,它就是足够好的。如果娱乐也能起到这三个作用,那么,它就是足够好的。如果设计能起到这三个作用,那么,它就是足够好的。谢谢各位。

当我们来思考网络产品设计的时候,这三个层次的思考非常有意义。许多人觉得大陆网络是娱乐导向的,开发网站只有迎合受众的趣味,尽量娱乐化,才有可能取得成就。从赛林格曼的角度来看的话,如果娱乐不能帮助人们达到快乐的、参与的和有意义的生活,那么它就不是足够好的。小容相信许多大陆以娱乐导向为主的网站,增强人们幸福感的效用微乎其微,它们还远远不够好,它们离真正的娱乐还有很大的距离。

相关链接:

这片演讲的英文演讲视频:Martin Seligman: What positive psychology can help you become

将近24分钟,很棒的资源。

TED网站:http://www.ted.com, 你应该可以在这里找到更多的精彩的演讲视频。

科学松鼠会上的演讲稿中文翻译:
(译)积极的心理学,积极的人生

译言上的TED演讲录小组

TED是Technology, Entertainment, Design (科技、娱乐、设计)的缩写,这个会议的宗旨是“用思想的力量来改变世界”。它于1984年由理查德·温曼和哈里·马克思共同创办,从1990年开始每年在美国加州的蒙特利举办一次,而如今,在世界的其他城市也会每半年举办一次。 TED会聚一群卓越的人,相互交流,并产生难以估量的价值。会议的演讲内容宽泛,涵盖科学、艺术、政治、全球性问题、建筑、音乐等等。到目前为止,已经有包括美国前总统克林顿、维基百科创始人詹姆斯·华森、google创办人等等社会各行各业有影响力的人物到场演讲。

TED演讲录小组的目的是要将www.TED.com网站上的英文演讲视频记录并翻译成中文文本,让有价值的演讲影响更多的中国人。

这个小组现在有14篇译文了。感谢TONY的翻译,小容相信他现在的幸福感一定很高:)

迟到的新书:Waiting for Your Cat to Bark ?

小容在这篇贴子里提到了赛林格曼创建的专业服务公司:Reflective Happiness, LLC的公司。

创建了积极心理学的赛林格曼(Martin E.P. Seligman)创立了一家叫做Reflective Happiness, LLC的公司,推广各类积极心理学的产品,例如书籍、培训、测试、在线测试等等各类服务。


《“心经济”:基于心理学知识的创业之路》PPT

这是小容在07年去广州参加心理商业机构年会时发表的演讲,虽然对象是针对心理学相关人士,然而,其中所提到的方法论却可适用于各行各业。

Slideshare上观看或者引用到你自己的blog。

下载原始PPT:http://www.swordi.com/download/psybusiness2007.ppt

14:02 豆瓣API:如果你还在用眼睛和手指上豆瓣 » 豆瓣blog

现在,我有一个视觉屏幕的篇幅来为你介绍一下 豆瓣API,所以我们可以慢慢来。API的全称,Application Programming Interface,应用程序接口,于是顾名思义是豆瓣为第三方开发人员提供的编程接口……如果你愿意接着听,我可以写满三个视觉屏幕来解释关于API的一切。不过即使你不会厌烦,恐怕我也会。所以呢,还是让那些关心API的Geeker们自己去浏览那些冗长的文档吧。至于你呢,让我来给你推荐些好东西,因为我知道,你是属于那不多数的还在用眼睛和手指上豆瓣的人们之一。

作为手指大军的一份子,也许你和很多同类一样,有刷豆瓣广播强迫症。只要你面对屏幕,每隔3-5分钟,你的手就会不由自主地点击刷新按钮。如果是这样,我推荐你尝试一下豆瓣的广播客户端—— 豆花。通过这个客户端,你就可以随时接收豆瓣广播的更新内容,从而避免你手指的劳损。虽然整个客户端还很简陋,不过你可以没事去 作者的blog看是不是有更新的消息。

另外,你是不是也会觉得在水平的屏幕上看着自己的友邻排成横七竖八的方阵,这实在是无趣得紧。这样的你,可以试试豆友地图
,按地理位置排列你的友邻会不会有趣一些?而你还可以参加这个活动来秀一下自己的友邻分布。(右边那位住在海里的龟仙人是谁……= _=|||)

����

当然,虽然你还用着眼睛和手指,但也许你已经厌恶真实了。地理分布?多么无聊,网络上谁会关心这些。好吧,好吧。那么也许你会喜欢这个— 友邻关系浏览器,纯粹的人际关系网络展示,用过wallop的同学们应该熟悉吧,不过这个弹性变更有质感(弹,弹,弹~)。

douban���

什么?你很复古,但是你没法习惯键盘鼠标这种东东。那么触摸设备可以嘛?你可以试试 iphone上的iDouban——另外iDouban小组在此。知道吗,我很喜欢iphone最初的重力感知设计,虽然它直到第五代才可以卷起来,戴在手腕上。

iphone��

最后其实还有很多奇奇怪怪的东西,比如如果你数不清帖子的楼数可以试试 这个;如果你是键盘狂人,对鼠标无爱,可以试试 这个;如果你嫌弃相册的缩略图、日记的文字太小,你可以尝试 这个;如果你在用Google Reader,那么 这个可能是个好东西。貌似我的篇幅限制要到了,那么其实在这里– 豆瓣API小组和这里– 豆瓣插件小组,还有很多类似的东西可能适合你这个鼠标键盘控。自己去找找吧。

啊,有人在提醒我,这个篇幅是用来讲豆瓣API的。那么事实上呢,上面的一切,都是广大豆瓣用户利用豆瓣提供的API完成的。因此豆瓣也许并不能实现每个用户的每个愿望和需求,但API为广大豆瓣用户提供了一种可能性,使他们可以以自己希望的方式使用豆瓣的数据和功能,创造自己的应用。好了,如果你已经明白了什么是API,那么不妨读读上面提到的文档,然后自己动手吧;如果你真的搞不懂,哪呢……你就摇旗呐喊以及坐享其成好了。

最后,我们来讨论一点严肃的问题。知道嘛,在这个时代里,像你这样还在用手指和眼睛上豆瓣的用户已经越来越少了。其实,只需要一根连线,然后闭上眼睛,放松身体,把意识接进Cyber Space。你认识到豆瓣的路径吧?你不认识,没关系,把视角拉远到整个Cyber,你很容易发现带着蓝色G标志的银白色巨型金字塔。在那里询问一下。接着,无论你的中途是怎么样的,最后你会到达那个淡绿色的Dome拱顶,在它的左边,穿过一系列半透明的悬浮绒状物,无岔路的连续5次路由中转。恩,豆瓣欢迎你。(在此感谢一下 Su27同学在千叶的中继器上的脆弱大素数密码,这让我得到了他在连线上的思维投射截图^_^)

last

咦?你说什么?不知道连线怎么用?插到左耳后面的插孔里呀。啊?没有插孔?……拜托,你该不是一台思考战车吧……赶紧去义体化,同学,什么时代了……

12:42 MapReduce framework Disco » High Scalability - Building bigger, faster, more reliable websites.

Disco is an open-source implementation of the MapReduce framework for distributed computing. It was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. The Disco core is written in Erlang. The MapReduce jobs in Disco are natively described as Python programs, which makes it possible to express complex algorithmic and data processing tasks often only in tens of lines of code.

08:42 What CDN would you recommend? » High Scalability - Building bigger, faster, more reliable websites.

Update 5: When It Comes To Content Delivery Networks, What Is The "Edge"?. Dan Rayburn is on edge about the misuse of the term edge: closest location to the user does not guarantee quality, often content is not delivered from the closest location, all content is not replicated at every "edge" location. Lots of other essential information.
Update 4: David Cancel runs a great test to see if you should be Using Amazon S3 as a CDN?. Conclusion: "CacheFly performed the best but only slightly better than EdgeCast. The S3 option was the worst with the Nginx/DIY option performing just over 100 ms faster." Also take look at Part 2 - Cacheability?
Update 3: Mr. Rayburn takes A Detailed Look At Akamai's Application Delivery Product . They create a "bi-nodal overlay network" where users and servers are always within 5 to 10 milliseconds of each other. Your data center hosted app can't compete. The problem is that people (that is, me) can understand the data center model. I don't yet understand how applications as a CDN will work.
Update 2: Dan Rayburn starts an interesting series of articles on Highlights Of My Day In Cambridge With Akamai. Akamai is moving strong into the application distribution business. That would make an interesting cloud alternative..
Update: Streamingmedia links to new CDN DF Splash that specializes in instant-on TV-quality video streaming.

A question was raised on the forum asking for a CDN recommendation. As usual there are no definitive answers, but here are three useful articles that may help your deliberations.

  • First, Tony Chang shows how to drive down response times using edge acceleration strategies.
  • Then Pingdom gives a nice overview and introduction to CDNs.
  • And last but not least, Dan Rayburn from StreamingMedia.com gives a master class in how much you should pay for your CDN, what you should be getting for your money, and how to find the right provider for your needs.

    Lots and lots of good stuff to learn, even if you didn't roll out of bed this morning pondering the deeper mysteries of content delivery networks and the Canadian dollar.

    read more

  • 07:04 SMACKDOWN :: Who are the Open Source Content Management System (CMS) market leaders in 2008? » High Scalability - Building bigger, faster, more reliable websites.

    I came across an interesting study about who are the leaders in open source content management systems market in the year of 2008.

    The study was just released to the public and it was conducted by Ric Sheves from Water & Stone web development company.

    At 50 pages, there is a significant amount of data in this study that should be of use to developers or to anyone who is looking to commit to a web publishing system (also known as a Content Management System).

    Read the entire article about who the open source content management systems market leader is for 2008 at MyTestBox.com - web software reviews, news, tips & tricks.

    05:09 ANALYZE: MyISAM vs Innodb » MySQL Performance Blog

    Following up on my Previous Post I decided to do little test to see how accurate stats we can get for for Index Stats created by ANALYZE TABLE for MyISAM and Innodb.

    But before we go into that I wanted to highlight about using ANALYZE TABLE in production as some people seems to be thinking I advice to use it.... a lot. In fact I should say I see more systems which have ANALYZE abused - run too frequently without much need than systems which do not run ANALYZE frequently enough.

    First it is worth to note MySQL only saves very basic cardinality information for index prefixes for index stats and these rarely change. There is no histograms or any other skew metrics etc. MySQL optimizer also uses number of rows in the table for many decisions but this is computed live (maintained for MyISAM and estimated during query execution for Innodb). This basic information means it does not change whole that quickly at extent to affect optimizer plans.

    If you look at the stats accuracy along running ANALYZE TABLE after initial table population and when there are significant changes makes sense. For Innodb as index stats are computed first time table is accessed after restart this often means "never" because MySQL servers are restarted frequently enough. Even once per 3 months is often enough for many workloads. Add to this Innodb stats are less accurate by nature which means you can allow more data change while your
    index stats remain as good as new.

    Looking at stats accuracy is however a wrong way to look at the problem. Your index stats are a bit off, so what ? What really matters is not how accurate stats are but how good plans you're getting for your queries. If you're getting as good plans as with perfect stats why bother updating them ?
    Also note many simple "queries" (using constants for index accesses) will not use index cardinality data at all but will estimate number of rows during query execution.

    I typically look at ANALYZE TABLE and adding it to the table if I see having it run helps to get good plans. If query plans are good or bad independently of it being run there is need to bother - for bad plans use FORCE INDEX or change the query and report MySQL Optimizer Bug :)

    But now lets see in the difference of behavior of ANALYZE TABLE for MyISAM vs Innodb.

    I used the following simple table for tests:

    SQL:
    1. CREATE TABLE `antest` (
    2.   `i` int(10) UNSIGNED NOT NULL,
    3.   `c` char(80) DEFAULT NULL,
    4.   KEY `i` (`i`),
    5.   KEY `c` (`c`,`i`)
    6. ) ENGINE=MyISAM DEFAULT CHARSET=latin1

    I have populated it with data with following true cardinality:

    SQL:
    1. mysql> SELECT count(DISTINCT c) FROM antest;
    2. +-------------------+
    3. | count(DISTINCT c) |
    4. +-------------------+
    5. |               101 |
    6. +-------------------+
    7. 1 row IN SET (0.36 sec)
    8.  
    9. mysql> SELECT count(DISTINCT i) FROM antest;
    10. +-------------------+
    11. | count(DISTINCT i) |
    12. +-------------------+
    13. |               101 |
    14. +-------------------+
    15. 1 row IN SET (0.20 sec)
    16.  
    17. mysql> SELECT count(DISTINCT i,c) FROM antest;
    18. +---------------------+
    19. | count(DISTINCT i,c) |
    20. +---------------------+
    21. |               10201 |
    22. +---------------------+
    23. 1 row IN SET (0.43 sec)

    Lets see how stats look for MYISAM:

    SQL:
    1. mysql> SHOW INDEX FROM antest;
    2. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    3. | TABLE  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    4. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    5. | antest |          1 | i        |            1 | i           | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
    6. | antest |          1 | c        |            1 | c           | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |
    7. | antest |          1 | c        |            2 | i           | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
    8. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    9. 3 rows IN SET (0.00 sec)

    Aha as you can see there is no cardinality stored with table as ANALYZE did not run yet.

    SQL:
    1. mysql> SHOW INDEX FROM antest;
    2. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    3. | TABLE  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    4. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    5. | antest |          1 | i        |            1 | i           | A         |         101 |     NULL | NULL   |      | BTREE      |         |
    6. | antest |          1 | c        |            1 | c           | A         |         101 |     NULL | NULL   | YES  | BTREE      |         |
    7. | antest |          1 | c        |            2 | i           | A         |       10240 |     NULL | NULL   |      | BTREE      |         |
    8. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    9. 3 rows IN SET (0.01 sec)

    As you can see after running ANALYZE we have exact cardinality for i and c columns, with cardinality for the pair (c,i) looks a bit off but is within 0.5% of the correct value so we can count on MyISAM values as almost exact.

    As you see ANALYZE table tool a little bit of time to run (even for this very small table) this is because ANALYZE does index scans to find number of exact values in the table.

    Now let us populate antest_innodb table which is same but uses Innodb format:

    SQL:
    1. mysql> INSERT INTO antest_innodb SELECT  * FROM antest;
    2. Query OK, 245760 rows affected (54.29 sec)
    3. Records: 245760  Duplicates: 0  Warnings: 0
    4.  
    5. mysql> SHOW INDEX FROM antest_innodb;
    6. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    7. | TABLE         | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    8. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    9. | antest_innodb |          1 | i        |            1 | i           | A         |      245900 |     NULL | NULL   |      | BTREE      |         |
    10. | antest_innodb |          1 | c        |            1 | c           | A         |      245900 |     NULL | NULL   | YES  | BTREE      |         |
    11. | antest_innodb |          1 | c        |            2 | i           | A         |      245900 |     NULL | NULL   |      | BTREE      |         |
    12. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    13. 3 rows IN SET (0.00 sec)

    Very interesting result - after loading the data with INSERT in Innodb table we do not get NULL cardinality as with MyISAM but instead we get very wrong cardinality which shows us index prefix is unique (245900 is estimate for the row count in the table)

    It is worth to note if you do ALTER TABLE Innodb, same as MyISAM will internally run analyze as soon as table is rebuilt and values will be more sensible:

    SQL:
    1. mysql> ALTER TABLE antest_innodb type=innodb;
    2. Query OK, 245760 rows affected, 1 warning (51.87 sec)
    3. Records: 245760  Duplicates: 0  Warnings: 0
    4.  
    5. mysql> SHOW INDEX FROM antest_innodb;
    6. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    7. | TABLE         | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    8. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    9. | antest_innodb |          1 | i        |            1 | i           | A         |         332 |     NULL | NULL   |      | BTREE      |         |
    10. | antest_innodb |          1 | c        |            1 | c           | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |
    11. | antest_innodb |          1 | c        |            2 | i           | A         |       20491 |     NULL | NULL   |      | BTREE      |         |
    12. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    13. 3 rows IN SET (0.00 sec)

    Note however how much are these values off from reality. The "i" key cardinality is overestimated 3 times, "c" key prefix cardinality is underestimated 5 times and the combined (c,i) key cardinality is overestimated 2 times. So Innodb stats are are very inexact. Fortunately for most queries which use these stats accuracy at the order of magnitude is enough. Sometimes it is not and you're thinking why a hell it could be picking this strange plan.

    Let us run ANALYZE TABLE for Innodb couple of more times to see how values change:

    SQL:
    1. mysql> analyze TABLE antest_innodb;
    2. +--------------------+---------+----------+----------+
    3. | TABLE              | Op      | Msg_type | Msg_text |
    4. +--------------------+---------+----------+----------+
    5. | test.antest_innodb | analyze | STATUS   | OK       |
    6. +--------------------+---------+----------+----------+
    7. 1 row IN SET (0.00 sec)
    8.  
    9. mysql> SHOW INDEX FROM antest_innodb;
    10. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    11. | TABLE         | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    12. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    13. | antest_innodb |          1 | i        |            1 | i           | A         |         338 |     NULL | NULL   |      | BTREE      |         |
    14. | antest_innodb |          1 | c        |            1 | c           | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |
    15. | antest_innodb |          1 | c        |            2 | i           | A         |       20491 |     NULL | NULL   |      | BTREE      |         |
    16. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    17. 3 rows IN SET (0.00 sec)
    18.  
    19. mysql> analyze TABLE antest_innodb;
    20. +--------------------+---------+----------+----------+
    21. | TABLE              | Op      | Msg_type | Msg_text |
    22. +--------------------+---------+----------+----------+
    23. | test.antest_innodb | analyze | STATUS   | OK       |
    24. +--------------------+---------+----------+----------+
    25. 1 row IN SET (0.00 sec)
    26.  
    27.  
    28. mysql> SHOW INDEX FROM antest_innodb;
    29. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    30. | TABLE         | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    31. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    32. | antest_innodb |          1 | i        |            1 | i           | A         |          92 |     NULL | NULL   |      | BTREE      |         |
    33. | antest_innodb |          1 | c        |            1 | c           | A         |         384 |     NULL | NULL   | YES  | BTREE      |         |
    34. | antest_innodb |          1 | c        |            2 | i           | A         |       20491 |     NULL | NULL   |      | BTREE      |         |
    35. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    36. 3 rows IN SET (0.00 sec)

    As we see subsequent runs change stats dramatically. For c prefix we got value changed to become 15 times larger. So Innodb stats are both inexact and unstable. So restarting server with Innodb may change stats dramatically and affect some query plans. You also may be getting different plans on different slaves with same data.

    Another difference when it comes from handling the statistics comes from NULL handling.
    MyISAM has a special variable which controls if NULLs should be considered equal when computing stats:

    SQL:
    1. mysql> SHOW VARIABLES LIKE "myisam_stats_method";
    2. +---------------------+---------------+
    3. | Variable_name       | Value         |
    4. +---------------------+---------------+
    5. | myisam_stats_method | nulls_unequal |
    6. +---------------------+---------------+
    7. 1 row IN SET (0.00 sec)

    Too see the difference let me set column "c" to NULL in both tables and see how values change:

    SQL:
    1. mysql> UPDATE antest SET c=NULL;
    2. Query OK, 245760 rows affected (11.48 sec)
    3. Rows matched: 245760  Changed: 245760  Warnings: 0
    4.  
    5. mysql> UPDATE antest_innodb SET c=NULL;
    6. Query OK, 245760 rows affected (1 min 20.19 sec)
    7. Rows matched: 245760  Changed: 245760  Warnings: 0
    8.  
    9.  
    10. mysql> SHOW INDEX FROM antest;
    11. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    12. | TABLE  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    13. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    14. | antest |          1 | i        |            1 | i           | A         |         101 |     NULL | NULL   |      | BTREE      |         |
    15. | antest |          1 | c        |            1 | c           | A         |      245760 |     NULL | NULL   | YES  | BTREE      |         |
    16. | antest |          1 | c        |            2 | i           | A         |      245760 |     NULL | NULL   |      | BTREE      |         |
    17. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    18. 3 rows IN SET (0.00 sec)
    19.  
    20. mysql> analyze TABLE antest_innodb;
    21. +--------------------+---------+----------+----------+
    22. | TABLE              | Op      | Msg_type | Msg_text |
    23. +--------------------+---------+----------+----------+
    24. | test.antest_innodb | analyze | STATUS   | OK       |
    25. +--------------------+---------+----------+----------+
    26. 1 row IN SET (0.01 sec)
    27.  
    28. mysql> SHOW INDEX FROM antest_innodb;
    29. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    30. | TABLE         | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    31. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    32. | antest_innodb |          1 | i        |            1 | i           | A         |         418 |     NULL | NULL   |      | BTREE      |         |
    33. | antest_innodb |          1 | c        |            1 | c           | A         |           8 |     NULL | NULL   | YES  | BTREE      |         |
    34. | antest_innodb |          1 | c        |            2 | i           | A         |         196 |     NULL | NULL   |      | BTREE      |         |
    35. +---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    36. 3 rows IN SET (0.00 sec)

    As you can see MyISAM set cardinality for prefix (c) and key(c,i) approximately to number of rows in the table treating all nulls different values. Innodb on the contrary treats all NULL values the same so
    cardinality for (c) and (c,i) dropped significantly.

    This means Innodb and MyISAM have different stats computation method by default.

    Lets check how stats change for MyISAM if we change the stats computation method:

    SQL:
    1. mysql> SET myisam_stats_method='nulls_equal';
    2. Query OK, 0 rows affected (0.00 sec)
    3.  
    4. mysql> analyze TABLE antest;
    5. +-------------+---------+----------+-----------------------------+
    6. | TABLE       | Op      | Msg_type | Msg_text                    |
    7. +-------------+---------+----------+-----------------------------+
    8. | test.antest | analyze | STATUS   | TABLE IS already up TO date |
    9. +-------------+---------+----------+-----------------------------+
    10. 1 row IN SET (0.00 sec)

    oops. Little gotcha. MySQL considers table up to date even though stats stored were computed with different method. If your table is written to actively you should not have this problem; I just did couple of updates to refresh update time.

    SQL:
    1. mysql> SHOW INDEX FROM antest;
    2. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    3. | TABLE  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | NULL | Index_type | Comment |
    4. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    5. | antest |          1 | i        |            1 | i           | A         |         101 |     NULL | NULL   |      | BTREE      |         |
    6. | antest |          1 | c        |            1 | c           | A         |           1 |     NULL | NULL   | YES  | BTREE      |         |
    7. | antest |          1 | c        |            2 | i           | A         |         101 |     NULL | NULL   |      | BTREE      |         |
    8. +--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    9. 3 rows IN SET (0.00 sec)

    So with nulls_equal method we see very different picture. It is considered we only have one distinct value for "c" and there are 101 distict values for (c,i) which is the same as value of distinct values in i column. These stats look much closer to what we get for Innodb table with same data though we can see Innodb stats are a bit off from reality too.

    MySQL version note: This is from MySQL 5.0.62 if there are other versions which show different behavior.


    Entry posted by peter | 3 comments

    Add to: delicious | digg | reddit | netscape | Google Bookmarks

    00:36 “能源草”不是救命稻草 » 中外对话新鲜出炉

    生物质能“能源草”的开发既不能解决气候问题,也不助缓解能源危机。蒋高明呼吁,应避免对“能源草”夸大其词的宣传,并以科学的态度对待此问题。

    能源危机给人类社会的可持续发展带来严重威胁,过度燃烧化石能源造成的温室效应,更引起了世界各国科学家、政府的高度关注,于是,人们开始将目光投向生物质能,期待借助它来破解能源危机。在开发生物质能的热潮中,一种叫做“能源草”的植物最近成了明星,屡屡见诸报端,几乎成了解决能源危机的“救命草”,但实际上却有着很大的炒作成份在里头。“能源草”既不可能像传说中的那么神奇,也算不上有百利而无一害。 

    这还要从生物质能的本质说起。生物质能是人类最早利用的能源之一,它其实是植物通过光合作用固定的太阳能,这部分能量以碳水化合物的形式最初由绿色植物固定。广义的生物质能应包括植物、动物、微生物体内含有的生物化学能,从这个角度来看,煤炭、石油、天然气就是地质历史时期的生物质能。然而,我们今天理解的生物质能,一般是指植物最近一两年固定的碳水化合物,即可以直接燃烧的生化产物或动植物残体,如油料、秸秆、木屑、树皮、枝条、藻类、人和动物粪便等。如果不特殊界定,生物质能是以植物材料为主的。 

    只要阳光照耀大地,绿色植物就能够进行光合作用,固定太阳能,这就是人们希冀的生物质能,地球上约50多万种植物都具备这个基本功能。然而,从利用的角度来看,只有那些光合速率快、叶面积指数高、生物量足够大,且容易收集运输的植物才是理想的能源植物。“能源草”可能就是这个概念的最初来源。 

    植物进行光合作用的途径有C3、C4和CAM 三种,其中以C4植物光合效率为高,包括甘蔗、玉米、高粱等。就生物量而言,在自然生物群落中,最高的是热带雨林,达35吨/公顷/年,即每年每亩产生2.33吨干物质。但在人工条件下(大肥、大水、高密度),植物生产力还可提高。山东农业大学创造的最高记录为,玉米和小麦地上部年生物量4.4吨/亩(或66吨/公顷)。可见,即使有“能源草”这样的植物,也必须具备这样几个基本条件:高光效的C4途径,栽培过程中使用大化肥、大水分,并保持相当高的种植密度和强度。 

    让我们用上面的标准来衡量一下“能源草”。有媒体报道,福建农林大学培育的“能源草”,属用来培育菌类的禾科(应为禾本科)植物,为菌草的一种,称作“巨菌草”。每亩“能源草”产量约7吨,发电量相当于3~4吨标准煤。 

    在植物分类系统中,根本不存在叫“菌草”的植物。从报道的材料看,“能源草”疑似为一种高光效的C4植物接种了某种真菌,促进了根系对养分的吸收。自然界中,生物与生物之间存在各种关系,其中以竞争和共生关系比较普遍。豆科植物的根瘤菌就是固氮菌与豆科植物共生的典型例子。禾本科植物与其它菌类在自然状态下共生的例子较少,但可通过人工接种实现两者的共生。接种菌根后,植物增强的是对养分的吸收,但不能改变光合作用途径。根据目前最高的生物量(4.4吨/亩)记录,“能源草”每亩能够产生7吨的生物量是值得怀疑的。也许在热带环境下,连续种植三季,且保证大量的化肥和机械投入,勉强能够达到。但是,要实现上述目标,投入的化学能必须足够大,并耗费大量的人力,这就背离了以“能源草”替代煤炭的初衷。 

    有人介绍,相对于农作物,“能源草”的优势是其可再生性,且生长周期短,可反复收割,更重要的是它可以在荒地上生长,不会挤占耕地。实际上,“能源草”面临的问题和作物秸秆一样,具有分散,密度小,收获成本高的弱点。如果没有路、电、水、肥、机械等基本条件,在“荒地”上搞“能源草”,上面宣传的7吨产量则难以实现。其实,所谓“荒地”是那些能够生长自然植被的地方,是生物多样性分布的重要场所。目前,我国自然生态系统面临全面退化危险,我们不能为眼前的经济利益而牺牲更多的生态环境。 

    相对于上面的“能源草”夸大宣传而言,中国农业大学专家提供的“能源草”清单则相对科学一些。他们心目中的“能源草”,为一两年生或多年生草本或半灌木,包括甜高粱、柳枝稷、芒属植物等高大植物,具有耐旱、耐盐碱、耐瘠薄、适应性强的特点,可在干旱、半干旱地区、低洼易涝和盐碱地区、土壤贫瘠山区和半山区种植。在他们的“鼓动”下,北京市已着手在京郊大兴、昌平等区县废弃沙荒地上种植,用来生产乙醇等生物质燃料。尽管如此,正如上面分析的那样,“荒地”其实不荒,在干旱、半干旱区种植受环境条件限制,实际效果可能不会像专家期望的那样理想。 

    尽管“能源草”可在一定程度上缓解能源危机,但其存在的缺陷也应引起高度关注。企业大规模生产能源草,仍可能挤占耕地,开荒则破坏生物多样性,造成自然生态系统退化。另外,相比种植经济效益低下的农作物,如果种“能源草”有利可图,农民就不可避免地争相种植,从而发生能源植物与粮争地的局面,对国家的粮食安全造成影响。 

    更何况,既然“能源草”能利用,为何不直接去利用国家每年约7亿吨的作物秸秆呢?这些秸秆在田间地头被农民直接焚烧,造成极大的能源浪费和环境污染。舍弃现成的秸秆不用,再去搞种草,显然是“逐本求末”。另外,认为“荒地”就可以随心所欲地开垦更是误区。至于那些不切实际的宣传更应当制止。为此,我们呼吁,在“能源草”问题上,国家有关部门一定要慎重,避免夸大其词的宣传,以科学的态度对待“能源草”。

    蒋高明,中国科学院植物研究所首席研究员、博士生导师,中国生态学会副秘书长、中国环境文化促进会理事。他提出的"城市植被"概念和"以自然力恢复中国退化生态系统"等观点得到社会各界广泛认可。

     

     

     


    ^==Back Home: www.chedong.com

    ^==Back Digest Home: www.chedong.com/digest/

    <== 2008-09-03
      九月 2008  
    1 2 3 4 5 6 7
    8 9 10 11 12 13 14
    15 16 17 18 19 20 21
    22 23 24 25 26 27 28
    29 30          
    ==> 2008-09-05