16:19 Using Multiple Key Caches for MyISAM Scalability » MySQL Performance Blog

I have written before - MyISAM Does Not Scale, or it does quite well - two main things stopping you is table locks and global mutex on the KeyCache.

Table Locks are not the issue for Read Only workload and write intensive workloads can be dealt with by using with many tables but Key Cache Mutex will still hunt you. If you aware of MySQL history you may think Key Cache scalability was fixed with new Key Cache in MySQL 4.1, and indeed previously it did not even scale with one CPU as global lock was held during IO duration, In MySQL 4.1 the lock is held only when key block (1KB to 4KB) is being copied from Key Cache to thread local buffer, which is terrible contention spot in particular on systems with many CPU cores.

Happily there is solution, or at least half of it.

If you have chosen a way of using Multiple Tables to solve Table Locks problem you can also use multiple Key Caches to reduce or virtually eliminate key cache contention. Too bad you can only map single table to single key cache - it would be so much more helpful if you could use multiple key caches for the same table, for example caching even/odd key blocks or something similar, or actually just keep hash of locks instead of one.

When you decide to use Multiple Key Caches the question is how many to use, what sizes to allocate and how to map tables to them. One simple solution I use - create separate key cache for all actively accessed tables (assuming there are only few of them), allocating key_cache proportional to their size and load, but no more than the index size (assuming table sizes are relatively static)

To get accurate information about table usage I will use Percona Patches:

SQL:
  1. mysql> SELECT * FROM information_schema.table_statistics WHERE table_schema='test' AND table_name IN ('a','b');
  2. +--------------+------------+-----------+--------------+------------------------+
  3. | TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
  4. +--------------+------------+-----------+--------------+------------------------+
  5. | test         | b          |    589824 |       589824 |                 589824 |
  6. | test         | a          |   2949111 |       589824 |                 589824 |
  7. +--------------+------------+-----------+--------------+------------------------+
  8. 2 rows IN SET (0.00 sec)

For table sizes we can use traditional TABLES table:

SQL:
  1. mysql> SELECT table_schema,table_name,index_length FROM information_schema.TABLES WHERE table_schema='test' AND table_name IN ('a','b');
  2. +--------------+------------+--------------+
  3. | table_schema | table_name | index_length |
  4. +--------------+------------+--------------+
  5. | test         | a          |     39514112 |
  6. | test         | b          |     28390400 |
  7. +--------------+------------+--------------+
  8. 2 rows IN SET (0.01 sec)

Now with a bit of INFORMATION_SCHEMA magic and a bit of waiting on "efficient" Information Schema Query Execution (as you may guess we just need to join two previous results sets here) we can get the information about relative table index sizes and their relative use activity. I just summed rows modified and updated but you can surely use different formula if you like.

SQL:
  1. mysql> SELECT t.table_name,index_length/(SELECT sum(index_length) FROM information_schema.TABLES WHERE table_schema='test' AND t.table_name IN ('a','b')) s_ratio, (rows_read+rows_changed)/(SELECT sum(rows_read+rows_changed) FROM information_schema.table_statistics WHERE table_schema='test' AND t.table_name IN ('a','b')) u_ratio FROM information_schema.table_statistics ts JOIN information_schema.TABLES t ON t.table_name=ts.table_name AND t.table_schema=ts.table_schema WHERE t.table_schema='test' AND t.table_name IN ('a','b');
  2. +------------+---------+---------+
  3. | table_name | s_ratio | u_ratio |
  4. +------------+---------+---------+
  5. | a          |  0.58180.7488 |
  6. | b          |  0.41800.2496 |
  7. +------------+---------+---------+
  8. 2 rows IN SET (3 min 23.67 sec)

A bit more query hacking and we get a query which will return statements to initialize key buffers according to table sizes and activity (in this case taken with 50-50 weight though you may use other formula), while maintaining the restriction on the sum key buffer size (4000000000 in this case) and actual index size:

SQL:
  1. SELECT concat("SET GLOBAL ", t.table_schema,"_",t.table_name,".key_buffer_size=",round(least(index_length,(index_length/(SELECT sum(index_length) FROM information_schema.TABLES WHERE table_schema="test" AND t.table_name IN ("a","b")) + (rows_read+rows_changed)/(SELECT sum(rows_read+rows_changed) FROM information_schema.table_statistics WHERE table_schema="test" AND t.table_name IN ("a","b")))/2*4000000000)),";") cmd FROM information_schema.table_statistics ts JOIN information_schema.TABLES t ON t.table_name=ts.table_name AND t.table_schema=ts.table_schema WHERE t.table_schema="test" AND t.table_name IN ("a","b");
  2. +---------------------------------------------+
  3. | cmd                                         |
  4. +---------------------------------------------+
  5. | SET GLOBAL test_a.key_buffer_size=39514112; |
  6. | SET GLOBAL test_b.key_buffer_size=28390400; |
  7. +---------------------------------------------+
  8. 2 rows IN SET (2 min 31.68 sec)

Pass this via SELECT INTO OUTFILE or pipe it to mysql directly as explained here to create key caches.

Now you can use much more simple command to assign tables to the key caches:

SQL:
  1. mysql> SELECT concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") FROM information_schema.TABLES WHERE table_schema='test' AND table_name IN ("a","b");
  2. +-------------------------------------------------------------------------------------------+
  3. | concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") |
  4. +-------------------------------------------------------------------------------------------+
  5. | CACHE INDEX test.a IN test_a;                                                             |
  6. | CACHE INDEX test.b IN test_b;                                                             |
  7. +-------------------------------------------------------------------------------------------+
  8. 2 rows IN SET (0.00 sec)

So going through complex or not multiple key cache creation exercise you probably wonder how much performance gains should you expect. In fact it can be very significant.

For CPU bound workload with 16 Cores Inserting data to about 20 tables I've seen performance gains as much as 10x compared to using single shared key cache of the same size.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

09:33 以平常心看待导出链接 » 谷歌中文网站管理员博客
Linking out: Often it's just applying common sense
发表于:2008年10月8日星期三,上午12:20

我们链接专题系列的第三篇是关于创建导出链接,也就是怎样“链出去”。对于大多数网站管理员来说,导出链接是很自然的事情,不需要特别关注。然而,如果您对这个虽然简单但是对于网络来说却是很基本的话题感兴趣的话,下面我们将介绍一下什么是好的导出链接,什么又是不好的导出链接并回答一些网站管理员同行提出的比较典型的问题。首先,让我们从好的导出链接开始......

相关性高的导出链接能够帮助您网站的访问者。
  • 可以提供给读者关于相关话题的更深入的信息
  • 提供给读者对于现存资源的独特评述
理性的导出链接有助于建立您的信誉。
  • 显示您已经做过一些研究,对于此项主题有专门知识
  • 让访问者乐意于回访您的网站以获得对于将来话题的更多分析
  • 与其他领域的专家建立关系(如,给其他网站带来用户可以让您了解其他成功的博客从而建立商业关系)
当论及不理想的导出链接的案例时,大家应该不会感到吃惊,但是为了避免引起任何的困惑,我们会公开表明我们的观点。

不好的链接:无人管理的(特别是用户生成的)链接和隐晦的付费广告链接会降低您网站的可信度。
  • 您的信誉也会受损——资深访问者往往会觉得反感,并且您在搜索引擎的权威性也会降低。如果您接受付费的导出链接,最好对这些链接使用 rel="no follow" 元标记或者其他方式以确保他们不会(对于搜索引擎来说)传递PageRank。(您应该也会理解,作为一个用户,我们通常只会对那些充分保障我们的知情权的网站保持我们的忠诚度。)
  • 这样会给那些毫无意义的垃圾留言以藏身之地。而 且,从搜索引擎的角度来看,垃圾留言将您的网站与那些不好的网站联系在了一起。作为网站管理员,我们建议您对那些用户生成的链接,比如有可能遭到网络垃圾 侵扰的博客留言本,增加 nofollow属性(<rel="nofollow">),不然的话,请您确保这些评论只有在经过人工审核和确认后才能发表。
请参阅Jason Morrison最近发表的一篇帖子《让垃圾留言远离您的网站和用户》, 从一开始就阻止垃圾的产生。

关于导出链接问题的回答:

链出是否意味着我将访问者永远带给了别的网站?
网 站的访问者在开始的时候会离开您的网站以查看相关的信息。但是,您还记得当您遇到一个网站链向其他网站的好文章时您是怎么做的吗?对于我个人来说,如果网 站提供了有价值的评述和额外资源的话,我总是会回到这个网站的。有时候,我会留在初始网站上,而在浏览器的不同标签中打开那些有意思的链接。所以,拥有导 出链接的网站是很有可能赢得重复访问者的,您也不会永远失去这些用户。
上一篇帖子提到描述性的锚文本(anchor text)对于内部链接有帮助,那对于导出链接是不是也是很重要?
描述性的锚文本(超级链接的可见文本)有助于网络间的准确互联。它可以同时让用户和Googlebot更好地了解他们在跟随一个链接到达另外一个页面时将会看到什么内容。所以如果不是太麻烦的话,请尽量生成描述性的锚文本。
我是否应该关注我选择链向的网站?万一它们的PageRank比我低怎么办?

如果您相信您链向的内容会让您的用户感兴趣的话,请不用担心该网站的PageRank。作为一个网站管理员,关于导出链接我们应当注意的事项上文中已经向您介绍了,比如链向垃圾网站会降低您的可信度等。其他的话,我们无须为导出链接殚精竭虑,只要把它作为一种向您的用户提供更多价值的自然而然的方式就好了。

01:33 WordPress 2.6.5 » WordPress Development Blog

WordPress 2.6.5 is immediately available and fixes one security problem and three bugs. We recommend everyone upgrade to this release.

The security issue is an XSS exploit discovered by Jeremias Reith that fortunately only affects IP-based virtual servers running on Apache 2.x. If you are interested only in the security fix, copy wp-includes/feed.php and wp-includes/version.php from the 2.6.5 release package.

2.6.5 contains three other small fixes in addition to the XSS fix. The first prevents accidentally saving post meta information to a revision. The second prevents XML-RPC from fetching incorrect post types. The third adds some user ID sanitization during bulk delete requests. For a list of changed files, consult the full changeset between 2.6.3 and 2.6.5.

Note that we are skipping version 2.6.4 and jumping from 2.6.3 to 2.6.5 to avoid confusion with a fake 2.6.4 release that made the rounds. There is not and never will be a version 2.6.4.

Get WordPress 2.6.5.


^==Back Home: www.chedong.com

^==Back Digest Home: www.chedong.com/digest/

<== 2008-11-25
  十一月 2008  
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
==> 2008-11-27