Linux C编程一站式学习 | 16 Mar 2009

17:54 Are Cloud Based Memory Architectures the Next Big Thing? » High Scalability - Building bigger, faster, more reliable websites.

We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon.

Let's take a short trip down web architecture lane:

It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database

It's 1995: Scale-up the database.

It's 1998: LAMP

It's 1999: Stateless + Load Balanced + Database + SAN

It's 2001: In-memory data-grid.

It's 2003: Add a caching layer.

It's 2004: Add scale-out and partitioning.

It's 2005: Add asynchronous job scheduling and maybe a distributed file system.

It's 2007: Move it all into the cloud.

It's 2008: Cloud + web scalable database.

It's 20??: Cloud + Memory Based Architectures

You may disagree with the timing of various innovations and you would be correct. I couldn't find a history of the evolution of website architectures so I just made stuff up. If you have any better information please let me know.

Why might cloud based memory architectures be the next big thing? For now we'll just address the memory based architecture part of the question, the cloud component is covered a little later.

Behold the power of keeping data in memory:

Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.

This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.

Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebook has northwards of 800 memcached servers creating a reservoir of 28 terabytes of memory enabling a 99% cache hit rate. Even little guys can handle100s of millions of events per day by using memory instead of disk.

15:34 Compression for InnoDB backup » MySQL Performance Blog

Playing with last version of xtrabackup and compress it I noticed that gzip is unacceptable slow for both compression and decompression operations. Actually Peter wrote about it some time ago, but I wanted to review that data having some new information. In current multi-core word the compression utility should utilize several CPU to speedup operation, and another my requirement was the ability to work with stdin / stdout, so I could do scripting something like: innobackupex –stream | compressor | network_copy.

My research gave me next list: pigz (parallel gzip), pbzip2 (parallel bzip2), qpress ( command line utility for QuickLZ) and I wanted to try LZO (as lzop 1.03 command line + LZO 2 libraries). Actually lzop does not support parallel operations, but it is know to have good decompression speed even with 1 thread.

For compression test I took ~12GB of InnoDB data files generated by tpcc benchmark with 100 warehouses.

I tested 1, 2, 4 parallel threads for tools that support it and different level of compression ( 1,2,3 for qpress; -1 and -5 for other tools)

The raw results are available here http://spreadsheets.google.com/ccc?key=pOIo5aX59b6biPZ0QTVMXHg&hl=en, and I copy table in place in case if Google stops to work.


	threads	level	compressed size	compress ratio	comression time, sec	compr speed, MB/s	decomp time, sec	decomp speed, MB/s
qpress	1	1	6,058.93	0.52	109	55.59	92	65.86
	1	2	5,892.62	0.51	201	29.32	123	47.91
	1	3	5,885.01	0.51	473	12.44	84	70.06
	2	1	6,058.93	0.52	65	93.21	66	91.80
	2	2	5,892.62	0.51	110	53.57	112	52.61
	2	3	5,885.01	0.51	245	24.02	84	70.06
	4	1	6,058.93	0.52	48	126.23	66	91.80
	4	2	5,892.62	0.51	64	92.07	68	86.66
	4	3	5,885.01	0.51	130	45.27	65	90.54
pigz	1	1	4,839.97	0.42	438	11.05	129	37.52
	1	5	3,460.31	0.30	763	4.54	121	28.60
	2	1	4,839.97	0.42	213	22.72	109	44.40
	2	5	3,460.31	0.30	379	9.13	104	33.27
	4	1	4,839.97	0.42	107	45.23	112	43.21
	4	5	3,460.31	0.30	190	18.21	103	33.60
LZOP	1	1	5,831.25	0.50	184	31.69	83	70.26
	1	5	5,850.16	0.50	179	32.68	87	67.24
pbzip2	1	1	4,154.41	0.36	1594	2.61	597	6.96
	1	5	4,007.07	0.34	1702	2.35	644	6.22
	2	1	4,154.41	0.36	800	5.19	605	6.87
	2	5	4,007.07	0.34	844	4.75	648	6.18
	4	1	4,154.41	0.36	399	10.41	602	6.90
	4	5	4,007.07	0.34	421	9.52	645	6.21

To summarize results:

pbzip2 obviously show good compression, but the speed of processing is too slow. What is interesting on Level 5 the compression is worse than in pigz Level 5
pigz is good for compression and faster than pbzip2 but still not so fast; however multi-threaded processing may be OK, especially if you need to keep compatibility, e.g. copy result on boxes where only standard gzip available
qpress is not so good in compression ration, but speed is impressive, and maybe we will ship xtrabackup with this compression
LZO is even faster in decompression than qpress, but I would like to see parallel version. There is the patch for it, but it did not apply clean to lzop 1.02, so I skipped it
In my opinion in all cases Level 1 of compression shows better tradeoff between size of archive and compression/decompression time

There is no obvious winner, it depends on what is more important for you - size or time, but having this data we can make decision.

Entry posted by Vadim | No comment

Add to: | | | |

06:06 Linux C编程一站式学习 » Delicious/chedong

本书包括三大部分： * C语言入门。介绍基本的C语法，帮助没有任何编程经验的读者理解什么是程序，怎么写程序，培养程序员的思维习惯，找到编程的感觉。前半部分改编自[ThinkCpp]。 * C语言本质。结合计算机和操作系统的原理讲解C程序是怎么编译、链接、运行的，同时全面介绍C的语法。位运算的章节改编自亚嵌教育林小竹老师的讲义。汇编语言的章节改编自[GroudUp]，在这本书的最后一章提到，学习编程有两种Approach，一种是Bottom Up，一种是Top Down，各有优缺点，需要两者结合起来。所以我编这本书的思路是，第一部分Top Down，第二部分Bottom Up，第三部分可以算填了中间的空隙，三部分全都围绕C语言展开。 * Linux系统编程。介绍各种Linux系统函数和内核的工作原理。Socket编程的章节改编自亚嵌教育卫剑钒老师的讲义。

	三月 2009
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31