17:54 Are Cloud Based Memory Architectures the Next Big Thing? » High Scalability - Building bigger, faster, more reliable websites.

We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon.

Let's take a short trip down web architecture lane:

  • It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database
  • It's 1995: Scale-up the database.
  • It's 1998: LAMP
  • It's 1999: Stateless + Load Balanced + Database + SAN
  • It's 2001: In-memory data-grid.
  • It's 2003: Add a caching layer.
  • It's 2004: Add scale-out and partitioning.
  • It's 2005: Add asynchronous job scheduling and maybe a distributed file system.
  • It's 2007: Move it all into the cloud.
  • It's 2008: Cloud + web scalable database.
  • It's 20??: Cloud + Memory Based Architectures

    You may disagree with the timing of various innovations and you would be correct. I couldn't find a history of the evolution of website architectures so I just made stuff up. If you have any better information please let me know.

    Why might cloud based memory architectures be the next big thing? For now we'll just address the memory based architecture part of the question, the cloud component is covered a little later.

    Behold the power of keeping data in memory:


    Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.

    This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.

    Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebook has northwards of 800 memcached servers creating a reservoir of 28 terabytes of memory enabling a 99% cache hit rate. Even little guys can handle100s of millions of events per day by using memory instead of disk.

    read more

  • 15:34 Compression for InnoDB backup » MySQL Performance Blog

    Playing with last version of xtrabackup and compress it I noticed that gzip is unacceptable slow for both compression and decompression operations. Actually Peter wrote about it some time ago, but I wanted to review that data having some new information. In current multi-core word the compression utility should utilize several CPU to speedup operation, and another my requirement was the ability to work with stdin / stdout, so I could do scripting something like: innobackupex –stream | compressor | network_copy.

    My research gave me next list: pigz (parallel gzip), pbzip2 (parallel bzip2), qpress ( command line utility for QuickLZ) and I wanted to try LZO (as lzop 1.03 command line + LZO 2 libraries). Actually lzop does not support parallel operations, but it is know to have good decompression speed even with 1 thread.

    For compression test I took ~12GB of InnoDB data files generated by tpcc benchmark with 100 warehouses.

    I tested 1, 2, 4 parallel threads for tools that support it and different level of compression ( 1,2,3 for qpress; -1 and -5 for other tools)

    The raw results are available here http://spreadsheets.google.com/ccc?key=pOIo5aX59b6biPZ0QTVMXHg&hl=en, and I copy table in place in case if Google stops to work.

    threads level compressed size compress ratio comression time, sec compr speed, MB/s decomp time, sec decomp speed, MB/s
    qpress 1 1 6,058.93 0.52 109 55.59 92 65.86
    1 2 5,892.62 0.51 201 29.32 123 47.91
    1 3 5,885.01 0.51

    473 12.44 84 70.06
    2 1 6,058.93 0.52 65 93.21 66 91.80
    2 2 5,892.62 0.51

    110 53.57 112 52.61
    2 3 5,885.01 0.51

    245 24.02 84 70.06
    4 1 6,058.93 0.52 48 126.23 66 91.80
    4 2 5,892.62 0.51 64 92.07 68 86.66
    4 3 5,885.01 0.51 130 45.27 65 90.54
    pigz 1 1 4,839.97 0.42 438 11.05 129 37.52
    1 5 3,460.31 0.30 763 4.54 121 28.60
    2 1 4,839.97 0.42 213 22.72 109 44.40
    2 5 3,460.31 0.30 379 9.13 104 33.27
    4 1 4,839.97 0.42 107 45.23 112 43.21
    4 5 3,460.31 0.30 190 18.21 103 33.60
    LZOP 1 1 5,831.25 0.50 184 31.69 83 70.26
    1 5 5,850.16 0.50 179 32.68 87 67.24
    pbzip2 1 1 4,154.41 0.36 1594 2.61 597 6.96
    1 5 4,007.07 0.34 1702 2.35 644 6.22
    2 1 4,154.41 0.36 800 5.19 605 6.87
    2 5 4,007.07 0.34 844 4.75 648 6.18
    4 1 4,154.41 0.36 399 10.41 602 6.90
    4 5 4,007.07 0.34 421 9.52 645 6.21

    To summarize results:

    There is no obvious winner, it depends on what is more important for you - size or time, but having this data we can make decision.


    Entry posted by Vadim | No comment

    Add to: delicious | digg | reddit | netscape | Google Bookmarks

    06:06 Linux C编程一站式学习 » Delicious/chedong
    本书包括三大部分: * C语言入门。介绍基本的C语法,帮助没有任何编程经验的读者理解什么是程序,怎么写程序,培养程序员的思维习惯,找到编程的感觉。前半部分改编自[ThinkCpp]。 * C语言本质。结合计算机和操作系统的原理讲解C程序是怎么编译、链接、运行的,同时全面介绍C的语法。位运算的章节改编自亚嵌教育林小竹老师的讲义。汇编语言的章节改编自[GroudUp],在这本书的最后一章提到,学习编程有两种Approach,一种是Bottom Up,一种是Top Down,各有优缺点,需要两者结合起来。所以我编这本书的思路是,第一部分Top Down,第二部分Bottom Up,第三部分可以算填了中间的空隙,三部分全都围绕C语言展开。 * Linux系统编程。介绍各种Linux系统函数和内核的工作原理。Socket编程的章节改编自亚嵌教育卫剑钒老师的讲义。

    ^==Back Home: www.chedong.com

    ^==Back Digest Home: www.chedong.com/digest/

    <== 2009-03-15
      三月 2009  
                1
    2 3 4 5 6 7 8
    9 10 11 12 13 14 15
    16 17 18 19 20 21 22
    23 24 25 26 27 28 29
    30 31          
    ==> 2009-03-17