(Note: Review was done as part of our consulting practice, but is totally independent and fully reflects our opinion)
I had a chance to take look TokuDB (the name of the Tokutek storage engine), and run some benchmarks. Tuning of TokuDB is much easier than InnoDB, there only few parameters to change, and actually out-of-box things running pretty well.
There are some rumors circulating that TokuDB is ”.. only an in memory or read-only engine, and that’s why inserts are so fast”. This is not actually the case, as TokuDB is a disk-based, read-write transactional storage engine that is based on special “fractal tree indexes”. Fractal Trees are a drop-in-replacement for a B-tree (based on current research in data structures by professors at Stony Brook, Rutgers, and MIT). I can’t say exactly how it is improved, because the engine itself is closed source.
Along with its “fractal tree indexes”, TokuDB also uses compression, which significantly ( Graph 1) reduces dataset and decreases the amount of IO operations. The benefit of small size is also that TokuDB can keep in memory much more records then InnoDB / MyISAM. Actually in internal cache records are stored in uncompressed form, but OS Cache can keep compressed pages in memory. For the data set we tested, TokuDB used 6.2x less disk space than InnoDB, and 5.5x less disk space than MyISAM.
For tests I used Dell PowerEdge R900, with RAID 10 on 8 disks (2.5″ SAS disks, 15K RPMS) and 32GB of physical RAM, but restricted on kernel level to 4GB to emulate case B-Tree does not fit into memory.
As benchmark software I tried iiBench, which you can take there https://launchpad.net/mysql-patch/mytools
What makes fractal indexes so interesting is the amount of IO operations to update index tree is significantly less than for usual B-Tree index. It’s as if Fractal Trees turn random IO into sequential IO. This is why you see the results that you do in iiBench test ( Graph 2), and the number of inserts/sec is almost linear even when table size bigger than available memory. For the last 10M rows inserted, InnoDB averaged 1,555 rows/sec while TokuDB averaged 16,437 rows/sec - about 10.6x faster. One consequence of having such fast indexes, is that you can maintain a richer set of indexes at a given incoming data rate, enabling much higher query performance.
Beside iiBench we run benchmarks of SELECT queries again one of our click analyzing schema, in two modes - 1. data size is much more then memory and 2. data fits into memory.
As you see in IO-bound case TokuDB outperforms InnoDB 1.4-2.5x times, but CPU-bound is not so good. I think there we meet one of current restrictions of TokuDB - SERIALIZABLE isolation level for transactions.
Speaking about restrictions, the current problems I see are:
- Transactions only support the SERIALIZABLE isolation level. Beside it TokuDB does not scale well on multi-cores even in only SELECT queries. What this practically means it that you can’t get benefit of multi-core boxes running concurrent threads. Tokutek plans to fix this in one of the next releases.
- We did not tested wide range of queries, but by design expect there may be not good results for some kind of queries, i.e. point select queries, as in this case TokuDB has to read and decompress big portion of data.
- Despite Inserts and Deletes are fast, updates are not expected to show the same performance gain, as to update we need to read data, and in this case - read previous comment.
- The version we tested did not yet support recovery logs. The code for it is ready, and will be available in a release soon.
- The ways to do a backup is mysqldump/mysqlhotcopy. It is not fully transparent backup, as it applied TABLE LOCK on copying table. When recovery logs are supported, I guess it will be possible to run LVM backup. Actually I would say backup is only partially the problem of storage engine. The biggest problem is that MySQL does not yet provide an interface for that. This is going to be fixed in MySQL 6.0, but I can’t yet say how it will work with mix of storage engines.
- The Tokutek engine I tested comes in binary form and mysqld binary does not contain InnoDB. Tokutek tells me that InnoDB will be included in a future release.
With all the given advantages and drawbacks, I see a good practical usage of TokuDB for log analyzing and log reporting queries. By log analyzing I mean any kind of log producing application, it can be from simple apache logs put into mysql, application performance logs to more complex log like clicks, user movements and actions on site, visits tracking etc. While it may sound like an easy and trivial task, it is not at all. The more logs there are, the more space they take, and we have had setups where logs are 80% of total database size. Also there is the problem of being able to run custom reporting queries on logs. To do this, you often need many, often complex indexes which gives us the problem of random IO, waste of RAM memory and slow inserts. This is where I think Tokutek appears to be positioned to do quite good at. There are operation issues which make things more complex and, probably, I would not put yet TokuDB on customer production boxes, but it may good fit to non-critical slave where you can run analyzing queries.
Entry posted by Vadim | No comment
Update 4:: Introduction to Pig. Pig allows you to skip programming Hadoop at the low map-reduce level. You don't have to know Java. Using the Pig Latin language, which is a scripting data flow language, you can think about your problem as a data flow program. 10 lines of Pig Latin = 200 lines of Java.
Update 3: Scaling Hadoop to 4000 nodes at Yahoo!. 30,000 cores with nearly 16PB of raw disk; sorted 6TB of data completed in 37 minutes; 14,000 map tasks writes (reads) 360 MB (about 3 blocks) of data into a single file with a total of 5.04 TB for the whole job.
Update 2: Hadoop Summit and Data-Intensive Computing Symposium Videos and Slides. Topics include: Pig, JAQL, Hbase, Hive, Data-Intensive Scalable Computing, Clouds and ManyCore: The Revolution, Simplicity and Complexity in Data Systems at Scale, Handling Large Datasets at Google: Current Systems and Future Directions, Mining the Web Graph. and Sherpa: Hosted Data Serving.
Update: Kevin Burton points out Hadoop now has a blog and an introductory video staring Beyonce. Well, the Beyonce part isn't quite true.
Hadoop is a framework for running applications on large clusters of commodity hardware using a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed on any node in the cluster. It replicates much of Google's stack, but it's for the rest of us. Jeremy Zawodny has a wonderful overview of why Hadoop is important for large website builders:
Today we glad to announce release 1.0.3-5 of our XtraDB storage engine.
Here is a list of enhancements in this release:
Percona XtraDB 1.0.3-5 available in source and several binary packages.
XtraDB is compatible with existing InnoDB tables (unless you used innodb_extra_undoslots) and we are going to keep compatibility in further releases. We are open for features requests for new engine and ready to accept community patches. You can monitor Percona’s current tasks and further plans on the Percona XtraDB Launchpad project. You can also request features and report bugs there. Also we have setup two maillists for General discussions and for Development related questions.
Entry posted by Evgeniy Stepchenko | 4 comments
Comps for the header/nav design tweaks are in, and the results are mixed. Some people just moved a few things around, while others proposed a new style altogether. We won’t make any major changes to style in 2.8, but if the vote leans toward a submission that proposes it, we’ll do some user testing and make a decision for early 2.9 (which, now that we think of it, is probably the right thing to do anyway. )
Below are the links to the screenshots that were submitted. Please review each one (I’d open them all in tabs so I could look back and forth while they are all large size, because the voting poll just uses thumbnails), then choose the one you think looks the best/is the most usable.
This poll was supposed to close at 8pm NY time on Tuesday (today), but we’re going to leave it open for an extra day. The voting poll will now be closed at 8pm NY time on Wednesday (that’s 2am Thursday, UTC). If you want to discuss the entries’ pros/cons, this thread would be a good place.
Current: The existing interface, for reference
KM: Current nav, header elements moved
AN: Current nav, file folder style header
KD: Current nav, modified header style
JJ: Swap blog title and favorites menu
IK: Nav layered over dark background
GB: Modified nav/header intersection
Results will be posted the day after the polls close.
四月 2009 | ||||||
一 | 二 | 三 | 四 | 五 | 六 | 日 |
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |