How to load large files safely into InnoDB with LOAD DATA INFILE

家族所有权能够使一家企业具有其竞争对手所缺乏的使命感和价值观。对于中国的新兴企业家，约翰·艾尔金顿表示，在这个颇受忽视的领域，到了该考虑最佳实践的时候了。

最近，在圣保罗机场，女士们追着我搭话，真是罕见。度过紧张的一周之后，我从巴西回国，行李塞得满满当当，一个透明的大包都不够用，里面装了12只五颜六色的小小的人字拖鞋。这种迷你鞋吸引了女性的目光——有几位女士叫住我问，这些拖鞋是在哪里搞到的。我解释说，这是当地一家公司送的礼物，还有一个匣子，里面是我向他们要的一套有关他们对企业责任和可持续性看法的材料。

当我告诉她们送包给我的公司时，追问者似乎被吸引住了，而当我提到里面的东西时，则没那么感兴趣。然而，我的反应却恰好相反。在所有的经济体中，尤其是在新兴市场和发展中国家，家族企业所起的重要作用，几十年来基本上被我忽视了。是的，我知道，像庄臣这样的公司，能够在环保领域先行一步，因为它们是家族企业。而且，SustainAbility与福特公司合作的近十年来，该公司首席执行官比尔·福特的家族纽带是如何有助于他在内部提出一个可持续动力的议程，至今让我印象深刻——由于燃料价格飙升，削弱了该公司的运动型多功能车业务，该议程到现在才开始获得吸引力。

出于完全可以理解的原因，活动人士和企业社会责任（CSR）“行业”往往关注上市公司，部分是因为这样的企业具有制造全球显著问题的规模，不过，部分也是因为活动人士发现，企业品牌和声誉能对企业的所有者造成影响。在过去，那些易受到活动人士和媒体攻击的品牌很少是由大型的家族企业控制的，只有糖果公司玛氏和包装生产商利乐属于例外。现在，全球化吸引越来越多的新兴经济体的参与，推动了全球化的发展，我们必须想办法如何解除这些企业的限制。

家族企业包括那些在巴西和印度等国的公司，这类公司我们见得越来越多。就我那诱人的行李而言，生产者是人字拖鞋品牌Havaianas，最终所有者是Camargo Corrêa公司，这家公司从事多种经营，从水泥和建筑，到环境工程，到生产鞋类。这里，参与更广泛的企业责任和可持续性议程，很多来自家族股东的推动，他们的父辈是公司的创立者。在印度，塔塔家族——已经是第五代了——和伯拉家族经营着庞大的家族企业集团，在经营上形成了很强的价值观和道德感。

鉴于国企或者被私有化的国企占据着主导地位，中国也许没有像巴西卡马戈、印度塔塔或伯拉这样的大型家族企业。即使如此，规模更小的国内外华人家族企业是一支举足轻重的力量。有一项估计表明，华人私营公司是继北美、日本和欧洲之后的世界第四大经济力量，其中很多是家族企业，很多是在中国大陆以外的地区。

无论是在中国还是在世界其他地方，像其他国家的家族企业一样，华人家族企业显然为企业责任和可持续发展的倡导者提供了同样的机会和挑战。机会是，企业主能够将营利和超出营利之外的使命感和价值观渗透到企业的经营之中——这是所有权分散的大型上市公司难以做到的，无论公司使命写得如何漂亮。

有两位充满激情的中国企业家正在这样做，一位是禽蛋生产商德清源的钟凯民，一位是远大空调的张跃。远大空调创立于1988年，生产不用电的吸收式空调。张跃相信，环境考虑应该是企业研发和销售活动的核心，该技术是实现其信念的关键。

然而，在20世纪90年代末期，经历了十年的商业成功之后，这一信念遭遇了挑战。随着能源供应和政府政策的变化，电价明显下降，危机随之而来，导致远大空调的产品市场份额下降，被价格更低更易用的电力空调夺走了。对于远大而言，最简单的解决办法就是转而生产电力空调。但是，张跃坚持己见，对非电力空调进行了重新设计，使其价格更低，甚至能耗更低，更易维护。结果，在中国和国际市场上不断取得成功。

就德清源而言，张凯民从一开始就遇到了挑战。该公司计划在中国市场生产和销售高品质的鸡蛋。考虑到这个国家的一系列食品安全和健康问题以及行业分散的特点，价格面临着来自为数众多的小生产商的压力，这确实是一个富有挑战性的商业环境。但是，钟凯民打赌——结果证明是做对了——他的高品质鸡蛋将切合越来越关注健康的中国消费者的需要，如果德清源能够建立一个品质与健康的品牌，消费者将愿意支付更高的价钱。

尽管远大空调和德清源获得了成功和环保资质，真正的挑战依然存在。家族企业的系统治理、透明度和企业责任还不为所知，而股东们对国际上市公司在这些方面的期望越来越高。不过，德清源在这个方向上已采取了明确的措施。SustainAbility未来的任务之一将着眼于家族企业中的最佳（和最差）实践，以更好地了解如何帮助这类企业处理可持续性方面的挑战。

约翰·艾尔金顿是SustainAbility（www.sustainability.com）的创始人兼非执行董事，同时也是Volans Ventures（www.volans.com）的创始合伙人。SustainAbility的新兴经济体项目（www.sustainability.com/emerging-economies）经理乔迪·索普为此文的写作提供了帮助，作者在此表示感谢。

首页图片由zinlee摄

16:27 Recovering Innodb table Corruption » MySQL Performance Blog

Assume you're running MySQL with Innodb tables and you've got crappy hardware, driver bug, kernel bug, unlucky power failure or some rare MySQL bug and some pages in Innodb tablespace got corrupted. In such cases Innodb will typically print something like this:

InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 7.
InnoDB: You may have to recover from a backup.
080703 23:46:16 InnoDB: Page dump in ascii and hex (16384 bytes):
... A LOT OF HEX AND BINARY DATA...
080703 23:46:16 InnoDB: Page checksum 587461377, prior-to-4.0.14-form checksum 772331632
InnoDB: stored checksum 2287785129, prior-to-4.0.14-form stored checksum 772331632
InnoDB: Page lsn 24 1487506025, low 4 bytes of lsn at page end 1487506025
InnoDB: Page number (if stored to page already) 7,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 6353
InnoDB: Page may be an index page where index id is 0 25556
InnoDB: (index "PRIMARY" of table "test"."test")
InnoDB: Database page corruption on disk or a failed

and crash with assertion failure.
So what can you do to recover such a table ?

There are multiple things which can get corrupted and I will be looking in details on the simple one in this article - when page in clustered key index is corrupted. It is worse compared to having data corrupted in secondary indexes, in which case simple OPTIMIZE TABLE could be enough to rebuild it, but it is much better compared to table dictionary corruption when it may be much harder to recover the table.

In this example I actually went ahead and manually edited test.ibd file replacing few bytes so corruption is mild.

First I should note CHECK TABLE in INNODB is pretty useless. For my manually corrupted table I am getting:

PLAIN TEXT

SQL:

mysql> CHECK TABLE test;
ERROR 2013 (HY000): Lost connection TO MySQL server during query
mysql> CHECK TABLE test;
+-----------+-------+----------+----------+
| TABLE | Op | Msg_type | Msg_text |
+-----------+-------+----------+----------+
| test.test | CHECK | STATUS | OK |
+-----------+-------+----------+----------+
1 row IN SET (0.69 sec)

First run is check table in normal operation mode - in which case Innodb simply crashes if there is checksum error (even if we're running CHECK operation). In second case I'm running with innodb_force_recovery=1 and as you can see even though I get the message in the log file about checksum failing CHECK TABLE says table is OK. This means You Can't Trust CHECK TABLE in Innodb to be sure your tables are good.

In this simple corruption was only in the data portion of pages so once you started Innodb with innodb_force_recovery=1 you can do the following:

PLAIN TEXT

SQL:

mysql> CREATE TABLE `test2` (
-> `c` char(255) DEFAULT NULL,
-> `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-> PRIMARY KEY (`id`)
-> ) ENGINE=MYISAM;
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO test2 SELECT * FROM test;
Query OK, 229376 rows affected (0.91 sec)
Records: 229376 Duplicates: 0 Warnings: 0

Now you got all your data in MyISAM table so all you have to do is to drop old table and convert new table back to Innodb after restarting without innodb_force_recovery option. You can also rename the old table in case you will need to look into it more later. Another alternative is to dump table with MySQLDump and load it back. It is all pretty much the same stuff. I'm using MyISAM table for the reason you'll see later.

You may think why do not you simply rebuild table by using OPTIMIZE TABLE ? This is because Running in innodb_force_recovery mode Innodb becomes read only for data operations and so you can't insert or delete any data (though you can create or drop Innodb tables):

PLAIN TEXT

SQL:

mysql> OPTIMIZE TABLE test;
+-----------+----------+----------+----------------------------------+
| TABLE | Op | Msg_type | Msg_text |
+-----------+----------+----------+----------------------------------+
| test.test | OPTIMIZE | error | Got error -1 FROM storage engine |
| test.test | OPTIMIZE | STATUS | Operation failed |
+-----------+----------+----------+----------------------------------+
2 rows IN SET, 2 warnings (0.09 sec)

That was easy, right ?

I also thought so, so I went ahead and edited test.ibd a little more wiping one of the page headers completely. Now CHECK TABLE would crash even with innodb_force_recovery=1

080704 0:22:53 InnoDB: Assertion failure in thread 1158060352 in file btr/btr0btr.c line 3235
InnoDB: Failing assertion: page_get_n_recs(page) > 0 || (level == 0 && page_get_page_no(page) == dict_index_get_page(index))
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even

If you get such assertion failures most likely higher innodb_force_recovery values would not help you - they are helpful in case there is corruption in various system areas but they can't really change anything in a way Innodb processes page data.

The next comes trial and error approach:

PLAIN TEXT

SQL:

mysql> INSERT INTO test2 SELECT * FROM test;
ERROR 2013 (HY000): Lost connection TO MySQL server during query

You may think will will scan the table until first corrupted row and get result in MyISAM table ? Unfortunately test2 ended up to be empty after the run. At the same time I saw some data could be selected. The problem is there is some buffering taking place and as MySQL crashes it does not store all data it could recover to MyISAM table.

Using series of queries with LIMIT can be handly if you recover manually:

PLAIN TEXT

SQL:

mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 10;
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 20;
Query OK, 10 rows affected (0.00 sec)
Records: 20 Duplicates: 10 Warnings: 0
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 100;
Query OK, 80 rows affected (0.00 sec)
Records: 100 Duplicates: 20 Warnings: 0
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 200;
Query OK, 100 rows affected (1.47 sec)
Records: 200 Duplicates: 100 Warnings: 0
mysql> INSERT IGNORE INTO test2 SELECT * FROM test LIMIT 300;
ERROR 2013 (HY000): Lost connection TO MySQL server during query

As you can see I can get rows from the table in the new one until we finally touch the row which crashes MySQL. In this case we can expect this is the row between 200 and 300 and we can do bunch of similar statements to find exact number doing "binary search"

Note even if you do not use MyISAM table but fetch data to the script instead make sure to use LIMIT or PK Rangers when MySQL crashes you will not get all data in the network packet you potentially could get due to buffering.

So now we found there is corrupted data in the table and we need to somehow skip over it. To do it we would need to find max PK which could be recovered and try some higher values

PLAIN TEXT

SQL:

mysql> SELECT max(id) FROM test2;
+---------+
| max(id) |
+---------+
| 220 |
+---------+
1 row IN SET (0.00 sec)
mysql> INSERT IGNORE INTO test2 SELECT * FROM test WHERE id>250;
ERROR 2013 (HY000): Lost connection TO MySQL server during query
mysql> INSERT IGNORE INTO test2 SELECT * FROM test WHERE id>300;
Query OK, 573140 rows affected (7.79 sec)
Records: 573140 Duplicates: 0 Warnings: 0

So we tried to skip 30 rows and it was too little while skipping 80 rows was OK. Again using binary search you can find out how many rows do you need to skip exactly to recover as much data as possible. Row size can be good help to you. In this case we have about 280 bytes per row so we get about 50 rows per page so not a big surprise 30 rows was not enough - typically if page directory is corrupted you would need to skip at least whole page. If page is corrupted at higher level in BTREE you may need to skip a lot of pages (whole subtree) to use this recovery method.

It is also well possible you will need to skip over few bad pages rather than one as in this example.

Another hint - you may want to CHECK your MyISAM table you use for recovery after MySQL crashes to make sure indexes are not corrupted.

So we looked at how to get your data back from simple Innodb Table Corruption. In more complex cases you may need to use higher innodb_force_recovery modes to block purging activity, insert buffer merge or recovery from transactional logs all together. Though the lower recovery mode you can run your recovery process with better data you're likely to get.

In some cases such as if data dictionary or "root page" for clustered index is corrupted this method will not work well - in this case you may wish to use Innodb Recovery Toolkit which is also helpful in cases you've want to recover deleted rows or dropped table.

I should also mention at Percona we offer assistance in MySQL Recovery, including recovery from Innodb corruptions and deleted data.

Entry posted by peter | One comment

Add to: | | | |

12:17 关于推介计划停止的进一步说明 » Inside AdSense-中文

作者 AdSense 支持小组

在之前发布了推介计划即将停止的通知后，有很多发布商对这个通知产生了误解，误以为我们要停止整个 AdSense 计划。所以今天我们就这个问题进行进一步的说明：

AdSense 计划不会停止，我们只会停止 AdSense 计划中的一个广告产品——推介（包括 Firefox，AdSense 和 AdWords等推介和推介广告）。

AdSense 计划的所有其他广告产品： AdSense 内容广告、搜索广告、移动广告、搜索联盟都仍将正常进行。Google AdSense 会不断努力改进我们的产品和服务，帮助广大发布商获得更好的收益！

03:23 How to load large files safely into InnoDB with LOAD DATA INFILE » MySQL Performance Blog

Recently I had a customer ask me about loading two huge files into InnoDB with LOAD DATA INFILE. The goal was to load this data on many servers without putting it into the binary log. While this is generally a fast way to load data (especially if you disable unique key checks and foreign key checks), I recommended against this. There are several problems with the very large transaction caused by the single statement. We didn't want to split the file into pieces for the load for various reasons. However, I found a way to load the single file in chunks as though it were many small files, which avoided splitting the file and let us load with many transactions instead of one huge transaction.

The smaller file is 4.1GB and has 260M lines in it; each row is just two bigints. The bigger file was about 20GB and had wider rows with textual data and about 60M lines (as I recall).

When InnoDB loads the file, it creates one big transaction with a lot of undo log entries. This has a lot of costs. To name a few:

the big LOAD DATA INFILE clogs the binary log and slows replication down. If the load takes 4 hours on the master, it will cause the slave to fall 4 hours behind.
lots of undo log entries collect in the tablespace. Not only from the load -- but from other transactions' changes too; the purge thread cannot purge them, so everything gets bloated and slow. Even simple SELECT queries might have to scan through lots of obsolete, but not-yet-purged, row versions. Later, the purge thread will have to clean these up. This is how you make InnoDB behave like PostgreSQL
If the undo log space grows really big, it won't fit in the buffer pool and InnoDB essentially starts swapping between its buffer pool and the tablespace on disk.

Most seriously, if something should happen and the load needs to roll back, it will take a Very Long Time to do -- I hate to think how long. I'm sure it would be faster to just shut everything down and re-clone the machine from another, which takes about 10 or 12 hours. InnoDB is not optimized for rollbacks, it's optimized for transactions that succeed and commit. Rollback can take an order of magnitude longer to do.

For that reason, we decided to load the file in chunks of a million rows each. (InnoDB internally does operations such as ALTER TABLE in 10k row chunks, by the way; I chose 1M because the rows were small). But how to do this without splitting the file? The answer lies in the Unix fifo. I created a script that reads lines out of the huge file and prints them to a fifo. Then we could use LOAD DATA INFILE on the fifo. Every million lines, the script prints an EOF character to the fifo, closes it and removes it, then re-creates it and keeps printing more lines. If you 'cat' the fifo file, you get a million lines at a time from it. The code is pretty simple and I've included it in Maatkit just for fun. (It's unreleased as of yet, but you can get it with the following command: "wget http://www.maatkit.org/trunk/fifo").

So how did it work? Did it speed up the load?

Not appreciably. There actually was a tiny speedup, but it's statistically insignificant IMO. I tested this first on an otherwise idle machine with the same hardware as the production machines. First, I did it in one big 4.1GB transaction, then I did it 1 million rows at a time. Here's the CREATE TABLE:

PLAIN TEXT

SQL:

CREATE TABLE load_test (
col1 bigint(20) NOT NULL,
col2 bigint(20) DEFAULT NULL,
KEY(col1),
KEY(col2)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Here's the result of loading the entire 4GB file in one chunk:

PLAIN TEXT

CODE:

time mysql -e "set foreign_key_checks=0; set sql_log_bin=0; set unique_checks=0; load data local infile 'infile.txt' into table load_test fields terminated by '\t' lines terminated by '\n' (col1, col2);"
real 234m53.228s
user 0m1.098s
sys 0m5.959s

While this ran, I captured vmstat output every 5 seconds and logged it to a file; I also captured the output of "mysqladmin ext -ri5 | grep Handler_write" and logged that to a file.

To load the file in chunks, I split my screen session in two and then ran (approximately -- edited for clarity) the following in one terminal:

PLAIN TEXT

CODE:

perl mk-fifo-split infile.txt --fifo /tmp/my-fifo --lines 1000000

And this in the other terminal:

PLAIN TEXT

CODE:

while [ -e /tmp/my-fifo ]; do
mysql -e "..... same as above.... ";
sleep 1;
done

After I was done, I ran a quick Perl script on the vmstat and mysqladmin log files to grab out the disk activity and rows-per-second to see what the progress was. Here are some graphs. This one is the rows per second from mysqladmin, and the blocks written out per second from vmstat.

Rows per second and blocks written out per second

And this one is the bytes/sec from Cacti running against this machine. This is only the bytes out per second; for some reason Cacti didn't seem to be capturing the bytes in per second.

Cacti graph while loading file

You can see how the curves are roughly logarithmic, which is what you should expect for B-Tree indexes. The two curves on the Cacti graph actually show both files being loaded. It might seem counter-intuitive, but the second (smaller) curve is actually the larger file. It has fewer rows and that's why it causes less I/O overall.

I also used 'time' to run the Perl fifo script, and it used a few minutes of CPU time during the loads. So not very much at all.

Some interesting things to note: the load was probably mostly CPU-bound. vmstat showed from 1% to 3% I/O wait during this time. (I didn't think to use iostat to see how much the device was actually used, so this isn't a scientific measurement of how much the load was really waiting for I/O). The single-file load showed about 1 or 2 percent higher I/O wait, and you can see the single-file load uses more blocks per row; I can only speculate that this is the undo log entries being written to disk. (Peter arrived at the same guess independently.)

Unfortunately I didn't think to log the "cool-down period" after the load ended. It would be fun to see that. Cacti seemed to show no cool-down period -- as soon as the load was done it looked like things went back to normal. I suspect that's not completely true, since the buffer pool must have been overly full with this table's data.

Next time I do something like this I'll try smaller chunks, such as 10k rows; and I'll try to collect more stats. It would also be interesting to try this on an I/O-bound server and see what the performance impact is, especially on other transactions running at the same time.

Entry posted by Baron Schwartz | 3 comments

Add to: | | | |

	七月 2008
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31