19:21 Beware of MySQL Data Truncation » MySQL Performance Blog

Here is nice gotcha which I've seen many times and which can cause just a minefield for many reasons.
Lets say you had a system storing articles and you use article_id as unsigned int. As the time goes and you see you may get over 4 billions of articles you change the type for article_id to bigint unsigned but forget linked tables.

SQL:
  1. mysql> CREATE TABLE article_comment(article_id int UNSIGNED NOT NULL, comment_id int UNSIGNED NOT NULL, KEY(article_id));
  2. Query OK, 0 rows affected (0.11 sec)
  3.  
  4. mysql> INSERT INTO article_comment VALUES(4300000000,1);
  5. Query OK, 1 row affected, 1 warning (0.00 sec)
  6.  
  7. mysql> INSERT INTO article_comment VALUES(4300000001,1);
  8. Query OK, 1 row affected, 1 warning (0.00 sec)
  9.  
  10. mysql> INSERT INTO article_comment VALUES(4300000002,1);
  11. Query OK, 1 row affected, 1 warning (0.00 sec)
  12.  
  13. mysql> SELECT * FROM article_comment;
  14. +------------+------------+
  15. | article_id | comment_id |
  16. +------------+------------+
  17. | 4294967295 |          1 |
  18. | 4294967295 |          1 |
  19. | 4294967295 |          1 |
  20. +------------+------------+
  21. 3 rows IN SET (0.00 sec)

For table "article" itself you would notice the issue quickly if you would forget to alter it as inserts will fail because of primary key collisions. But for linked tables like above inserts work you just get a warning:

SQL:
  1. mysql> SHOW warnings;
  2. +---------+------+--------------------------------------------------------------+
  3. | Level   | Code | Message                                                      |
  4. +---------+------+--------------------------------------------------------------+
  5. | Warning | 1264 | Out of range value adjusted FOR COLUMN 'article_id' at row 1 |
  6. +---------+------+--------------------------------------------------------------+
  7. 1 row IN SET (0.06 sec)

And in my experience very few people mind to check the warnings promptly.

So two things happen, one is nastier than the other. First the comments end up associated to the wrong article (4294967295) which can be quite a problem in particular if these are private comments. The other thing which tends to happen - you get A LOT of comments associated with this article and few systems are designed with YouTube capacity to handle millions of comments per item.

Note the question in this case is not like there is just one article which has insane number of comments but a as a all articles outside of allowed range have it:

SQL:
  1. mysql> EXPLAIN SELECT count(*) FROM article_comment WHERE article_id=4300000000;
  2. +----+-------------+-----------------+------+---------------+------------+---------+-------+------+--------------------------+
  3. | id | select_type | TABLE           | type | possible_keys | KEY        | key_len | ref   | rows | Extra                    |
  4. +----+-------------+-----------------+------+---------------+------------+---------+-------+------+--------------------------+
  5. 1 | SIMPLE      | article_comment | ref  | article_id    | article_id | 4       | const |    3 | USING WHERE; USING INDEX |
  6. +----+-------------+-----------------+------+---------------+------------+---------+-------+------+--------------------------+
  7. 1 row IN SET, 1 warning (0.00 sec)
  8.  
  9. mysql> SHOW warnings;
  10. +---------+------+--------------------------------------------------------------+
  11. | Level   | Code | Message                                                      |
  12. +---------+------+--------------------------------------------------------------+
  13. | Warning | 1264 | Out of range value adjusted FOR COLUMN 'article_id' at row 1 |
  14. +---------+------+--------------------------------------------------------------+
  15. 1 row IN SET (0.00 sec)

MySQL optimizer behaves rather strange in such case (at least in MySQL 5.0) - instead of instantly saying there is no such value (because value is out of range what could be stored in the column) it truncates the value, performs index lookup, scanning all rows (just 3 in my trivial example) and when filtering them out because value really does not match the where clause.

This makes such error even harder to catch - you will not see wrong comments for a lot articles, you will see 0 while queries will just run insanely slow.

So how to avoid this problem ?

It is actually quite easy - you just need to enable strict mode:

SQL:
  1. mysql> SET sql_mode=STRICT_ALL_TABLES;
  2. Query OK, 0 rows affected (0.00 sec)
  3.  
  4. mysql> INSERT INTO article_comment VALUES(4400000000,1);
  5. ERROR 1264 (22003): Out of range value adjusted FOR COLUMN 'article_id' at row 1

Note however strict mode should not be taken lightly - it is good for reliability but many applications will need to be fixed before they start working properly with this mode.


Entry posted by peter | One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks


^==Back Home: www.chedong.com

^==Back Digest Home: www.chedong.com/digest/

<== 2009-02-06
  二月 2009  
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28  
==> 2009-02-08