The web is going the way of utf8. Drizzle has chosen it as the default character set, most back-ends to websites use it to store text data, and those who are still using latin1 have begun to migrate their databases to utf8. Googling for "mysql convert charset to utf8" results in a plethora of sites, each with a slightly different approach, and each broken in some respect. I'll outline those approaches here and show why they don't work, and then present a script that can generically be used to convert a database (or set of tables) to a target character set and collation.
Approach #1:
Take the following table as an example why this approach will not work:
Notice the implicit conversion of c1 from text to mediumtext. This approach can result in modified data types and silent data truncation, which makes it unacceptable for our purposes.
Approach #2 (outlined here):
This approach avoids the issue of implicit conversions by changing each data type to it's binary counterpart before conversion. Due to implementation limitations, however, it also converts any pre-existing binary columns to their text counterpart. Additionally, this approach will fail because a binary column cannot be part of a FULLTEXT index. Even if these limitations are overcome, this process is inherently unsuitable for large databases because it requires multiple alter statements to be run on each table:
1) Drop FULLTEXT indexes
2) Convert target columns to their binary counterparts
3) Convert the table to the target character set
4) Convert target columns to their original data types
5) Add FULLTEXT indexes back
For those of us routinely waiting hours, if not days, for a single alter statement to finish, this is unacceptable.
Approach #3:
Dumping the entire database and re-importing it with the appropriate server & client character sets.
This is a three-step process, where one must first dump only the schema and then edit it by hand to have the appropriate character sets and the dump the data separately. After which, the schema must be re-created and data imported. If you're using replication, this usually isn't even an option because you'll have a ridiculous amount of binary logs and force a reload of data on every server in the replication chain (very time/bandwidth/disk space consuming).
Except for Approach #1, these approaches are much more difficult than they need to be. Consider the following ALTER statement against the table in Approach #1:
This approach will both change the default character set for the table and target column, while leaving in place any FULLTEXT indexes. It also requires only a single ALTER statement for a given table. A perl script has been put together to parallel-ize the ALTER statements and is available at:
It will be added to Percona Tools on Launchpad (or perhaps maatkit, if it proves useful enough) once it is feature complete. Outstanding issues include:
- Proper handling of string foreign keys (currently fails, but you probably shouldn't be using strings as foreign keys anyway ...)
- Allow throttling of the number of threads created (currently creates one per table)
Entry posted by Ryan Lowe | No comment
I am excited about the upcoming release of two books on Web 2.0 and Cloud Application Architectures by O'Reilly.
Web 2.0 Architectures (estimated release in May 2009)
What entrepreneurs and information architects need to know
Using several high-profile Web 2.0 companies as examples, authors Duane Nickull, Dion Hinchcliffe, and James Governor have distilled the core patterns of Web 2.0 coupled with an abstract model and reference architecture. The result is a base of knowledge that developers, business people, futurists, and entrepreneurs can understand and use as a source of ideas and inspiration. Featured architectures include Google, Flickr, BitTorrent, MySpace, Facebook, and Wikipedia.
Cloud Application Architectures (estimated release in April 2009)
Building Applications and Infrastructure in the Cloud
This book by George Reese offers tested techniques for creating web applications on cloud computing infrastructures and for migrating existing systems to these environments. Specifically, you'll learn about the programming and system administration necessary for supporting transactional web applications in the cloud -- mission-critical activities that include orders and payments to support customers.
The second book is available online at O'Reilly as a Rough Cuts Version so you might already had a chance to check it out. If so, do you like it?
Shared by 车东
每两周要读完一本书
Joe Marasco是Rational Software的一名已经退休的事业部经理和高级副总裁。他将自己多年软件开发与管理经验的精华萃取成《The Software Development Edge:Essays on Managing Successful Projects》一书,该书中文名为《软件开发的边界——管理成功的项目》(个人觉得这个名字译得不好)。
此书着实是近期有关软件开发项目的一本精品书籍,在“第四章——管理”中,Joe提出了软件项目团队管理的十条忠告。
三月 2009 | ||||||
一 | 二 | 三 | 四 | 五 | 六 | 日 |
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | 31 |