We recently examined a customer’s system to try to speed up an ETL (Extraction, Transformation and Loading) process for a big data set into a sort of datamart or DW. What we typically do is ask customers to run the process in question, and then examine what’s happening. In this case, the (very large, powerful) database server was almost completely idle, with virtually no I/O activity or CPU usage. So we looked at the server where the ETL process was running. It was running at 25% CPU usage and was writing some files to disk, but not waiting on I/O.
What’s going on here? Where’s the bottleneck? The process is slow, and neither machine is really doing much work. Why?
Maybe you guessed the network. Nope, not the network either. There was plenty of spare network capacity.
If I told you the ETL machine was using exactly 25% of its CPU capacity, would you guess that it had 4 CPU cores and one of them was running at 100% usage? This is what was happening. The ETL app was single-threaded and CPU-bound.
Of course, we measured the entire process, so we could say authoritatively what was going on. But this problem actually took only a few minutes to diagnose. The point here is to look beyond the database server for what seems to be a database problem. This is why we call ourselves “full-stack performance tuning experts.” We try not to have tunnel vision. This reminds me of another problem I helped debug a few weeks ago — a really slow website was due to a curl call that was hidden in the code, and timing out because of DNS issues.
One of the best things you can do to improve your performance is build profiling into your application, or run the application under a profiler (Google for “profiling <language>”) and find the places where it consumes the most time. It’s especially valuable to profile “external resource calls” such as calls to the database, web services, and so on. These approaches can make it much easier to find the slow parts.
We devoted part of a chapter to profiling in our book. We explain a lot of useful techniques to help you build “light-weight” profiling into the application from the start (a very smart thing to do).
Sometimes our clients believe they already know the source of the problem, but they can’t prove it. They ask us to either prove them right or show them the real problem. The way we do this is to prefer measurements to guesses.
Of course, finding the problem is only part of the battle. Fixing it is another matter. But “how to fix every performance problem” doesn’t fit into a single blog post!
Entry posted by Baron Schwartz | One comment
Recently, we added information schema support to Google’s userstats patch.
There are three information schema tables added: user_statistics, table_statistics, index_statistics.
One can now use select * from information_schema.user_statistics along with show user_statistics.
Links:
Patch for 5.0.62
Patch for 5.1.26
Entry posted by Evgeniy | 4 comments
Storm Worm Bots Activated - 14 November 2007
Storm Worm Strikes Again With New Year's Attack - 27 December 2007
Stock Spammers Using Video to Dodge Filters - 30 December 2007
Spammers Using Google To Evade Filters - 25 January 2008
八月 2008 | ||||||
一 | 二 | 三 | 四 | 五 | 六 | 日 |
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |