14:21 Computing 95 percentile in MySQL » MySQL Performance Blog

When doing performance analyzes you often would want to see 95 percentile, 99 percentile and similar values. The "average" is the evil of performance optimization and often as helpful as "average patient temperature in the hospital".

Lets set you have 10000 page views or queries and have average response time of 1 second. What does it mean ? Really nothing - may be one page view was 10000 seconds and the rest was in low milliseconds or may be you had every single page view taking 1 second, which are completely different.

You also do not really care about average performance - the goal of good user experience is majority of users to have good experience and average is not a good fit here. Defining your response time goal in 95 or 99 percentile is much better. Say you say 99 percentile response time should be one second, this means only 1 percent of queries/page views are allowed to take more than that. For larger systems defining (increasing) response times for 99.9 or even 99.99 percentile numbers often make sense.

It also often makes sense to define response time goals separately for different transactions - the AJAX widget response time requirements may be very different from the slow search page.

So you have defined your response time in terms of 95/99 percentile and get your logs in the table, so how to get the data if MySQL only provides you the avg:

SQL:
  1. mysql> SELECT count(*),avg(wtime) FROM performance_log_081128 WHERE page_type='search';
  2. +----------+-----------------+
  3. | count(*) | avg(wtime)      |
  4. +----------+-----------------+
  5. |   106859 | 1.4469140766532 |
  6. +----------+-----------------+
  7. 1 row IN SET (2.08 sec)

The average response time here is for example; the real data what we need is number of rows which matches for given query type.

Dividing the count by 100 we get our 1% of values and dividing by 20 5% of values, now we can get the response time we concerned about simply by running following order-by queries:

SQL:
  1. mysql> SELECT wtime FROM performance_log_081128 WHERE page_type='search' ORDER BY wtime DESC LIMIT 1068,1;
  2. +---------+
  3. | wtime   |
  4. +---------+
  5. | 10.1007 |
  6. +---------+
  7. 1 row IN SET (2.06 sec)
  8.  
  9. mysql> SELECT wtime FROM performance_log_081128 WHERE page_type='search' ORDER BY wtime DESC LIMIT 5342,1;
  10. +---------+
  11. | wtime   |
  12. +---------+
  13. | 5.09297 |
  14. +---------+
  15. 1 row IN SET (2.06 sec)

So for this system the 95 percentile is just over 5 sec (some 3 times more than the average) and 99% percentile is just a bit over 10 seconds (6 times more than average). The both numbers are horrible and system surely needs to be fixed.

These numbers are to illustrate - the percentile numbers can be quite different from average numbers (it is not rare to see 99 percentile to be order of magnitude different from the average) and this is what you really need to focus on.

Looking at the numbers from the business standpoint try to understand what these really are. In some cases I see rather bad percentile on the backend which are not really the problem for the business because there is a cache up front anyway. If 99% of requests are coming from the cache and you observe certain 99 percentile response time on the backend it is often 99.99 percentile response time which is a lot different - you often can afford 1/10000 requests to stall for few seconds, because things outside of your control (like packet loss at client side) would be responsible for larger amount of delays.

Be careful though - the "random" delays, for example if system was busy and delayed servicing request is one thing, "systematic" delays, when response time is always bad in given conditions can be much worse problems. You do not want your best client to suffer for example, even if he is the only one.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

06:24 MySQL for Hosting Providers - how do they manage ? » MySQL Performance Blog

Working with number of hosting providers I always wonder how do they manage to keep things up given MySQL gives you so little ways to really restrict how much resources single user can consume. I have written over a year ago about 10+ ways to crash or overload MySQL and since that people have come to me and suggested more ways to do the same.

This is huge hole in MySQL design, thinking little about users isolations and resource quotas and interesting enough I have not seen significant changes in fresh our MySQL 5.1 GA or even something major on the roadmap for future MySQL versions. May be Drizzle will give it a thought ? This surely would help adoption by (especially low end) Hosting Providers and remember this exactly where a lot of kids start to develop their first sites and play with web technologies.

So how do the hosting providers manage to host hundreds of users on single server with single MySQL server ? Well people just seems to be nice and not looking to crash MySQL/DOS server on purpose but rather cause most of the issues unintentionally by running bad queries or installing bad software.

It is good people are nice but it may not be comforting to know you stay up just because nobody wants bring you down rather than because your systems are solid and designed to prevent such abuse.

The systems which I see people implementing are typically focused on the load eliminating - using Google UserStats patches (included in Percona releases) or Log analyzes as well as PROCESSLIST monitoring. This allows you to crack down on users which cause a lot of load which causes a lot of unintentional abusers, but to get MySQL in trouble you do not need a lot of load. You can do this by very light queries which would not show up in the PROCESSLIST or will not take too much combined time in the logs to attract your attention;

For example:

SQL:
  1. SET @a1:=repeat("a",1000000); SELECT sleep(1);
  2. SET @a2:=repeat("a",1000000); SELECT sleep(1);
  3. SET @a3:=repeat("a",1000000); SELECT sleep(1);
  4. SET @a4:=repeat("a",1000000); SELECT sleep(1);
  5. SET @a5:=repeat("a",1000000); SELECT sleep(1);
  6. SET @a6:=repeat("a",1000000); SELECT sleep(1);
  7. SET @a7:=repeat("a",1000000); SELECT sleep(1);
  8. SET @a8:=repeat("a",1000000); SELECT sleep(1);
  9. SET @a9:=repeat("a",1000000); SELECT sleep(1);
  10. SET @a10:=repeat("a",1000000); SELECT sleep(1);
  11. SET @a11:=repeat("a",1000000); SELECT sleep(1);
  12. SET @a12:=repeat("a",1000000); SELECT sleep(1);
  13. SET @a13:=repeat("a",1000000); SELECT sleep(1);
  14. ...

Causes MySQL to "leak" 1MB of memory per second with no apparent good reason - no slow queries or queries in the PROCESS LIST and you can't really track how much memory was allocated for given session (or restrict this number)

Now remove sleep(1) and you will get mysqld quickly running out of memory and being killed by OOM killer or being unusable for any queries. In my test I got OOM though it kept the box stalled for few minutes before that:

Out of memory: Killed process 1081 (mysqld).
automount invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Out of memory: Killed process 1081 (mysqld).
automount invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
[] out_of_memory+0x8e/0x2f5
[] __alloc_pages+0x245/0x2ce
[] __do_page_cache_readahead+0x95/0x1d9
[] :dm_mod:dm_any_congested+0x38/0x3f
[] filemap_nopage+0x148/0x322
[] __handle_mm_fault+0x1f8/0xe23
[] do_page_fault+0x4cb/0x830
[] error_exit+0x0/0x84

Do I need any particular privileges for this to happen ? Not really - you do not even need to be able to select from the table. Any user with permission to connect to MySQL Server can crash it.


Entry posted by peter | 5 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks


^==Back Home: www.chedong.com

^==Back Digest Home: www.chedong.com/digest/

<== 2008-11-28
  十一月 2008  
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
==> 2008-11-30