A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples. You know the design pattern: display 20 results in some most-relevant order. Show a "next" and "previous" link. And usually, show how many items are in the whole list and how many pages of results there are.
Rendering such a display can consume more resources than the entire rest of the site!
As an example, I'm looking at slow log analysis results (with our microslow patches, set to log all queries) for one client; the slow log contains 6300 seconds' worth of queries, and the two main queries for the paginated display consumed 2850 and 380 seconds, respectively.
Why is it so expensive? I typically see queries like this:
If the ORDER BY can't use an index (commonly the case), it uses a filesort. Suppose there are a million rows that meet any WHERE conditions. That means a million rows are retrieved, stored, filesorted, then most of them are discarded and only 20 retrieved. If the user clicks the "next" button the same process happens again, only a different 20 are retrieved. And to show the list of pages and the total count, you either a) use SQL_CALC_FOUND_ROWS (see our post on this) or b) execute a separate SELECT to count the rows.
There are ways to optimize so you don't have to do quite so much offsetting and limiting. I wrote about this on O'Reilly's website in an article on optimizing ranked data. But frankly it's not that easy to do in the real world; you can usually optimize for one access method to the data at some significant cost in complexity and maintenance (which might be worth it) but not for many different ways of accessing the same data, which is more typical in websites we work on.
Beyond indexing, re-organizing data, or query optimizations, there are two big things you can do. One is caching aggressively to prevent these queries from running. The other is to rethink the paradigm. Just because everyone lays out such pages in the same way doesn't mean you need to. Think about how you use such pages. Do you really go clicking directly to the Nth page of results, or the last page? "Hmm, it found 13928 results. Let me look at the least relevant search results for my query." Generally not -- you usually look at the most helpful stuff, which is supposed to be first in the list.
With that in mind, here are four suggestions for optimizing paginated displays that can give significantly better performance.
These suggestions can take a lot of work off the database server without impacting the user's experience at all.
Entry posted by Baron Schwartz | 3 comments
By George Palmer of 3dogsbark.com. Covers:
* How you start out: shared hosting, web server DB on same machine. Move two 2 machines. Minimal code changes.
* Scaling the database. Add read slaves on their own machines. Then master-master setup. Still minimal code changes.
* Scaling the web server. Load balance against multiple application servers. Application servers scale but the database doesn't.
* User clusters. Partition and allocate users to their own dedicated cluster. Requires substantial code changes.
* Caching. A large percentage of hits are read only. Use reverse proxy, memcached, and language specific cache.
* Elastic architectures. Based on Amazon EC2. Start and stop instances on demand. For global applications keep a cache on each continent, assign users to clusters by location, maintain app servers on each continent, use transaction replication software if you must replicate your site globally.
By John Engales CTO, Rackspace. Good presentation of the stages a typical successful website goes through:
* Stage 1 - The Beginning: Simple architecture, low complexity. no redundancy. Firewall, load balancer, a pair of web servers, database server, and internal storage.
* Stage 2 - More of the same, just bigger.
* Stage 3 - The Pain Begins: publicity hits. Use reverse proxy, cache static content, load balancers, more databases, re-coding.
* Stage 4 - The Pain Intensifies: caching with memcached, writes overload and replication takes too long, start database partitioning, shared storage makes sense for content, significant re-architecting for DB.
* Stage 5 - This Really Hurts!: rethink entire application, partition on geography user ID, etc, create user clusters, using hashing scheme for locating which user belongs to which cluster.
* Stage 6 - Getting a little less painful: scalable application and database architecture, acceptable performance, starting to add ne features again, optimizing some code, still growing but manageable.
* Stage 7 - Entering the unknown: where are the remaining bottlenecks (power, space, bandwidth, CDN, firewall, load balancer, storage, people, process, database), all eggs in one basked (single datacenter, single instance of data).
Update: 9 Sources of Cloud Computing News You May Not Know About by James Urquhart. I folded in these recommendations.
Can't get enough cloud computing? Then you must really be a glutton for punishment! But just in case, here are some cloud computing resources, collected from various sources, that will help you transform into a Tesla silently flying solo down the diamond lane.
Many more below the fold.
九月 2008 | ||||||
一 | 二 | 三 | 四 | 五 | 六 | 日 |
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 |