22:34 Four ways to optimize paginated displays » MySQL Performance Blog

A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples. You know the design pattern: display 20 results in some most-relevant order. Show a "next" and "previous" link. And usually, show how many items are in the whole list and how many pages of results there are.

Rendering such a display can consume more resources than the entire rest of the site!

As an example, I'm looking at slow log analysis results (with our microslow patches, set to log all queries) for one client; the slow log contains 6300 seconds' worth of queries, and the two main queries for the paginated display consumed 2850 and 380 seconds, respectively.

Why is it so expensive? I typically see queries like this:

SQL:
  1. SELECT .... FROM ... ORDER BY .... LIMIT X, 20

If the ORDER BY can't use an index (commonly the case), it uses a filesort. Suppose there are a million rows that meet any WHERE conditions. That means a million rows are retrieved, stored, filesorted, then most of them are discarded and only 20 retrieved. If the user clicks the "next" button the same process happens again, only a different 20 are retrieved. And to show the list of pages and the total count, you either a) use SQL_CALC_FOUND_ROWS (see our post on this) or b) execute a separate SELECT to count the rows.

There are ways to optimize so you don't have to do quite so much offsetting and limiting. I wrote about this on O'Reilly's website in an article on optimizing ranked data. But frankly it's not that easy to do in the real world; you can usually optimize for one access method to the data at some significant cost in complexity and maintenance (which might be worth it) but not for many different ways of accessing the same data, which is more typical in websites we work on.

Beyond indexing, re-organizing data, or query optimizations, there are two big things you can do. One is caching aggressively to prevent these queries from running. The other is to rethink the paradigm. Just because everyone lays out such pages in the same way doesn't mean you need to. Think about how you use such pages. Do you really go clicking directly to the Nth page of results, or the last page? "Hmm, it found 13928 results. Let me look at the least relevant search results for my query." Generally not -- you usually look at the most helpful stuff, which is supposed to be first in the list.

With that in mind, here are four suggestions for optimizing paginated displays that can give significantly better performance.

  1. On the first query, fetch and cache all the results. Now it's easy to know how many results there are, and fetching subsequent pages is no extra work for the database. In this model, you get to keep your "found X rows, showing page N of M" display that many people cherish.
  2. Don't show all results. Not even Google lets you see the millionth result. You get to see N results and after that, you're done. Limit the results to 100 or 500 or something. Remember, the further you go into the list, the more rows you are scanning and discarding with that LIMIT. This technique works great in conjunction with the first one. If you want to show 500 results, maybe you can fetch 501 and if the 501st row exists, display "more than 500 results found."
  3. Don't show the total count or the intermediate links to other pages. Show only the "next" link. (If people want to see the "previous" results, they can use their browser's back button.) You can do this by fetching one more result than you want to display -- for example, fetch 21 rows and display only 20. If there's a 21st row, render the "next" link; if not, you're at the end of the list. This way you don't have to calculate how many results there are, and if caching is difficult then this is a simple way to avoid some of the costs.
  4. Estimate how many results there are. Again, Google does this and nobody complains. Use EXPLAIN and look at the "rows" column -- that's a fine estimate for some scenarios. (This tip doesn't work in as many scenarios as others, but it's still acceptable in many.)

These suggestions can take a lot of work off the database server without impacting the user's experience at all.


Entry posted by Baron Schwartz | 3 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

10:06 AdSense 中小合作伙伴联谊会圆满成功 » Inside AdSense-中文


9 月 19 日,AdSense 小组在谷歌公司举办了中小合作伙伴联谊会,并邀请了三十余位北京地区的发布商参加。

这次联谊会旨在加强与中小合作伙伴的沟通和交流,并更好地服务于中小合作伙伴。

会上推出了专门服务于中小合作伙伴的“AdSense成长计划”,成长计划包括“新手训练营”、“珍珠计划”和“贵宾俱乐部”,旨在帮助新加入联盟的发布商及中小发布商快速发展和成长。我们会在后面的文章中对“AdSense 成长计划”做详细介绍。

同时,这一天也是 中文AdSense 的三周年纪念日,所有参会的发布商与 AdSense 小组全体成员一同庆祝了这个特别的日子。

这次联谊会得到了发布商的赞许和肯定,并希望我们能举办更多类似的沟通活动。我们会继续努力,创造更多与发布商面对面沟通的机会。


先玩一个热身游戏,大家熟悉一下


赢啦!


演讲开始了,专心听讲


互动时间-有奖问答- 猜猜看AdSense 中文小组的第一位员工是谁?


AdSense 亚太区总监周文彪,AdSense 大中华区经理王莹,和同一天生日的发布商一起切生日蛋糕


谷歌大厨为大家精心准备的甜点


轻松交流
03:32 How to Scale with Ruby on Rails » High Scalability - Building bigger, faster, more reliable websites.

By George Palmer of 3dogsbark.com. Covers:
* How you start out: shared hosting, web server DB on same machine. Move two 2 machines. Minimal code changes.
* Scaling the database. Add read slaves on their own machines. Then master-master setup. Still minimal code changes.
* Scaling the web server. Load balance against multiple application servers. Application servers scale but the database doesn't.
* User clusters. Partition and allocate users to their own dedicated cluster. Requires substantial code changes.
* Caching. A large percentage of hits are read only. Use reverse proxy, memcached, and language specific cache.
* Elastic architectures. Based on Amazon EC2. Start and stop instances on demand. For global applications keep a cache on each continent, assign users to clusters by location, maintain app servers on each continent, use transaction replication software if you must replicate your site globally.

02:22 The 7 Stages of Scaling Web Apps » High Scalability - Building bigger, faster, more reliable websites.

By John Engales CTO, Rackspace. Good presentation of the stages a typical successful website goes through:
* Stage 1 - The Beginning: Simple architecture, low complexity. no redundancy. Firewall, load balancer, a pair of web servers, database server, and internal storage.
* Stage 2 - More of the same, just bigger.
* Stage 3 - The Pain Begins: publicity hits. Use reverse proxy, cache static content, load balancers, more databases, re-coding.
* Stage 4 - The Pain Intensifies: caching with memcached, writes overload and replication takes too long, start database partitioning, shared storage makes sense for content, significant re-architecting for DB.
* Stage 5 - This Really Hurts!: rethink entire application, partition on geography user ID, etc, create user clusters, using hashing scheme for locating which user belongs to which cluster.
* Stage 6 - Getting a little less painful: scalable application and database architecture, acceptable performance, starting to add ne features again, optimizing some code, still growing but manageable.
* Stage 7 - Entering the unknown: where are the remaining bottlenecks (power, space, bandwidth, CDN, firewall, load balancer, storage, people, process, database), all eggs in one basked (single datacenter, single instance of data).

01:17 Useful Cloud Computing Blogs » High Scalability - Building bigger, faster, more reliable websites.

Update: 9 Sources of Cloud Computing News You May Not Know About by James Urquhart. I folded in these recommendations.

Can't get enough cloud computing? Then you must really be a glutton for punishment! But just in case, here are some cloud computing resources, collected from various sources, that will help you transform into a Tesla silently flying solo down the diamond lane.

Meta Sources

  • Cloud Computing Email List: An often lively email list discussing cloud computing.
  • Cloud Computing Blogs & Resources. An excellent and big list of cloud resources.
  • Cloud Computing Portal: A community edited database for making the vendor selection process easier.
  • List of Cloud Platforms, Providers, and Enablers.
  • datacenterknowledge.com's Recap: More than 70 Industry Blogs : A nice set of blog's for: Data Center, Web Hosting, Content Delivery Network (CDN), Cloud Computing
  • Cloud Computing Wiki: A cloud computing wiki started by participants of the cloud email list.

    Specific Blogs

  • James Urquhart's The Wisdom of Clouds : Cloud Computing and Utility Computing for the Enterprise and the Individual. James writes great articles and has a regular can't miss links style post summarizing much of what you need need to know in cloud world.

    Many more below the fold.

    read more


  • ^==Back Home: www.chedong.com

    ^==Back Digest Home: www.chedong.com/digest/

    <== 2008-09-23
      九月 2008  
    1 2 3 4 5 6 7
    8 9 10 11 12 13 14
    15 16 17 18 19 20 21
    22 23 24 25 26 27 28
    29 30          
    ==> 2008-09-25