15:55 谷歌Wave对决Twitter ? » Delicious/chedong
貌似GMail中会内置类似于Twitter的功能。
13:34 Analytics Blog Now in Spanish » Google Analytics Blog
We're very happy to announce the recent launch of Central de Conversiones, a new Spanish-language blog covering Google's measurement tools including Google Analytics, Website Optimizer, Insights for Search, AdPlanner and others.

Both Googlers and GAACs from Spanish speaking countries such as Argentina, Mexico and Spain will be sharing basic tips and advanced web analytics techniques trying to help decision makers integrate data from these tools into their business strategies.

Central de Conversiones will translate important posts from this blog but will also create content and share case studies specific to the Spanish Speaking markets. We're sure that you will find it a very useful tool.

¡Esperamos lo disfruten!

12:15 Google Wave Architecture » High Scalability - Building bigger, faster, more reliable websites.

Update: Good Vibrations by Radovan Semančík. Lot's of interesting questions about how Wave works, scalability, security, RESTyness, and so on.

Google Wave is a new communication and collaboration platform based on hosted XML documents (called waves) supporting concurrent modifications and low-latency updates. This platform enables people to communicate and work together in new, convenient and effective ways. We will offer these benefits to users of Google Wave and we also want to share them with everyone else by making waves an open platform that everybody can share. We welcome others to run wave servers and become wave providers, for themselves or as services for their users, and to "federate" waves, that is, to share waves with each other and with Google Wave. In this way users from different wave providers can communicate and collaborate using shared waves. We are introducing the Google Wave Federation Protocol for federating waves between wave providers on the Internet.

Here are the initial white papers that are available to complement the Google Wave Federation Protocol:

The Google Wave APIs are documented here.

10:08 HotPads Shows the True Cost of Hosting on Amazon » High Scalability - Building bigger, faster, more reliable websites.

Mather Corgan, president of HotPads, gave a great talk on how HotPads uses AWS to run their real estate search engine. I loved the presentation for a few reasons:

  • It gives real costs on on their servers, how many servers they have, what they are used for, and exactly how they use S2, EBS, CloudFront and other AWS services. This is great information for anybody trying to architect a system and wondering where to run it.
  • HotPads is a "real" application. It's a small company and at 4.5 million page-views/month it's large but not super large. It has custom server side components like indexing engines, image processing, and background database update engines for syncing new real estate data. And it also stores a lot of images and has low latency requirements.

    This a really good example mix of where many companies are or would like to be with their applications.

    Their total costs are about $11K/month, which is about what they were paying at their previous provider. I found this is a little surprising as I thought the cloud would be more expensive, but they only pay for what they need instead of having to over provision for transient uses testing. Some servers aren't necessary anymore. EBS handles backups so database slave servers aren't necessary.

    There are lots more lessons like this that I've abstracted down below.

    read more

  • 05:30 A rule of thumb for choosing column order in indexes » MySQL Performance Blog

    I wanted to share a little rule of thumb I sometimes use to decide which columns should come first in an index. This is not specific to MySQL, it's generally applicable to any database server with b-tree indexes. And there are a bunch of subtleties, but I will also ignore those for the sake of simplicity.

    Let's start with this query, which returns zero rows but does a full table scan. EXPLAIN says there are no possible_keys.

    SQL:
    1. SELECT * FROM tbl WHERE STATUS='waiting' AND source='twitter'
    2.  AND no_send_before <= '2009-05-28 03:17:50' AND tries <= 20
    3.  ORDER BY date ASC LIMIT 1;

    Don't try to figure out the meaning of the query, because that'll add complexity to the example ;-) In the simplest case, we want to put the most selective column first in the index, so that the number of possible matching rows is the smallest, i.e. we find the rows as quickly as possible. Assuming that all the columns have an even distribution of values, we can just count the number of matching rows for each criterion.

    SQL:
    1. SELECT sum(STATUS='waiting'), sum(source='twitter'),
    2.  sum(no_send_before <= '2009-05-28 03:17:50'), sum(tries <= 20), count(*)
    3.  FROM tbl\G
    4.  *************************** 1. row ***************************
    5.                        sum(STATUS ='waiting'): 550
    6.                         sum(source='twitter'): 37271
    7.  sum(no_send_before <= '2009-05-28 03:17:50'): 36975
    8.                              sum(tries <= 20): 36569
    9.                                      count(*): 37271

    This is pretty simple -- all I did was wrap each clause in a SUM() function, which in MySQL is equivalent to COUNT(number_of_times_this_is_true). It looks like the most selective criterion is "status=waiting". Let's put that column first in the index. Now, pull it out of the SELECT list and put it into the WHERE clause, and run the query again to get numbers within the subset of rows that match:

    SQL:
    1. SELECT sum(source='twitter'),
    2.  sum(no_send_before <= '2009-05-28 03:17:50'), sum(tries <= 20), count(*)
    3.  FROM tbl WHERE STATUS='waiting'\G
    4.  *************************** 1. row ***************************
    5.                         sum(source='twitter'): 549
    6.  sum(no_send_before <= '2009-05-28 03:17:50'): 255
    7.                              sum(tries <= 20): 294
    8.                                      count(*): 549

    So we're down to a reasonable number of rows (the count() is changing because I'm running this on live data, by the way). It looks like the 'source' is no more selective, that is, it won't filter out any more rows within this set. So adding it to the index would not be useful. We can filter this set further by either the 'no_send_before' or the 'tries' column. Doing so on either will reduce the count of matches for the other to zero:

    SQL:
    1. SELECT sum(source='twitter'),
    2.  sum(no_send_before <= '2009-05-28 03:17:50'), sum(tries <= 20), count(*)
    3.  FROM tbl WHERE STATUS='waiting' AND no_send_before
    4.  <= '2009-05-28 03:17:50'\G
    5.  *************************** 1. row ***************************
    6.                         sum(source='twitter'): 255
    7.  sum(no_send_before <= '2009-05-28 03:17:50'): 255
    8.                              sum(tries <= 20): 0
    9.                                      count(*): 255
    10.  
    11. SELECT sum(source='twitter'),
    12.  sum(no_send_before <= '2009-05-28 03:17:50'), sum(tries <= 20), count(*)
    13.  FROM tbl WHERE STATUS='waiting' AND tries <= 20\G
    14.  *************************** 1. row ***************************
    15.                         sum(source='twitter'): 294
    16.  sum(no_send_before <= '2009-05-28 03:17:50'): 0
    17.                              sum(tries <= 20): 294
    18.                                      count(*): 294

    That means we can add an index on either of (status,tries) or (status,no_send_before) and we will find the zero rows pretty efficiently. Which is better depends on what this table is really used for, which is a question I'm avoiding.


    Entry posted by Baron Schwartz | No comment

    Add to: delicious | digg | reddit | netscape | Google Bookmarks

    03:31 New Book: Even Faster Web Sites: Performance Best Practices for Web Developers » High Scalability - Building bigger, faster, more reliable websites.

    Performance is critical to the success of any web site, and yet today's web applications push browsers to their limits with increasing amounts of rich content and heavy use of Ajax. In his new book Even Faster Web Sites: Performance Best Practices for Web Developers, Steve Souders, web performance evangelist at Google and former Chief Performance Yahoo!, provides valuable techniques to help you optimize your site's performance.

    Souders' previous book, the bestselling High Performance Web Sites, shocked the web development world by revealing that 80% of the time it takes for a web page to load is on the client side. In Even Faster Web Sites, Souders and eight expert contributors provide best practices and pragmatic advice for improving your site's performance in three critical categories:

    Speed is essential for today's rich media web sites and Web 2.0 applications. With this book, you'll learn how to shave precious seconds off your sites' load times and make them respond even faster.

    About the Author

    Steve Souders works at Google on web performance and open source initiatives. His book High Performance Web Sites explains his best practices for performance along with the research and real-world results behind them. Steve is the creator of YSlow, the performance analysis extension to Firebug. He is also co-chair of Velocity 2008, the first web performance conference sponsored by O'Reilly. He frequently speaks at such conferences as OSCON, Rich Web Experience, Web 2.0 Expo, and The Ajax Experience.

    Steve previously worked at Yahoo! as the Chief Performance Yahoo!, where he blogged about web performance on Yahoo! Developer Network. He was named a Yahoo! Superstar. Steve worked on many of the platforms and products within the company, including running the development team for My Yahoo!.


    ^==Back Home: www.chedong.com

    ^==Back Digest Home: www.chedong.com/digest/

    <== 2009-06-04
      六月 2009  
    1 2 3 4 5 6 7
    8 9 10 11 12 13 14
    15 16 17 18 19 20 21
    22 23 24 25 26 27 28
    29 30          
    ==> 2009-06-06