16:49 Scaling Django Web Apps by Mike Malone » High Scalability - Building bigger, faster, more reliable websites.

Film buffs will recognize Django as a classic 1966 spaghetti western that spawned hundreds of imitators. Web heads will certainly first think of Django as the classic Python based Web framework that has also spawned hundreds of imitators and has become the gold standard framework for the web.

Mike Malone, who worked on Pownce, a blogging tool now owned by Six Apart, tells in this very informative EuroDjangoCon presentation how Pownce scaled using Django in the real world.

I was surprised to learn how large Pounce was: hundreds of requests/sec, thousands of DB operations/sec, millions of user relationships, millions of notes, and terabytes of static data. Django has a lot of functionality in the box to help you scale, but if you want to scale large it turns out Django has some limitations and Mike tells you what these are and also provides some code to get around them.

Mike's talk-although Django specific--will really help anyone creating applications on the web. There's a lot of useful Django specific advice and a lot of general good design ideas as well.

The topics covered in the talk are:

read more

13:27 What time 18446744073709550.000 means » MySQL Performance Blog

Sometimes when you do profiling you can see number like this in timestamps. Periodically in our patches there was Query_time: 18446744073709550.000 in slow.log file (well, it was fixed recently, but still appears in other places).

I faced this problem several years ago when only 2-core AMD Opteron systems appeared and I noticed sysbench getting crazy showing query execution time 18446744073709550.000 or like this.

Obviously this is unsigned integer which was received by subtraction bigger number from smaller.
But how it can be if we use:

start_time = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp);

query_execution

end_time = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tp);

total_time = end_time - start_time;

How we can get query executed in negative time ? It would be too good to be real :).

After some research I found the reason, and it is interesting that I want to share it even few years later. The reason is that different CPU on the same motherboard uses different frequency generators (each CPU has its own frequency generator, though exact implementation depends on CPU vendor), and there is no way to get these generators synchronized. That is if in your server you have 4 CPU, each of them lives on its own time. Usually difference between generators is very small (microseconds matter), but it appears this difference is enough to get negative time.
It happens when your query started on one CPU and during execution it was switched to another CPU (where generator delayed to first CPU) and time between two measurements were small enough to catch difference in generators. In result you have negative time, but it actually, it is more serious, it mean - we can’t get exact time between two points in application.
So designing profiling in your application be aware of. I wonder how realtime sensor systems, where microseconds are important, deal with this.


Entry posted by Vadim | 3 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

11:10 Product: Hadoop » High Scalability - Building bigger, faster, more reliable websites.

Update 5: Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds and has its green cred questioned because it took 40 times the number of machines Greenplum used to do the same work.
Update 4: Introduction to Pig. Pig allows you to skip programming Hadoop at the low map-reduce level. You don't have to know Java. Using the Pig Latin language, which is a scripting data flow language, you can think about your problem as a data flow program. 10 lines of Pig Latin = 200 lines of Java.
Update 3: Scaling Hadoop to 4000 nodes at Yahoo!. 30,000 cores with nearly 16PB of raw disk; sorted 6TB of data completed in 37 minutes; 14,000 map tasks writes (reads) 360 MB (about 3 blocks) of data into a single file with a total of 5.04 TB for the whole job.
Update 2: Hadoop Summit and Data-Intensive Computing Symposium Videos and Slides. Topics include: Pig, JAQL, Hbase, Hive, Data-Intensive Scalable Computing, Clouds and ManyCore: The Revolution, Simplicity and Complexity in Data Systems at Scale, Handling Large Datasets at Google: Current Systems and Future Directions, Mining the Web Graph. and Sherpa: Hosted Data Serving.
Update: Kevin Burton points out Hadoop now has a blog and an introductory video staring Beyonce. Well, the Beyonce part isn't quite true.

Hadoop is a framework for running applications on large clusters of commodity hardware using a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed on any node in the cluster. It replicates much of Google's stack, but it's for the rest of us. Jeremy Zawodny has a wonderful overview of why Hadoop is important for large website builders:

read more

08:00 戈壁挑战赛 - 第二天 - 潜行 » 刘润

心无挂碍,无挂碍故,无有恐怖,远离颠倒梦想,究竟涅槃

- 玄奘译《般若心经》

今日潜行近40公里,无话可表,看图。

注:疲累之极,略有扭伤,静待明日。此文为SHE代发。


^==Back Home: www.chedong.com

^==Back Digest Home: www.chedong.com/digest/

<== 2009-05-16
  五月 2009  
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
==> 2009-05-18