Strategy: In Cloud Computing Systematically Drive Load to the CPU

10:09 Cloud Programming Directly Feeds Cost Allocation Back into Software Design » High Scalability - Building bigger, faster, more reliable websites.

Update 3: An interesting simple example of this idea showed up on the Google AppEngine list. With one paging algorithm and one use of AJAX the yearly cost of the site was $1000. By changing those algorithms the site went under quota and became free again. This will make life a lot more interesting for developers.
Update 2: Business Model Influencing Software Architecture by Brandon Watson. The profitability of your project could disappear overnight on account of code behaving badly.
Update: Amazon adds Elastic Block Store at $0.10 per 1 million I/O requests. Now I need some cost minimization storage algorithms!

In the GAE Meetup yesterday a very interesting design rule came up: Design By Explicit Cost Model. A clumsy name I know, but it is explained like this:

If you are going to be charged for an operation GAE wants you to explicitly ask for it. This is why some automatic navigation between objects isn't provided because that will force an explicit query to be written. Writing an explicit query is a sort of EULA for being charged. Click OK in the form of a query and you've indicated that you are prepared to pay for a database operation.

Usually in programming the costs we talk about are time, space, latency, bandwidth, storage, person hours, etc. Listening to the Google folks talk about how one of their explicit design goals was to require programmers to be mindful of operations that will cost money made me realize in cloud programming cost will be another aspect of design we'll have to factor in.

Instead of asking for the Big O complexity of an algorithm we'll also have to ask for the Big $ (or Big Euro) notation so we can judge an algorithm by its cost against a particular cloud profile. Maybe something like $(CPU=1.3,DISK=3,IN-BANDWIDTH=2,OUT=BANDWIDTH=3, DB=10). You could look at the Big $ notation for algorithm and shake your head saying that approach will never work for GAE, but it could work for Amazon. Can we find a cheaper Big $? ...

Lightcloud is a distributed and persistent key-value database from Plurk.com. Performance is said to be comparable to memcached. It's different than memcachedb because it scales out horizontally by adding new nodes. It's different than memcached because it persists to disk, it's not just a cache. Now you have one more option in the never ending quest to ditch the RDBMS.

Their website does a nice job explaining the system:

Built on Tokyo Tyrant. One of the fastest key-value databases [benchmark]. Tokyo Tyrant has been in development for many years and is used in production by Plurk.com, mixi.jp and scribd.com (to name a few)...

Great performance (comparable to memcached!)

Can store millions of keys on very few servers - tested in production

Scale out by just adding nodes

Nodes are replicated via master-master replication. Automatic failover and load balancing is supported from the start

Ability to script and extend using Lua. Included extensions are incr and a fixed list

Hot backups and restore: Take backups and restore servers without shutting them down

LightCloud manager can control nodes, take backups and give you a status on how your nodes are doing

Very small foot print (lightcloud client is around ~500 lines and manager about ~400)

Python only, but LightCloud should be easy to port to other languages

07:46 Product: Amazon Simple Storage Service » High Scalability - Building bigger, faster, more reliable websites.

Update: HostedFTP.com - Amazon S3 Performance Report. How fast is S3? Based on their own study HostedFTP.com has found: 10 to 12 MB/second when storing and receiving files and 140 ms per file stored as a fixed overhead cost.

	三月 2009
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31