21:00 新版delicious的firefox插件 Add-on for Firefox on Delicious » Delicious/chedong
书签版的脚本位于: http://delicious.com/help/bookmarklets 拖拽到书签栏即可;
20:49 Guessing gender from browser history » O'Reilly Radar 中文站

Nat Torkington Nat Torkington 2008-08-01

I just found a clever trick for guessing gender from browser history. I tried it and then realized that I'm a crappy test for the system: yes, likelihood of my being male is 99%. But if I read a hardcore geek tech blog, then that's probably the case anyway. I could emulate that behaviour with a simple return(G_MALE) in the code.

I pushed the link to a few women for some more strenuous testing. Penny Leach was told she's 52% likely to be female, and Laurel at O'Reilly was told she's 50% likely to be female. Perhaps on the internet, everyone surfs like a MALE with probability 50%. How'd the test work for you? Let me know in the comments ....

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it
18:49 Random OSCON Tidbits » O'Reilly Radar 中文站

Nat Torkington Nat Torkington 2008-08-01

Some things I learned about at the Django/Python meetup in downtown Portland during OSCON:

  • JS Bridge: a Python to Javascript bridge for all Mozilla applications, still under very active development (i.e., changing daily).
  • 960.gs: a grid framework for Javascript (replacing Blueprint CSS) with a naming scheme that makes prototyping designs a lot less painful.
  • Dojo has Django Templates: I take my eye off Dojo for a year and it suddenly grows the ability to have full Django templates in the browser. Holy CRAP.

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it
Google Reader Adds the Blogs You Follow in BloggerGoogle Operating System » Che, Dong 在 Google 阅读器中共享的项目
Google Reader tests a new feature that automatically subscribes you to the "blogs you are following" in Blogger. "The blogs you follow in Blogger have been added as subscriptions in Google Reader. Subscriptions can be managed in Reader without affecting your following list in Blogger."

It's not very clear if "blogs you are following" is a new feature or a synonymous for blogroll, since Google Reader links to a non-existent page that is supposed to reveal more information. A thread from Google Reader Group shows that the new feature was accidentally added and then removed.

"Google Reader automatically added a "Blogs I'm Following" folder on my Reader. I've already got my Reader set up the way I want it and this folder is superfluous and annoying," says Vanessa. "It would be nice if they gave us the option of using it before they just took it over that way! There is no mention of it in any of their help files either, this is just ridiculous," mentions Jackie.

The following screenshot, courtesy of "The Other Drummer", shows the new folder automatically added by Google Reader:
http://www.google.com/reader/view/user/-/state/com.blogger/following.


In other Google Reader news, the iPhone version started to reformat the linked web pages for mobile browser, but this can be changed in the settings. "For users with Nokia and other AppleWebKit-enabled phones, soon your phones won't automatically choose the iPhone version of Google Reader," says a Google employee.

{ Thanks, hlpPy. }
11:00 DNS漏洞解析 - delphij's Chaos -具体地说这种攻击包括两个部分,其一是设法让缓存DNS服务器发起新的DNS查询请求(有很多方法),其二则是发出大量的伪造包并期待其命中,也就是在真实的权威DNS回应之前,将伪造的,但端口和TXID均能匹配的UDP回应发到缓存DNS服务器。 » Delicious/chedong
对于一些以线性递增方式来产生TXID的DNS客户端而言,攻击者可以自行架设一台DNS服务器、诱使对方发出DNS请求,从而大大提高攻击的效率。对于采用固定端口的DNS客户端来说,攻击者可以通过类似的手段来获取他关心的信息。攻击者可以用各种方式来达到这种目的,包括钓鱼邮件,也包括直接发起查询等等。
08:19 Percona RPMS for RedHat 5 / CentOS 5 x86_64 » MySQL Performance Blog

We prepared RPMs of our release for RedHat 5 / CentOS 5 x86_64 platform.
http://www.mysqlperformanceblog.com/mysql/RPM/RHEL5/5.0.62/

There was question what patcheset includes and if there is manuals.
We have:


Entry posted by Vadim | 2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

03:49 Book Review: Pragmatic Thinking and Learning » O'Reilly Radar 中文站

Nat Torkington Nat Torkington 2008-07-31

Andy Hunt's Pragmatic Thinking and Learning: Refactor Your Wetware (Pragmatic Press; 2008) teaches programmers how to master a subject, strategies for using your brain to its fullest, systems for learning, and the best ways to practice. The result is a grab-bag of pop-psych systems, practical strategies, and good old-fashioned inspiration that will give most programmers more footholds as they climb the tree of knowledge. I had expected the book to be about thinking more than learning (it's not) and to be a more rigorous in separating scientifically tested theories from speculation, but despite these disappointments I still recommend the book to any programmer who wants to become better. The book is still in beta, so the final version may yet address my concerns.

The book begins with an explanation of the Dreyfus Model of Skills Acquisition, which identifies and describes five stages we pass through as we master a skill. We begin as novices, who lack a conceptual model of the problem space and also lack the basic skills. Consequently, novices need recipes or rules that specify "if X then Y" in great detail. We then moved to "advanced beginners", who can try tasks on their own but still need help troubleshooting. They're starting to formulate principles, and can reuse techniques in similar contexts (if only just) but still don't have the Big Picture. Then we become "competent", when we can troubleshoot problems and solve new ones. After that comes "proficient", when we seek out the Big Picture and, most significantly, are able to correct our previously poor performance—we can look back at what we did and revise our approach. Finally, we become "expert" with the world view and skills so ingrained that we have phenomenal instincts that let us zen in on the answer immediately without higher-level analysis.

The Dreyfus model isn't a "scientific truth", it's a philosopher's construction that has some traction in the AI world and a lot of traction in the nursing world. Indeed, this is one of the places where I feel the book presents something that's useful without acknowledging that it's not necessarily an absolute truth. However, the general point is that once we have a map, however grainy, we can find where we are and the map will show us what we must do to make progress.

The rest of the book is concepts and skills for moving towards mastery. First Andy tells us the left brain-right brain story (left brain linear and rational, right brain random access and kludgy) and gives advice for engaging the correct side of the brain when you need them. This is where things start to get a bit weird. Andy talks about "free form journaling", "morning pages", and advice to park on the other side of the road. It's here that I feel he's furtherest from useful programming-related ground, although he does attempt to intersperse this rather woo-woo advice with thoughts on pair programming and typesetting code.

Then comes a quick romp through the ways in which we fail to think accurately, a chapter cleverly titled "Debug Your Mind". He covers the standard fare of behavioural economics: fundamental attribution error, anchoring, the imbalance between gaining and losing, our nonlinear sense of risk vs reward. This is the stuff I like (and have read way too many books on lately) but it's coupled with a discussion of demographics and archetypes of which I was, frankly, very suspicious. Sentences like "I am among the eldest of Generation X, on the cusp of the Boomers, I tend to identify with the Gen-X characteristics, especially survivalism, pragmatism, and realism" sound awfully close to a cold reading. And don't even get me started on the Myers-Briggs test that comes next ....

In Chapter 6 we're back onto more solid ground, learning how to learn. Some of this was familiar to me from my highschool days when I read everything I could on how to study (amusingly, this was a procrastination technique so I wouldn't have to study). Andy talks about setting SMART objectives (Specific Measurable Achievable Relevant Time-boxed), and treating your education like an investment plan by making regular active contributions to a diverse portfolio of skills. Then we're into specific learning techniques like SQ3R, spacing retrieval, mind maps, and (my favourite) learning by teaching. I've been working hard lately on purposeful reading as I habitually skim, and I found this whole chapter useful.

The final chapter in the draft I read was "Gain Experience". This is about purposeful exercise of skills. Andy wants us to choose projects that keep our interest but also productively extend our skill set. This means dealing with failure, and Andy has a great line here: "it's not important to get it right the first time, it's important to get it right the last time". He talks about unit testing and version control in the context of purposeful play: we should have the freedom to experiment but without losing the ability to backtrack to a stable state, and the ability to demonstrate progress. He's better here at bringing the pedagogy back to the real world of a programmer, moving seamlessly from the Inner Game of offline practising to a discussion of scaffolding and the benefits of C++ and Ruby programmers learning each others' languages.

Andy's thesis is that we can master a subject faster if we know how to get better at it rather than taking random walks through the manual and problem space. He does a good job of laying out useful techniques and concepts—perhaps too good. While he industriously attempts to connect the abstract theories to the concrete life of a programmer, this reviewer was left with the feeling of too much theory and too many techniques too choose from. Despite this, the book still remains a rewarding read and the only book that tackles these subjects from a programmer's point of view. Recommended.

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it
02:49 Open Source and Cloud Computing » O'Reilly Radar 中文站

Tim O'Reilly Tim O'Reilly 2008-07-31

I've been worried for some years that the open source movement might fall prey to the problem that Kim Stanley Robinson so incisively captured in Green Mars: "History is a wave that moves through time slightly faster than we do." Innovators are left behind, as the world they've changed picks up on their ideas, runs with them, and takes them in unexpected directions.

In essays like The Open Source Paradigm Shift and What is Web 2.0?, I argued that the success of the internet as a non-proprietary platform built largely on commodity open source software could lead to a new kind of proprietary lock-in in the cloud. What good are free and open source licenses, all based on the act of software distribution, when software is no longer distributed but merely performed on the global network stage? How can we preserve freedom to innovate when the competitive advantage of online players comes from massive databases created via user contribution, which literally get better the more people use them, raising seemingly insuperable barriers to new competition?

I was heartened by the program at this year's Open Source Convention. Over the past couple of years, open source programs aimed at the Web 2.0 and cloud computing problem space have been proliferating, and I'm seeing clear signs that the values of open source are being reframed for the network era. Sessions like Beyond REST? Building Data Services with XMPP PubSub, Cloud Computing with BigData, Hypertable: An Open Source, High Performance, Scalable Database, Supporting the Open Web, and Processing Large Data with Hadoop and EC2 were all full. (Due to enforcement of fire regulations at the Portland Convention Center, many of them had people turned away, as SRO was not allowed.)

But just "paying attention" to cloud computing isn't the point. The point is to rediscover what makes open source tick, but in the new context. It's important to recognize that open source has several key dimensions that contribute to its success:

  1. Licenses that permit and encourage redistribution, modification, and even forking;
  2. An architecture that enables programs to be used as components where-ever possible, and extended rather than replaced to provide new functionality;
  3. Low barriers for new users to try the software;
  4. Low barriers for developers to build new applications and share them with the world.

This is far from a complete list, but it gives food for thought. As outlined above, I don't believe we've figured out what kinds of licenses will allow forking of Web 2.0 and cloud applications, especially because the lock-in provided by many of these applications is given by their data rather than their code. However, there are hopeful signs like Yahoo! Boss that companies are at beginning to understand that in the era of the cloud, open source without open data is only half the application.

But even open data is fundamentally challenged by the idea of utility computing in the cloud. Jesse Vincent, the guy who's brought out some of the best hacker t-shirts ever (as well as RT) put it succinctly: "Web 2.0 is digital sharecropping." (Googling, I discover that Nick Carr seems to have coined this meme back in 2006!) If this is true of many Web 2.0 success stories, it's even more true of cloud computing as infrastructure. I'm ever mindful of Microsoft Windows Live VP Debra Chrapaty's dictum that "In the future, being a developer on someone's platform will mean being hosted on their infrastructure." The New York Times dubbed bandwidth providers OPEC 2.0. How much more will that become true of cloud computing platforms?

That's why I'm interested in peer-to-peer approaches to delivering internet applications. Jesse Vincent's talk, Prophet: Your Path Out of the Cloud describes a system for federated sync; Evan Prodromou's Open Source Microblogging describes identi.ca, a federated open source approach to lifestreaming applications.

We can talk all we like about open data and open services, but frankly, it's important to realize just how much of what is possible is dictated by the architecture of the systems we use. Ask yourself, for example, why the PC wound up with an ecosystem of binary freeware, while Unix wound up with an ecosystem of open source software? It wasn't just ideology; it was that the fragmented hardware architecture of Unix required source so users could compile the applications for their machine. Why did the WWW end up with hundreds of millions of independent information providers while centralized sites like AOL and MSN faltered?

Take note: All of the platform as a service plays, from Amazon's S3 and EC2 and Google's AppEngine to Salesforce's force.com -- not to mention Facebook's social networking platform -- have a lot more in common with AOL than they do with internet services as we've known them over the past decade and a half. Will we have to spend a decade backtracking from centralized approaches? The interoperable internet should be the platform, not any one vendor's private preserve. (Neil McAllister provides a look at just how one-sided most platform as a service contracts are.)

So here's my first piece of advice: if you care about open source for the cloud, build on services that are designed to be federated rather than centralized. Architecture trumps licensing any time.

But peer-to-peer architectures aren't as important as open standards and protocols. If services are required to interoperate, competition is preserved. Despite all Microsoft and Netscape's efforts to "own" the web during the browser wars, they failed because Apache held the line on open standards. This is why the Open Web Foundation, announced last week at OScon, is putting an important stake in the ground. It's not just open source software for the web that we need, but open standards that will ensure that dominant players still have to play nice.

The "internet operating system" that I'm hoping to see evolve over the next few years will require developers to move away from thinking of their applications as endpoints, and more as re-usable components. For example, why does every application have to try to recreate its own social network? Shouldn't social networking be a system service?

This isn't just a "moral" appeal, but strategic advice. The first provider to build a reasonably open, re-usable system service in any particular area is going to get the biggest uptake. Right now, there's a lot of focus on low level platform subsystems like storage and computation, but I continue to believe that many of the key subsystems in this evolving OS will be data subsystems, like identity, location, payment, product catalogs, music, etc. And eventually, these subsystems will need to be reasonably open and interoperable, so that a developer can build a data-intensive application without having to own all the data his application requires. This is what John Musser calls the programmable web.

Note that I said "reasonably open." Google Maps isn't open source by any means, but it was open enough (considerably more so than any preceding web mapping service) and so it became a key component of a whole generation of new applications that no longer needed to do their own mapping. A quick look at programmableweb.com shows google maps with about 90% share of mapping mashups. Google Maps is proprietary, but it is reusable. A key test of whether an API is open is whether it is used to enable services that are not hosted by the API provider, and are distributed across the web. Facebook's APIs enable applications on Facebook; Google Maps is a true programmable web subsystem.

That being said, even though the cloud platforms themselves are mostly proprietary, the software stacks running on them are not. Thorstein von Eicken of Rightscale pointed out in his talk Scale Into the Cloud, that almost all of the software stacks running on cloud computing platforms are open source, for the simple reason that proprietary software licenses have no provisions for cloud deployment. Even though open source licenses don't prevent lock-in by cloud providers, they do at least allow developers to deploy their work on the cloud.

In that context, it's important to recognize that even proprietary cloud computing provides one of the key benefits of open source: low barriers to entry. Derek Gottfried's Processing Large Data with Hadoop and EC2 talk was especially sweet in demonstrating this point. Derek described how, armed with a credit card, a sliver of permission, and his hacking skills, he was able to put the NY Times historical archive online for free access, ramping up from 4 instances to nearly 1,000. Open source is about enabling innovation and re-use, and at their best, Web 2.0 and cloud computing can be bent to serve those same aims.

Yet another benefit of open source - try before you buy viral marketing - is also possible for cloud application vendors. During one venture pitch, I was asking the company how they'd avoid the high sales costs typically associated with enterprise software. Open source has solved this problem by letting companies build a huge pipeline of free users, who they can then upsell with follow-on services. The cloud answer isn't quite as good, but at least there's an answer: some number of application instances are free, and you charge after that. While this business model loses some virality, and transfers some costs from the end user to the application provider, it has a benefit that open source now lacks, of providing a much stronger upgrade path to paid services. Only time will tell whether open source or cloud deployment is a better distribution vector, but it's clear that both are miles ahead of traditional proprietary software in this regard.

In short, we're a long way from having all the answers, but we're getting there. Despite all the possibilities for lock-in that we see with Web 2.0 and cloud computing, I believe that the benefits of openness and interoperability will eventually prevail, and we'll see a system made up of cooperating programs that aren't all owned by the same company, an internet platform, that, like Linux on the commodity PC architecture, is assembled from the work of thousands. Those who are skeptical of the idea of the internet operating system argue that we're missing the kinds of control layers that characterize a true operating system. I like to remind them that much of the software that is today assembled into a Linux system already existed before Linus wrote the kernel. Like LA, 72 suburbs in search of a city, today's web is 72 subsystems in search of an operating system kernel. When we finally get that kernel, it had better be open source.

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it
00:49 Energy Savings, Strange Attractors, ... » O'Reilly Radar 中文站

Jim Stogdill Jim Stogdill 2008-07-31

... the Intrinsic Cost of State Change, Orbiting Alien Voyeurs, and 200 Square Kilometers of Solar Panels Somewhere in Texas

The Silicon Valley Leadership Group and Berkeley National Labs recently published the results of their first Data Center Demonstration Project (pdf). (Disclosure: My colleague Teresa Tung of Accenture R+D labs was the report's principal author). The study follows up on last year's publication of the EPA's report to Congress (pdf) on data center energy consumption. That report, among other things, estimated the range of savings that data center operators could achieve with varying degrees of technology and practice improvement. This more recent report is based on real world studies and was intended to validate the estimates in the EPA report.

Both reports are good reads if you are interested in reducing the megawatts being consumed in your organization's silicon (though the EPA report has been criticized as being a bit toothless). However, I should warn you that they are fairly long and detailed so the bedside table might not be the best home for them if you want to get through them, at least until the manga versions are released.

The EPA study estimated that "state of the art" technology and processes in the data center might cut energy usage by 55%, the more readily achievable "best practices" come in at 45% savings. State of the art includes a range of approaches including better server utilization through virtualization, better cooling techniques, improved power distribution, sensor networks, etc.

electricity-usage-graph.jpg

The more recent study, testing those techniques in working data centers, validates the EPA's estimates but also offers the initially surprising conclusion that legacy data centers can be retrofitted to achieve efficiencies close to that of new builds. That conclusion follows from the less surprising finding that the most bang for the buck comes from improvements on the "IT" side of the energy draw (energy efficient servers, virtualization, etc.) rather than from the harder to retrofit "site" side (cooling systems etc.). The dog wags the tail after all and if you can reduce the direct power consumption by the IT equipment, you will simultaneously reduce associated cooling costs whether in an old building with relatively inefficient HVAC or a shiny new one.

The last finding that I'll mention here is that it doesn't look like the time is right yet for widespread adoption of more advanced load management techniques outside of niche applications. The demonstration project had facilities that experimented with them, but the risk aversion that stems from high reliability requirements in production data centers has these experiments mostly restricted to centers that serve R+D than production functions.

Maybe one of the most interesting things about the report is what it doesn't (can't) say.

Because the experiments were run in working data centers where things like cost structure are considered competition sensitive information, the project was not able to collect actual costs. ROI's are inferred to be in the 18 month range, but that's mostly guess work based on reasoning like "they must be around 18 months, or the companies wouldn't have made the investment."

electricity-usage-graph-question.jpgSo, the graph from the report that lays out possible rates of practice and technology uptake can't predict which curve will actually be followed. That will depend on where the dynamical system made up of the actual and expected cost of energy, costs and benefits of many possible practice and technology improvement choices, equipment depreciation periods, vagaries of human decision cycles, and other factors, stabilizes. Barring the enactment of aggressive public policy constraints or hard limits on power availability for data center operators, the curve might ultimately follow sort of a strange attractor that leans strange-attractor-small.jpgtoward the state of the art curve. The open question is, how rapidly will it converge? Also, if I'm an operator, which order should I tackle practices and technologies so that I can 80/20 rule my way up the efficiency ladder while paying my way with relatively near term dollars?

electricity-usage-graph-with-attractor.jpg

If you are that data center operator trying to decide which steps to take and in which order, each ROI calculation you tackle is sensitive to the scale of your operation - over how many nodes are you going to distribute that fixed cost?. I think this implies two things. To the degree that the data centers measured in the study might be larger than the average, our strange attractor will get pushed back up, as though the cost of the technology or practices has increased. Because for smaller footprints, the ROI calculations associated with capital intensive practices won't be as attractive.

Viewed another way, that same conclusion suggests an attraction to more concentrated larger data centers. The economics of increasing energy cost and efficiencies of scale might push smaller data center operators more rapidly into the cloud. Those operators with smaller footprints simply won't be able to achieve the same low cost per unit of work because of the naturally occurring economies of scale that are inherent in everything from virtualized server pooling to power distribution and cooling systems at scale.

If you don't mind me going off on a bit of a tangent in an already long post,... given that a data center is really just a vast state machine, it would be really cool if its efficiency was tied to some kind of intrinsic cost of state transition rather than to trillions of leaky circuits. After all, cars burn a lot of gas, but the energy they use is at least in the ball park (an order of magnitude or so) of the intrinsic cost of moving their mass against friction and pushing air out of the way. But for data centers the real intrinsic cost is probably damn near zero, we're ultimately only processing information after all. So, all those megawatts are tied instead to the massive current leakage associated with the fact that we choose to maintain state in silicon instead of something more elegant (but currently impossible). Viewed as a physical system, data centers are about as efficient as a well cooled warehouse full of burning light bulbs (now there's an idea, a central lighting plant full of giant fluorescent bulbs connected to your house by fiber optic cable).

In fact, if you were an analog alien floating around in some kind of off-the-grid Galactica you might look down at one of our data centers and see 4MW going in and a mere few hundred watts coming out through an OC-48 fiber trunk and wonder "what the hell?" Watching it spew entropic HVAC waste heat, those bemused aliens could be forgiven for concluding that that these buildings with no obvious use must be massive sacrificial alters where silly humans offer up electricity and make their wishes or say their prayers (well, perhaps we do).

Since it looks like we aren't going to replace silicon anytime soon and we are on a path of incremental rather than disruptive improvements in data center energy consumption, maybe it wouldn't be such a big deal if we could just power them with renewable sources. That's Google's plan right? At one point in my life I studied mechanical engineering at the University of Texas and I've always loved back of the envelope calculations, so if you'll indulge me just one last paragraph...

The report indicates that U.S. data centers consumed 61 billion Kw-hrs in 2006. That works out to be about 7,000 megawatts of continuous electric load. For perspective, in Austin Texas at noon on a sunny day the incident energy from the Sun is approximately 900 watts per square meter. That sounds pretty good but accounting for the motion of the sun, weather, and the efficiency of our best collectors, the usable incident energy is much lower. From my trusty old Solar-Thermal Energy Systems (Howell, Bannerot, and Vliet) I can look up the "average daily extraterrestrial insolation on a horizontal surface in the northern hemisphere" at 30 latitude and see that it ranges from a low in December of 19.7 to a high of 40.7 MJ/(m^2-day) incident on a horizontal surface. Since it's not always summer and we still need to power these things in December, we better use the 19.7 value. We're also going to need to knock it down a bit for weather (i.e. we need to make the extraterrestrial value terrestrial where the collectors will be). I don't have data for Texas but let's assume it's pretty sunny most days and just round that value down to 15. At an average solar cell efficiency of about 20% only about 3 MJ/(m^2-day) are left to do anything useful. Pencil going crazy on envelope... So, to power all of our 2006 data centers is going to require about 200 square kilometers of Texas covered in solar collectors (ignoring transmission and overnight storage losses). We'll double our computing needs again by 2011 so let's hope we achieve that state of the art 55% savings, and some more after that, otherwise we're going to have cover another 200 km^2 of Texas every five years or so.

Read or add comments to this article
Save to del.icio.us · Digg this post · Stumble it

^==Back Home: www.chedong.com

^==Back Digest Home: www.chedong.com/digest/

<== 2008-07-31
  八月 2008  
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
==> 2008-08-02