Programming LanguageI looked at inbound traffic for a recent post and was surprised to see programming.reddit.com at the top of the list. I knew about Reddit before but not this sub-reddit. I checked it out and the articles were geeky-cool (for a programmer). But after a few days of reading I started to get an uneasy feeling about the place.
Cluster/Grid <--- you are here
What was all this fretting about why nobody uses Lisp or functional languages? Haskell, ML, yikes. I felt like I'd been teleported back in time to my college days. Maybe this was an east-coast vs. west-coast thing? Reddit is in that Boston/MIT corridor, Paul Graham talks about Lisp all the time, are they really still worried about this stuff?
Language? Bah. The action is in the frontier after the OS.
Don't get me wrong, I love programming languages, and I have a soft spot for language design. I tried (and failed) to design a new language early in my career. I even have a collection of books about historical programming language design. I've seen huge productivity wins with better programming abstractions, and sure, picking nonconventional choices can often give you a leg-up over the competition.
Picking a language isn't just a personal choice though. It has to be tempered by the realities of how mature the platform is, whether you can hire people who will want to work in your language, how appealing your tech platform will appear to partners, investors, acquirers... Yahoo Shopping isn't written in Lisp anymore, they rewrote it. Of course.
But the productivity and development problems that I see building search and web apps just aren't happening at the language statement level.
Language statements generally live inside a program process. But coordinating all the pieces of communicating software across a modest 500-node application like Topix is a bitch, though.
I want a fast scratchpad for my 50 front-ends to be able to share, kind of like sys V shared memory, but networked. I want get, put, append, tail, queue, dequeue, infinitely scalable across some RAID-ish cluster. Billions of keys, petabytes of data, if I get something a zillion times a second from all the front ends it should adapt so it can serve that fast, but migrate stuff I never get to slower storage. Everything should be redundant, fault-tolerant, self-repairing and administratively scalable.
You end up building some version of this every time you make an eBay, Second Life, Hotmail, Bloglines, AIM, Google, Inktomi, Webfountain, Facebook, Flickr, Paypal, Youtube.
A zillion machines, a zillion concurrent connections, a big mess of data, never lose any of it, never go down, oh and the SLA is never take longer than 50ms to do anything. And be simple and fun to program on top of so the programmers can work on the actual app instead of spending all their time firefighting the cluster support layer.
We all keep cobbling together solutions for whatever app we happen to be writing out of ad-hoc clustered RDBMs, Reiser, Berkeley DBs, piles of coordination code and scripted admin.
Language innovations like Ruby are great, especially when they get some traction and acceptance so that you actually could use them if you wanted to. But all of the recent languages that get use have come out of individual eccentrics. They're incremental aesthetic exercises. They're also all more alike than different. Language innovation is basically done, and mostly has been for a long time.
Machine-level OS research died too, probably sometime in the 90's. Rob Pike, one of the inventors of Unix, put out a paper in 2000 called "Systems Software Research is Irrelevant."
Systems software research has become a sideline to the excitement in the computing industry...
Ironically, at a time when computing is almost the definition of innovation, research in both software and hardware at universities and much of industry is becoming insular, ossified and irrelevant...
What is Systems Research these days? Web caches, web servers, file systems, network packet delays, all that stuff. Performance, peripherals, and applications, but not kernels or even user-level applications.
Now after Pike wrote that he left Bell Labs and went to work at Google.
Of course. Google is doing more cluster OS research than anyone right now. You could argue that Google's technology success owes more to the block & tackle work of managing 500,000 servers than to little algorithms that power search and ad targeting. GFS, Map/Reduce, BigTable.
A smart researcher can write an ad targeting algorithm or some pagerank variant in a weekend. It's relatively easy to think up new algorithms; implementing them and getting them to run, especially for web-scale problems, is the hard part. Without the platform to develop and deploy against, it's like you're writing code on paper waiting for the computer to be invented so you can run your program.
It's too bad there isn't a standard platform for all this stuff, so we wouldn't all have to stop and write a new custom version every time we want to code something that will need more than a single machine to run on.
Peculiar distribution and economic dynamics -- giving the source to Unix away to universities -- lead to the entire industry eventually standardizing on the C/Unix/posix syscall OS model. GNU and Linux helped vastly here by obliterating the stranglehold that AT&T held over the technology, which was holding adoption back. New languages get scale by being free, so they can get critical adoption mass, bake their platform to maturity, and become viable, become socially acceptable by pragmatic users.
But we don't need a clone of SYSV or a free C compiler or a dynamic language with socially-acceptable syntax now. We need an industrial strength, hyper scalable cluster OS.
The problem is that the kind of eccentrics that gave us Unix, GNU, Linux, Perl, Ruby, aren't likely to be able to deliver here. Who has 500 machines in their garage and a million pageviews/day as a personal thorn in their side? Only companies have these problems, and when companies build a platform to solve the problem, the platform isn't general, and it's not given away.