« Open source Bigtable clone 'Hypertable' posts performance numbers | Main | Markson: The Tin Handcuffs of SEO »

Database gods bitch about mapreduce

This is what disruption sounds like.

This rant by major database guys against mapreduce is pretty telling.

(You can read a good rebuttal here, and discussion on ycomb.)

The thing that disrupts you is always uglier and worse in some way. Less features, less developed. But if there's a 10X price win in there somewhere, the cheap rickety thing wins in the end.

Think Linux vs. AT&T Unix, or mysql vs. Oracle.

I'll also take exception to the claim that schemas won out over unstructured data in the 60's. Unix ultimately trounced Multics and its ilk, not simply because of quasi-open source and economics, but also because the programming model was superior. "A file is just a stream of bytes" was a radical departure from the record and key oriented approaches that were dominant at the time. Some folks haven't stopped fighting the war though. Oracle's multi-decade messaging effort deserves more credit for the acceptance of databases as industry-standard tech than the idea that warring academics came to realize some deep truth about the way data "should" be stored.

Is it the case that mapreduce on top of something like HDFS + Hypertable is a competitor to old-style monolithic databases running on big iron? You bet it is.

Linear perf, linear cost scale, and the programming flexiblity of unstructured Unix-like I/O in GFS or fluid schemas in Bigtable. All good.

And I wouldn't be surprised if the adoption curve, even for conservative Fortune-500 companies, was quicker than we've seen in the past. Bolt a map/reduce cluster onto the side of your data warehouse and mine those CRM records for business insights. Sounds like a startup idea we'll be seeing soon enough. ;-)

Comments (5)


I think you're exaggerating mightily. Data warehouse DBMS have ALREADY been disrupted. Vertica et al. are participating in that disruption. SAS is going MPP. Conventional data mining on MapReduce seems quite the nonstarter. And by the way, Google Analytics over MapReduce suck in their reliability.

On the other hand, if one wants to do variable-schema analytics, that might be a whole other matter.

I spelled this all out further at http://www.dbms2.com/2008/01/19/mapreduce-variable-schema-analytics/



Unix trounced Multics for many reasons, but I don't think pipes had much to do with it. Multics only ran on Honeywell 6180/6880 hardware, for example -- right there, it could never compete with Unix. Honeywell was never very interested in developing it or selling it, either.

Multics certainly had files as a stream of bytes, as did ITS (the MIT AI Lab's operating system, which only ran on DEC 10/20's). Multics was not particularly built on a record/key architecture. The primary storage concept was the "segment", which is simply a persistent big array of bytes, much like a Unix file that could be accessed directly by hardware memory-mapping.

The main problem with the original posting is simply that MapReduce was never intended to be a DBMS, so criticing it for not being a good DBMS is beside the point. Schemas are a good thing if your goals are the goals of the people who adopted relational DBMS's. But RDBMS's are just one of the many tools in the toolbox, suitable for some tasks and less suitable for others.

Anyway, I don't see what would be hard about writing a layer on top of MapReduce that would provide schemas, if you want to get some basic integrity checking. No big deal. Useful for those who want it, optional for those who don't.

It's really absurd. Relational databases are good for some things, but they are used for a whole lot of things that they are totally inappropriate for, because that's the only model of computing that many people know how to use (I wouldn't say "understand"; most of these people don't actually understand what the RDBMS is doing). So here we have a very different model of computing, better for some things and not as good for other things. And the database guys don't want to allow it to be taught in schools?? This is exactly how databases came to be so oversold and overused, they were advertised and taught as the Universal Computing Paradigm. And that miseducation has imposed a big cost on a lot of companies, the ones who happened to be trying to do things that don't map well to databases. It was good for us at Google, though.

yeah. What's funny about this article is that it was so poorly argued that I didn't even feel like spending the time to hand them a clue stick.

Their argument that schemas, transactions, have one is just not supported by the evidence.

In fact, the largest MySQL installs have all been federated DBs which don't use transactions, don't use FKs, and use denormalized data.

That, and flat file databases are serving up a LOT of content that a traditional RDBMS would never be able to accomplish without 10x more hardware.

Seems the Architect Astronauts might have a decaying orbit.

Here's an example of why it's pretty ridiculous to call Stonebraker hidebound. At age 60-whatever, he's actually one of the biggest revolutionaries around:



Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


This page contains a single entry from the blog posted on January 18, 2008 11:41 AM.

The previous post in this blog was Open source Bigtable clone 'Hypertable' posts performance numbers.

The next post in this blog is Markson: The Tin Handcuffs of SEO.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33