I've never worked anywhere where the logs could be tallied
well. Netscape, AOL, they had giant systems that slurped
up the logs from the front ends and stuffed them into
web-enabled databases. Every query took 90 seconds to run,
half of them timed out. Forget ad-hoc queries or tossing
a custom regex in. Sometimes the logs would break and
it'd be weeks or months or never before they worked again.
Sometimes there was just too much traffic to be able to
count it all. More log events came in every 24 hours than
could be processed in a 24 hour log run.
Google Analytics doesn't seem to fare much better.
Granted, we probably put more data into it at Topix than
the average site. But I could never get unique IP counts
of that thing. It would just spin and spin until my
browser gave up.
I've repeatedly seen senior engineers fail to make
headway on the log problem. Logs should be easy, right? What could be more straightforward than collecting a set
of files each day and tallying the lines?
It turns out that anything involving lots of data spread
over a cluster of machines is hard. Correct that: Even
little bits of data spread over a cluster is hard.
i=n++ in a distributed environment is a
We take the simplicity of i=n++ or counting lines for
granted. It all begins with a single CPU and we know
that model. In fact, we know that model so deeply that
we think in it, in the same way that language shapes what
we can think about. The von Neumann architecture defines
our perception of what is easy and what is hard.
But it doesn't map at all to distributed systems.
The approach of the industry has been to try to impose von
Neumann semantics on the distributed system. Recently some
have started to question whether that's the right approach.
The underlying assumption ... is that any system that is
scalable, fault-tolerant, and upgradable is composed of
N nodes, where N>1.
The problem with current data storage systems, with rare exception, is that they are all "one box native" applications, i.e. from a world where N=1. From Berkeley DB to MySQL, they were all designed initially to sit on one box. Even after several years of dealing with MegaData you still see painful stories like what the YouTube guys went through as they scaled up. All of this stems from an N=1 mentality.
-- Joe Gregorio
Distributed systems upend our intuition of what should
be hard and what should be easy. So we try to devise
protocols and systems to carry forward what was easy in
our N=1 single CPU world.
But these algorithms are seriously messed up. "Let's Paxos
for lunch" is a joke because Paxos is such a ridiculously
complicated protocol. Yes I understand its inner beauty
and all that but c'mon. Sometimes you get the feeling
the universe is on your side when you use a technique.
Like exponential backoff. You've been using that since
you were a kid learning about social interactions and how
to manage frustration. It feels right. But if you come
to a point in your design where something like Paxos needs
to be brought out, maybe the universe is telling you that
you're doing it wrong.
It may be a bit unusual, but my way of thinking of
"distributed systems" was the 30+ year (and still
continuing) effort to make many systems look like
one. Distributed transactions, quorum algorithms,
RPC, synchronous request-response, tightly-coupled
schema, and similar efforts all try to mask the
existence of independence from the application
developer and from the user. In other words, make
it look to the application like many systems are
one system. While I have invested a significant
portion of my career working in this effort,
I have repented and believe that we are evolving
away from this approach.
-- Pat Helland
This stuff isn't just for egghead protocol designers and
comp sci academics. Basically any project that is sooner
or later going to run on more than a single box encounters
these problems. Your coders have modules to finish.
But they have no tools in their aresenal to deal
with this stuff. The SQL or Posix APIs leave programmers
woefully unprepared for even a trivial foray outside
in the face of complexity makes programmers better.
Logs sucker-punch good programmers because their
assumptions about what should be hard and what should be
easy are upended by N>1. Once you get two machines in the
mix, if your requirements include reliability, consistency,
fault-tolerance, and high performance, you are at the bleeding
edge of distributed systems research.
This is not what we want to be worrying about.
We're making huge social media systems to change the
world. Head-spinning semantic analysis algorithms.
Creepy targetted monetization networks. The future is
indeed bright. But we take for granted the implicit
requirements that the application will be able to scale,
that it will stay up, that it will work.
So why does Technorati go down so much... why is
Twitter having problems scaling... why did Friendster
lose? All those places both benefited from top notch
programmers, lots of resources. How can it be,
we ask, that the top software designers in the world,
with potentially millions of dollars personally at stake,
create systems that let everyone down?
Of course programmers make systems that don't
satisfy all of the (implicit) requirements. Nobody knows
how to yet. We're still figuring this stuff out.
There are no off-the-shelf toolkits.
Without a standardized approach or toolset, programmers
do what they can and get the job done anyway. So you
have cron jobs ssh'ing files around, ad-doc DB replication
schemes, de-normalized data sharded across god-knows-where.
And the maintenance load for the cluster starts to
"We're fine here," some readers will say. "We have a
great system to count our logs." But below the
visible surface of FAIL is the hidden realm of
productivity opportunity cost. Getting the application to
work, to scale, to be resilient to failures is just the start.
Making it a joy to program is the differentiator.
* * *
There is a place where they can count their logs.
They had to make this funny distributed hash-of-hashes
data structure. It's got a some unusual features for
a database - explicit application management of disk
seeks, a notion of time-based versioning, and a severely
limited transactional model. It relies on an odd cast of
supporting software. Paxos is even in there under
the hood. That wasn't enough so they hired one of the
original guys who invented Unix a million years ago,
and the first thing he did was to invent an entirely new
programming language to use it.
But now they can count their logs.
Kevin Burton: Distributed System Design is Difficult. We're seeing distributed systems effects even on single machines now, thanks to multiple cores.