« If you're so good... | Main | 'tie' considered harmful »

Scaling Facebook, Hi5 with memcached

From a discussion board thread pointed to by programming.reddit, a nifty discussion of high volume sites Facebook, Hi5 and others who are using memcached as a critical scaling tool:

From: Steve Grimm <... facebook.com>
Subject: Re: Largest production memcached install?

No clue if we're the largest installation, but Facebook has roughly 200 dedicated memcached servers in its production environment, plus a small number of others for development and so on. A few of those 200 are hot spares. They are all 16GB 4-core AMD64 boxes, just because that's where the price/performance sweet spot is for us right now (though it looks like 32GB boxes are getting more economical lately, so I suspect we'll roll out some of those this year.)

We have a home-built management and monitoring system that keeps track of all our servers, both memcached and other custom backend stuff. Some of our other backend services are written memcached-style with fully interchangeable instances; for such services, the monitoring system knows how to take a hot spare and swap it into place when a live server has a failure. When one of our memcached servers dies, a replacement is always up and running in under a minute.

All our services use a unified database-backed configuration scheme which has a Web front-end we use for manual operations like adding servers to handle increased load. Unfortunately that management and configuration system is highly tailored to our particular environment, but I expect you could accomplish something similar on the monitoring side using Nagios or another such app.

...

At peak times we see about 35-40% utilization (that's across all 4 CPUs.) But as you say, that number will vary dramatically depending on how you use it. The biggest single user of CPU time isn't actually memcached per se; it's interrupt handling for all the incoming packets.

 

From: Paul Lindner <... inuus.com>

Don't forget about latency. At Hi5 we cache entire user profiles that are composed of data from up to a dozen databases. Each page might need access to many profiles. Getting these from cache is about the only way you can achieve sub 500ms response times, even with the best DBs.

We're also using memcache as a write-back cache for transient data. Data is written to memcache, then queued to the DB where it's eventually written to long-term storage. The effect is dramatic -- heavy write spikes are greatly diminished and we get predictable response times.

That said there's situations that memcache didn't work for our requirements. Storing friend graph relations was one of them. That's taken care of by another in-memory proprietary system. At some point we might consider merging some of this functionality into memcached including:

  • Multicast listener/broadcaster protocols
  • fixed size data structure storage
    (perhaps done via pluggable hashing algorithms??)
  • Loading the entire contents of one server from another.
    (while processing ongoing multicast updates to get in sync)
I'd be interested in working with others who want to add these types of features to memcache.

Greg Linden has commented on a talk about Livejournal's use of memcached for scaling. See also previous posts on scaling for ebay and mailinator.

Comments (6)

It's amusing how everyone seems to re-"discover" the technology that AOL implemented in house, like, 10+ years ago.

(In other words: how much does this all sound like a Bucky complex?)

Yeah, Bucky was definitely very cool...

Of course whatever tool which can roughly do the job that gets open sourced and adopted is the one that wins and becomes the standard.

What's interesting to me is the trend, as more and more sites face the high-volume issues that AOL experienced previously. The gradual realization that the DB is not the entire back-end of the app, but a severely throughput-limited component stuck on a single box. First it gets partitioned/replicated/scaled across the cluster. But it's still too slow. Then it gets relegated to a backing store for the real DB, which lives in memory in front of it.

And then you have Bucky :)

>> It's amusing how everyone seems to re-"discover" the technology that AOL implemented in house, like, 10+ years ago.


It seems to me that this is neither amusing nor surprising, since the implementation you mention was "in house" at AOL, presumably not in the literature, in published experience reports, etc. Now that we see high-scalability approaches widely discussed, in the future it should be much more straightforward for others to follow in the footsteps, rather than reinventing.

Steve and Brian write more about open technologies at FB here http://blog.facebook.com/blog.php?post=2356432130
and I'd like to repeat their message that the combination of memcache covering the database tier is especially effective due to well used APC covering the web tier.

During the period covered in my book "Inside Facebook" (http://www.fbbook.com) FB was also storing some system specific variables directly in the APC variable cache, avoiding the request even to the memcache tier.

The high use of memcache is especially appropriate to stable areas of user data (e.g. more to the user profile than to the FB wall) where the components of the data object are collected with a complex join on tables that have concurrent writes. For an object take from only on select/insert table the cpu cost of creating the object may be too low to be worth optimizing. For a highly dynamic object (like my http://www.PTrades.com commodity trade account objects) the cache miss rate skyrockets, because the items are viewed only once or twice by other users before being purged. For objects that are not keyed to a user, like site stats or my commitments-of-traders flash graph config files, even a straight filesystem cache of APC pre-compiled code is quite speedy.

Karel

amazing, facebook or hi5 type of application always need high performance. it is really nice to see how these projects are ensuring performance for their user.

wish many tips and tricks will come out from these type of projects.

best wishes,

Memcached is really good and is a must for high load sites that need real performance and speed. RAM is cheap nowadays - go for it and design web sites that are fast from the application point of view, not just ordering super-duper servers that cost 20,000+ each!

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on May 15, 2007 8:11 AM.

The previous post in this blog was If you're so good....

The next post in this blog is 'tie' considered harmful.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33