Very few things in technology impress me more than web applications which are able to massively scale. The ability to serve up gagillions of requests to bajillions of users in milliseconds is nothing short of amazing. And, for whatever reason, I’ve always had the impression that the software it takes to run massive web sites is nothing short of magic.

All You Need Is Cache

Well, as it turns out, massive scale is not achieved with magic. Or, putting it another way, that magic is called caching.

Caching is an interesting beast. If you don’t need to worry about scaling, you rarely think about it. However, once you do need to worry about it, it seems that you rarely think of anything else. I’ve been talking to a lot of MySpace devs recently and caching is certainly top of mind for most of them. It comes up repeatedly in conversation, no matter the topic:

Disclaimer: the conversation transcribed above is a rough approximation of real conversations I’ve had, exaggerated for comedic effect. MySpace devs that I’ve had the pleasure of meeting are great devs and wonderful people who do impressive work.

Facebook

A great example of what caching can do is Facebook.

The scale that Facebook deals with is staggering: 500 million users and counting, 690 billion page views each month, countless transactions being generated constantly, and on and on. To be sure, they have massive computing power at their disposal to deal with all this traffic: probably well north of 60,000 servers across at least 9 data centers.

However, what really surprised me about Facebook is the software they run *. The site is built on PHP which is compiled into C++ using homegrown PHP compiler called HipHop (get yours on github). There is a massive cache layer using open source memcached along with a homegrown customization of it they call “son of memcached” (SOM). And there is a massive MySQL backend.

Yes, there is certainly some very impressive engineering that goes into their stack (I mean, they wrote their own compiler). Yet the thing that makes it work at the scale they have is caching. Don’t take my word for it: just listen to Mark Zuckerberg  who did a talk on the topic. (By the way, I wonder what the record is for using the phrase “at the scale we operate at” in a single presentation.)

So, in conclusion: caching is great and you should use it if you’re building massive web sites. Cache you later!

 

* This presentation from Jason Sobel, an engineering manager from Facebook is exceedingly informative about exactly how they do what they do.

You may also like:

Did you love / hate / were unmoved by this post?
Then show your support / disgust / indifference by following me on Twitter!

This post got 2 comments so far. Care to add yours?

  1. Jeff Wilson says:

    I couldn’t agree more! A recent project of mine involved building a new web site for a client with a large base of registered users (over 30,000, of which over 1,000 could be online at a given time). My initial proposal to use .Net 4 and MVC3 met with some concern from the client’s technical staff that MVC would introduce an unacceptable amount of overhead when satisfying page requests from users.
    I responded by pointing out that some large sites are currently using MVC (like MySpace), so we got the go-ahead. I used output caching for each of the major pages on the site, with the cache set to 60 seconds.
    The site went live this month, and prior to that, we contracted with a firm to stress test the site. The results were VERY gratifying! Over 10,000 users with no significant degradation in response times.
    Needless to say – I’m 100% behind caching! It was stupefyingly easy to implement and had an enormous payback.