Very few things in technology impress me more than web applications which are able to massively scale. The ability to serve up gagillions of requests to bajillions of users in milliseconds is nothing short of amazing. And, for whatever reason, I’ve always had the impression that the software it takes to run massive web sites is nothing short of magic.
All You Need Is Cache
Well, as it turns out, massive scale is not achieved with magic. Or, putting it another way, that magic is called caching.
Caching is an interesting beast. If you don’t need to worry about scaling, you rarely think about it. However, once you do need to worry about it, it seems that you rarely think of anything else. I’ve been talking to a lot of MySpace devs recently and caching is certainly top of mind for most of them. It comes up repeatedly in conversation, no matter the topic:
- Me: So, What is the most interesting thing you’ve done recently?
- MySpace dev: I’ve built a caching system for this content we needed to serve up.
- Me: That’s great. What was the last technical book you read?
- MySpace dev: A book about caching strategies.
- Me: Is that right? What was your degree in?
- MySpace dev: I have an MS in advanced caching techniques.
- Me: Really? Wow. Ok, what’s your favorite vacation spot?
- MySpace dev: Cachistan.
- Me: Favorite band?
- MySpace dev: The Caches.
- Me: Food?
- MySpace dev: Cashews.
- Me: Ok, that’s all I had, thanks so much for your time!
- MySpace dev: Cache you later!
Disclaimer: the conversation transcribed above is a rough approximation of real conversations I’ve had, exaggerated for comedic effect. MySpace devs that I’ve had the pleasure of meeting are great devs and wonderful people who do impressive work.
A great example of what caching can do is Facebook.
The scale that Facebook deals with is staggering: 500 million users and counting, 690 billion page views each month, countless transactions being generated constantly, and on and on. To be sure, they have massive computing power at their disposal to deal with all this traffic: probably well north of 60,000 servers across at least 9 data centers.
However, what really surprised me about Facebook is the software they run *. The site is built on PHP which is compiled into C++ using homegrown PHP compiler called HipHop (get yours on github). There is a massive cache layer using open source memcached along with a homegrown customization of it they call “son of memcached” (SOM). And there is a massive MySQL backend.
Yes, there is certainly some very impressive engineering that goes into their stack (I mean, they wrote their own compiler). Yet the thing that makes it work at the scale they have is caching. Don’t take my word for it: just listen to Mark Zuckerberg who did a talk on the topic. (By the way, I wonder what the record is for using the phrase “at the scale we operate at” in a single presentation.)
So, in conclusion: caching is great and you should use it if you’re building massive web sites. Cache you later!
* This presentation from Jason Sobel, an engineering manager from Facebook is exceedingly informative about exactly how they do what they do.
You may also like:
Did you love / hate / were unmoved by this post?
Then show your support / disgust / indifference by following me on Twitter!