The 2008 Scaling Crisis: Caching at the Edge with Memcached
In mid-2008, the "Social Web" is exploding. If you're building a site that's getting millions of hits a day, your primary enemy isn't your code—it's your database's I/O wait. Even with the fastest SAS drives and massive RAID arrays, a relational database like MySQL or PostgreSQL will eventually choke on the weight of concurrent read requests.
The industry-standard solution is Memcached. Originally built for LiveJournal, it’s a high-performance, distributed memory object caching system. Its goal is simple: take the load off your database by storing pre-rendered objects in RAM.
The Caching Strategy
The most common pattern in 2008 is the 'Cache-Aside' strategy. When a request comes in, the application first checks Memcached. If the data is there (a 'hit'), it uses it. If not (a 'miss'), it queries the database and immediately populates the cache for next time.
<?php
// Initialize Memcached (using the pecl-memcache extension)
$memcache = new Memcache;
$memcache->connect('localhost', 11211) or die ("Could not connect");
$user_id = 12345;
$cache_key = "user_profile_" . $user_id;
// Attempt to get the profile from RAM
$profile = $memcache->get($cache_key);
if ($profile === false) {
// CACHE MISS: Hit the database
$db_result = mysql_query("SELECT * FROM users WHERE id = $user_id");
$profile = mysql_fetch_assoc($db_result);
// Populate the cache for 3600 seconds (1 hour)
$memcache->set($cache_key, $profile, false, 3600);
}
// Proceed with the profile data
render_user_profile($profile);
?>
Distributed Memory: The 'Pool' Concept
The real power of Memcached is that it’s distributed. You can have five servers with 2GB of RAM each, and Memcached will treat them as a single 10GB pool of memory. The client library handles the 'hashing' of keys across the servers using consistent hashing, ensuring that the same key always lands on the same server.
// Adding multiple servers to the pool
$memcache->addServer('mem1.internal', 11211);
$memcache->addServer('mem2.internal', 11211);
$memcache->addServer('mem3.internal', 11211);
Pitfalls: The 'Thundering Herd'
Caching isn't a silver bullet. If a highly popular key (like your homepage's main news feed) expires, you'll suddenly have 1,000 concurrent requests all missing the cache and hitting the database at the same time. This is the 'thundering herd' problem. We're experimenting with 'early re-warming' and mutex-locked updates, but it requires careful coordination.
Also, remember that Memcached is an LRU (Least Recently Used) cache. If your memory fills up, it will start evicting the oldest items to make room for new ones. Monitor your eviction count using telnet localhost 11211 and the stats command. If evictions are high, it’s time to add more RAM.
The philosophy of 2008 is 'RAM is cheap, Database I/O is expensive'. If a piece of data doesn't change on every request, it belongs in Memcached.