AboutBlogContact
DatabasesFebruary 14, 2009 2 min read 20

CouchDB: Scaling with MapReduce and Incremental Views (2009)

AunimedaAunimeda
📋 Table of Contents

CouchDB: Scaling with MapReduce and Incremental Views

In early 2009, the limitations of relational databases (SQL) are becoming a major headache for high-traffic web applications. We're tired of "Object-Relational Mapping" (ORM) impedance mismatch and rigid schemas. This has given birth to the NoSQL movement, and Apache CouchDB is one of its most fascinating entries.

CouchDB doesn't store data in tables; it stores JSON documents. But how do you query data across millions of documents without a SELECT statement? The answer is MapReduce.

The Map Function

In CouchDB, you define "Design Documents" that contain JavaScript functions. The map function is called for every document in the database. You "emit" the data you want to index.

// A Map function to index blog posts by category
function(doc) {
    if (doc.type === 'post' && doc.category) {
        emit(doc.category, doc.title);
    }
}

This creates an index where the key is the category and the value is the title.

The Reduce Function

If you want to perform aggregations (like counting items), you use a reduce function. CouchDB has built-in optimized reducers for common tasks like _count, _sum, and _stats.

// Map:
function(doc) {
    if (doc.type === 'order') {
        emit(doc.customer_id, doc.amount);
    }
}

// Reduce:
"_sum"

Why It Scales: Incremental Updates

The brilliance of CouchDB is that it doesn't re-run the MapReduce every time you query it. Instead, it builds a B-Tree index. When a new document is added, CouchDB only runs the functions for that document and updates the index incrementally.

This means that a query on a 10-million document database is just as fast as a query on a 10-document database.

The CAP Theorem

CouchDB is designed with the CAP Theorem in mind, prioritizing Availability and Partition Tolerance. It uses "Eventual Consistency" via its unique replication protocol. You can have two CouchDB instances offline, make changes to both, and they will intelligently merge the changes once they reconnect.

In 2009, we are realizing that the "one size fits all" era of the SQL database is over. CouchDB is proving that for many web use cases, a schema-less, document-oriented approach is not just easier to develop, but significantly easier to scale.

Read Also

Redis: RDB vs. AOF Persistence (2009)aunimeda
Databases

Redis: RDB vs. AOF Persistence (2009)

Redis is fast because it's in-memory, but what happens when the power goes out? Choosing between RDB and AOF is a classic trade-off.

MongoDB: When Your Data Doesn't Fit in a Tableaunimeda
Databases

MongoDB: When Your Data Doesn't Fit in a Table

The 10gen team has released MongoDB. It's 'humongous' (supposedly), it's NoSQL, and it uses JSON. Is the relational era over?

InfluxDB: TSM Engine and the Cardinality Trap (2014)aunimeda
Databases

InfluxDB: TSM Engine and the Cardinality Trap (2014)

Moving from LevelDB to TSM was a bold move. Let's see how InfluxDB handles millions of series and why high cardinality is your worst enemy.

Need IT development for your business?

We build websites, mobile apps and AI solutions. Free consultation.

Get Consultation All articles