Real-Time With Node.js and MongoDB: Building a Live

#node.js#mongodb#websocket#real-time#socket.io#backend#2015

📋 Table of Contents ▼

In early 2015 we had a business intelligence dashboard that refreshed every 5 seconds via AJAX polling. Each poll hit a PHP backend which queried MySQL. At 200 concurrent users, the dashboard was responsible for 2,400 database queries per minute - most of which returned identical data.

The fix wasn't clever caching. It was reconsidering the model entirely: instead of clients asking "is there new data?", the server would tell clients when new data arrives. That's WebSockets.

Why Node.js for Real-Time

The traditional PHP/Apache model: one thread per request. A WebSocket connection is a persistent, long-lived connection - it stays open for the entire session. A 500-user dashboard means 500 simultaneous open connections. Thread-per-connection doesn't scale.

Node.js uses a single-threaded event loop. It handles thousands of concurrent connections not by spawning threads but by registering callbacks and waiting for events:

// This is the mental model of the Node.js event loop
while (true) {
  event = eventQueue.pop();
  if (event) {
    event.callback();  // Execute callback, return immediately
  }
  // No blocking - if callback does I/O, register another callback and move on
}

The constraint: callbacks must not block the event loop. CPU-heavy synchronous operations (video encoding, large JSON parsing, cryptographic operations) will freeze all connections. Node.js is excellent at I/O-bound work (waiting for DB, waiting for network); it's poor at CPU-bound work.

The Stack

Node.js 0.12 (LTS in 2015)
Socket.io 1.3 - WebSocket abstraction with fallback to long-polling for IE9
MongoDB 3.0 - document store with the new WiredTiger storage engine
Redis - pub/sub for multi-instance coordination

Socket.io: Rooms and Namespaces

Socket.io's room concept let us efficiently broadcast to subsets of clients:

// server.js
var io = require('socket.io')(httpServer);

io.on('connection', function(socket) {
  console.log('client connected:', socket.id);

  // Client tells us which dashboard section they're viewing
  socket.on('subscribe', function(section) {
    // Leave previous room, join new one
    socket.leaveAll();
    socket.join('dashboard:' + section);
    
    // Send current state immediately on subscribe
    getDashboardData(section, function(err, data) {
      socket.emit('dashboard:update', data);
    });
  });

  socket.on('disconnect', function() {
    console.log('client disconnected:', socket.id);
    // Socket.io automatically removes socket from all rooms
  });
});

// When new data arrives, broadcast to everyone in the relevant room
function broadcastUpdate(section, data) {
  io.to('dashboard:' + section).emit('dashboard:update', data);
}

On the client:

// client.js
var socket = io();

socket.on('connect', function() {
  socket.emit('subscribe', 'sales');  // Subscribe to sales section
});

socket.on('dashboard:update', function(data) {
  updateCharts(data);  // Re-render charts with new data
});

socket.on('disconnect', function() {
  showReconnectingIndicator();
});

Socket.io handles reconnection automatically. The disconnect + connect cycle happens transparently; the client just re-subscribes in the connect handler.

MongoDB: Tailing the Oplog

MongoDB's replication mechanism writes every write operation to the oplog - a special capped collection in the local database. We could tail this collection to react to database changes in real-time.

This predates MongoDB Change Streams (added in 3.6). In 2015, the approach was a tailable cursor on the oplog:

var MongoClient = require('mongodb').MongoClient;

MongoClient.connect('mongodb://localhost:27017/local', function(err, db) {
  var oplogCollection = db.collection('oplog.rs');
  
  // Get current oplog position
  oplogCollection.find({}, { ts: 1 })
    .sort({ $natural: -1 })
    .limit(1)
    .toArray(function(err, docs) {
      var lastTimestamp = docs[0].ts;
      
      // Tailable cursor - stays open and returns new docs as they arrive
      var cursor = oplogCollection.find({
        ts: { $gt: lastTimestamp },
        ns: 'aunimeda.orders'  // Watch 'orders' collection in 'aunimeda' DB
      }, {
        tailable: true,
        awaitdata: true,
        numberOfRetries: -1,  // Retry forever
        tailableRetryInterval: 200
      });
      
      cursor.each(function(err, doc) {
        if (err) return console.error(err);
        if (!doc) return;  // No new docs yet
        
        // doc.op: 'i' = insert, 'u' = update, 'd' = delete
        if (doc.op === 'i') {
          handleNewOrder(doc.o);
        } else if (doc.op === 'u') {
          handleOrderUpdate(doc.o2._id, doc.o.$set);
        }
      });
    });
});

Every new order insert immediately triggered handleNewOrder, which called broadcastUpdate('orders', ...), which pushed to all clients in the dashboard:orders room. Zero polling.

Multi-Instance Coordination with Redis Pub/Sub

Running a single Node.js process couldn't use all CPU cores. We ran multiple instances with PM2:

# pm2 ecosystem.config.js
apps: [{
  name: 'dashboard',
  script: 'server.js',
  instances: 4,          # One per CPU core
  exec_mode: 'cluster',  # Node.js cluster module
}]

The problem: with 4 processes, a WebSocket connection from client A goes to process 1. A broadcast from process 2 won't reach client A.

Redis pub/sub solved this:

var redis = require('redis');
var redisSub = redis.createClient();
var redisPub = redis.createClient();

// Every process subscribes to the Redis channel
redisSub.subscribe('dashboard:broadcast');

redisSub.on('message', function(channel, message) {
  var payload = JSON.parse(message);
  // Emit to local Socket.io clients - this process's connected clients
  io.to(payload.room).emit('dashboard:update', payload.data);
});

// When data changes, any process publishes to Redis
// Redis delivers to all processes, each emits to its local clients
function broadcastUpdate(section, data) {
  redisPub.publish('dashboard:broadcast', JSON.stringify({
    room: 'dashboard:' + section,
    data: data
  }));
}

Socket.io 1.x had a built-in Redis adapter (socket.io-redis) that handled exactly this, but understanding the underlying pub/sub pattern was valuable.

The Result

Metric	Before (AJAX polling)	After (WebSockets)
DB queries/minute at 200 users	2,400	~4 (only on actual data changes)
Dashboard update latency	0-5 seconds	<100ms
Server memory at 200 users	480MB (200 PHP-FPM workers)	95MB (4 Node.js processes)
CPU at 200 users (idle data)	40% (constant polling)	2%

The latency drop from "up to 5 seconds" to "under 100ms" changed how the product felt. Users stopped second-guessing whether the data was fresh.

What We Got Wrong

Memory leaks in long-running processes. PHP restarts after every request - memory leaks are irrelevant. Node.js runs for days. We had a subtle leak in our MongoDB cursor handling that caused memory to grow ~2MB/hour. Caught it after 3 days when the process hit 6GB and OOM-killed.

Tools: process.memoryUsage() logged every minute, node --inspect with Chrome DevTools heap snapshots. The leak was a closure inside the oplog tailing function that held a reference to a growing array.

Unhandled promise rejections. In 2015, unhandled rejections were silent warnings, not process crashes. We had several "why did the broadcast stop?" incidents traced to promise chains without .catch(). Node.js 15 (2020) made unhandled rejections crash the process - the right call.

The Mental Model Shift

The most valuable thing from this project wasn't the technology - it was the event-driven programming model. Before Node.js, we thought in terms of threads blocking on I/O. After, we thought in callbacks and event queues.

This mental model transfers: browser event handlers, React's useEffect, Go channels, Rust async/await - all variations on the same idea. Concurrent I/O without thread overhead, at the cost of callback complexity. The answer to callback hell (deeply nested callbacks) came in 2017 with async/await in Node.js 7.6, but the event loop underneath is unchanged.

In 2024, real-time architectures have more options: Server-Sent Events for one-way push, WebRTC for peer-to-peer, MongoDB Change Streams replacing oplog tailing, and managed services (Pusher, Ably) for teams that don't want to run their own Socket.io infrastructure. The underlying pattern - event-driven, non-blocking - remains the foundation.

Aunimeda builds production-grade backend systems - APIs, microservices, real-time applications, and system integrations.

Real-Time With Node.js and MongoDB: Building a Live Dashboard That Doesn't Melt at Scale

Why Node.js for Real-Time

The Stack

Socket.io: Rooms and Namespaces

MongoDB: Tailing the Oplog

Multi-Instance Coordination with Redis Pub/Sub

The Result

What We Got Wrong

The Mental Model Shift

Aunimeda

Need IT development for your business?

Real-Time With Node.js and MongoDB: Building a Live Dashboard That Doesn't Melt at Scale

Why Node.js for Real-Time

The Stack

Socket.io: Rooms and Namespaces

MongoDB: Tailing the Oplog

Multi-Instance Coordination with Redis Pub/Sub

The Result

What We Got Wrong

The Mental Model Shift

Aunimeda

Read Also

Node.js + TypeScript: Building a Production REST API from Scratch in 2026

PostgreSQL Performance Optimization: The Practical Guide for 2026

Redis Data Structures in Production: Beyond SET and GET

Need IT development for your business?