Building Real-Time Chat Before Socket.io Was Stable (2013)
The year was 2013. The client wanted real-time chat for a support widget. Simple enough requirement. The implementation was anything but.
WebSockets had been in Chrome since 2010, Firefox since 2011. But IE9 - still 25% of enterprise traffic - didn't support WebSockets. IE10 did, but only partially. Socket.io 0.9 was the standard solution, but it had documented memory leaks under sustained load that the maintainers acknowledged and hadn't fixed yet.
We went through three implementations in four months.
Iteration 1: Long-Polling
Long-polling was the 2010-era "fake real-time" solution. The client sends a request, the server holds it open until there's data (or a timeout), then the client immediately sends another request.
// Client: jQuery, 2013
function longPoll() {
$.ajax({
url: '/chat/poll',
data: { last_id: lastMessageId, room: roomId },
timeout: 30000, // 30 second server hold
success: function(messages) {
if (messages.length) {
appendMessages(messages);
lastMessageId = messages[messages.length - 1].id;
}
longPoll(); // Immediately reconnect
},
error: function(xhr, status) {
if (status !== 'abort') {
setTimeout(longPoll, 2000); // Wait 2s on error, then retry
}
}
});
}
longPoll();
// Server: PHP with sleep loop
// chat_poll.php
$since = (int)$_GET['last_id'];
$room = (int)$_GET['room'];
$maxWait = 28; // seconds (below 30s nginx proxy timeout)
$start = time();
while (time() - $start < $maxWait) {
$messages = Message::where('id', '>', $since)
->where('room_id', $room)
->orderBy('id')
->get();
if ($messages->count()) {
header('Content-Type: application/json');
echo json_encode($messages);
exit;
}
sleep(1); // Poll DB every 1 second
// This was expensive - 1 PHP worker held open per connected user
}
// Timeout - return empty, client will reconnect
echo json_encode([]);
Problem: Each connected user held one Apache worker hostage for up to 28 seconds. At 100 concurrent users: 100 Apache workers doing nothing except sleeping and querying MySQL once per second. We hit the Apache worker limit with 120 users. The application became unavailable.
Iteration 2: Server-Sent Events on Node.js
Server-Sent Events (SSE) - a one-way server-to-client push protocol, simpler than WebSockets, native browser API - became our second attempt. We moved the real-time endpoint to Node.js 0.10 (just released), keeping PHP for the main application.
// Node.js 0.10 SSE server
var http = require('http');
var url = require('url');
var clients = {}; // room_id → [response objects]
http.createServer(function(req, res) {
var query = url.parse(req.url, true).query;
var roomId = parseInt(query.room);
// SSE headers
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Access-Control-Allow-Origin': 'https://yoursite.com'
});
// Register client
if (!clients[roomId]) clients[roomId] = [];
clients[roomId].push(res);
// Send heartbeat every 25s to prevent proxy timeouts
var heartbeat = setInterval(function() {
res.write(':heartbeat\n\n');
}, 25000);
// Cleanup on disconnect
req.on('close', function() {
clearInterval(heartbeat);
clients[roomId] = clients[roomId].filter(function(r) { return r !== res; });
});
}).listen(3001);
// PHP calls this internal endpoint to broadcast a message
// POST /broadcast { room_id, message }
function broadcast(roomId, message) {
var data = 'data: ' + JSON.stringify(message) + '\n\n';
(clients[roomId] || []).forEach(function(res) {
try { res.write(data); } catch(e) { /* client disconnected */ }
});
}
// Browser client
var evtSource = new EventSource('/sse?room=' + roomId);
evtSource.onmessage = function(event) {
var message = JSON.parse(event.data);
appendMessage(message);
};
evtSource.onerror = function() {
// Browser automatically reconnects SSE - built into the protocol
console.log('SSE reconnecting...');
};
Result: SSE on Node.js held 1,000 simultaneous connections on a 512MB VPS with no issues. Node's event loop was designed exactly for this - many idle connections waiting for data.
Problem: SSE is one-way. Users could receive messages but sending still went through a PHP AJAX call. Not a technical problem - but IE9 had no native EventSource support, and the polyfill was unreliable. We still had 20% IE9 traffic.
Iteration 3: WebSockets with Socket.io 0.9
For the third iteration we moved to Socket.io 0.9 on Node.js, accepting the IE9 fallback behavior (Socket.io would use XHR long-polling for IE9 automatically).
// Server: Node.js + Socket.io 0.9
var io = require('socket.io').listen(3002);
// Critical: reduce logging - Socket.io 0.9 was verbose
io.set('log level', 1);
// The memory leak fix (unofficial, from GitHub issues)
io.set('browser client etag', true);
io.set('browser client minification', true);
io.set('transports', ['websocket', 'xhr-polling']); // Limit transports
io.sockets.on('connection', function(socket) {
var userId = null;
var roomId = null;
socket.on('join', function(data) {
// Validate JWT token from PHP session
var payload = verifyToken(data.token);
if (!payload) { socket.disconnect(); return; }
userId = payload.user_id;
roomId = data.room_id;
socket.join('room:' + roomId);
});
socket.on('message', function(data) {
if (!userId || !roomId) return;
// Save to DB via internal HTTP call to PHP
saveMessage(userId, roomId, data.text, function(message) {
// Broadcast to room
io.sockets.in('room:' + roomId).emit('message', message);
});
});
socket.on('disconnect', function() {
// Nothing to clean up - Socket.io handles room membership
});
});
// Client
var socket = io.connect('https://chat.yoursite.com:3002', {
'reconnection delay': 1000,
'reconnection limit': 5000,
'max reconnection attempts': Infinity
});
socket.on('connect', function() {
socket.emit('join', { token: SESSION_TOKEN, room_id: ROOM_ID });
});
socket.on('message', function(msg) {
appendMessage(msg);
});
socket.on('disconnect', function() {
showBanner('Connection lost - reconnecting...');
});
The memory leak: Socket.io 0.9 leaked memory when sockets disconnected without a clean handshake (browser tab closed, mobile switching networks). We worked around it with a PM2 auto-restart rule - restart the process when RSS exceeded 512MB:
// ecosystem.json for PM2
{
"apps": [{
"name": "chat",
"script": "chat-server.js",
"max_memory_restart": "512M",
"restart_delay": 100
}]
}
Zero-downtime restart: new connections went to the new process, existing connections finished gracefully. Users saw a 100ms reconnect event, not a disconnect.
What actually shipped
The final production system: Socket.io WebSockets for modern browsers, long-polling fallback for IE9, Node.js backend with PM2 restart management, PHP application calling Node.js internal API to broadcast server-generated events.
This hybrid worked reliably for 18 months until Socket.io 1.0 arrived with a rewritten engine and the memory issues resolved.
The lesson: in 2013, real-time was solvable but required understanding the failure modes of each transport. Long-polling's Apache worker exhaustion. SSE's lack of bidirectional communication and IE9 incompatibility. WebSocket's early SDK instability.
Understanding why each approach failed made the combined solution obvious. That's still true for WebSockets, SSE, and their 2025 successors.
Aunimeda builds production-grade backend systems - APIs, microservices, real-time applications, and system integrations.
Contact us for backend engineering services. See also: Custom Software Development, Web Development