Raw Socket Mastery: High-Performance TCP Load Balancing in C

If you're still using fork() for every incoming connection, your server is going to crawl and die the moment you get Slashdotted. We're in 1998, and the "C10k" wall is real. Linux 2.0.36 is stable, but its scheduler isn't ready to juggle 5,000 processes. To scale a web cluster today, you need a single-process multiplexing engine.

The secret is select(). We treat file descriptors as bitfields and let the kernel tell us when a socket has data ready to be drained. No context switching overhead, no memory bloat.

The Non-Blocking Socket Setup

First, forget blocking I/O. We want our load balancer to never sleep. Every socket-both the listening side and the backend connections-must be set to O_NONBLOCK.

#include <fcntl.h>
#include <sys/socket.h>

int set_nonblocking(int fd) {
    int opts = fcntl(fd, F_GETFL);
    if (opts < 0) return -1;
    opts = (opts | O_NONBLOCK);
    if (fcntl(fd, F_SETFL, opts) < 0) return -1;
    return 0;
}

The Multiplexing Loop

We maintain a master_fds set. When select() returns, we iterate through the active bits. If it's the listener, we accept(). If it's an established connection, we shovel bytes between the client and our backend farm.

fd_set master_fds;
fd_set read_fds;
int fd_max;

// ... initialization ...

for(;;) {
    read_fds = master_fds; // copy it
    if (select(fd_max+1, &read_fds, NULL, NULL, NULL) == -1) {
        exit(1); 
    }

    for(int i = 0; i <= fd_max; i++) {
        if (FD_ISSET(i, &read_fds)) {
            if (i == listener) {
                // handle new connection
                addrlen = sizeof(remoteaddr);
                newfd = accept(listener, (struct sockaddr *)&remoteaddr, &addrlen);
                if (newfd != -1) {
                    set_nonblocking(newfd);
                    FD_SET(newfd, &master_fds);
                    if (newfd > fd_max) fd_max = newfd;
                }
            } else {
                // handle data from client or backend
                char buf[2048];
                int nbytes = recv(i, buf, sizeof(buf), 0);
                if (nbytes <= 0) {
                    close(i);
                    FD_CLR(i, &master_fds);
                } else {
                    // find the peer (backend or client) and send
                    // real hackers use a lookup table here
                    send(peer_map[i], buf, nbytes, 0);
                }
            }
        }
    }
}

Memory Efficiency: The Real Bottleneck

In a production environment, you cannot afford malloc() for every packet. We pre-allocate a static pool of buffers at startup. Each connection gets a pointer into this ring buffer. If your load balancer is swapping to disk, you've already lost. We keep the state machine lean: just a struct per connection tracking the client FD, the backend FD, and a small byte-count offset.

Linux 2.2 is on the horizon, promising better threading, but for now, the single-threaded select() loop is the fastest path to high-availability. If you need more than 1024 descriptors (the default FD_SETSIZE), you'll have to recompile your kernel or start looking at poll(), though support for it is still spotty across different Unices.

Data alignment matters. On Alpha or SPARC, unaligned access will SIGBUS your process. Even on x86, it's a performance hit. Pack your structs tightly and keep your hot loops in the L1 cache.

Aunimeda provides DevOps engineering and infrastructure services - CI/CD pipelines, containerization, cloud deployments, and monitoring setups.

Raw Socket Mastery: High-Performance TCP Load Balancing in C (1998)

Raw Socket Mastery: High-Performance TCP Load Balancing in C

The Non-Blocking Socket Setup

The Multiplexing Loop

Memory Efficiency: The Real Bottleneck

Aunimeda

Need IT development for your business?

Raw Socket Mastery: High-Performance TCP Load Balancing in C (1998)

Raw Socket Mastery: High-Performance TCP Load Balancing in C

The Non-Blocking Socket Setup

The Multiplexing Loop

Memory Efficiency: The Real Bottleneck

Aunimeda

Read Also

Kafka: Zero-Copy and Why It's Fast (2015)

Docker 1.0+: Deep Dive into Overlay Networking and VXLAN (2014)

The 2008 Scaling Crisis: Caching at the Edge with Memcached

Need IT development for your business?