The Multi-CPU Era: Real-World Multithreading with POSIX

#C#pthreads#Multithreading#Performance#SMP#Linux

📋 Table of Contents ▼

The Multi-CPU Era: Real-World Multithreading with POSIX pthreads

With the rise of the Pentium III Xeon and dual-processor Sun Ultra workstations, building single-threaded applications is a waste of silicon. In 1999, if you're writing a server that doesn't utilize all available CPUs, you're building a bottleneck. This is the year of pthreads (POSIX Threads).

Moving from fork() to pthread_create() is not just about speed; it’s about shared address spaces. But with great power comes the absolute certainty of a deadlock if you don't respect your mutexes.

Creating the Thread

A thread in C is created by pointing pthread_create() at a function. The trick is passing your data via a single void*.

#include <pthread.h>
#include <stdio.h>

void* worker_thread(void* arg) {
    int thread_id = *(int*)arg;
    printf("Worker thread %d starting heavy computation...\n", thread_id);
    
    // Perform compute-heavy task here
    
    return NULL;
}

int main() {
    pthread_t threads[2];
    int thread_args[2] = {1, 2};

    for(int i = 0; i < 2; i++) {
        pthread_create(&threads[i], NULL, worker_thread, &thread_args[i]);
    }

    for(int i = 0; i < 2; i++) {
        pthread_join(threads[i], NULL);
    }
}

The Mutex: Your Best Friend and Worst Enemy

In a shared address space, you cannot simply increment a global counter. Two CPUs reading and writing the same memory address simultaneously is the definition of a race condition. You MUST use pthread_mutex_t.

pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
long global_counter = 0;

void increment_counter() {
    pthread_mutex_lock(&counter_mutex);
    global_counter++; // Safe now
    pthread_mutex_unlock(&counter_mutex);
}

SMP Performance and False Sharing

If you're seeing poor scaling on a dual-CPU system, you might be hitting False Sharing. This happens when two different variables, used by different threads on different CPUs, end up on the same cache line. The hardware will constantly invalidate the cache for both CPUs, causing a performance nose-dive.

To fix this, align your thread-specific data to 64-byte boundaries (the typical cache line size for late-90s hardware) or use padding:

struct thread_data {
    long result;
    char padding[56]; // Ensure each struct is on its own cache line
};

On Linux 2.2, pthreads are implemented as 'lightweight processes' via the clone() system call. This means they are scheduled by the kernel, allowing them to actually run in parallel on SMP systems. Avoid 'user-space' threading libraries (Green Threads) if you want real performance on multi-CPU hardware.

Aunimeda provides DevOps engineering and infrastructure services - CI/CD pipelines, containerization, cloud deployments, and monitoring setups.

The Multi-CPU Era: Real-World Multithreading with POSIX pthreads (1999)

The Multi-CPU Era: Real-World Multithreading with POSIX pthreads

Creating the Thread

The Mutex: Your Best Friend and Worst Enemy

SMP Performance and False Sharing

Aunimeda

Need IT development for your business?

The Multi-CPU Era: Real-World Multithreading with POSIX pthreads (1999)

The Multi-CPU Era: Real-World Multithreading with POSIX pthreads

Creating the Thread

The Mutex: Your Best Friend and Worst Enemy

SMP Performance and False Sharing

Aunimeda

Read Also

Kafka: Zero-Copy and Why It's Fast (2015)

The 2008 Scaling Crisis: Caching at the Edge with Memcached

Memcached: Slab Allocation Internals (2007)

Need IT development for your business?