AboutBlogContact
DevOps & InfrastructureApril 10, 1999 3 min read 153Updated: June 22, 2026

The Multi-CPU Era: Real-World Multithreading with POSIX pthreads (1999)

AunimedaAunimeda
📋 Table of Contents

The Multi-CPU Era: Real-World Multithreading with POSIX pthreads

With the rise of the Pentium III Xeon and dual-processor Sun Ultra workstations, building single-threaded applications is a waste of silicon. In 1999, if you're writing a server that doesn't utilize all available CPUs, you're building a bottleneck. This is the year of pthreads (POSIX Threads).

Moving from fork() to pthread_create() is not just about speed; it’s about shared address spaces. But with great power comes the absolute certainty of a deadlock if you don't respect your mutexes.

Creating the Thread

A thread in C is created by pointing pthread_create() at a function. The trick is passing your data via a single void*.

#include <pthread.h>
#include <stdio.h>

void* worker_thread(void* arg) {
    int thread_id = *(int*)arg;
    printf("Worker thread %d starting heavy computation...\n", thread_id);
    
    // Perform compute-heavy task here
    
    return NULL;
}

int main() {
    pthread_t threads[2];
    int thread_args[2] = {1, 2};

    for(int i = 0; i < 2; i++) {
        pthread_create(&threads[i], NULL, worker_thread, &thread_args[i]);
    }

    for(int i = 0; i < 2; i++) {
        pthread_join(threads[i], NULL);
    }
}

The Mutex: Your Best Friend and Worst Enemy

In a shared address space, you cannot simply increment a global counter. Two CPUs reading and writing the same memory address simultaneously is the definition of a race condition. You MUST use pthread_mutex_t.

pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
long global_counter = 0;

void increment_counter() {
    pthread_mutex_lock(&counter_mutex);
    global_counter++; // Safe now
    pthread_mutex_unlock(&counter_mutex);
}

SMP Performance and False Sharing

If you're seeing poor scaling on a dual-CPU system, you might be hitting False Sharing. This happens when two different variables, used by different threads on different CPUs, end up on the same cache line. The hardware will constantly invalidate the cache for both CPUs, causing a performance nose-dive.

To fix this, align your thread-specific data to 64-byte boundaries (the typical cache line size for late-90s hardware) or use padding:

struct thread_data {
    long result;
    char padding[56]; // Ensure each struct is on its own cache line
};

On Linux 2.2, pthreads are implemented as 'lightweight processes' via the clone() system call. This means they are scheduled by the kernel, allowing them to actually run in parallel on SMP systems. Avoid 'user-space' threading libraries (Green Threads) if you want real performance on multi-CPU hardware.


Aunimeda provides DevOps engineering and infrastructure services - CI/CD pipelines, containerization, cloud deployments, and monitoring setups.

Contact us to discuss your infrastructure needs. See also: DevOps Services, Custom Software Development

Read Also

Kafka: Zero-Copy and Why It's Fast (2015)aunimeda
DevOps & Infrastructure

Kafka: Zero-Copy and Why It's Fast (2015)

How does Kafka push gigabits of data on commodity hardware? The secret isn't in the code; it's in the Linux kernel's sendfile() call.

The 2008 Scaling Crisis: Caching at the Edge with Memcachedaunimeda
DevOps & Infrastructure

The 2008 Scaling Crisis: Caching at the Edge with Memcached

Your database is the bottleneck. In 2008, if you're hitting your MySQL server for every user profile, you're not scaling. It's time to offload the heavy lifting to a distributed memory pool.

Memcached: Slab Allocation Internals (2007)aunimeda
DevOps & Infrastructure

Memcached: Slab Allocation Internals (2007)

Why is your cache server swapping? It's probably memory fragmentation. Let's look at how Memcached solves this with slabs.

Need IT development for your business?

We build websites, mobile apps and AI solutions. Free consultation.

DevOps Services

Get Consultation All articles