The Multi-CPU Era: Real-World Multithreading with POSIX pthreads
With the rise of the Pentium III Xeon and dual-processor Sun Ultra workstations, building single-threaded applications is a waste of silicon. In 1999, if you're writing a server that doesn't utilize all available CPUs, you're building a bottleneck. This is the year of pthreads (POSIX Threads).
Moving from fork() to pthread_create() is not just about speed; it’s about shared address spaces. But with great power comes the absolute certainty of a deadlock if you don't respect your mutexes.
Creating the Thread
A thread in C is created by pointing pthread_create() at a function. The trick is passing your data via a single void*.
#include <pthread.h>
#include <stdio.h>
void* worker_thread(void* arg) {
int thread_id = *(int*)arg;
printf("Worker thread %d starting heavy computation...\n", thread_id);
// Perform compute-heavy task here
return NULL;
}
int main() {
pthread_t threads[2];
int thread_args[2] = {1, 2};
for(int i = 0; i < 2; i++) {
pthread_create(&threads[i], NULL, worker_thread, &thread_args[i]);
}
for(int i = 0; i < 2; i++) {
pthread_join(threads[i], NULL);
}
}
The Mutex: Your Best Friend and Worst Enemy
In a shared address space, you cannot simply increment a global counter. Two CPUs reading and writing the same memory address simultaneously is the definition of a race condition. You MUST use pthread_mutex_t.
pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER;
long global_counter = 0;
void increment_counter() {
pthread_mutex_lock(&counter_mutex);
global_counter++; // Safe now
pthread_mutex_unlock(&counter_mutex);
}
SMP Performance and False Sharing
If you're seeing poor scaling on a dual-CPU system, you might be hitting False Sharing. This happens when two different variables, used by different threads on different CPUs, end up on the same cache line. The hardware will constantly invalidate the cache for both CPUs, causing a performance nose-dive.
To fix this, align your thread-specific data to 64-byte boundaries (the typical cache line size for late-90s hardware) or use padding:
struct thread_data {
long result;
char padding[56]; // Ensure each struct is on its own cache line
};
On Linux 2.2, pthreads are implemented as 'lightweight processes' via the clone() system call. This means they are scheduled by the kernel, allowing them to actually run in parallel on SMP systems. Avoid 'user-space' threading libraries (Green Threads) if you want real performance on multi-CPU hardware.
Aunimeda provides DevOps engineering and infrastructure services - CI/CD pipelines, containerization, cloud deployments, and monitoring setups.
Contact us to discuss your infrastructure needs. See also: DevOps Services, Custom Software Development