Multiprocessing

Hard40 min read

Why Multiprocessing?

Why Multiprocessing Matters

The Problem: The GIL bottlenecks CPU-bound Python — one core does the work, the rest idle, no matter how many threads you spawn.

The Solution: multiprocessing forks separate processes, each with its own interpreter and GIL, so they run truly in parallel. ProcessPoolExecutor + chunksize is the canonical pattern.

Real Impact: For data pipelines, scientific computing, image/video processing, and any CPU-heavy workload, multiprocessing turns an 8-core machine into a real 8-core machine for Python.

Real-World Analogy

Think of processes as separate kitchens each cooking one dish:

  • Process = a self-contained kitchen with its own equipment and ingredients
  • Pool = the head chef assigning dishes from a queue to free kitchens
  • Queue / Pipe = the dumb waiter passing finished dishes back to the dining room
  • Pickling = putting the dish in a takeaway box so it can travel between kitchens
  • Shared memory = a communal fridge — fast but everyone must agree on the rules

Multiprocessing spawns separate Python processes, each with its own interpreter and memory space. This bypasses the GIL and gives true parallelism on multiple cores — ideal for CPU-bound work.

ThreadingMultiprocessingasyncio
ParallelismNo (GIL)YesNo (single thread)
Best forI/O-boundCPU-boundI/O-bound, high concurrency
MemorySharedSeparateShared
Spawn costCheapExpensiveVery cheap

Process and Pool

from multiprocessing import Process

def crunch(n):
    total = sum(i * i for i in range(n))
    print(f"{n}: {total}")

if __name__ == "__main__":
    procs = [Process(target=crunch, args=(10_000_000,)) for _ in range(4)]
    for p in procs: p.start()
    for p in procs: p.join()

⚠️ Always guard with if __name__ == "__main__"

On Windows and macOS, child processes re-import your module. Without the guard, you'll fork bomb yourself.

ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor

def cube(n): return n ** 3

if __name__ == "__main__":
    with ProcessPoolExecutor(max_workers=4) as ex:
        results = list(ex.map(cube, range(1_000_000), chunksize=10_000))

chunksize matters

Without chunksize, each item is pickled and sent individually — huge overhead. For many small items, chunk them in batches of 1000-10000.

Inter-Process Communication

Queue and Pipe

from multiprocessing import Process, Queue

def worker(q):
    while True:
        item = q.get()
        if item is None: break
        process(item)

if __name__ == "__main__":
    q = Queue()
    workers = [Process(target=worker, args=(q,)) for _ in range(4)]
    for w in workers: w.start()

    for item in jobs: q.put(item)
    for _ in workers: q.put(None)        # shutdown signal
    for w in workers: w.join()

Shared Memory

from multiprocessing import shared_memory
import numpy as np

# Parent — allocate shared block
shm = shared_memory.SharedMemory(create=True, size=1024)
arr = np.ndarray((256,), dtype=np.float64, buffer=shm.buf)
arr[0] = 3.14

# Child — attach by name (shm.name)
# existing = shared_memory.SharedMemory(name=...)

shm.close()
shm.unlink()         # delete the block

Pickling Constraints

Arguments and return values are pickled to cross process boundaries. Lambdas, locally defined functions, and lock instances cannot be pickled.

# BAD — lambda can't be pickled
ex.map(lambda x: x * 2, items)

# GOOD — module-level function
def double(x): return x * 2
ex.map(double, items)

Start methods

multiprocessing.set_start_method("spawn") is the safe default (and the only option on macOS/Windows). "fork" is fastest on Linux but unsafe with threads.

🎯 Practice Exercises

Exercise 1: Parallel sum

Use ProcessPoolExecutor to compute sum of squares 1..10M across 4 workers. Compare to single-threaded baseline.

Exercise 2: Image processor

Apply a CPU-bound transform (resize, filter) to a list of files in parallel. Use Pool.map with appropriate chunksize.

Exercise 3: Pipeline

Build a producer process + 4 worker processes + a results collector — all communicating via Queue.

Exercise 4: Shared counter

Use multiprocessing.Value with a lock to safely count totals across processes.