Why Multiprocessing?
Why Multiprocessing Matters
The Problem: The GIL bottlenecks CPU-bound Python — one core does the work, the rest idle, no matter how many threads you spawn.
The Solution: multiprocessing forks separate processes, each with its own interpreter and GIL, so they run truly in parallel. ProcessPoolExecutor + chunksize is the canonical pattern.
Real Impact: For data pipelines, scientific computing, image/video processing, and any CPU-heavy workload, multiprocessing turns an 8-core machine into a real 8-core machine for Python.
Real-World Analogy
Think of processes as separate kitchens each cooking one dish:
- Process = a self-contained kitchen with its own equipment and ingredients
- Pool = the head chef assigning dishes from a queue to free kitchens
- Queue / Pipe = the dumb waiter passing finished dishes back to the dining room
- Pickling = putting the dish in a takeaway box so it can travel between kitchens
- Shared memory = a communal fridge — fast but everyone must agree on the rules
Multiprocessing spawns separate Python processes, each with its own interpreter and memory space. This bypasses the GIL and gives true parallelism on multiple cores — ideal for CPU-bound work.
| Threading | Multiprocessing | asyncio | |
|---|---|---|---|
| Parallelism | No (GIL) | Yes | No (single thread) |
| Best for | I/O-bound | CPU-bound | I/O-bound, high concurrency |
| Memory | Shared | Separate | Shared |
| Spawn cost | Cheap | Expensive | Very cheap |
Process and Pool
from multiprocessing import Process
def crunch(n):
total = sum(i * i for i in range(n))
print(f"{n}: {total}")
if __name__ == "__main__":
procs = [Process(target=crunch, args=(10_000_000,)) for _ in range(4)]
for p in procs: p.start()
for p in procs: p.join()
⚠️ Always guard with if __name__ == "__main__"
On Windows and macOS, child processes re-import your module. Without the guard, you'll fork bomb yourself.
ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor
def cube(n): return n ** 3
if __name__ == "__main__":
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(cube, range(1_000_000), chunksize=10_000))
chunksize matters
Without chunksize, each item is pickled and sent individually — huge overhead. For many small items, chunk them in batches of 1000-10000.
Inter-Process Communication
Queue and Pipe
from multiprocessing import Process, Queue
def worker(q):
while True:
item = q.get()
if item is None: break
process(item)
if __name__ == "__main__":
q = Queue()
workers = [Process(target=worker, args=(q,)) for _ in range(4)]
for w in workers: w.start()
for item in jobs: q.put(item)
for _ in workers: q.put(None) # shutdown signal
for w in workers: w.join()
Shared Memory
from multiprocessing import shared_memory
import numpy as np
# Parent — allocate shared block
shm = shared_memory.SharedMemory(create=True, size=1024)
arr = np.ndarray((256,), dtype=np.float64, buffer=shm.buf)
arr[0] = 3.14
# Child — attach by name (shm.name)
# existing = shared_memory.SharedMemory(name=...)
shm.close()
shm.unlink() # delete the block
Pickling Constraints
Arguments and return values are pickled to cross process boundaries. Lambdas, locally defined functions, and lock instances cannot be pickled.
# BAD — lambda can't be pickled
ex.map(lambda x: x * 2, items)
# GOOD — module-level function
def double(x): return x * 2
ex.map(double, items)
Start methods
multiprocessing.set_start_method("spawn") is the safe default (and the only option on macOS/Windows). "fork" is fastest on Linux but unsafe with threads.
🎯 Practice Exercises
Exercise 1: Parallel sum
Use ProcessPoolExecutor to compute sum of squares 1..10M across 4 workers. Compare to single-threaded baseline.
Exercise 2: Image processor
Apply a CPU-bound transform (resize, filter) to a list of files in parallel. Use Pool.map with appropriate chunksize.
Exercise 3: Pipeline
Build a producer process + 4 worker processes + a results collector — all communicating via Queue.
Exercise 4: Shared counter
Use multiprocessing.Value with a lock to safely count totals across processes.