Threading and the GIL

Hard40 min read

Threads in Python

Why Threading Matters

The Problem: Sequential code waits idle while a network call returns — wasting capacity. Doing 100 HTTP calls one at a time takes 100x as long as it should.

The Solution: Threads let one process handle many I/O-bound waits concurrently. The GIL prevents true parallel CPU execution, but for I/O — where most apps spend their time — threading is the simplest concurrency model.

Real Impact: Use ThreadPoolExecutor for I/O-bound work, multiprocessing for CPU-bound. Knowing the GIL distinction prevents the most common Python performance mistake.

Real-World Analogy

Think of threads as call-center agents sharing one phone book:

  • Thread = an agent who handles one customer at a time but can switch when the customer is on hold
  • GIL = only one agent may read the phone book at a time — fine if everyone's on hold
  • Lock = a sticky note reserving a phone book page for one agent
  • ThreadPoolExecutor = the supervisor who hands out customers from a queue
  • Daemon thread = a background agent that gets sent home when the office closes

The threading module provides OS-level threads. Threads share memory and are great for I/O-bound workloads (network, disk). Due to the GIL (covered below), they don't speed up CPU-bound Python code.

import threading
import time

def worker(n: int):
    print(f"worker {n} starting")
    time.sleep(1)
    print(f"worker {n} done")

threads = [threading.Thread(target=worker, args=(i,)) for i in range(5)]
for t in threads: t.start()
for t in threads: t.join()                   # wait for completion

Daemon Threads

t = threading.Thread(target=poll_loop, daemon=True)
t.start()
# Daemon threads die when the main program exits — useful for background loops

The Global Interpreter Lock (GIL)

CPython's GIL ensures only one thread executes Python bytecode at a time. This means:

# CPU-bound — threading provides ~0 speedup
def crunch():
    return sum(i * i for i in range(10_000_000))

# I/O-bound — threading provides big speedup
def fetch(url):
    return requests.get(url).text

Python 3.13+ free-threaded build

Python 3.13 ships an experimental no-GIL build. Once mature, this will allow real parallel Python execution. For now, the GIL is the rule.

concurrent.futures — High-Level API

The recommended way to use threads in modern Python. Submit functions to an executor, get futures back.

from concurrent.futures import ThreadPoolExecutor
import requests

urls = ["https://example.com"] * 20

with ThreadPoolExecutor(max_workers=10) as ex:
    # map: in input order
    for result in ex.map(lambda u: requests.get(u).status_code, urls):
        print(result)

    # Or submit individually for finer control
    futures = [ex.submit(requests.get, u) for u in urls]
    for f in futures:
        print(f.result().status_code)

Synchronization Primitives

Even with the GIL, Python operations are not atomic at the bytecode level. Use locks for shared mutable state.

import threading

counter = 0
lock = threading.Lock()

def increment(n):
    global counter
    for _ in range(n):
        with lock:           # automatic acquire/release
            counter += 1

Primitives

PrimitiveUse
LockMutex — one holder at a time
RLockReentrant — same thread can acquire multiple times
Semaphore(n)Up to n holders simultaneously
EventOne thread signals, others wait
ConditionLock + signal — wait until a predicate becomes true
Barrier(n)All n threads wait until everyone arrives
queue.QueueThread-safe FIFO queue — preferred for producer/consumer

Producer/Consumer with Queue

import queue, threading

q = queue.Queue(maxsize=100)

def producer():
    for i in range(1000):
        q.put(i)            # blocks if full
    q.put(None)               # sentinel

def consumer():
    while True:
        item = q.get()
        if item is None: break
        process(item)
        q.task_done()

Thread-Local Storage

Each thread gets its own attributes — useful for request-scoped data without explicit plumbing.

import threading

local_data = threading.local()

def handle_request(req):
    local_data.user = req.user
    process()

def process():
    print(local_data.user)         # whatever was set in THIS thread

🎯 Practice Exercises

Exercise 1: URL fetcher

Fetch 50 URLs with ThreadPoolExecutor. Compare wall-clock time against a sequential version.

Exercise 2: Counter race

Spawn 10 threads incrementing a counter 100k times. Show the count is less than 1M without a lock. Fix with Lock.

Exercise 3: Producer-consumer

Build a thread pool where producers fill a Queue and consumers drain it. Use a sentinel value to shut down cleanly.

Exercise 4: CPU vs I/O

Run a CPU-bound function with threads and time it. Repeat with an I/O-bound function. Observe why one scales and the other doesn't.