How Producers Work
A Kafka producer is any application that sends messages to Kafka. When you call producer.send(), the message doesn't go directly to a broker. It passes through several stages: serialization, partitioning, batching, and finally transmission. Understanding this pipeline is key to configuring producers for your throughput and durability needs.
The Producer Record
Every message you send is a ProducerRecord with these fields:
| Field | Required | Purpose |
|---|---|---|
topic | Yes | Which topic to send to |
value | Yes | The message payload (bytes) |
key | No | Used for partitioning. Same key = same partition = ordering |
partition | No | Override automatic partitioning |
headers | No | Key-value metadata (like HTTP headers) |
timestamp | No | Event time (defaults to current time) |
Partitioning Strategies
The partitioner decides which partition a message goes to. This is critical because ordering is only guaranteed within a single partition. Messages with the same key always go to the same partition, ensuring ordered processing per entity (e.g., all events for user 42 go to partition 3).
# Python producer example (confluent-kafka)
from confluent_kafka import Producer
producer = Producer({'bootstrap.servers': 'localhost:9092'})
# Key-based partitioning: all events for user_42 go to the same partition
producer.produce(
topic='user-events',
key='user_42', # Determines partition (hash of key % num_partitions)
value='{"action": "login", "timestamp": "2024-01-15T10:30:00Z"}',
)
# No key: round-robin across partitions (no ordering guarantee)
producer.produce(
topic='page-views',
value='{"url": "/home", "duration": 30}',
)
producer.flush() # Wait for all messages to be delivered
Acknowledgment Modes (Durability vs Speed)
The acks setting controls how many brokers must confirm receipt before the producer considers a message "sent." This is the fundamental tradeoff between durability and speed:
| Setting | Behavior | Durability | Latency | Use When |
|---|---|---|---|---|
acks=0 | Don't wait for any confirmation | May lose data | Fastest | Metrics, logs you can afford to lose |
acks=1 | Leader confirms write | Lose data if leader dies before replication | Fast | Most applications |
acks=all | All in-sync replicas confirm | No data loss (with min.insync.replicas) | Slower | Financial data, orders, critical events |
⚠️ Common Mistake: Using acks=0 or acks=1 for critical data
Wrong: Sending payment events with acks=1.
Why: If the leader broker dies after acknowledging but before replicating, the message is lost. For a payment, that means money moved but you have no record of it.
Instead: Use acks=all with min.insync.replicas=2 for any data you can't afford to lose.
Batching and Compression
Producers don't send messages one at a time — they batch them for efficiency. Two key settings control this:
batch.size— Maximum batch size in bytes (default: 16KB). Larger batches = fewer network round trips = higher throughput.linger.ms— How long to wait for more messages before sending (default: 0ms). Setting to 5-10ms dramatically improves throughput by filling batches.compression.type— Compress batches before sending. Options:none,gzip,snappy,lz4,zstd. Recommended:lz4orzstdfor the best speed/compression ratio.
# High-throughput producer configuration
producer = Producer({
'bootstrap.servers': 'kafka1:9092,kafka2:9092,kafka3:9092',
'acks': 'all',
'compression.type': 'lz4',
'batch.size': 65536, # 64KB batches
'linger.ms': 10, # Wait 10ms to fill batches
'retries': 3,
'enable.idempotence': True, # Prevent duplicates on retry
})
acks=all, enable.idempotence=true, compression.type=lz4, and linger.ms=5. This gives you durability, deduplication, compression, and good throughput out of the box.🔍 Deep Dive: Idempotent Producers
Network failures can cause a producer to retry sending a message that was actually received by the broker, creating duplicates. enable.idempotence=true prevents this by assigning each producer a unique ID and each message a sequence number. The broker detects and discards duplicate deliveries. This is transparent — your code doesn't change. Always enable it unless you have a specific reason not to.
Fire and forget
Leader confirms
All replicas confirm
Practice Exercises
Easy Hello World Variant
Modify the example to accept user input and print a personalized greeting.
Easy Code Reading
Read through the code examples above and predict the output before running them.
Medium Extend the Example
Take one code example and add error handling, input validation, or a new feature.