Message Queues & Event-Driven Architecture

Why Message Queues Matter

Why This Matters

The Problem: Synchronous communication between services creates tight coupling. If Service B is down, Service A fails too. If Service B is slow, Service A blocks waiting.

The Solution: Message queues decouple producers from consumers. The producer sends a message and moves on. The consumer processes it when ready.

Real Impact: LinkedIn processes over 7 trillion messages per day through Apache Kafka, enabling real-time analytics, notifications, and data pipelines.

Real-World Analogy

Think of a message queue like a post office:

Producer = Person dropping off a letter (sends and forgets)
Queue = Post office mailbox (stores messages safely)
Consumer = Recipient (picks up mail on their schedule)
Topic = Address routing (messages go to the right destination)
Acknowledgment = Delivery confirmation (consumer confirms receipt)

Benefits of Asynchronous Messaging

Decoupling

Producers and consumers can be developed, deployed, and scaled independently. Neither needs to know about the other's implementation.

Resilience

If a consumer crashes, messages persist in the queue. When it recovers, processing resumes from where it left off. No data loss.

Load Leveling

Queues absorb traffic spikes. Even if 10,000 requests arrive in a second, consumers process them at their own sustainable rate.

Fan-Out

A single event can trigger multiple independent consumers: send an email, update analytics, notify a partner system -- all from one message.

Point-to-Point vs Pub/Sub

Pub/Sub Architecture

Aspect	Point-to-Point	Pub/Sub
Delivery	One message to one consumer	One message to many subscribers
Use Case	Task distribution, work queues	Event broadcasting, notifications
Coupling	Producer aware of queue	Producer unaware of subscribers
Example	Job processing queue	Order placed triggers email + analytics + inventory

Apache Kafka Deep Dive

Apache Kafka is a distributed event streaming platform capable of handling trillions of events per day. Originally developed at LinkedIn, it is now used by over 80% of Fortune 100 companies for real-time data pipelines and streaming applications.

Kafka Partition Architecture

Key Kafka Concepts

Kafka Architecture Components

Broker: A single Kafka server that stores data and serves clients. A cluster has multiple brokers.
Topic: A category or feed name to which records are published. Analogous to a database table.
Partition: A topic is split into partitions for parallelism. Each partition is an ordered, immutable sequence of records.
Offset: A sequential ID uniquely identifying each record within a partition.
Consumer Group: A set of consumers that cooperatively consume a topic. Each partition is assigned to exactly one consumer in the group.
Replication Factor: Number of copies of each partition across brokers. A factor of 3 tolerates 2 broker failures.

kafka_example.py

from kafka import KafkaProducer, KafkaConsumer
import json

# ─── Producer: Send order events ───

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    key_serializer=lambda k: k.encode('utf-8'),
    acks='all',           # Wait for all replicas
    retries=3,             # Retry on transient failures
    linger_ms=10,           # Batch messages for efficiency
)

def publish_order_event(order):
    """Publish an order event, keyed by user_id for partition ordering."""
    future = producer.send(
        topic='order-events',
        key=str(order['user_id']),
        value={
            'event_type': 'order_placed',
            'order_id': order['id'],
            'user_id': order['user_id'],
            'total': order['total'],
            'items': order['items'],
        }
    )
    # Block until sent (or timeout)
    record_metadata = future.get(timeout=10)
    print(f"Sent to partition {record_metadata.partition} offset {record_metadata.offset}")


# ─── Consumer: Process order events ───

consumer = KafkaConsumer(
    'order-events',
    bootstrap_servers=['localhost:9092'],
    group_id='email-notifications',
    value_deserializer=lambda m: json.loads(m.decode('utf-8')),
    auto_offset_reset='earliest',    # Start from beginning if no offset
    enable_auto_commit=False,       # Manual commit for reliability
)

for message in consumer:
    event = message.value
    print(f"Processing: {event['event_type']} for order {event['order_id']}")

    # Process the event (send email, etc.)
    send_order_confirmation_email(event)

    # Commit offset after successful processing
    consumer.commit()

RabbitMQ vs Kafka

Feature	RabbitMQ	Apache Kafka
Model	Message broker (push)	Distributed log (pull)
Message Retention	Deleted after acknowledgment	Retained for configurable duration
Ordering	Per-queue FIFO	Per-partition ordering
Throughput	~50K messages/sec	~1M+ messages/sec
Replay	Not supported (message consumed once)	Consumers can rewind to any offset
Best For	Task queues, routing, complex patterns	Event streaming, data pipelines, logs
Protocol	AMQP, MQTT, STOMP	Custom binary over TCP

Event-Driven Architecture Patterns

Event Notification

Services emit events when something happens. Consumers decide what to do. Minimal coupling, but consumers may need to call back for full details.

Event-Carried State Transfer

Events contain the full state needed by consumers. No callbacks required. Higher bandwidth but complete decoupling.

Event Sourcing

Store all changes as a sequence of events rather than current state. Enables full audit trail, time-travel, and rebuilding state from events.

CQRS (Command Query Separation)

Separate write (command) and read (query) models. Write to an event store, project into read-optimized views. Scales reads and writes independently.

Common Pitfall: Distributed Transactions

Problem: You need to update a database AND publish an event atomically. If either fails, the system becomes inconsistent.

Solution: Use the Outbox Pattern: write the event to an outbox table in the same database transaction, then a separate process reads the outbox and publishes to the message broker. This guarantees at-least-once delivery without distributed transactions.

Exactly-Once Delivery

Message delivery guarantees are one of the hardest problems in distributed systems. Understanding the trade-offs is essential.

Guarantee	Description	Trade-off
At-Most-Once	Message may be lost, never duplicated	Fire-and-forget. Fastest but messages can be lost.
At-Least-Once	Message never lost, may be duplicated	Retries ensure delivery. Consumer must be idempotent.
Exactly-Once	Message delivered exactly once	Highest overhead. Kafka achieves this with idempotent producers + transactional consumers.

Pro Tip: Idempotency is Key

In practice, most systems use at-least-once delivery with idempotent consumers. This means processing the same message twice produces the same result. Use techniques like:

Deduplication table: Store processed message IDs and skip duplicates
Idempotency keys: Use a unique request ID to ensure operations are applied only once
Database upserts: Use INSERT ON CONFLICT UPDATE instead of plain INSERT

Practice Problems

Medium Order Processing Pipeline

Design a message queue system for an e-commerce order pipeline:

Order placed triggers: payment processing, inventory update, email notification
Payment success triggers: shipping label creation
Handle failures at each stage without losing orders

Use a topic per event type (order-placed, payment-completed, etc.). Each downstream service subscribes to the relevant topic. Use dead-letter queues for failed messages.

# Topics:
# 1. order-events (order placed, updated, cancelled)
# 2. payment-events (payment success, failure)
# 3. shipping-events (label created, shipped)
# 4. notification-events (email, SMS triggers)

# Consumer Groups:
# - payment-service subscribes to order-events
# - inventory-service subscribes to order-events
# - email-service subscribes to order-events + payment-events
# - shipping-service subscribes to payment-events

# Dead Letter Queue (DLQ) pattern:
def process_with_retry(message, max_retries=3):
    for attempt in range(max_retries):
        try:
            process_order(message)
            return True
        except TransientError:
            time.sleep(2 ** attempt)  # Exponential backoff
    # All retries failed - send to DLQ
    producer.send('order-events-dlq', value=message)
    return False

Medium Kafka Partition Strategy

You have a Kafka topic receiving 100K events/sec for a ride-sharing app:

Choose an appropriate partition key to ensure ride events for the same ride are ordered
Determine the number of partitions needed if each consumer handles 5K events/sec
Handle the hot-partition problem if some cities have 10x more rides

Use ride_id as the partition key for ordering. You need at least 100K/5K = 20 partitions. For hot partitions, consider a compound key or random suffix for high-volume cities.

# 1. Partition key: ride_id (ensures all events for one ride go to same partition)
producer.send('ride-events', key=ride_id, value=event)

# 2. Partitions needed: ceil(100K / 5K) = 20 minimum
#    Add headroom: 30-40 partitions for growth

# 3. Hot partition solution: compound key
def partition_key(ride_id, city):
    if city in HIGH_VOLUME_CITIES:
        # Add random suffix to spread across partitions
        shard = hash(ride_id) % 10
        return f"{city}-{shard}"
    return ride_id  # Normal cities use ride_id

Hard Event Sourcing Design

Design an event-sourced banking system:

Account balance is derived from a stream of debit/credit events
Support querying balance at any point in time
Handle concurrent transactions on the same account

Store events in an append-only log with sequence numbers. Use snapshots every N events for fast state recovery. Use optimistic concurrency control (expected version) to handle conflicts.

class EventSourcedAccount:
    def __init__(self, account_id):
        self.account_id = account_id
        self.balance = 0
        self.version = 0
        self.events = []

    def apply_event(self, event):
        if event['type'] == 'credited':
            self.balance += event['amount']
        elif event['type'] == 'debited':
            self.balance -= event['amount']
        self.version += 1

    def debit(self, amount, expected_version):
        # Optimistic concurrency control
        if self.version != expected_version:
            raise ConcurrencyError("Account modified")
        if self.balance < amount:
            raise InsufficientFunds()
        event = {'type': 'debited', 'amount': amount}
        self.apply_event(event)
        self.events.append(event)

    def balance_at(self, target_version):
        """Replay events up to a point in time."""
        balance = 0
        for event in self.events[:target_version]:
            if event['type'] == 'credited':
                balance += event['amount']
            elif event['type'] == 'debited':
                balance -= event['amount']
        return balance

Quick Reference

Message Queue Decision Guide

Requirement	Recommended Tool	Reason
Task distribution	RabbitMQ / SQS	Push-based, per-message acknowledgment
Event streaming	Apache Kafka	Log-based, replay, high throughput
Serverless fan-out	AWS SNS + SQS	Managed, no infrastructure to run
Real-time analytics	Kafka + Flink/Spark	Stream processing on top of event log
Simple pub/sub	Redis Pub/Sub	Lightweight, in-memory, no persistence

Key Takeaways

Message queues decouple services and enable asynchronous processing
Kafka is a distributed log; RabbitMQ is a traditional message broker
Choose partition keys carefully to maintain ordering and avoid hot spots
Design consumers to be idempotent for at-least-once delivery
Use the outbox pattern for atomic database + event operations
Dead-letter queues catch messages that fail processing repeatedly