⏱️ 50 min read 🎯 Beginner Friendly 🔧 Hands-On Examples 📚 Complete Guide

Introduction to Service Communication

In microservices architecture, services need to communicate to deliver business functionality. Choosing the right communication pattern is critical for performance, reliability, and scalability.

Why Communication Matters

Poor communication patterns can lead to:

  • High latency and slow response times
  • Tight coupling between services
  • Cascading failures during outages
  • Difficult debugging and troubleshooting

Synchronous vs Asynchronous Communication

Aspect Synchronous Asynchronous
Analogy Phone call - wait for response Text message - continue without waiting
Examples REST, gRPC, GraphQL Message Queues, Event Streams, Pub/Sub
Coupling Tight - both services must be available Loose - services don't need to be online simultaneously
Response Time Immediate - blocking Eventual - non-blocking
Use Cases User-facing APIs, real-time queries Background jobs, event notifications, data processing
Complexity Simple - request/response Complex - requires message broker

Communication Pattern Categories

Pattern Categories

  1. Request/Response (Synchronous): Client sends request, waits for response
    • REST APIs (HTTP/JSON)
    • gRPC (HTTP/2 + Protocol Buffers)
    • GraphQL (Query language over HTTP)
  2. Event-Driven (Asynchronous): Services emit events, others react
    • Message Queues (RabbitMQ, AWS SQS)
    • Event Streams (Apache Kafka, AWS Kinesis)
    • Pub/Sub (Redis Pub/Sub, Google Pub/Sub)
  3. Hybrid: Combination of sync and async
    • API Gateway + Message Queue
    • CQRS (Command Query Responsibility Segregation)
When to Use Synchronous
  • Need immediate response (e.g., user login, product search)
  • Data must be consistent right away
  • Simple request/response flow
  • Client needs to know if operation succeeded immediately
When to Use Asynchronous
  • Long-running operations (video processing, report generation)
  • Fire-and-forget scenarios (sending email, logging)
  • Need to decouple services for scalability
  • Handling traffic spikes with queue buffering

Synchronous Protocols Deep Dive

Understanding REST, gRPC, and GraphQL - when to use each and how they compare.

REST APIs - The Universal Standard

REST (Representational State Transfer) is the most common way services communicate using HTTP methods and JSON.

Why REST Dominates

  • Universal - works everywhere (browsers, mobile, IoT)
  • Human-readable JSON format
  • Well-understood by developers
  • Extensive tooling and libraries

REST Principles:

  • Stateless: Each request contains all information needed
  • Resource-based: Everything is a resource with a URI
  • HTTP Methods: GET (read), POST (create), PUT (update), DELETE (delete)
  • Standard Status Codes: 200 OK, 404 Not Found, 500 Server Error
Python - Flask REST API
from flask import Flask, jsonify, request

app = Flask(__name__)

# In-memory database
products = {
    1: {"id": 1, "name": "Laptop", "price": 999.99, "stock": 50},
    2: {"id": 2, "name": "Mouse", "price": 29.99, "stock": 200}
}

@app.route('/products', methods=['GET'])
def get_products():
    return jsonify(list(products.values()))

@app.route('/products/', methods=['GET'])
def get_product(product_id):
    product = products.get(product_id)
    if not product:
        return jsonify({"error": "Product not found"}), 404
    return jsonify(product)

@app.route('/products', methods=['POST'])
def create_product():
    data = request.json
    product_id = max(products.keys()) + 1
    products[product_id] = {
        "id": product_id,
        "name": data['name'],
        "price": data['price'],
        "stock": data['stock']
    }
    return jsonify(products[product_id]), 201

@app.route('/products/', methods=['PUT'])
def update_product(product_id):
    if product_id not in products:
        return jsonify({"error": "Product not found"}), 404
    data = request.json
    products[product_id].update(data)
    return jsonify(products[product_id])

@app.route('/products/', methods=['DELETE'])
def delete_product(product_id):
    if product_id not in products:
        return jsonify({"error": "Product not found"}), 404
    del products[product_id]
    return jsonify({"message": "Product deleted"}), 200

gRPC - High Performance RPC

gRPC uses HTTP/2 and Protocol Buffers for fast, efficient communication between services.

Feature REST gRPC
Protocol HTTP/1.1 HTTP/2
Data Format JSON (text) Protocol Buffers (binary)
Performance Good Excellent (7-10x faster)
Streaming Not supported Bi-directional streaming
Browser Support Native Requires gRPC-Web
Code Generation Manual Automatic from .proto files
Protocol Buffers - product.proto
syntax = "proto3";

package ecommerce;

service ProductService {
    rpc GetProduct(ProductRequest) returns (ProductResponse);
    rpc ListProducts(Empty) returns (stream ProductResponse);
    rpc CreateProduct(CreateProductRequest) returns (ProductResponse);
}

message ProductRequest {
    int32 product_id = 1;
}

message CreateProductRequest {
    string name = 1;
    double price = 2;
    int32 stock = 3;
}

message ProductResponse {
    int32 id = 1;
    string name = 2;
    double price = 3;
    int32 stock = 4;
}

message Empty {}
Python - gRPC Server
import grpc
from concurrent import futures
import product_pb2
import product_pb2_grpc

class ProductServicer(product_pb2_grpc.ProductServiceServicer):
    def __init__(self):
        self.products = {
            1: {"id": 1, "name": "Laptop", "price": 999.99, "stock": 50},
            2: {"id": 2, "name": "Mouse", "price": 29.99, "stock": 200}
        }

    def GetProduct(self, request, context):
        product = self.products.get(request.product_id)
        if not product:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details('Product not found')
            return product_pb2.ProductResponse()

        return product_pb2.ProductResponse(**product)

    def ListProducts(self, request, context):
        for product in self.products.values():
            yield product_pb2.ProductResponse(**product)

    def CreateProduct(self, request, context):
        product_id = max(self.products.keys()) + 1
        product = {
            "id": product_id,
            "name": request.name,
            "price": request.price,
            "stock": request.stock
        }
        self.products[product_id] = product
        return product_pb2.ProductResponse(**product)

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    product_pb2_grpc.add_ProductServiceServicer_to_server(
        ProductServicer(), server
    )
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()

if __name__ == '__main__':
    serve()

GraphQL - Flexible Query Language

GraphQL allows clients to request exactly the data they need, avoiding over-fetching and under-fetching.

GraphQL Benefits
  • Single endpoint for all queries
  • Client specifies exact fields needed
  • Reduces network requests (aggregate data from multiple sources)
  • Strong typing and introspection
GraphQL Schema
type Product {
    id: ID!
    name: String!
    price: Float!
    stock: Int!
    reviews: [Review!]!
}

type Review {
    id: ID!
    rating: Int!
    comment: String
    user: User!
}

type User {
    id: ID!
    name: String!
    email: String!
}

type Query {
    product(id: ID!): Product
    products: [Product!]!
}

type Mutation {
    createProduct(name: String!, price: Float!, stock: Int!): Product!
    updateProduct(id: ID!, name: String, price: Float, stock: Int): Product!
}
GraphQL Query Example
# Client query - request only needed fields
query {
    product(id: "1") {
        name
        price
        reviews {
            rating
            comment
            user {
                name
            }
        }
    }
}

# Response - exactly what was requested
{
    "data": {
        "product": {
            "name": "Laptop",
            "price": 999.99,
            "reviews": [
                {
                    "rating": 5,
                    "comment": "Great laptop!",
                    "user": {"name": "John Doe"}
                }
            ]
        }
    }
}

Asynchronous Messaging Patterns

Message queues, event streams, and pub/sub for decoupled, scalable communication.

Message Queues - RabbitMQ

Message Queues enable asynchronous communication where services send messages to a queue, and other services consume them when ready.

Why Message Queues

  • Decouple producers and consumers
  • Handle traffic spikes with buffering
  • Retry failed messages automatically
  • Load balancing across multiple consumers

Queue Patterns:

  • Work Queue: Multiple workers process tasks from a shared queue
  • Publish/Subscribe: Message broadcast to all subscribers
  • Routing: Messages routed based on keys
  • Topics: Pattern-based message routing
Python - RabbitMQ Producer
import pika
import json

class OrderPublisher:
    def __init__(self):
        self.connection = pika.BlockingConnection(
            pika.ConnectionParameters('localhost')
        )
        self.channel = self.connection.channel()

        # Declare exchange and queue
        self.channel.exchange_declare(
            exchange='orders',
            exchange_type='direct',
            durable=True
        )
        self.channel.queue_declare(queue='order_processing', durable=True)
        self.channel.queue_bind(
            exchange='orders',
            queue='order_processing',
            routing_key='new_order'
        )

    def publish_order(self, order_data):
        message = json.dumps(order_data)
        self.channel.basic_publish(
            exchange='orders',
            routing_key='new_order',
            body=message,
            properties=pika.BasicProperties(
                delivery_mode=2,  # Make message persistent
                content_type='application/json'
            )
        )
        print(f"Published order: {order_data['order_id']}")

    def close(self):
        self.connection.close()

# Usage
publisher = OrderPublisher()
publisher.publish_order({
    "order_id": "ORD-12345",
    "user_id": "USER-789",
    "items": [{"product_id": 1, "quantity": 2}],
    "total": 1999.98
})
Python - RabbitMQ Consumer
import pika
import json

class OrderProcessor:
    def __init__(self):
        self.connection = pika.BlockingConnection(
            pika.ConnectionParameters('localhost')
        )
        self.channel = self.connection.channel()
        self.channel.queue_declare(queue='order_processing', durable=True)

        # Fair dispatch - don't give more than 1 message to a worker at a time
        self.channel.basic_qos(prefetch_count=1)

    def process_order(self, order_data):
        print(f"Processing order: {order_data['order_id']}")
        # Business logic here
        # - Validate inventory
        # - Process payment
        # - Schedule shipping
        return True

    def callback(self, ch, method, properties, body):
        order_data = json.loads(body)
        try:
            success = self.process_order(order_data)
            if success:
                ch.basic_ack(delivery_tag=method.delivery_tag)
        except Exception as e:
            print(f"Error processing order: {e}")
            # Negative acknowledgment - requeue the message
            ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)

    def start_consuming(self):
        self.channel.basic_consume(
            queue='order_processing',
            on_message_callback=self.callback
        )
        print("Waiting for orders...")
        self.channel.start_consuming()

# Usage
processor = OrderProcessor()
processor.start_consuming()

Event Streaming - Apache Kafka

Kafka is a distributed event streaming platform for high-throughput, fault-tolerant message processing.

Feature RabbitMQ Kafka
Model Message Queue Event Log
Throughput 20K msgs/sec 1M+ msgs/sec
Message Retention Deleted after consumption Retained for configured time
Ordering Queue level Partition level
Use Case Task queues, RPC Event sourcing, log aggregation, stream processing
Python - Kafka Producer
from kafka import KafkaProducer
import json
from datetime import datetime

class EventPublisher:
    def __init__(self):
        self.producer = KafkaProducer(
            bootstrap_servers=['localhost:9092'],
            value_serializer=lambda v: json.dumps(v).encode('utf-8'),
            key_serializer=lambda k: k.encode('utf-8') if k else None
        )

    def publish_event(self, topic, key, event_data):
        # Add metadata
        event_data['timestamp'] = datetime.utcnow().isoformat()

        future = self.producer.send(topic, key=key, value=event_data)

        # Block for synchronous send
        record_metadata = future.get(timeout=10)
        print(f"Event sent to {record_metadata.topic} partition {record_metadata.partition} offset {record_metadata.offset}")

    def close(self):
        self.producer.flush()
        self.producer.close()

# Usage
publisher = EventPublisher()

# Publish order created event
publisher.publish_event(
    topic='order-events',
    key='ORD-12345',  # Key determines partition
    event_data={
        "event_type": "OrderCreated",
        "order_id": "ORD-12345",
        "user_id": "USER-789",
        "total": 1999.98
    }
)

# Publish payment processed event
publisher.publish_event(
    topic='payment-events',
    key='ORD-12345',
    event_data={
        "event_type": "PaymentProcessed",
        "order_id": "ORD-12345",
        "amount": 1999.98,
        "payment_method": "credit_card"
    }
)
Python - Kafka Consumer
from kafka import KafkaConsumer
import json

class EventConsumer:
    def __init__(self, topic, group_id):
        self.consumer = KafkaConsumer(
            topic,
            bootstrap_servers=['localhost:9092'],
            group_id=group_id,
            value_deserializer=lambda m: json.loads(m.decode('utf-8')),
            auto_offset_reset='earliest',  # Start from beginning if no offset
            enable_auto_commit=True
        )

    def process_event(self, event_data):
        event_type = event_data.get('event_type')

        if event_type == 'OrderCreated':
            print(f"New order: {event_data['order_id']}")
            # Trigger inventory reservation
            # Send confirmation email

        elif event_type == 'PaymentProcessed':
            print(f"Payment confirmed for: {event_data['order_id']}")
            # Update order status
            # Trigger shipping

    def start_consuming(self):
        print("Waiting for events...")
        for message in self.consumer:
            print(f"Received from partition {message.partition} offset {message.offset}")
            self.process_event(message.value)

# Usage - multiple consumers in same group for load balancing
consumer = EventConsumer('order-events', group_id='order-processor-group')
consumer.start_consuming()

Pub/Sub Pattern

The Publish/Subscribe pattern allows multiple subscribers to receive the same message.

Netflix Event-Driven Architecture

  • Use Case: User watches a movie → trigger multiple actions
  • Publisher: Video Streaming Service publishes "VideoWatched" event
  • Subscribers:
    • Recommendation Service → Update user preferences
    • Analytics Service → Track viewing metrics
    • Billing Service → Update watch time for subscription
    • Continue Watching Service → Update progress
  • Benefits: Services are decoupled, can add new subscribers without changing publisher

Hands-On: Building Communication Patterns

Complete, production-ready code examples you can run and modify.

Example 1: REST API with JWT Authentication

Python - Authenticated REST API
from flask import Flask, jsonify, request
from flask_jwt_extended import JWTManager, create_access_token, jwt_required, get_jwt_identity
from functools import wraps

app = Flask(__name__)
app.config['JWT_SECRET_KEY'] = 'your-secret-key'  # Change in production
jwt = JWTManager(app)

# Mock database
users = {
    "user1": {"password": "pass123", "role": "admin"},
    "user2": {"password": "pass456", "role": "user"}
}

products = [
    {"id": 1, "name": "Laptop", "price": 999.99},
    {"id": 2, "name": "Mouse", "price": 29.99}
]

@app.route('/login', methods=['POST'])
def login():
    username = request.json.get('username')
    password = request.json.get('password')

    user = users.get(username)
    if not user or user['password'] != password:
        return jsonify({"error": "Invalid credentials"}), 401

    access_token = create_access_token(
        identity=username,
        additional_claims={"role": user['role']}
    )
    return jsonify({"access_token": access_token})

@app.route('/products', methods=['GET'])
@jwt_required()
def get_products():
    current_user = get_jwt_identity()
    return jsonify({
        "user": current_user,
        "products": products
    })

@app.route('/products', methods=['POST'])
@jwt_required()
def create_product():
    # Admin-only endpoint
    from flask_jwt_extended import get_jwt
    claims = get_jwt()
    if claims.get('role') != 'admin':
        return jsonify({"error": "Admin access required"}), 403

    data = request.json
    new_product = {
        "id": len(products) + 1,
        "name": data['name'],
        "price": data['price']
    }
    products.append(new_product)
    return jsonify(new_product), 201

if __name__ == '__main__':
    app.run(debug=True, port=5000)

# Usage Example:
# 1. Login: POST http://localhost:5000/login
#    Body: {"username": "user1", "password": "pass123"}
#    Response: {"access_token": "eyJ0eXAi..."}
#
# 2. Get Products: GET http://localhost:5000/products
#    Headers: Authorization: Bearer eyJ0eXAi...
#
# 3. Create Product: POST http://localhost:5000/products
#    Headers: Authorization: Bearer eyJ0eXAi...
#    Body: {"name": "Keyboard", "price": 79.99}

Example 2: gRPC Bi-directional Streaming

chat.proto - Streaming Chat Service
syntax = "proto3";

package chat;

service ChatService {
    // Bi-directional streaming for real-time chat
    rpc Chat(stream ChatMessage) returns (stream ChatMessage);

    // Server streaming for chat history
    rpc GetHistory(HistoryRequest) returns (stream ChatMessage);
}

message ChatMessage {
    string user_id = 1;
    string message = 2;
    int64 timestamp = 3;
    string room_id = 4;
}

message HistoryRequest {
    string room_id = 1;
    int32 limit = 2;
}
Python - gRPC Streaming Server
import grpc
from concurrent import futures
import chat_pb2
import chat_pb2_grpc
import time
from collections import defaultdict

class ChatServicer(chat_pb2_grpc.ChatServiceServicer):
    def __init__(self):
        self.rooms = defaultdict(list)  # room_id -> [messages]
        self.active_streams = defaultdict(list)  # room_id -> [response_streams]

    def Chat(self, request_iterator, context):
        """Bi-directional streaming for real-time chat"""
        room_id = None
        response_queue = []

        for message in request_iterator:
            room_id = message.room_id

            # Store message
            self.rooms[room_id].append(message)

            # Broadcast to all clients in this room
            for stream_queue in self.active_streams[room_id]:
                stream_queue.append(message)

            # Add current client's stream
            if response_queue not in self.active_streams[room_id]:
                self.active_streams[room_id].append(response_queue)

            # Yield messages for this client
            while response_queue:
                yield response_queue.pop(0)

    def GetHistory(self, request, context):
        """Server streaming for chat history"""
        room_id = request.room_id
        limit = request.limit or 100

        messages = self.rooms[room_id][-limit:]
        for message in messages:
            yield message

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    chat_pb2_grpc.add_ChatServiceServicer_to_server(ChatServicer(), server)
    server.add_insecure_port('[::]:50051')
    print("gRPC Chat Server started on port 50051")
    server.start()
    server.wait_for_termination()

if __name__ == '__main__':
    serve()

Example 3: GraphQL Server with Mutations

Python - GraphQL Server (Graphene)
from flask import Flask
from flask_graphql import GraphQLView
import graphene
from datetime import datetime

# Mock database
products_db = [
    {"id": "1", "name": "Laptop", "price": 999.99, "stock": 50},
    {"id": "2", "name": "Mouse", "price": 29.99, "stock": 200}
]

class Product(graphene.ObjectType):
    id = graphene.ID()
    name = graphene.String()
    price = graphene.Float()
    stock = graphene.Int()

class Query(graphene.ObjectType):
    product = graphene.Field(Product, id=graphene.ID(required=True))
    products = graphene.List(Product)
    search_products = graphene.List(Product, name=graphene.String())

    def resolve_product(self, info, id):
        return next((p for p in products_db if p['id'] == id), None)

    def resolve_products(self, info):
        return products_db

    def resolve_search_products(self, info, name):
        return [p for p in products_db if name.lower() in p['name'].lower()]

class CreateProduct(graphene.Mutation):
    class Arguments:
        name = graphene.String(required=True)
        price = graphene.Float(required=True)
        stock = graphene.Int(required=True)

    product = graphene.Field(Product)

    def mutate(self, info, name, price, stock):
        new_id = str(len(products_db) + 1)
        product = {"id": new_id, "name": name, "price": price, "stock": stock}
        products_db.append(product)
        return CreateProduct(product=product)

class UpdateProduct(graphene.Mutation):
    class Arguments:
        id = graphene.ID(required=True)
        name = graphene.String()
        price = graphene.Float()
        stock = graphene.Int()

    product = graphene.Field(Product)

    def mutate(self, info, id, name=None, price=None, stock=None):
        product = next((p for p in products_db if p['id'] == id), None)
        if not product:
            raise Exception("Product not found")

        if name:
            product['name'] = name
        if price:
            product['price'] = price
        if stock is not None:
            product['stock'] = stock

        return UpdateProduct(product=product)

class Mutation(graphene.ObjectType):
    create_product = CreateProduct.Field()
    update_product = UpdateProduct.Field()

schema = graphene.Schema(query=Query, mutation=Mutation)

app = Flask(__name__)
app.add_url_rule(
    '/graphql',
    view_func=GraphQLView.as_view('graphql', schema=schema, graphiql=True)
)

if __name__ == '__main__':
    app.run(debug=True, port=5000)
    # Access GraphiQL at http://localhost:5000/graphql

# Example Queries:
# query {
#   products {
#     id name price stock
#   }
# }
#
# mutation {
#   createProduct(name: "Keyboard", price: 79.99, stock: 100) {
#     product { id name price }
#   }
# }

Example 4: Production RabbitMQ with Retry Logic

Python - RabbitMQ with Dead Letter Queue
import pika
import json
import time
from functools import wraps

class RobustRabbitMQConsumer:
    def __init__(self):
        self.connection = None
        self.channel = None
        self.setup_connection()
        self.setup_queues()

    def setup_connection(self):
        """Setup connection with retry logic"""
        max_retries = 5
        for i in range(max_retries):
            try:
                self.connection = pika.BlockingConnection(
                    pika.ConnectionParameters(
                        'localhost',
                        heartbeat=600,
                        blocked_connection_timeout=300
                    )
                )
                self.channel = self.connection.channel()
                print("Connected to RabbitMQ")
                return
            except pika.exceptions.AMQPConnectionError:
                if i < max_retries - 1:
                    time.sleep(2 ** i)  # Exponential backoff
                else:
                    raise

    def setup_queues(self):
        """Setup main queue, retry queue, and dead letter queue"""
        # Dead Letter Exchange for failed messages
        self.channel.exchange_declare(
            exchange='dlx',
            exchange_type='direct',
            durable=True
        )

        # Dead Letter Queue - messages that failed all retries
        self.channel.queue_declare(
            queue='failed_orders',
            durable=True
        )
        self.channel.queue_bind(
            exchange='dlx',
            queue='failed_orders',
            routing_key='failed'
        )

        # Main queue with DLX
        self.channel.queue_declare(
            queue='orders',
            durable=True,
            arguments={
                'x-dead-letter-exchange': 'dlx',
                'x-dead-letter-routing-key': 'failed'
            }
        )

        # Fair dispatch
        self.channel.basic_qos(prefetch_count=1)

    def process_order(self, order_data):
        """Business logic - may fail"""
        print(f"Processing order: {order_data['order_id']}")

        # Simulate processing
        if order_data.get('should_fail'):
            raise Exception("Payment gateway timeout")

        # Success
        print(f"Order {order_data['order_id']} processed successfully")
        return True

    def callback(self, ch, method, properties, body):
        """Message handler with retry logic"""
        order_data = json.loads(body)

        # Track retry count
        headers = properties.headers or {}
        retry_count = headers.get('x-retry-count', 0)
        max_retries = 3

        try:
            self.process_order(order_data)
            ch.basic_ack(delivery_tag=method.delivery_tag)

        except Exception as e:
            print(f"Error processing order: {e}")

            if retry_count < max_retries:
                # Retry with exponential backoff
                retry_count += 1
                print(f"Retrying... (attempt {retry_count}/{max_retries})")

                # Republish with updated retry count
                ch.basic_publish(
                    exchange='',
                    routing_key='orders',
                    body=body,
                    properties=pika.BasicProperties(
                        delivery_mode=2,
                        headers={'x-retry-count': retry_count}
                    )
                )
                ch.basic_ack(delivery_tag=method.delivery_tag)

            else:
                # Max retries exceeded - send to DLQ
                print(f"Max retries exceeded for order {order_data['order_id']}")
                ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)

    def start(self):
        """Start consuming messages"""
        self.channel.basic_consume(
            queue='orders',
            on_message_callback=self.callback
        )
        print("Waiting for orders. Press CTRL+C to exit")
        try:
            self.channel.start_consuming()
        except KeyboardInterrupt:
            self.channel.stop_consuming()
        finally:
            if self.connection:
                self.connection.close()

if __name__ == '__main__':
    consumer = RobustRabbitMQConsumer()
    consumer.start()
Try It Yourself
  1. Install dependencies: pip install flask flask-jwt-extended grpcio graphene-flask pika kafka-python
  2. Run RabbitMQ: docker run -p 5672:5672 -p 15672:15672 rabbitmq:management
  3. Run Kafka: docker-compose up -d kafka zookeeper
  4. Test each example and modify parameters to see how they behave

Practice Exercises

Build these projects to master service communication patterns.

Exercise 1: Build a Multi-Protocol API Gateway

Goal: Create an API Gateway that routes requests to backend services using different protocols.

Requirements:

  1. Accept HTTP REST requests from clients
  2. Route product queries to a gRPC service
  3. Route user auth to a REST service
  4. Route order submissions to a RabbitMQ queue
  5. Implement rate limiting (100 requests/minute per client)
  6. Add request correlation IDs for tracing

Tech Stack: Node.js/Express for gateway, Python gRPC service, RabbitMQ

Bonus: Add circuit breaker for backend service failures

Exercise 2: Real-Time Notification System

Goal: Build a notification system using event-driven architecture.

Requirements:

  1. Kafka producer publishes events (OrderPlaced, PaymentProcessed, ShippingStarted)
  2. Multiple consumers:
    • Email Service → Sends email notifications
    • SMS Service → Sends SMS for critical events
    • Push Notification Service → Mobile push notifications
    • Analytics Service → Tracks event metrics
  3. Ensure message ordering by user ID
  4. Handle consumer failures gracefully with retry
  5. Implement dead letter queue for failed messages

Tech Stack: Kafka, Python/Java consumers

Bonus: Add event versioning for backward compatibility

Exercise 3: GraphQL Federation for Microservices

Goal: Create a federated GraphQL schema across multiple microservices.

Requirements:

  1. Service 1 (Users): Manages user data
  2. Service 2 (Products): Manages product catalog
  3. Service 3 (Orders): Manages orders, references users and products
  4. Apollo Federation Gateway aggregates schemas
  5. Client can query across services in one request:
    query {
      order(id: "123") {
        id
        user { name email }
        items {
          product { name price }
          quantity
        }
      }
    }

Tech Stack: Apollo Federation, Node.js

Bonus: Add authentication and authorization per service

Solution Hints:

  • Use @key directive to mark entities that can be extended
  • Use @extends and @external for cross-service references
  • Implement __resolveReference for entity resolution
  • Gateway handles query planning and execution across services

Protocol Comparison & Decision Guide

Quick reference for choosing the right communication pattern.

Complete Protocol Comparison

Protocol Performance Browser Support Use Case Pros Cons
REST Good Native Public APIs, CRUD operations Simple, universal, human-readable Over-fetching, multiple round-trips
gRPC Excellent Requires gRPC-Web Internal microservices, real-time Fast, streaming, code generation Complex setup, binary format
GraphQL Good Native Mobile apps, complex data needs Flexible queries, no over-fetching Complexity, caching challenges
RabbitMQ Good (20K/sec) N/A Task queues, RPC Reliable, routing flexibility Message size limits, persistence overhead
Kafka Excellent (1M+/sec) N/A Event streaming, log aggregation High throughput, replay events Complex setup, higher latency
WebSocket Good Native Real-time apps, chat, gaming Bi-directional, low latency Stateful, scaling challenges

Decision Tree: Choosing the Right Pattern

Decision Framework

  1. Need real-time bi-directional communication?
    • YES → Use WebSocket or gRPC streaming
    • NO → Continue...
  2. Internal service-to-service communication?
    • YES → Use gRPC (high performance) or REST (simplicity)
    • NO (external/public) → Continue...
  3. Need flexible data fetching for mobile/web?
    • YES → Use GraphQL
    • NO → Continue...
  4. Asynchronous processing needed?
    • YES → Continue to messaging decision...
    • NO → Use REST
  5. High throughput event streaming?
    • YES (100K+ events/sec) → Use Kafka
    • NO (task queues, <20K/sec) → Use RabbitMQ or AWS SQS

Best Practices

Idempotency

Same request = same result. Critical for retries and message reprocessing.

Implementation: Use idempotency keys in request headers

@app.route('/payments', methods=['POST'])
def process_payment():
    idempotency_key = request.headers.get('Idempotency-Key')
    if cache.get(idempotency_key):
        return cache.get(idempotency_key)  # Return cached response

    result = charge_payment(request.json)
    cache.set(idempotency_key, result, ttl=86400)  # Cache for 24 hours
    return result
Correlation IDs

Track requests across multiple services for debugging and monitoring.

Implementation: Pass correlation ID in headers across all service calls

import uuid
from flask import request

@app.before_request
def add_correlation_id():
    correlation_id = request.headers.get('X-Correlation-ID') or str(uuid.uuid4())
    g.correlation_id = correlation_id

@app.route('/products/')
def get_product(id):
    # Log with correlation ID
    logger.info(f"[{g.correlation_id}] Fetching product {id}")

    # Pass to downstream services
    response = requests.get(
        'http://inventory-service/check',
        headers={'X-Correlation-ID': g.correlation_id}
    )
Timeout Configuration

Always set timeouts to prevent hanging requests.

Service Type Recommended Timeout
Database queries 5-10 seconds
Microservice calls 2-5 seconds
External APIs 10-30 seconds
Message queue processing 30-60 seconds

Real-World Architecture Examples

Netflix Communication Architecture

  • Client to API: REST + GraphQL (mobile uses GraphQL for flexible queries)
  • Service-to-Service: gRPC for low-latency internal calls
  • Event Processing: Kafka for user activity streams (billions of events/day)
  • Caching: EVCache (memcached) to reduce backend calls
  • Scale: 1000+ microservices, 100B+ API calls/day

Shopify Communication Patterns

  • Public API: GraphQL for merchant apps and integrations
  • Internal Services: REST + gRPC hybrid
  • Background Jobs: Kafka + Sidekiq (Ruby)
  • Real-time: WebSocket for live order notifications
  • Scale: 1M+ merchants, peak 80K requests/second
Common Anti-Patterns
  • Synchronous Cascade: Service A → B → C → D in sync. Use async for non-critical paths.
  • No Timeout: Requests hang forever. Always set timeouts.
  • Ignoring Backpressure: Producer overwhelms consumer. Use rate limiting and queue depth monitoring.
  • Large Payload Sync Calls: Sending MBs via REST. Use streaming or async file transfer.