API Design & REST/gRPC | LIZIU System Design

Why API Design Matters

Why This Matters

The Problem: Modern applications are built from dozens or hundreds of services that must communicate reliably. Poorly designed APIs lead to tight coupling, breaking changes, and frustrated developers.

The Solution: Well-designed APIs provide a stable contract between services, enabling teams to build, deploy, and scale independently.

Real Impact: Stripe's API is often cited as the gold standard -- its consistent, intuitive design helped the company grow to process billions in payments.

Real-World Analogy

Think of an API like a restaurant menu:

Menu items = API endpoints (what you can request)
Order format = Request schema (how you ask for it)
Kitchen = Server (processes your request)
Waiter = HTTP protocol (carries requests and responses)
Receipt = Response with status code (confirmation of what happened)

Core Qualities of Good APIs

Consistency

Endpoints follow predictable naming conventions, error formats, and pagination patterns across the entire API surface.

Discoverability

Developers can explore the API intuitively. Resource names, query parameters, and relationships are self-documenting.

Backward Compatibility

New features can be added without breaking existing clients. Versioning strategies protect consumers from disruptive changes.

Performance

Efficient serialization, proper caching headers, and minimal over-fetching keep latency low and throughput high.

REST API Principles

REST (Representational State Transfer) is an architectural style that uses standard HTTP methods to operate on resources identified by URLs. Roy Fielding defined it in his 2000 doctoral dissertation, and it remains the most widely used API paradigm for web services.

The Six REST Constraints

Client-Server

Separate the user interface from the data storage. The client and server evolve independently as long as the interface stays the same.

Stateless

Each request contains all information needed to process it. The server stores no client context between requests.

Cacheable

Responses must define themselves as cacheable or not. Proper caching eliminates redundant interactions and improves scalability.

Uniform Interface

Resources are identified by URIs, manipulated through representations, and messages are self-descriptive with HATEOAS links.

Layered System

A client cannot tell whether it is connected to the end server or an intermediary. Layers enable load balancers, caches, and gateways.

Code on Demand (Optional)

Servers can extend client functionality by transferring executable code, such as JavaScript applets.

REST Endpoint Pattern

rest_api.py

from flask import Flask, jsonify, request, abort

app = Flask(__name__)

# In-memory store for demonstration
users = {}
next_id = 1


@app.route('/api/v1/users', methods=['GET'])
def list_users():
    """List all users with pagination."""
    page = request.args.get('page', 1, type=int)
    per_page = request.args.get('per_page', 20, type=int)

    all_users = list(users.values())
    start = (page - 1) * per_page
    end = start + per_page

    return jsonify({
        'data': all_users[start:end],
        'meta': {
            'page': page,
            'per_page': per_page,
            'total': len(all_users)
        }
    }), 200


@app.route('/api/v1/users', methods=['POST'])
def create_user():
    """Create a new user."""
    global next_id
    data = request.get_json()

    if not data or 'name' not in data:
        abort(400, description='Name is required')

    user = {
        'id': next_id,
        'name': data['name'],
        'email': data.get('email', '')
    }
    users[next_id] = user
    next_id += 1

    return jsonify({'data': user}), 201


@app.route('/api/v1/users/<int:user_id>', methods=['GET'])
def get_user(user_id):
    """Retrieve a single user by ID."""
    user = users.get(user_id)
    if not user:
        abort(404, description='User not found')
    return jsonify({'data': user}), 200


@app.route('/api/v1/users/<int:user_id>', methods=['PUT'])
def update_user(user_id):
    """Update an existing user."""
    if user_id not in users:
        abort(404, description='User not found')

    data = request.get_json()
    users[user_id].update({
        'name': data.get('name', users[user_id]['name']),
        'email': data.get('email', users[user_id]['email'])
    })
    return jsonify({'data': users[user_id]}), 200


@app.route('/api/v1/users/<int:user_id>', methods=['DELETE'])
def delete_user(user_id):
    """Delete a user."""
    if user_id not in users:
        abort(404, description='User not found')
    del users[user_id]
    return '', 204

HTTP Methods & Status Codes

HTTP Methods (Verbs)

Method	Purpose	Idempotent	Safe	Example
GET	Retrieve a resource	Yes	Yes	`GET /users/42`
POST	Create a new resource	No	No	`POST /users`
PUT	Replace a resource entirely	Yes	No	`PUT /users/42`
PATCH	Partially update a resource	No	No	`PATCH /users/42`
DELETE	Remove a resource	Yes	No	`DELETE /users/42`

Status Code Families

Range	Category	Common Codes
2xx	Success	200 OK, 201 Created, 204 No Content
3xx	Redirection	301 Moved Permanently, 304 Not Modified
4xx	Client Error	400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests
5xx	Server Error	500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable

Common Pitfall

Problem: Using 200 OK for everything, even errors.

Solution: Use the correct status code for each response. A 404 tells the client the resource does not exist, a 400 tells them their request was malformed, and a 500 signals an internal server failure. This distinction is critical for client-side error handling and debugging.

gRPC and Protocol Buffers

gRPC is a high-performance, open-source RPC (Remote Procedure Call) framework originally developed by Google. It uses HTTP/2 for transport, Protocol Buffers for serialization, and provides features such as bidirectional streaming, flow control, and pluggable authentication.

When to Use gRPC

Microservice-to-microservice: Low-latency internal communication where browser support is not needed
Streaming data: Real-time feeds, chat, or sensor data where bidirectional streaming shines
Polyglot systems: Auto-generated client libraries for 10+ languages from a single .proto file
Performance-critical paths: Binary serialization is 5-10x smaller and faster than JSON

user_service.proto

syntax = "proto3";

package userservice;

// The User service definition
service UserService {
    // Retrieve a single user
    rpc GetUser (GetUserRequest) returns (User);

    // List users with pagination
    rpc ListUsers (ListUsersRequest) returns (ListUsersResponse);

    // Create a new user
    rpc CreateUser (CreateUserRequest) returns (User);

    // Stream user activity events
    rpc StreamActivity (ActivityRequest) returns (stream ActivityEvent);
}

message GetUserRequest {
    int32 id = 1;
}

message User {
    int32 id = 1;
    string name = 2;
    string email = 3;
    int64 created_at = 4;
}

message ListUsersRequest {
    int32 page = 1;
    int32 per_page = 2;
}

message ListUsersResponse {
    repeated User users = 1;
    int32 total = 2;
}

message CreateUserRequest {
    string name = 1;
    string email = 2;
}

message ActivityRequest {
    int32 user_id = 1;
}

message ActivityEvent {
    string event_type = 1;
    string description = 2;
    int64 timestamp = 3;
}

REST vs gRPC Comparison

Which Should You Choose?

Use REST when you need broad client compatibility (browsers, mobile, third-party developers) and human-readable payloads for debugging. Use gRPC when you need high performance between internal services, real-time streaming, or strict contract enforcement across polyglot teams. Many systems use both: REST for external APIs and gRPC for internal communication.

API Versioning Strategies

APIs evolve over time. Versioning ensures that existing clients continue to work while new features are introduced. There is no single "correct" strategy; each has trade-offs.

Strategy	Example	Pros	Cons
URI Path	`/api/v1/users`	Simple, visible, easy to route	URL changes break caches, not RESTful purists' favorite
Query Parameter	`/api/users?version=1`	Easy to default, optional	Can be missed, harder to route
Header	`Accept: application/vnd.api+json;v=1`	Clean URLs, flexible	Hidden, harder to test in browser
Content Negotiation	`Accept: application/vnd.company.v2+json`	Most RESTful approach	Complex, rarely used in practice

Industry Practice

URI path versioning (/api/v1/) is the most common approach used by companies like Stripe, Twilio, and GitHub. It is simple, explicit, and works well with API gateways and documentation tools.

Rate Limiting APIs

Rate limiting protects your API from abuse and ensures fair usage across all clients. Without it, a single misbehaving client can overwhelm your servers and degrade the experience for everyone else.

Common Algorithms

Token Bucket

Tokens are added at a fixed rate. Each request consumes a token. Allows bursts up to the bucket size, then throttles. Used by AWS and Stripe.

Sliding Window

Counts requests in a rolling time window. Smoother than fixed windows. Avoids boundary spikes but uses more memory to track timestamps.

Fixed Window

Counts requests in fixed time intervals (e.g., per minute). Simple to implement but can allow 2x the limit at window boundaries.

Rate Limit Headers

Good APIs communicate rate limit status via response headers:

X-RateLimit-Limit: Maximum requests allowed in the window
X-RateLimit-Remaining: Requests remaining in the current window
X-RateLimit-Reset: Timestamp when the window resets
Retry-After: Seconds to wait before retrying (sent with 429 responses)

Practice Problems

Medium Design a RESTful API

Design a RESTful API for an e-commerce product catalog:

Define endpoints for products (CRUD), categories, and reviews
Include pagination, filtering by category, and sorting by price
Design proper error responses with consistent format

Use nouns for resources (/products, /categories), nest related resources (/products/:id/reviews), and keep query parameters for filtering and sorting.

# Products
GET    /api/v1/products                    # List (paginated)
GET    /api/v1/products?category=electronics&sort=price_asc
POST   /api/v1/products                    # Create
GET    /api/v1/products/:id                # Read
PUT    /api/v1/products/:id                # Update
DELETE /api/v1/products/:id                # Delete

# Categories
GET    /api/v1/categories
GET    /api/v1/categories/:id/products     # Products in category

# Reviews (nested under products)
GET    /api/v1/products/:id/reviews
POST   /api/v1/products/:id/reviews

# Consistent error format
{
  "error": {
    "code": "PRODUCT_NOT_FOUND",
    "message": "Product with ID 42 does not exist",
    "status": 404
  }
}

Medium REST to gRPC Migration

You have a REST API handling 10,000 RPS between two internal microservices. The JSON payloads are large and latency is becoming a bottleneck.

Write a .proto file that replaces the existing REST endpoints
Identify which endpoints benefit most from streaming
Plan a migration strategy that avoids downtime

Map each REST verb+path to an RPC method. Use server-side streaming for list endpoints that return large datasets. Run both REST and gRPC in parallel during migration.

# 1. The migration plan:
# Phase 1: Define .proto, generate clients
# Phase 2: Add gRPC server alongside REST
# Phase 3: Migrate consumers one by one
# Phase 4: Deprecate REST after full migration

# 2. Streaming candidates:
# - ListOrders -> server-side stream for large results
# - RealTimeUpdates -> bidirectional stream

# 3. Dual-serve pattern (Python):
import grpc
from concurrent import futures

def serve():
    # gRPC server on port 50051
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    add_OrderServiceServicer_to_server(OrderServicer(), server)
    server.add_insecure_port('[::]:50051')
    server.start()

    # REST server on port 8080 (keep running)
    app.run(port=8080)

    server.wait_for_termination()

Hard Rate Limiter Design

Design a distributed rate limiter for a multi-region API:

Support per-user and per-API-key limits
Handle 100K+ requests per second across 3 regions
Decide between strict consistency and eventual consistency

Consider using Redis with a sliding window algorithm. For multi-region, you can use local counters with periodic synchronization (eventual consistency) or a centralized Redis cluster (strict consistency with higher latency).

import redis
import time

class SlidingWindowRateLimiter:
    def __init__(self, redis_client, limit, window_seconds):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, key):
        now = time.time()
        window_start = now - self.window
        pipe = self.redis.pipeline()

        # Remove old entries
        pipe.zremrangebyscore(key, 0, window_start)
        # Count current entries
        pipe.zcard(key)
        # Add current request
        pipe.zadd(key, {str(now): now})
        # Set TTL
        pipe.expire(key, self.window)

        results = pipe.execute()
        current_count = results[1]

        return current_count < self.limit

# Usage: 100 requests per minute per user
limiter = SlidingWindowRateLimiter(
    redis.Redis(), limit=100, window_seconds=60
)
if limiter.is_allowed(f"rate:user:{user_id}"):
    process_request()
else:
    return 429, "Too Many Requests"

Quick Reference

API Design Checklist

Aspect	Best Practice	Example
Resource Naming	Use plural nouns, lowercase	`/api/v1/users`
Nesting	Max 2 levels deep	`/users/:id/orders`
Pagination	Cursor-based for large datasets	`?cursor=abc&limit=20`
Filtering	Query parameters	`?status=active&role=admin`
Versioning	URI path versioning	`/api/v1/`
Error Format	Consistent JSON with code, message	`{"error": {"code": "NOT_FOUND"}}`
Auth	Bearer token in Authorization header	`Authorization: Bearer <token>`

gRPC Quick Reference

Key gRPC Concepts

Unary RPC: Single request, single response (like a normal function call)
Server Streaming: Single request, stream of responses (e.g., downloading a large result set)
Client Streaming: Stream of requests, single response (e.g., uploading chunks)
Bidirectional Streaming: Both sides stream simultaneously (e.g., real-time chat)
Deadlines: Client sets maximum time for a call; gRPC enforces it across the call chain
Interceptors: Middleware for logging, auth, and metrics (like HTTP middleware)