Why API Design Matters
Why This Matters
The Problem: Modern applications are built from dozens or hundreds of services that must communicate reliably. Poorly designed APIs lead to tight coupling, breaking changes, and frustrated developers.
The Solution: Well-designed APIs provide a stable contract between services, enabling teams to build, deploy, and scale independently.
Real Impact: Stripe's API is often cited as the gold standard -- its consistent, intuitive design helped the company grow to process billions in payments.
Real-World Analogy
Think of an API like a restaurant menu:
- Menu items = API endpoints (what you can request)
- Order format = Request schema (how you ask for it)
- Kitchen = Server (processes your request)
- Waiter = HTTP protocol (carries requests and responses)
- Receipt = Response with status code (confirmation of what happened)
Core Qualities of Good APIs
Consistency
Endpoints follow predictable naming conventions, error formats, and pagination patterns across the entire API surface.
Discoverability
Developers can explore the API intuitively. Resource names, query parameters, and relationships are self-documenting.
Backward Compatibility
New features can be added without breaking existing clients. Versioning strategies protect consumers from disruptive changes.
Performance
Efficient serialization, proper caching headers, and minimal over-fetching keep latency low and throughput high.
REST API Principles
REST (Representational State Transfer) is an architectural style that uses standard HTTP methods to operate on resources identified by URLs. Roy Fielding defined it in his 2000 doctoral dissertation, and it remains the most widely used API paradigm for web services.
The Six REST Constraints
Client-Server
Separate the user interface from the data storage. The client and server evolve independently as long as the interface stays the same.
Stateless
Each request contains all information needed to process it. The server stores no client context between requests.
Cacheable
Responses must define themselves as cacheable or not. Proper caching eliminates redundant interactions and improves scalability.
Uniform Interface
Resources are identified by URIs, manipulated through representations, and messages are self-descriptive with HATEOAS links.
Layered System
A client cannot tell whether it is connected to the end server or an intermediary. Layers enable load balancers, caches, and gateways.
Code on Demand (Optional)
Servers can extend client functionality by transferring executable code, such as JavaScript applets.
from flask import Flask, jsonify, request, abort
app = Flask(__name__)
# In-memory store for demonstration
users = {}
next_id = 1
@app.route('/api/v1/users', methods=['GET'])
def list_users():
"""List all users with pagination."""
page = request.args.get('page', 1, type=int)
per_page = request.args.get('per_page', 20, type=int)
all_users = list(users.values())
start = (page - 1) * per_page
end = start + per_page
return jsonify({
'data': all_users[start:end],
'meta': {
'page': page,
'per_page': per_page,
'total': len(all_users)
}
}), 200
@app.route('/api/v1/users', methods=['POST'])
def create_user():
"""Create a new user."""
global next_id
data = request.get_json()
if not data or 'name' not in data:
abort(400, description='Name is required')
user = {
'id': next_id,
'name': data['name'],
'email': data.get('email', '')
}
users[next_id] = user
next_id += 1
return jsonify({'data': user}), 201
@app.route('/api/v1/users/<int:user_id>', methods=['GET'])
def get_user(user_id):
"""Retrieve a single user by ID."""
user = users.get(user_id)
if not user:
abort(404, description='User not found')
return jsonify({'data': user}), 200
@app.route('/api/v1/users/<int:user_id>', methods=['PUT'])
def update_user(user_id):
"""Update an existing user."""
if user_id not in users:
abort(404, description='User not found')
data = request.get_json()
users[user_id].update({
'name': data.get('name', users[user_id]['name']),
'email': data.get('email', users[user_id]['email'])
})
return jsonify({'data': users[user_id]}), 200
@app.route('/api/v1/users/<int:user_id>', methods=['DELETE'])
def delete_user(user_id):
"""Delete a user."""
if user_id not in users:
abort(404, description='User not found')
del users[user_id]
return '', 204
HTTP Methods & Status Codes
HTTP Methods (Verbs)
| Method | Purpose | Idempotent | Safe | Example |
|---|---|---|---|---|
| GET | Retrieve a resource | Yes | Yes | GET /users/42 |
| POST | Create a new resource | No | No | POST /users |
| PUT | Replace a resource entirely | Yes | No | PUT /users/42 |
| PATCH | Partially update a resource | No | No | PATCH /users/42 |
| DELETE | Remove a resource | Yes | No | DELETE /users/42 |
Status Code Families
| Range | Category | Common Codes |
|---|---|---|
| 2xx | Success | 200 OK, 201 Created, 204 No Content |
| 3xx | Redirection | 301 Moved Permanently, 304 Not Modified |
| 4xx | Client Error | 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests |
| 5xx | Server Error | 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable |
Common Pitfall
Problem: Using 200 OK for everything, even errors.
Solution: Use the correct status code for each response. A 404 tells the client the resource does not exist, a 400 tells them their request was malformed, and a 500 signals an internal server failure. This distinction is critical for client-side error handling and debugging.
gRPC and Protocol Buffers
gRPC is a high-performance, open-source RPC (Remote Procedure Call) framework originally developed by Google. It uses HTTP/2 for transport, Protocol Buffers for serialization, and provides features such as bidirectional streaming, flow control, and pluggable authentication.
When to Use gRPC
- Microservice-to-microservice: Low-latency internal communication where browser support is not needed
- Streaming data: Real-time feeds, chat, or sensor data where bidirectional streaming shines
- Polyglot systems: Auto-generated client libraries for 10+ languages from a single .proto file
- Performance-critical paths: Binary serialization is 5-10x smaller and faster than JSON
syntax = "proto3";
package userservice;
// The User service definition
service UserService {
// Retrieve a single user
rpc GetUser (GetUserRequest) returns (User);
// List users with pagination
rpc ListUsers (ListUsersRequest) returns (ListUsersResponse);
// Create a new user
rpc CreateUser (CreateUserRequest) returns (User);
// Stream user activity events
rpc StreamActivity (ActivityRequest) returns (stream ActivityEvent);
}
message GetUserRequest {
int32 id = 1;
}
message User {
int32 id = 1;
string name = 2;
string email = 3;
int64 created_at = 4;
}
message ListUsersRequest {
int32 page = 1;
int32 per_page = 2;
}
message ListUsersResponse {
repeated User users = 1;
int32 total = 2;
}
message CreateUserRequest {
string name = 1;
string email = 2;
}
message ActivityRequest {
int32 user_id = 1;
}
message ActivityEvent {
string event_type = 1;
string description = 2;
int64 timestamp = 3;
}
REST vs gRPC Comparison
Which Should You Choose?
Use REST when you need broad client compatibility (browsers, mobile, third-party developers) and human-readable payloads for debugging. Use gRPC when you need high performance between internal services, real-time streaming, or strict contract enforcement across polyglot teams. Many systems use both: REST for external APIs and gRPC for internal communication.
API Versioning Strategies
APIs evolve over time. Versioning ensures that existing clients continue to work while new features are introduced. There is no single "correct" strategy; each has trade-offs.
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URI Path | /api/v1/users |
Simple, visible, easy to route | URL changes break caches, not RESTful purists' favorite |
| Query Parameter | /api/users?version=1 |
Easy to default, optional | Can be missed, harder to route |
| Header | Accept: application/vnd.api+json;v=1 |
Clean URLs, flexible | Hidden, harder to test in browser |
| Content Negotiation | Accept: application/vnd.company.v2+json |
Most RESTful approach | Complex, rarely used in practice |
Industry Practice
URI path versioning (/api/v1/) is the most common approach used by companies like Stripe, Twilio, and GitHub. It is simple, explicit, and works well with API gateways and documentation tools.
Rate Limiting APIs
Rate limiting protects your API from abuse and ensures fair usage across all clients. Without it, a single misbehaving client can overwhelm your servers and degrade the experience for everyone else.
Common Algorithms
Token Bucket
Tokens are added at a fixed rate. Each request consumes a token. Allows bursts up to the bucket size, then throttles. Used by AWS and Stripe.
Sliding Window
Counts requests in a rolling time window. Smoother than fixed windows. Avoids boundary spikes but uses more memory to track timestamps.
Fixed Window
Counts requests in fixed time intervals (e.g., per minute). Simple to implement but can allow 2x the limit at window boundaries.
Rate Limit Headers
Good APIs communicate rate limit status via response headers:
- X-RateLimit-Limit: Maximum requests allowed in the window
- X-RateLimit-Remaining: Requests remaining in the current window
- X-RateLimit-Reset: Timestamp when the window resets
- Retry-After: Seconds to wait before retrying (sent with 429 responses)
Practice Problems
Medium Design a RESTful API
Design a RESTful API for an e-commerce product catalog:
- Define endpoints for products (CRUD), categories, and reviews
- Include pagination, filtering by category, and sorting by price
- Design proper error responses with consistent format
Use nouns for resources (/products, /categories), nest related resources (/products/:id/reviews), and keep query parameters for filtering and sorting.
# Products
GET /api/v1/products # List (paginated)
GET /api/v1/products?category=electronics&sort=price_asc
POST /api/v1/products # Create
GET /api/v1/products/:id # Read
PUT /api/v1/products/:id # Update
DELETE /api/v1/products/:id # Delete
# Categories
GET /api/v1/categories
GET /api/v1/categories/:id/products # Products in category
# Reviews (nested under products)
GET /api/v1/products/:id/reviews
POST /api/v1/products/:id/reviews
# Consistent error format
{
"error": {
"code": "PRODUCT_NOT_FOUND",
"message": "Product with ID 42 does not exist",
"status": 404
}
}
Medium REST to gRPC Migration
You have a REST API handling 10,000 RPS between two internal microservices. The JSON payloads are large and latency is becoming a bottleneck.
- Write a .proto file that replaces the existing REST endpoints
- Identify which endpoints benefit most from streaming
- Plan a migration strategy that avoids downtime
Map each REST verb+path to an RPC method. Use server-side streaming for list endpoints that return large datasets. Run both REST and gRPC in parallel during migration.
# 1. The migration plan:
# Phase 1: Define .proto, generate clients
# Phase 2: Add gRPC server alongside REST
# Phase 3: Migrate consumers one by one
# Phase 4: Deprecate REST after full migration
# 2. Streaming candidates:
# - ListOrders -> server-side stream for large results
# - RealTimeUpdates -> bidirectional stream
# 3. Dual-serve pattern (Python):
import grpc
from concurrent import futures
def serve():
# gRPC server on port 50051
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
add_OrderServiceServicer_to_server(OrderServicer(), server)
server.add_insecure_port('[::]:50051')
server.start()
# REST server on port 8080 (keep running)
app.run(port=8080)
server.wait_for_termination()
Hard Rate Limiter Design
Design a distributed rate limiter for a multi-region API:
- Support per-user and per-API-key limits
- Handle 100K+ requests per second across 3 regions
- Decide between strict consistency and eventual consistency
Consider using Redis with a sliding window algorithm. For multi-region, you can use local counters with periodic synchronization (eventual consistency) or a centralized Redis cluster (strict consistency with higher latency).
import redis
import time
class SlidingWindowRateLimiter:
def __init__(self, redis_client, limit, window_seconds):
self.redis = redis_client
self.limit = limit
self.window = window_seconds
def is_allowed(self, key):
now = time.time()
window_start = now - self.window
pipe = self.redis.pipeline()
# Remove old entries
pipe.zremrangebyscore(key, 0, window_start)
# Count current entries
pipe.zcard(key)
# Add current request
pipe.zadd(key, {str(now): now})
# Set TTL
pipe.expire(key, self.window)
results = pipe.execute()
current_count = results[1]
return current_count < self.limit
# Usage: 100 requests per minute per user
limiter = SlidingWindowRateLimiter(
redis.Redis(), limit=100, window_seconds=60
)
if limiter.is_allowed(f"rate:user:{user_id}"):
process_request()
else:
return 429, "Too Many Requests"
Quick Reference
API Design Checklist
| Aspect | Best Practice | Example |
|---|---|---|
| Resource Naming | Use plural nouns, lowercase | /api/v1/users |
| Nesting | Max 2 levels deep | /users/:id/orders |
| Pagination | Cursor-based for large datasets | ?cursor=abc&limit=20 |
| Filtering | Query parameters | ?status=active&role=admin |
| Versioning | URI path versioning | /api/v1/ |
| Error Format | Consistent JSON with code, message | {"error": {"code": "NOT_FOUND"}} |
| Auth | Bearer token in Authorization header | Authorization: Bearer <token> |
gRPC Quick Reference
Key gRPC Concepts
- Unary RPC: Single request, single response (like a normal function call)
- Server Streaming: Single request, stream of responses (e.g., downloading a large result set)
- Client Streaming: Stream of requests, single response (e.g., uploading chunks)
- Bidirectional Streaming: Both sides stream simultaneously (e.g., real-time chat)
- Deadlines: Client sets maximum time for a call; gRPC enforces it across the call chain
- Interceptors: Middleware for logging, auth, and metrics (like HTTP middleware)