Problem Statement & Requirements
Why Video Streaming Is the Ultimate Scale Problem
The Challenge: Video is the largest category of internet traffic, accounting for over 80% of all consumer bandwidth. Serving video at scale requires solving upload, processing, storage, and delivery simultaneously.
Real Scale: YouTube serves over 1 billion hours of video per day. Netflix delivers 17% of all downstream internet traffic in North America. TikTok processes 500 hours of new video uploads per minute.
Real-World Analogy
Think of a video streaming platform like a global TV broadcasting network:
- Upload = A filmmaker submitting their raw footage to the studio
- Transcoding = The studio converting the film to different formats (4K, HD, SD)
- CDN = Regional broadcast towers distributing the signal locally
- Adaptive bitrate = The TV automatically switching to a lower quality when signal is weak
- Recommendations = The TV guide suggesting shows based on what you have watched
Functional Requirements
Video Upload
Creators upload videos up to 10GB. Support resumable uploads for reliability on poor connections.
Video Playback
Viewers stream videos with adaptive quality. Support seeking, pause/resume, and playback speed control.
Search & Discovery
Full-text search on titles, descriptions, and tags. Personalized recommendations and trending content.
Social Features
Like/dislike, comments, subscriptions, and share. View count tracking in real time.
Video Upload Pipeline
import boto3
import hashlib
class VideoUploadService:
"""
Handles chunked, resumable video uploads.
Flow:
1. Client requests upload URL (pre-signed S3 multipart)
2. Client uploads in 5MB chunks
3. On complete: trigger transcoding pipeline
"""
def __init__(self):
self.s3 = boto3.client('s3')
self.BUCKET = 'raw-videos'
self.CHUNK_SIZE = 5 * 1024 * 1024 # 5MB
def initiate_upload(self, video_id: str,
filename: str) -> dict:
"""Start a multipart upload and return upload_id."""
key = f"uploads/{video_id}/{filename}"
response = self.s3.create_multipart_upload(
Bucket=self.BUCKET, Key=key,
ContentType='video/mp4'
)
return {
"upload_id": response["UploadId"],
"video_id": video_id,
"chunk_size": self.CHUNK_SIZE
}
def get_chunk_url(self, video_id: str,
upload_id: str,
part_number: int) -> str:
"""Generate a pre-signed URL for uploading one chunk."""
return self.s3.generate_presigned_url(
'upload_part',
Params={
'Bucket': self.BUCKET,
'Key': f"uploads/{video_id}/video.mp4",
'UploadId': upload_id,
'PartNumber': part_number
},
ExpiresIn=3600 # 1 hour
)
def complete_upload(self, video_id: str, upload_id: str):
"""Finalize upload and trigger transcoding."""
# Complete multipart upload in S3
self.s3.complete_multipart_upload(...)
# Publish job to transcoding queue
queue.publish("video-transcode", {
"video_id": video_id,
"source": f"s3://raw-videos/uploads/{video_id}/"
})
Video Transcoding & Encoding
Why Transcoding Is Essential
A single uploaded video must be converted into multiple formats and resolutions to serve viewers on different devices and network conditions. A 10-minute 4K video at 60fps might be 2GB raw but needs to be available as 360p (for slow mobile), 720p, 1080p, and 4K versions.
| Resolution | Bitrate | 10-min File Size | Target Device |
|---|---|---|---|
| 360p | 800 Kbps | ~60 MB | Mobile on 3G |
| 480p | 1.5 Mbps | ~110 MB | Mobile on LTE |
| 720p | 3 Mbps | ~225 MB | Tablet / laptop |
| 1080p | 6 Mbps | ~450 MB | Desktop / Smart TV |
| 4K | 15 Mbps | ~1.1 GB | 4K TV / high-bandwidth |
import subprocess
import json
class TranscodeWorker:
"""
Video transcoding worker using FFmpeg.
Produces multiple resolutions + HLS segments.
"""
PROFILES = [
{"name": "360p", "width": 640, "height": 360, "bitrate": "800k"},
{"name": "480p", "width": 854, "height": 480, "bitrate": "1500k"},
{"name": "720p", "width": 1280, "height": 720, "bitrate": "3000k"},
{"name": "1080p", "width": 1920, "height": 1080, "bitrate": "6000k"},
]
def transcode(self, source_path: str,
output_dir: str, video_id: str):
"""Transcode video into multiple HLS streams."""
for profile in self.PROFILES:
output = f"{output_dir}/{profile['name']}"
cmd = [
"ffmpeg", "-i", source_path,
"-vf", f"scale={profile['width']}:{profile['height']}",
"-b:v", profile["bitrate"],
"-codec:v", "h264",
"-codec:a", "aac",
"-hls_time", "6", # 6-second segments
"-hls_list_size", "0", # Keep all segments
"-f", "hls",
f"{output}/playlist.m3u8"
]
subprocess.run(cmd, check=True)
# Generate master playlist
self._create_master_playlist(output_dir, video_id)
Adaptive Bitrate Streaming (HLS/DASH)
#EXTM3U
#EXT-X-VERSION:3
# 360p variant
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8
# 480p variant
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480
480p/playlist.m3u8
# 720p variant
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
720p/playlist.m3u8
# 1080p variant
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
# The player downloads this master playlist first,
# then selects the appropriate quality level and
# fetches 6-second .ts segments from that stream.
HLS vs DASH
- HLS (HTTP Live Streaming): Developed by Apple. Uses .m3u8 playlists and .ts segments. Native iOS/Safari support. Industry standard.
- DASH (Dynamic Adaptive Streaming over HTTP): Open standard (ISO). Uses .mpd manifests and .m4s segments. Better codec flexibility.
- In practice: Most platforms support both. YouTube uses DASH. Netflix uses both. HLS has wider device support.
CDN Delivery Architecture
CDN Caching Strategy for Video
Not all video segments are created equal. The first few segments of a video are accessed far more often than later segments (many viewers drop off early). A tiered caching strategy optimizes this:
- First 30 seconds: Cached at all edge PoPs (highest priority)
- Rest of popular videos: Cached at regional PoPs
- Long tail videos: Served from origin, cached on demand
- Live content: Pushed to edge proactively before viewers request it
Video Metadata & Search
Metadata Storage Design
- Video metadata (MySQL): title, description, upload_date, duration, view_count, channel_id
- Search index (Elasticsearch): Full-text on title + description + tags. Autocomplete via edge-ngram tokenizer.
- View counts (Redis): Atomic increments for real-time counts. Periodically flushed to MySQL.
- Comments (Cassandra): Partitioned by video_id, sorted by timestamp. High write throughput.
Recommendation Engine Overview
How YouTube Recommendations Work
YouTube's recommendation system uses a two-stage architecture:
- Candidate Generation: From millions of videos, narrow to ~1,000 candidates based on user history, subscriptions, and collaborative filtering.
- Ranking: Score the 1,000 candidates using a deep neural network that considers watch time prediction, freshness, diversity, and creator engagement signals.
- Re-ranking: Apply business rules (content policies, freshness boost, diversity) to produce the final recommendation list.
class RecommendationPipeline:
"""
Simplified recommendation pipeline.
In production, each stage is a separate microservice.
"""
def get_recommendations(self, user_id: str,
count: int = 20) -> list:
# Stage 1: Candidate generation (fast, broad)
candidates = []
candidates += self._from_watch_history(user_id, 200)
candidates += self._from_subscriptions(user_id, 200)
candidates += self._collaborative_filter(user_id, 200)
candidates += self._trending_videos(100)
# Deduplicate and filter already-watched
candidates = self._deduplicate(candidates)
candidates = self._filter_watched(user_id, candidates)
# Stage 2: Ranking (slower, precise)
scored = []
for video in candidates:
score = self._predict_watch_time(user_id, video)
scored.append((score, video))
scored.sort(reverse=True)
# Stage 3: Re-ranking (business rules)
final = self._apply_diversity(scored[:count * 2])
final = self._apply_content_policy(final)
return final[:count]
def _predict_watch_time(self, user_id, video) -> float:
"""ML model predicts expected watch time (in seconds)."""
features = {
"user_history": get_user_embedding(user_id),
"video_features": get_video_embedding(video.id),
"freshness": video.age_hours,
"channel_affinity": get_affinity(user_id, video.channel_id),
}
return self.model.predict(features)
Practice Problems
Medium Live Streaming
Extend the video platform to support live streaming. A creator broadcasts in real-time to thousands or millions of concurrent viewers.
- How does the video pipeline change for live content?
- What is the acceptable latency for live streams?
- How do you scale to millions of concurrent viewers?
Live streaming replaces the upload pipeline with an ingest pipeline (RTMP). Transcoding happens in real-time. CDN is critical -- popular streams should be pushed to edge before viewers request them.
# Live Streaming Architecture
# Ingest: Creator -> RTMP -> Ingest Server
# Transcode: Real-time FFmpeg (GPU-accelerated)
# Segment: 2-second chunks (vs 6s for VOD) for lower latency
# CDN: Push segments to edge proactively
# Latency tiers:
# - Standard: 10-30s (HLS, wide compatibility)
# - Low latency: 2-5s (LL-HLS, CMAF)
# - Ultra-low: < 1s (WebRTC, limited scale)
# Scale: 1M viewers on a single stream
# - Origin serves to ~50 CDN edge PoPs
# - Each PoP serves ~20K viewers
# - Total origin bandwidth: 50 * 6Mbps = 300 Mbps
# - CDN edge bandwidth: 1M * 3Mbps = 3 Tbps (handled by CDN)
Hard Video Content Moderation
Design an automated content moderation system that scans uploaded videos for policy violations (violence, nudity, copyright) before making them publicly available.
- At what point in the pipeline do you run moderation?
- How do you handle the compute cost of analyzing video frames?
- How do you balance speed vs accuracy?
Run moderation after transcoding but before publishing. Sample frames (1 per second) instead of analyzing every frame. Use a multi-stage pipeline: fast classifiers for obvious cases, deeper analysis for borderline content.
# Video Moderation Pipeline
# Runs after transcoding, before CDN publish
# Stage 1: Frame sampling (1 frame/sec)
# Stage 2: Fast classifier on each frame
# - Nudity detection (CV model)
# - Violence detection (CV model)
# - Text overlay extraction (OCR)
# Stage 3: Audio analysis
# - Speech-to-text for hate speech detection
# - Audio fingerprint for copyright (like Shazam)
# Stage 4: Decision
# - All pass -> publish immediately
# - Any flag with confidence > 0.9 -> reject
# - Borderline -> queue for human review
# SLA: 95% of videos processed within 30 minutes
# Human review SLA: 4 hours for borderline content
Hard Cost Optimization
Video storage and CDN bandwidth are the largest cost drivers. Design a strategy to reduce costs by 40% without impacting user experience.
- What percentage of videos are never watched after upload?
- How can you reduce transcoding costs?
- How do you optimize CDN costs for long-tail content?
The Pareto principle applies heavily: ~10% of videos generate ~90% of views. Lazy transcoding (only transcode when first requested) can save enormous compute. Storage tiering (S3 -> Glacier) for old, unpopular content.
# Cost Optimization Strategy
# 1. Lazy transcoding
# - On upload: only create 480p + 720p
# - 360p and 1080p: transcode on first request
# - 4K: only for videos with > 1000 views
# Savings: ~50% transcoding compute
# 2. Storage tiering
# - Hot (< 30 days, > 10 views): S3 Standard
# - Warm (30-90 days): S3 IA (40% cheaper)
# - Cold (> 90 days, < 1 view/month): S3 Glacier
# - Delete raw source after 7 days
# Savings: ~35% storage costs
# 3. CDN optimization
# - Only cache popular videos at edge
# - Long-tail: serve from regional origin
# - Negotiate committed-use discounts with CDN
# Savings: ~25% CDN costs
Quick Reference
Video Streaming Design Summary
| Component | Technology | Rationale |
|---|---|---|
| Upload | S3 Multipart Upload | Resumable, handles large files |
| Transcoding | FFmpeg on EC2/K8s | Industry standard, GPU-accelerated |
| Streaming | HLS + DASH | Adaptive bitrate, wide compatibility |
| CDN | CloudFront / Akamai | Global edge delivery, high cache hit |
| Metadata | MySQL + Elasticsearch | Structured data + full-text search |
| View Counts | Redis + Kafka | Real-time counts, async aggregation |
| Recommendations | TensorFlow + Feature Store | Two-stage: candidate gen + ranking |
Key Takeaways
Interview Tips
- Separate the upload path from the viewing path -- they have very different requirements
- Video transcoding is CPU-intensive and must be handled asynchronously with a job queue
- Adaptive bitrate streaming (HLS/DASH) is essential -- explain how it adapts to network conditions
- CDN is the hero of video delivery: 95%+ of requests should be served from edge
- Storage costs dominate: discuss tiering, lazy transcoding, and lifecycle policies
- Mention the recommendation engine even briefly -- it drives 70%+ of views on YouTube