Design a Video Streaming Platform

Hard 35 min read

Problem Statement & Requirements

Why Video Streaming Is the Ultimate Scale Problem

The Challenge: Video is the largest category of internet traffic, accounting for over 80% of all consumer bandwidth. Serving video at scale requires solving upload, processing, storage, and delivery simultaneously.

Real Scale: YouTube serves over 1 billion hours of video per day. Netflix delivers 17% of all downstream internet traffic in North America. TikTok processes 500 hours of new video uploads per minute.

Real-World Analogy

Think of a video streaming platform like a global TV broadcasting network:

  • Upload = A filmmaker submitting their raw footage to the studio
  • Transcoding = The studio converting the film to different formats (4K, HD, SD)
  • CDN = Regional broadcast towers distributing the signal locally
  • Adaptive bitrate = The TV automatically switching to a lower quality when signal is weak
  • Recommendations = The TV guide suggesting shows based on what you have watched

Functional Requirements

Video Upload

Creators upload videos up to 10GB. Support resumable uploads for reliability on poor connections.

Video Playback

Viewers stream videos with adaptive quality. Support seeking, pause/resume, and playback speed control.

Search & Discovery

Full-text search on titles, descriptions, and tags. Personalized recommendations and trending content.

Social Features

Like/dislike, comments, subscriptions, and share. View count tracking in real time.

Video Upload Pipeline

Video Upload and Processing Pipeline
Creator Uploads video Upload Service Chunked upload Resumable Raw Storage S3 / GCS Job Queue Kafka / SQS Transcoding Farm FFmpeg workers Multiple resolutions HLS/DASH segments Processed Storage S3 + manifests CDN CloudFront / Akamai Thumbnail Gen + metadata extract Metadata DB Chunks Store raw Trigger
video_upload.py
import boto3
import hashlib

class VideoUploadService:
    """
    Handles chunked, resumable video uploads.

    Flow:
    1. Client requests upload URL (pre-signed S3 multipart)
    2. Client uploads in 5MB chunks
    3. On complete: trigger transcoding pipeline
    """

    def __init__(self):
        self.s3 = boto3.client('s3')
        self.BUCKET = 'raw-videos'
        self.CHUNK_SIZE = 5 * 1024 * 1024  # 5MB

    def initiate_upload(self, video_id: str,
                         filename: str) -> dict:
        """Start a multipart upload and return upload_id."""
        key = f"uploads/{video_id}/{filename}"
        response = self.s3.create_multipart_upload(
            Bucket=self.BUCKET, Key=key,
            ContentType='video/mp4'
        )
        return {
            "upload_id": response["UploadId"],
            "video_id": video_id,
            "chunk_size": self.CHUNK_SIZE
        }

    def get_chunk_url(self, video_id: str,
                      upload_id: str,
                      part_number: int) -> str:
        """Generate a pre-signed URL for uploading one chunk."""
        return self.s3.generate_presigned_url(
            'upload_part',
            Params={
                'Bucket': self.BUCKET,
                'Key': f"uploads/{video_id}/video.mp4",
                'UploadId': upload_id,
                'PartNumber': part_number
            },
            ExpiresIn=3600  # 1 hour
        )

    def complete_upload(self, video_id: str, upload_id: str):
        """Finalize upload and trigger transcoding."""
        # Complete multipart upload in S3
        self.s3.complete_multipart_upload(...)
        # Publish job to transcoding queue
        queue.publish("video-transcode", {
            "video_id": video_id,
            "source": f"s3://raw-videos/uploads/{video_id}/"
        })

Video Transcoding & Encoding

Why Transcoding Is Essential

A single uploaded video must be converted into multiple formats and resolutions to serve viewers on different devices and network conditions. A 10-minute 4K video at 60fps might be 2GB raw but needs to be available as 360p (for slow mobile), 720p, 1080p, and 4K versions.

Resolution Bitrate 10-min File Size Target Device
360p 800 Kbps ~60 MB Mobile on 3G
480p 1.5 Mbps ~110 MB Mobile on LTE
720p 3 Mbps ~225 MB Tablet / laptop
1080p 6 Mbps ~450 MB Desktop / Smart TV
4K 15 Mbps ~1.1 GB 4K TV / high-bandwidth
transcode_worker.py
import subprocess
import json

class TranscodeWorker:
    """
    Video transcoding worker using FFmpeg.
    Produces multiple resolutions + HLS segments.
    """

    PROFILES = [
        {"name": "360p",  "width": 640,  "height": 360,  "bitrate": "800k"},
        {"name": "480p",  "width": 854,  "height": 480,  "bitrate": "1500k"},
        {"name": "720p",  "width": 1280, "height": 720,  "bitrate": "3000k"},
        {"name": "1080p", "width": 1920, "height": 1080, "bitrate": "6000k"},
    ]

    def transcode(self, source_path: str,
                   output_dir: str, video_id: str):
        """Transcode video into multiple HLS streams."""
        for profile in self.PROFILES:
            output = f"{output_dir}/{profile['name']}"
            cmd = [
                "ffmpeg", "-i", source_path,
                "-vf", f"scale={profile['width']}:{profile['height']}",
                "-b:v", profile["bitrate"],
                "-codec:v", "h264",
                "-codec:a", "aac",
                "-hls_time", "6",       # 6-second segments
                "-hls_list_size", "0",  # Keep all segments
                "-f", "hls",
                f"{output}/playlist.m3u8"
            ]
            subprocess.run(cmd, check=True)

        # Generate master playlist
        self._create_master_playlist(output_dir, video_id)

Adaptive Bitrate Streaming (HLS/DASH)

Adaptive Bitrate Streaming Flow
Video Player Monitors bandwidth Selects quality CDN Edge Serves segments from nearest PoP 1080p / 6 Mbps 720p / 3 Mbps 480p / 1.5 Mbps 360p / 800 Kbps ABR Algorithm 1. Measure download speed 2. Check buffer level 3. Select highest quality that won't cause rebuffer 4. Request next segment 5. Repeat for each segment Goal: maximize quality, zero rebuffering selected
master_playlist.m3u8
#EXTM3U
#EXT-X-VERSION:3

# 360p variant
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8

# 480p variant
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480
480p/playlist.m3u8

# 720p variant
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
720p/playlist.m3u8

# 1080p variant
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
1080p/playlist.m3u8

# The player downloads this master playlist first,
# then selects the appropriate quality level and
# fetches 6-second .ts segments from that stream.

HLS vs DASH

  • HLS (HTTP Live Streaming): Developed by Apple. Uses .m3u8 playlists and .ts segments. Native iOS/Safari support. Industry standard.
  • DASH (Dynamic Adaptive Streaming over HTTP): Open standard (ISO). Uses .mpd manifests and .m4s segments. Better codec flexibility.
  • In practice: Most platforms support both. YouTube uses DASH. Netflix uses both. HLS has wider device support.

CDN Delivery Architecture

Global CDN Delivery Architecture
Origin (S3) All video segments US-East PoP Cache hit: 95% EU-West PoP Cache hit: 92% AP-South PoP Cache hit: 88% SA PoP Cache hit: 85% US Viewers EU Viewers Asia Viewers SA Viewers Cache miss: fetch from origin

CDN Caching Strategy for Video

Not all video segments are created equal. The first few segments of a video are accessed far more often than later segments (many viewers drop off early). A tiered caching strategy optimizes this:

  • First 30 seconds: Cached at all edge PoPs (highest priority)
  • Rest of popular videos: Cached at regional PoPs
  • Long tail videos: Served from origin, cached on demand
  • Live content: Pushed to edge proactively before viewers request it

Metadata Storage Design

  • Video metadata (MySQL): title, description, upload_date, duration, view_count, channel_id
  • Search index (Elasticsearch): Full-text on title + description + tags. Autocomplete via edge-ngram tokenizer.
  • View counts (Redis): Atomic increments for real-time counts. Periodically flushed to MySQL.
  • Comments (Cassandra): Partitioned by video_id, sorted by timestamp. High write throughput.

Recommendation Engine Overview

How YouTube Recommendations Work

YouTube's recommendation system uses a two-stage architecture:

  • Candidate Generation: From millions of videos, narrow to ~1,000 candidates based on user history, subscriptions, and collaborative filtering.
  • Ranking: Score the 1,000 candidates using a deep neural network that considers watch time prediction, freshness, diversity, and creator engagement signals.
  • Re-ranking: Apply business rules (content policies, freshness boost, diversity) to produce the final recommendation list.
recommendation_overview.py
class RecommendationPipeline:
    """
    Simplified recommendation pipeline.
    In production, each stage is a separate microservice.
    """

    def get_recommendations(self, user_id: str,
                             count: int = 20) -> list:
        # Stage 1: Candidate generation (fast, broad)
        candidates = []
        candidates += self._from_watch_history(user_id, 200)
        candidates += self._from_subscriptions(user_id, 200)
        candidates += self._collaborative_filter(user_id, 200)
        candidates += self._trending_videos(100)

        # Deduplicate and filter already-watched
        candidates = self._deduplicate(candidates)
        candidates = self._filter_watched(user_id, candidates)

        # Stage 2: Ranking (slower, precise)
        scored = []
        for video in candidates:
            score = self._predict_watch_time(user_id, video)
            scored.append((score, video))
        scored.sort(reverse=True)

        # Stage 3: Re-ranking (business rules)
        final = self._apply_diversity(scored[:count * 2])
        final = self._apply_content_policy(final)
        return final[:count]

    def _predict_watch_time(self, user_id, video) -> float:
        """ML model predicts expected watch time (in seconds)."""
        features = {
            "user_history": get_user_embedding(user_id),
            "video_features": get_video_embedding(video.id),
            "freshness": video.age_hours,
            "channel_affinity": get_affinity(user_id, video.channel_id),
        }
        return self.model.predict(features)

Practice Problems

Medium Live Streaming

Extend the video platform to support live streaming. A creator broadcasts in real-time to thousands or millions of concurrent viewers.

  1. How does the video pipeline change for live content?
  2. What is the acceptable latency for live streams?
  3. How do you scale to millions of concurrent viewers?

Live streaming replaces the upload pipeline with an ingest pipeline (RTMP). Transcoding happens in real-time. CDN is critical -- popular streams should be pushed to edge before viewers request them.

# Live Streaming Architecture

# Ingest: Creator -> RTMP -> Ingest Server
# Transcode: Real-time FFmpeg (GPU-accelerated)
# Segment: 2-second chunks (vs 6s for VOD) for lower latency
# CDN: Push segments to edge proactively

# Latency tiers:
# - Standard: 10-30s (HLS, wide compatibility)
# - Low latency: 2-5s (LL-HLS, CMAF)
# - Ultra-low: < 1s (WebRTC, limited scale)

# Scale: 1M viewers on a single stream
# - Origin serves to ~50 CDN edge PoPs
# - Each PoP serves ~20K viewers
# - Total origin bandwidth: 50 * 6Mbps = 300 Mbps
# - CDN edge bandwidth: 1M * 3Mbps = 3 Tbps (handled by CDN)

Hard Video Content Moderation

Design an automated content moderation system that scans uploaded videos for policy violations (violence, nudity, copyright) before making them publicly available.

  1. At what point in the pipeline do you run moderation?
  2. How do you handle the compute cost of analyzing video frames?
  3. How do you balance speed vs accuracy?

Run moderation after transcoding but before publishing. Sample frames (1 per second) instead of analyzing every frame. Use a multi-stage pipeline: fast classifiers for obvious cases, deeper analysis for borderline content.

# Video Moderation Pipeline
# Runs after transcoding, before CDN publish

# Stage 1: Frame sampling (1 frame/sec)
# Stage 2: Fast classifier on each frame
#   - Nudity detection (CV model)
#   - Violence detection (CV model)
#   - Text overlay extraction (OCR)
# Stage 3: Audio analysis
#   - Speech-to-text for hate speech detection
#   - Audio fingerprint for copyright (like Shazam)
# Stage 4: Decision
#   - All pass -> publish immediately
#   - Any flag with confidence > 0.9 -> reject
#   - Borderline -> queue for human review

# SLA: 95% of videos processed within 30 minutes
# Human review SLA: 4 hours for borderline content

Hard Cost Optimization

Video storage and CDN bandwidth are the largest cost drivers. Design a strategy to reduce costs by 40% without impacting user experience.

  1. What percentage of videos are never watched after upload?
  2. How can you reduce transcoding costs?
  3. How do you optimize CDN costs for long-tail content?

The Pareto principle applies heavily: ~10% of videos generate ~90% of views. Lazy transcoding (only transcode when first requested) can save enormous compute. Storage tiering (S3 -> Glacier) for old, unpopular content.

# Cost Optimization Strategy

# 1. Lazy transcoding
# - On upload: only create 480p + 720p
# - 360p and 1080p: transcode on first request
# - 4K: only for videos with > 1000 views
# Savings: ~50% transcoding compute

# 2. Storage tiering
# - Hot (< 30 days, > 10 views): S3 Standard
# - Warm (30-90 days): S3 IA (40% cheaper)
# - Cold (> 90 days, < 1 view/month): S3 Glacier
# - Delete raw source after 7 days
# Savings: ~35% storage costs

# 3. CDN optimization
# - Only cache popular videos at edge
# - Long-tail: serve from regional origin
# - Negotiate committed-use discounts with CDN
# Savings: ~25% CDN costs

Quick Reference

Video Streaming Design Summary

Component Technology Rationale
Upload S3 Multipart Upload Resumable, handles large files
Transcoding FFmpeg on EC2/K8s Industry standard, GPU-accelerated
Streaming HLS + DASH Adaptive bitrate, wide compatibility
CDN CloudFront / Akamai Global edge delivery, high cache hit
Metadata MySQL + Elasticsearch Structured data + full-text search
View Counts Redis + Kafka Real-time counts, async aggregation
Recommendations TensorFlow + Feature Store Two-stage: candidate gen + ranking

Key Takeaways

Interview Tips

  • Separate the upload path from the viewing path -- they have very different requirements
  • Video transcoding is CPU-intensive and must be handled asynchronously with a job queue
  • Adaptive bitrate streaming (HLS/DASH) is essential -- explain how it adapts to network conditions
  • CDN is the hero of video delivery: 95%+ of requests should be served from edge
  • Storage costs dominate: discuss tiering, lazy transcoding, and lifecycle policies
  • Mention the recommendation engine even briefly -- it drives 70%+ of views on YouTube