Design a Video Streaming Platform

Problem Statement & Requirements

Why Video Streaming Is the Ultimate Scale Problem

The Challenge: Video is the largest category of internet traffic, accounting for over 80% of all consumer bandwidth. Serving video at scale requires solving upload, processing, storage, and delivery simultaneously.

Real Scale: YouTube serves over 1 billion hours of video per day. Netflix delivers 17% of all downstream internet traffic in North America. TikTok processes 500 hours of new video uploads per minute.

Real-World Analogy

Think of a video streaming platform like a global TV broadcasting network:

Upload = A filmmaker submitting their raw footage to the studio
Transcoding = The studio converting the film to different formats (4K, HD, SD)
CDN = Regional broadcast towers distributing the signal locally
Adaptive bitrate = The TV automatically switching to a lower quality when signal is weak
Recommendations = The TV guide suggesting shows based on what you have watched

Functional Requirements

Video Upload

Creators upload videos up to 10GB. Support resumable uploads for reliability on poor connections.

Video Playback

Viewers stream videos with adaptive quality. Support seeking, pause/resume, and playback speed control.

Search & Discovery

Full-text search on titles, descriptions, and tags. Personalized recommendations and trending content.

Social Features

Like/dislike, comments, subscriptions, and share. View count tracking in real time.

Video Upload Pipeline

Video Upload and Processing Pipeline

video_upload.py

import boto3
import hashlib

class VideoUploadService:
    """
    Handles chunked, resumable video uploads.

    Flow:
    1. Client requests upload URL (pre-signed S3 multipart)
    2. Client uploads in 5MB chunks
    3. On complete: trigger transcoding pipeline
    """

    def __init__(self):
        self.s3 = boto3.client('s3')
        self.BUCKET = 'raw-videos'
        self.CHUNK_SIZE = 5 * 1024 * 1024  # 5MB

    def initiate_upload(self, video_id: str,
                         filename: str) -> dict:
        """Start a multipart upload and return upload_id."""
        key = f"uploads/{video_id}/{filename}"
        response = self.s3.create_multipart_upload(
            Bucket=self.BUCKET, Key=key,
            ContentType='video/mp4'
        )
        return {
            "upload_id": response["UploadId"],
            "video_id": video_id,
            "chunk_size": self.CHUNK_SIZE
        }

    def get_chunk_url(self, video_id: str,
                      upload_id: str,
                      part_number: int) -> str:
        """Generate a pre-signed URL for uploading one chunk."""
        return self.s3.generate_presigned_url(
            'upload_part',
            Params={
                'Bucket': self.BUCKET,
                'Key': f"uploads/{video_id}/video.mp4",
                'UploadId': upload_id,
                'PartNumber': part_number
            },
            ExpiresIn=3600  # 1 hour
        )

    def complete_upload(self, video_id: str, upload_id: str):
        """Finalize upload and trigger transcoding."""
        # Complete multipart upload in S3
        self.s3.complete_multipart_upload(...)
        # Publish job to transcoding queue
        queue.publish("video-transcode", {
            "video_id": video_id,
            "source": f"s3://raw-videos/uploads/{video_id}/"
        })

Video Transcoding & Encoding

Why Transcoding Is Essential

A single uploaded video must be converted into multiple formats and resolutions to serve viewers on different devices and network conditions. A 10-minute 4K video at 60fps might be 2GB raw but needs to be available as 360p (for slow mobile), 720p, 1080p, and 4K versions.

Resolution	Bitrate	10-min File Size	Target Device
360p	800 Kbps	~60 MB	Mobile on 3G
480p	1.5 Mbps	~110 MB	Mobile on LTE
720p	3 Mbps	~225 MB	Tablet / laptop
1080p	6 Mbps	~450 MB	Desktop / Smart TV
4K	15 Mbps	~1.1 GB	4K TV / high-bandwidth

transcode_worker.py

import subprocess
import json

class TranscodeWorker:
    """
    Video transcoding worker using FFmpeg.
    Produces multiple resolutions + HLS segments.
    """

    PROFILES = [
        {"name": "360p",  "width": 640,  "height": 360,  "bitrate": "800k"},
        {"name": "480p",  "width": 854,  "height": 480,  "bitrate": "1500k"},
        {"name": "720p",  "width": 1280, "height": 720,  "bitrate": "3000k"},
        {"name": "1080p", "width": 1920, "height": 1080, "bitrate": "6000k"},
    ]

    def transcode(self, source_path: str,
                   output_dir: str, video_id: str):
        """Transcode video into multiple HLS streams."""
        for profile in self.PROFILES:
            output = f"{output_dir}/{profile['name']}"
            cmd = [
                "ffmpeg", "-i", source_path,
                "-vf", f"scale={profile['width']}:{profile['height']}",
                "-b:v", profile["bitrate"],
                "-codec:v", "h264",
                "-codec:a", "aac",
                "-hls_time", "6",       # 6-second segments
                "-hls_list_size", "0",  # Keep all segments
                "-f", "hls",
                f"{output}/playlist.m3u8"
            ]
            subprocess.run(cmd, check=True)

        # Generate master playlist
        self._create_master_playlist(output_dir, video_id)

Adaptive Bitrate Streaming (HLS/DASH)

Adaptive Bitrate Streaming Flow

master_playlist.m3u8

#EXTM3U
#EXT-X-VERSION:3

# 360p variant
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8

# 480p variant
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=854x480
480p/playlist.m3u8

# 720p variant
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
720p/playlist.m3u8

# 1080p variant
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
1080p/playlist.m3u8

# The player downloads this master playlist first,
# then selects the appropriate quality level and
# fetches 6-second .ts segments from that stream.

HLS vs DASH

HLS (HTTP Live Streaming): Developed by Apple. Uses .m3u8 playlists and .ts segments. Native iOS/Safari support. Industry standard.
DASH (Dynamic Adaptive Streaming over HTTP): Open standard (ISO). Uses .mpd manifests and .m4s segments. Better codec flexibility.
In practice: Most platforms support both. YouTube uses DASH. Netflix uses both. HLS has wider device support.

CDN Delivery Architecture

Global CDN Delivery Architecture

CDN Caching Strategy for Video

Not all video segments are created equal. The first few segments of a video are accessed far more often than later segments (many viewers drop off early). A tiered caching strategy optimizes this:

First 30 seconds: Cached at all edge PoPs (highest priority)
Rest of popular videos: Cached at regional PoPs
Long tail videos: Served from origin, cached on demand
Live content: Pushed to edge proactively before viewers request it

Video Metadata & Search

Metadata Storage Design

Video metadata (MySQL): title, description, upload_date, duration, view_count, channel_id
Search index (Elasticsearch): Full-text on title + description + tags. Autocomplete via edge-ngram tokenizer.
View counts (Redis): Atomic increments for real-time counts. Periodically flushed to MySQL.
Comments (Cassandra): Partitioned by video_id, sorted by timestamp. High write throughput.

Recommendation Engine Overview

How YouTube Recommendations Work

YouTube's recommendation system uses a two-stage architecture:

Candidate Generation: From millions of videos, narrow to ~1,000 candidates based on user history, subscriptions, and collaborative filtering.
Ranking: Score the 1,000 candidates using a deep neural network that considers watch time prediction, freshness, diversity, and creator engagement signals.
Re-ranking: Apply business rules (content policies, freshness boost, diversity) to produce the final recommendation list.

recommendation_overview.py

class RecommendationPipeline:
    """
    Simplified recommendation pipeline.
    In production, each stage is a separate microservice.
    """

    def get_recommendations(self, user_id: str,
                             count: int = 20) -> list:
        # Stage 1: Candidate generation (fast, broad)
        candidates = []
        candidates += self._from_watch_history(user_id, 200)
        candidates += self._from_subscriptions(user_id, 200)
        candidates += self._collaborative_filter(user_id, 200)
        candidates += self._trending_videos(100)

        # Deduplicate and filter already-watched
        candidates = self._deduplicate(candidates)
        candidates = self._filter_watched(user_id, candidates)

        # Stage 2: Ranking (slower, precise)
        scored = []
        for video in candidates:
            score = self._predict_watch_time(user_id, video)
            scored.append((score, video))
        scored.sort(reverse=True)

        # Stage 3: Re-ranking (business rules)
        final = self._apply_diversity(scored[:count * 2])
        final = self._apply_content_policy(final)
        return final[:count]

    def _predict_watch_time(self, user_id, video) -> float:
        """ML model predicts expected watch time (in seconds)."""
        features = {
            "user_history": get_user_embedding(user_id),
            "video_features": get_video_embedding(video.id),
            "freshness": video.age_hours,
            "channel_affinity": get_affinity(user_id, video.channel_id),
        }
        return self.model.predict(features)

Practice Problems

Medium Live Streaming

Extend the video platform to support live streaming. A creator broadcasts in real-time to thousands or millions of concurrent viewers.

How does the video pipeline change for live content?
What is the acceptable latency for live streams?
How do you scale to millions of concurrent viewers?

Live streaming replaces the upload pipeline with an ingest pipeline (RTMP). Transcoding happens in real-time. CDN is critical -- popular streams should be pushed to edge before viewers request them.

# Live Streaming Architecture

# Ingest: Creator -> RTMP -> Ingest Server
# Transcode: Real-time FFmpeg (GPU-accelerated)
# Segment: 2-second chunks (vs 6s for VOD) for lower latency
# CDN: Push segments to edge proactively

# Latency tiers:
# - Standard: 10-30s (HLS, wide compatibility)
# - Low latency: 2-5s (LL-HLS, CMAF)
# - Ultra-low: < 1s (WebRTC, limited scale)

# Scale: 1M viewers on a single stream
# - Origin serves to ~50 CDN edge PoPs
# - Each PoP serves ~20K viewers
# - Total origin bandwidth: 50 * 6Mbps = 300 Mbps
# - CDN edge bandwidth: 1M * 3Mbps = 3 Tbps (handled by CDN)

Hard Video Content Moderation

Design an automated content moderation system that scans uploaded videos for policy violations (violence, nudity, copyright) before making them publicly available.

At what point in the pipeline do you run moderation?
How do you handle the compute cost of analyzing video frames?
How do you balance speed vs accuracy?

Run moderation after transcoding but before publishing. Sample frames (1 per second) instead of analyzing every frame. Use a multi-stage pipeline: fast classifiers for obvious cases, deeper analysis for borderline content.

# Video Moderation Pipeline
# Runs after transcoding, before CDN publish

# Stage 1: Frame sampling (1 frame/sec)
# Stage 2: Fast classifier on each frame
#   - Nudity detection (CV model)
#   - Violence detection (CV model)
#   - Text overlay extraction (OCR)
# Stage 3: Audio analysis
#   - Speech-to-text for hate speech detection
#   - Audio fingerprint for copyright (like Shazam)
# Stage 4: Decision
#   - All pass -> publish immediately
#   - Any flag with confidence > 0.9 -> reject
#   - Borderline -> queue for human review

# SLA: 95% of videos processed within 30 minutes
# Human review SLA: 4 hours for borderline content

Hard Cost Optimization

Video storage and CDN bandwidth are the largest cost drivers. Design a strategy to reduce costs by 40% without impacting user experience.

What percentage of videos are never watched after upload?
How can you reduce transcoding costs?
How do you optimize CDN costs for long-tail content?

The Pareto principle applies heavily: ~10% of videos generate ~90% of views. Lazy transcoding (only transcode when first requested) can save enormous compute. Storage tiering (S3 -> Glacier) for old, unpopular content.

# Cost Optimization Strategy

# 1. Lazy transcoding
# - On upload: only create 480p + 720p
# - 360p and 1080p: transcode on first request
# - 4K: only for videos with > 1000 views
# Savings: ~50% transcoding compute

# 2. Storage tiering
# - Hot (< 30 days, > 10 views): S3 Standard
# - Warm (30-90 days): S3 IA (40% cheaper)
# - Cold (> 90 days, < 1 view/month): S3 Glacier
# - Delete raw source after 7 days
# Savings: ~35% storage costs

# 3. CDN optimization
# - Only cache popular videos at edge
# - Long-tail: serve from regional origin
# - Negotiate committed-use discounts with CDN
# Savings: ~25% CDN costs

Quick Reference

Video Streaming Design Summary

Component	Technology	Rationale
Upload	S3 Multipart Upload	Resumable, handles large files
Transcoding	FFmpeg on EC2/K8s	Industry standard, GPU-accelerated
Streaming	HLS + DASH	Adaptive bitrate, wide compatibility
CDN	CloudFront / Akamai	Global edge delivery, high cache hit
Metadata	MySQL + Elasticsearch	Structured data + full-text search
View Counts	Redis + Kafka	Real-time counts, async aggregation
Recommendations	TensorFlow + Feature Store	Two-stage: candidate gen + ranking

Key Takeaways

Interview Tips

Separate the upload path from the viewing path -- they have very different requirements
Video transcoding is CPU-intensive and must be handled asynchronously with a job queue
Adaptive bitrate streaming (HLS/DASH) is essential -- explain how it adapts to network conditions
CDN is the hero of video delivery: 95%+ of requests should be served from edge
Storage costs dominate: discuss tiering, lazy transcoding, and lifecycle policies
Mention the recommendation engine even briefly -- it drives 70%+ of views on YouTube