Design Video Streaming (YouTube-lite)
Two systems in a trench coat — a write-heavy pipeline and a read-heavy edge, joined by a CDN.
You're asked: "Design YouTube." Don't panic — nobody expects YouTube. The interviewer wants to see whether you can spot the question's single biggest structural insight and build around it.
That insight: this is two almost-unrelated systems. The upload path (creators pushing massive files through a processing pipeline — write-heavy, slow, asynchronous, throughput-obsessed) and the watch path (viewers pulling video — read-heavy, latency-obsessed, ~100–1000× the traffic). They share little besides a metadata database and an object store. Say that split in your first two minutes and design each side on its own terms.
The second insight, delivered as a punchline at the right moment: at video scale, your servers barely touch video bytes. The CDN serves the video; your backend serves metadata — small JSON about big files. Most of this design is arranging for that to be true.
01Requirements and the numbers that shape everything
Functional — keep it tight: upload a video, watch a video (with seeking), basic metadata (title, channel), view counts. Explicitly defer search, comments, and recommendations — "each is its own interview." Interviewers respect scoping; flailing at all of YouTube is the classic failure mode here.
Non-functional
- Watch path: startup under ~1–2 s, no buffering, works on flaky mobile networks.
- Upload path: reliable beats fast — a creator tolerates minutes of processing, never a failed 2-hour upload at 95%.
- Availability over consistency everywhere: a video appearing 30 s late is fine; the site being down is not.
Estimation — these numbers drive the whole design. Suppose: 1M video uploads/day, 100M watches/day (a healthy 100:1 read:write ratio).
Storage (the write side):
- An hour of raw 1080p upload ≈ a few GB; after compression to a streaming ladder (we'll build it in stage 3), call it ~1 GB per content-hour across all renditions.
- 1M uploads/day × avg 10 min = ~170K content-hours/day → ~170 TB/day, ~60 PB/year. Verdict you must say out loud: "petabytes per year — no database holds this; video bytes live in object storage, only metadata goes in a database."
Bandwidth (the read side — the scarier number):
- 100M watches/day × avg 5 min watched × ~5 Mbps ≈ 1.5×10¹⁷ bytes/day ≈ ~1.7 Tbps average egress, peaks 2–3× higher.
- Serving multi-Tbps from your own servers is absurd in cost and in physics (distance = latency = buffering). Verdict: "egress is the dominant problem, and a CDN is the only realistic answer — I'll design the watch path around it."
Two sentences of math just dictated the architecture: object storage for the write side, CDN for the read side. Draw the skeleton with the two paths visibly separated and proceed.
Video systems split into a write-heavy ingestion pipeline (slow, async, throughput-oriented) and a read-heavy serving edge (fast, latency-oriented, 100–1000× the traffic). Designing them separately — different storage, scaling, and failure stories — is the structural move of this question.
At streaming scale, outbound bandwidth (multi-Tbps) dwarfs storage and compute as the cost and feasibility constraint. The conclusion is always the same: push the bytes to a CDN at the edge; never serve video from origin at scale.
Checkpoint
The estimation gives ~170 TB/day ingest and ~1.7 Tbps egress. Which architectural conclusions follow?
Why is availability-over-consistency the right call for a video platform?