Distributed Key-Value Store
Dynamo in five moves — the most conceptual case you'll ever be asked.
You're asked: "Design a distributed key-value store — something like DynamoDB, Cassandra, or Redis Cluster." Just two operations: put(key, value) and get(key). Sounds trivial. It is the deepest question in the interview canon.
Why interviewers love it: there's no product to hide behind. No URLs, no feeds, no images — only the raw physics of distributed systems. Partitioning, replication, consistency, failure. Every senior-level concept shows up in its purest form.
How to play it: this round is won with vocabulary plus one worked example each. You don't need to have built Cassandra. You need to say "consistent hashing", draw the ring, say "quorum", and walk one concrete N=3, W=2, R=2 scenario without stumbling. This case gives you exactly that ammunition.
Two morals to carry out of this case — say them out loud in the room:
- Consistency is a dial, not a switch.
- W + R > N is one inequality worth memorizing.
01Requirements: two functions, brutal promises
Start by pinning down the API — it's tiny, so do it fast and spend your time on the promises behind it.
Functional
put(key, value)— store a value (values are opaque blobs, say up to 1 MB)get(key)— return the latest value- (Clarify) no range scans, no transactions, no secondary indexes — pure key-value. Say this out loud: every feature you exclude is complexity you don't have to design.
Non-functional — this is where the design lives
- Low latency: p99 reads/writes under ~10 ms. This rules out anything that touches disk synchronously on the read path more than necessary.
- Durability: an acknowledged
putmust survive a machine dying one second later. "It was in RAM" is not an answer. - Scale: say 100 TB of data, 1M ops/sec. Numbers first, boxes second.
- High availability: the store keeps taking reads and writes even while machines are failing — Dynamo-family systems famously choose "always writable".
Why one box fails — say both reasons:
- Capacity: 100 TB doesn't fit on one machine, and 1M ops/sec doesn't fit on one NIC or one CPU. Vertical scaling has a ceiling and a brutal price curve.
- Single point of failure: one machine = one power supply away from 100% data unavailability. Durability and availability both demand copies on multiple machines.
So before drawing anything, announce the two problems you must solve: "I need to split the data across machines (partitioning) and copy it across machines (replication). Let me take those one at a time." That sentence is your roadmap for the next 30 minutes — interviewers relax when they hear it.
Once a write is acknowledged, it survives crashes — which means it reached persistent storage (or multiple machines) *before* the ack, not after. The ack is a promise; durability is what backs it.
Any component whose failure takes the whole system down. The fix is always redundancy — but redundancy of *data* (replication) immediately raises the question of keeping copies in sync, which is where the real design begins.
Checkpoint
Why can't one very large machine serve this workload, even a $500K one?