Design a Rate Limiter
Small surface, deep trade-offs — the classic 'do you actually understand distributed state?' question.
You're asked: "Design a rate limiter for our API." Clients get N requests per time window; anything over the limit is rejected.
This question is beloved by interviewers because it looks like a single function — allow(request) → yes/no — but the path from one box to a fleet forces you through algorithms, atomicity, and the consistency-vs-availability trade-off in miniature. It's the rare question where you can demonstrate distributed-systems judgment in 35 minutes.
The arc to walk: clarify what we're limiting and where the limiter sits → compare the classic algorithms and pick one with reasons → build it on one box → distribute it (here's where the real marks are) → handle scale and failure. Don't skip ahead; each stage earns the next.
01Requirements: what, where, and how we say no
Before any algorithm talk, pin down three things out loud.
1. What are we limiting on — the "key"?
- Per user ID: fair, but only works for authenticated traffic.
- Per API key: the standard for public APIs (Stripe, GitHub) — each customer gets a quota.
- Per IP: the fallback for anonymous traffic — but beware, a university NAT or mobile carrier can put thousands of users behind one IP.
Real systems layer these: per-API-key limits for customers plus a coarse per-IP limit to blunt anonymous abuse. Say this out loud: "The limit key is a product decision — I'll assume per-API-key with a per-IP fallback, and design so the key is pluggable."
2. Where does the limiter sit?
- Client-side: trivially bypassed — never the enforcement point (fine as a courtesy to reduce wasted calls).
- In each API server: works, but couples limiting logic to every service and dies if you have many services.
- At the API gateway: the standard answer. One choke point all traffic already flows through; services behind it stay clean.
3. What happens on rejection? Return HTTP 429 Too Many Requests with a Retry-After header (seconds until the client may retry) and ideally X-RateLimit-Limit / -Remaining / -Reset headers on every response. Well-behaved clients use these to back off; without Retry-After they hammer you in a tight retry loop and make the overload worse.
Non-functional bar: the limiter is on the critical path of every request, so its own latency budget is tiny — think < 1–2 ms — and it must not become the new single point of failure.
The contract of a polite rate limiter: reject with HTTP 429, tell the client exactly when to come back via Retry-After, and expose X-RateLimit-Remaining so clients can self-throttle before hitting the wall. Naming the headers signals you've used real APIs.
The identity a counter is attached to: user ID, API key, or IP. Per-key is fair but needs auth; per-IP catches anonymous abuse but punishes users behind shared NATs. Production systems layer multiple keys.
Checkpoint
Why is the API gateway the standard home for a rate limiter?
A client gets a 429 with no Retry-After header. What typically happens?