system design · week 4

Design a Rate Limiter

Small surface, deep trade-offs — the classic 'do you actually understand distributed state?' question.

You're asked: "Design a rate limiter for our API." Clients get N requests per time window; anything over the limit is rejected.

This question is beloved by interviewers because it looks like a single function — allow(request) → yes/no — but the path from one box to a fleet forces you through algorithms, atomicity, and the consistency-vs-availability trade-off in miniature. It's the rare question where you can demonstrate distributed-systems judgment in 35 minutes.

The arc to walk: clarify what we're limiting and where the limiter sits → compare the classic algorithms and pick one with reasons → build it on one box → distribute it (here's where the real marks are) → handle scale and failure. Don't skip ahead; each stage earns the next.

01Requirements: what, where, and how we say no

Before any algorithm talk, pin down three things out loud.

1. What are we limiting on — the "key"?

Per user ID: fair, but only works for authenticated traffic.
Per API key: the standard for public APIs (Stripe, GitHub) — each customer gets a quota.
Per IP: the fallback for anonymous traffic — but beware, a university NAT or mobile carrier can put thousands of users behind one IP.

Real systems layer these: per-API-key limits for customers plus a coarse per-IP limit to blunt anonymous abuse. Say this out loud: "The limit key is a product decision — I'll assume per-API-key with a per-IP fallback, and design so the key is pluggable."

2. Where does the limiter sit?

Client-side: trivially bypassed — never the enforcement point (fine as a courtesy to reduce wasted calls).
In each API server: works, but couples limiting logic to every service and dies if you have many services.
At the API gateway: the standard answer. One choke point all traffic already flows through; services behind it stay clean.

3. What happens on rejection? Return HTTP 429 Too Many Requests with a Retry-After header (seconds until the client may retry) and ideally X-RateLimit-Limit / -Remaining / -Reset headers on every response. Well-behaved clients use these to back off; without Retry-After they hammer you in a tight retry loop and make the overload worse.

Non-functional bar: the limiter is on the critical path of every request, so its own latency budget is tiny — think < 1–2 ms — and it must not become the new single point of failure.

concept · 429 + Retry-After

The contract of a polite rate limiter: reject with HTTP 429, tell the client exactly when to come back via Retry-After, and expose X-RateLimit-Remaining so clients can self-throttle before hitting the wall. Naming the headers signals you've used real APIs.

concept · Limit key

The identity a counter is attached to: user ID, API key, or IP. Per-key is fair but needs auth; per-IP catches anonymous abuse but punishes users behind shared NATs. Production systems layer multiple keys.

Checkpoint

Why is the API gateway the standard home for a rate limiter?

A client gets a 429 with no Retry-After header. What typically happens?