Heaps & Priority Queues
Always know the extreme — without sorting everything.
A heap answers one question obsessively well: what's the smallest (or largest) thing right now? It gives you the extreme in O(1) and lets you push or pop in O(log n) — without ever fully sorting. That single capability powers top-k queries, schedulers, Dijkstra, and streaming medians. Learn what the heap promises, what it deliberately doesn't, and the two patterns (size-k heap, two heaps) that show up in interview after interview.
01The contract: one guarantee, ruthlessly kept
A min-heap is a complete binary tree with one rule: every parent ≤ its children. That's it. From that single invariant:
- the minimum is always at the root — peek in O(1)
- insert and remove-min are O(log n) (bubble the violation up or down one level at a time)
- the tree is complete, so it packs into a plain array: children of index
ilive at2i+1and2i+2. No pointers, no nodes.
Just as important is what a heap does not promise:
- it is not sorted — siblings are in arbitrary order, and
heap[1]is not the second-smallest - it cannot find an arbitrary element faster than O(n)
- it cannot delete an arbitrary element efficiently (only the root)
The interview smell test: if the problem keeps asking for "the current min/max" while items come and go, that's a heap. If it asks "is x present?", that's a hash set. Mixing those up is a classic flunk.
A min-heap is just this array: [2, 5, 8, 9, 7]. Index i's parent lives at (i-1)//2 — no pointers, the shape is implied.
def push(x): heap.append(x); i = len(heap)-1while i > 0 and heap[i] < heap[(i-1)//2]:swap(i, (i-1)//2); i = (i-1)//2def pop(): top = heap[0]heap[0] = heap.pop() # last leaf to rootwhile a child is smaller than heap[i]:swap(i, smaller_child); i = smaller_childreturn top
02Python heapq: min-heap only — negate for max
Python's heapq module operates on a plain list and is min-heap only:
import heapq
h = []
heapq.heappush(h, 5)
heapq.heappush(h, 1)
heapq.heappush(h, 3)
h[0] # 1 — peek the min, O(1)
heapq.heappop(h) # 1 — remove the min, O(log n)
nums = [5, 1, 3]
heapq.heapify(nums) # in-place, and only O(n) — cheaper than n pushes
Need a max-heap? Store negated values and flip the sign on the way out:
h = [-x for x in nums]
heapq.heapify(h)
largest = -heapq.heappop(h)
For compound items, push tuples — Python compares them element by element, so (priority, payload) sorts by priority first:
heapq.heappush(h, (dist, x, y)) # nearest point pops first
One trap: if two priorities tie, Python falls through to comparing payloads, which crashes on uncomparable objects. Insert a tiebreaker counter: (priority, count, obj).
03Top-k: a heap of size k beats sorting
"Find the k largest of n items." Sorting costs O(n log n) and orders everything — wasted work when you only care about k of them.
Instead, keep a min-heap of size k holding the best k candidates seen so far. Its root is the weakest of the winners — the bouncer at the door:
import heapq
def k_largest(nums, k):
h = []
for x in nums:
heapq.heappush(h, x)
if len(h) > k:
heapq.heappop(h) # evict the weakest
return h # the k largest, O(n log k)
Note the inversion that trips people up: you want the largest items, so you keep a min-heap — because the eviction decision is always about the smallest thing you're currently keeping.
Why it matters at scale: O(n log k) with O(k) memory means you can stream a billion log lines through it and never hold more than k in RAM. heapq.nlargest(k, nums) does exactly this — fine to use once you can write the loop yourself.
04Two heaps: a running median
One heap tracks one extreme. To track the middle of a stream, run two heaps facing each other:
low— a max-heap holding the smaller half (root = its largest)high— a min-heap holding the larger half (root = its smallest)
The median lives at the boundary, in the roots. Keep two invariants:
- Order: everything in
low≤ everything inhigh - Balance: their sizes differ by at most 1
import heapq
low, high = [], [] # low is negated (max-heap)
def add(x):
heapq.heappush(low, -x) # 1. into low
heapq.heappush(high, -heapq.heappop(low)) # 2. fix order
if len(high) > len(low): # 3. fix balance
heapq.heappush(low, -heapq.heappop(high))
def median():
if len(low) > len(high):
return float(-low[0])
return (-low[0] + high[0]) / 2
Each insert is O(log n); each median read is O(1). The transferable move — split the data at the point you care about and guard the boundary with two heaps — also solves sliding-window medians and IQR-style percentile tracking in monitoring systems.
✦Prove it — checkpoint quiz
A min-heap of 1,000,000 items. What's the cost of reading the minimum vs finding an arbitrary value?
You want the 5 largest of 10 million numbers using heapq. Which heap do you maintain?
Why does pushing -x into heapq simulate a max-heap?
In the two-heaps median structure, where does the median come from after 9 insertions?
⚒Now forge it
Reading is scaffolding. These drills are the building. Easiest first: