Notice Period
week 4 · lesson

Heaps & Priority Queues

Always know the extreme — without sorting everything.

A heap answers one question obsessively well: what's the smallest (or largest) thing right now? It gives you the extreme in O(1) and lets you push or pop in O(log n) — without ever fully sorting. That single capability powers top-k queries, schedulers, Dijkstra, and streaming medians. Learn what the heap promises, what it deliberately doesn't, and the two patterns (size-k heap, two heaps) that show up in interview after interview.

01The contract: one guarantee, ruthlessly kept

A min-heap is a complete binary tree with one rule: every parent ≤ its children. That's it. From that single invariant:

  • the minimum is always at the root — peek in O(1)
  • insert and remove-min are O(log n) (bubble the violation up or down one level at a time)
  • the tree is complete, so it packs into a plain array: children of index i live at 2i+1 and 2i+2. No pointers, no nodes.

Just as important is what a heap does not promise:

  • it is not sorted — siblings are in arbitrary order, and heap[1] is not the second-smallest
  • it cannot find an arbitrary element faster than O(n)
  • it cannot delete an arbitrary element efficiently (only the root)

The interview smell test: if the problem keeps asking for "the current min/max" while items come and go, that's a heap. If it asks "is x present?", that's a hash set. Mixing those up is a classic flunk.

min-heap = [2, 5, 8, 9, 7] · push(3), push(1), then pop()step 1/17
0215283947
implicit tree (parent = (i-1)//2)
i=02 (root)i=15, parent i=0i=28, parent i=0i=39, parent i=1i=47, parent i=1

A min-heap is just this array: [2, 5, 8, 9, 7]. Index i's parent lives at (i-1)//2 — no pointers, the shape is implied.

pseudocode
def push(x): heap.append(x); i = len(heap)-1
while i > 0 and heap[i] < heap[(i-1)//2]:
swap(i, (i-1)//2); i = (i-1)//2
def pop(): top = heap[0]
heap[0] = heap.pop() # last leaf to root
while a child is smaller than heap[i]:
swap(i, smaller_child); i = smaller_child
return top

02Python heapq: min-heap only — negate for max

Python's heapq module operates on a plain list and is min-heap only:

import heapq

h = []
heapq.heappush(h, 5)
heapq.heappush(h, 1)
heapq.heappush(h, 3)
h[0]              # 1 — peek the min, O(1)
heapq.heappop(h)  # 1 — remove the min, O(log n)

nums = [5, 1, 3]
heapq.heapify(nums)   # in-place, and only O(n) — cheaper than n pushes

Need a max-heap? Store negated values and flip the sign on the way out:

h = [-x for x in nums]
heapq.heapify(h)
largest = -heapq.heappop(h)

For compound items, push tuples — Python compares them element by element, so (priority, payload) sorts by priority first:

heapq.heappush(h, (dist, x, y))   # nearest point pops first

One trap: if two priorities tie, Python falls through to comparing payloads, which crashes on uncomparable objects. Insert a tiebreaker counter: (priority, count, obj).

03Top-k: a heap of size k beats sorting

"Find the k largest of n items." Sorting costs O(n log n) and orders everything — wasted work when you only care about k of them.

Instead, keep a min-heap of size k holding the best k candidates seen so far. Its root is the weakest of the winners — the bouncer at the door:

import heapq

def k_largest(nums, k):
    h = []
    for x in nums:
        heapq.heappush(h, x)
        if len(h) > k:
            heapq.heappop(h)   # evict the weakest
    return h                   # the k largest, O(n log k)

Note the inversion that trips people up: you want the largest items, so you keep a min-heap — because the eviction decision is always about the smallest thing you're currently keeping.

Why it matters at scale: O(n log k) with O(k) memory means you can stream a billion log lines through it and never hold more than k in RAM. heapq.nlargest(k, nums) does exactly this — fine to use once you can write the loop yourself.

04Two heaps: a running median

One heap tracks one extreme. To track the middle of a stream, run two heaps facing each other:

  • low — a max-heap holding the smaller half (root = its largest)
  • high — a min-heap holding the larger half (root = its smallest)

The median lives at the boundary, in the roots. Keep two invariants:

  1. Order: everything in low ≤ everything in high
  2. Balance: their sizes differ by at most 1
import heapq

low, high = [], []          # low is negated (max-heap)

def add(x):
    heapq.heappush(low, -x)                      # 1. into low
    heapq.heappush(high, -heapq.heappop(low))    # 2. fix order
    if len(high) > len(low):                     # 3. fix balance
        heapq.heappush(low, -heapq.heappop(high))

def median():
    if len(low) > len(high):
        return float(-low[0])
    return (-low[0] + high[0]) / 2

Each insert is O(log n); each median read is O(1). The transferable move — split the data at the point you care about and guard the boundary with two heaps — also solves sliding-window medians and IQR-style percentile tracking in monitoring systems.

Prove it — checkpoint quiz

1

A min-heap of 1,000,000 items. What's the cost of reading the minimum vs finding an arbitrary value?

2

You want the 5 largest of 10 million numbers using heapq. Which heap do you maintain?

3

Why does pushing -x into heapq simulate a max-heap?

4

In the two-heaps median structure, where does the median come from after 9 insertions?

Now forge it

Reading is scaffolding. These drills are the building. Easiest first: