Graphs
Everything is a node; everything else is an edge.
Dependency chains, social networks, server clusters, city maps — once you see nodes and edges, half of real-world engineering is a graph problem. Interviews test four moves: represent the graph (adjacency list, or a grid pretending it isn't one), traverse it without going in circles (BFS/DFS plus visited discipline), order it (topological sort), and group it (union-find). This lesson gives you all four.
01Representation: adjacency lists, and grids in disguise
Forget adjacency matrices for interviews — they cost O(V²) space and you almost never need them. The workhorse is the adjacency list: a dict mapping each node to its neighbors.
from collections import defaultdict
adj = defaultdict(list)
for u, v in edges:
adj[u].append(v)
adj[v].append(u) # drop this line for a directed graph
Build it in O(V + E); iterate a node's neighbors in O(degree). That defaultdict(list) four-liner should leave your fingers without thought.
The sneakiest graph is the grid. Half of "graph" interview questions hand you a 2-D matrix and never say the word graph: each cell is a node, and its edges go to the 4 orthogonal neighbors. You never build the adjacency list — you generate neighbors on the fly:
for dr, dc in ((1, 0), (-1, 0), (0, 1), (0, -1)):
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols:
... # (nr, nc) is a neighbor
Islands, flood fill, word search, rotting oranges — all the same grid-as-graph move with different paint.
02BFS vs DFS — and the visited discipline that keeps you alive
Both traversals visit every reachable node in O(V + E). The difference is the order, and the order is the feature:
- DFS (recursion or an explicit stack) plunges down one path before retreating. Use it for: explore a whole connected region (islands), detect cycles, anything where reaching matters but distance doesn't.
- BFS (a
deque) expands in rings — all nodes at distance 1, then distance 2, … Use it whenever the question contains "shortest" or "minimum steps" on an unweighted graph. BFS finds shortest paths because of the ring order; DFS does not.
from collections import deque
def bfs(start, adj):
visited = {start} # mark BEFORE enqueueing
queue = deque([start])
while queue:
node = queue.popleft()
for nxt in adj[node]:
if nxt not in visited:
visited.add(nxt) # at enqueue time, not dequeue time
queue.append(nxt)
Visited discipline — graphs have cycles, and traversal without a visited set is an infinite loop. Two rules:
- Mark a node visited when you enqueue/recurse into it, not when you process it — otherwise the same node enters the queue many times and BFS distances break.
- Never un-mark (that's backtracking's move, not traversal's). In grids, mutating the cell in place (
grid[r][c] = 0) is a legal, allocation-free visited set — say you're doing it.
Start: A enters the queue at distance 0. BFS will reach every node in order of distance from A.
queue = deque([A]); dist = {A: 0}parent = {}while queue:node = queue.popleft()for nb in adj[node]:if nb in dist: continue # reached earlier = reached shorterdist[nb] = dist[node] + 1; parent[nb] = nodeif nb == H: return walk_back(parent, H)queue.append(nb)
03Topological sort: Kahn's algorithm
Directed graph, edges mean "must come before" — build targets, course prerequisites, task schedulers. A topological order lists the nodes so every edge points forward. It exists iff the graph has no cycle.
Kahn's algorithm is BFS with a bookkeeping twist: repeatedly peel off nodes that have no remaining prerequisites.
from collections import deque
def topo_order(n, edges): # edges: u -> v (u before v)
adj = [[] for _ in range(n)]
indegree = [0] * n
for u, v in edges:
adj[u].append(v)
indegree[v] += 1
queue = deque(i for i in range(n) if indegree[i] == 0)
order = []
while queue:
u = queue.popleft()
order.append(u)
for v in adj[u]:
indegree[v] -= 1 # u is done; v has one less blocker
if indegree[v] == 0:
queue.append(v)
return order if len(order) == n else [] # leftovers ⇒ cycle
The elegance is the last line: if a cycle exists, its members never reach indegree 0, never enter the queue, and the output comes up short. One algorithm, two answers: the valid order, and cycle detection for free. "Can these courses be finished?" and "give me a build order" are the same code.
04Union-find: 'are we in the same group yet?'
Some problems don't need paths at all — only connectivity: are these two nodes in the same component? Which edge first creates a cycle? How many clusters remain? For these, union-find (a.k.a. disjoint set union) beats traversal.
The idea: every component elects a representative. Each node stores a parent pointer; follow parents until a node is its own parent — that's the representative. Two nodes are connected iff they share one.
parent = list(range(n))
def find(x): # who represents x's group?
while parent[x] != x:
parent[x] = parent[parent[x]] # path compression: point at
x = parent[x] # grandparent, flattening as we go
return x
def union(a, b): # merge the two groups; False if already one
ra, rb = find(a), find(b)
if ra == rb:
return False # same group ⇒ this edge closes a cycle
parent[ra] = rb
return True
Path compression, simply: every time find walks up the chain, it re-points nodes closer to the root. The chain you walk today is shorter tomorrow. With it (plus union by rank, optional in interviews), operations are effectively O(1) amortized — the official bound is the inverse Ackermann function, a constant below 5 for any input that fits in this universe.
When to reach for it: edges arrive as a list and the question is about groups or cycles — "redundant connection," "number of provinces," "accounts merge." If you'd only build an adjacency list in order to ask "same component?", skip the list and union-find instead.
05Choosing your weapon
A fast routing table for the interview moment:
| The question says… | Reach for |
|---|---|
| "shortest path", "fewest moves" (unweighted) | BFS |
| "count regions/islands", "explore the area" | DFS or BFS (either; flood fill) |
| "prerequisites", "order of tasks", "can it finish?" | Topological sort (Kahn's) |
| "are these connected?", "which edge makes a cycle?" | Union-find |
| 2-D matrix of cells with neighbor relationships | grid-as-graph + one of the above |
Two habits that read as senior:
- Name your complexity unprompted: traversal is O(V + E) time, O(V) space; for grids that's O(rows · cols).
- State your visited strategy before coding: separate set, in-place mutation, or a parallel boolean grid. Choosing it consciously prevents the bug class that sinks most graph interviews — revisits and infinite loops.
✦Prove it — checkpoint quiz
Why is BFS — and not DFS — the right tool for 'minimum number of moves' on an unweighted graph?
In BFS, when should a node be marked visited?
Kahn's algorithm finishes and the output order contains fewer nodes than the graph has. What does that mean?
What does path compression do in union-find?
Edges arrive as a list and the only question is 'which edge first connects two already-connected nodes?' Best tool?
⚒Now forge it
Reading is scaffolding. These drills are the building. Easiest first: