Notice Period
week 5 · lesson

Graphs

Everything is a node; everything else is an edge.

Dependency chains, social networks, server clusters, city maps — once you see nodes and edges, half of real-world engineering is a graph problem. Interviews test four moves: represent the graph (adjacency list, or a grid pretending it isn't one), traverse it without going in circles (BFS/DFS plus visited discipline), order it (topological sort), and group it (union-find). This lesson gives you all four.

01Representation: adjacency lists, and grids in disguise

Forget adjacency matrices for interviews — they cost O(V²) space and you almost never need them. The workhorse is the adjacency list: a dict mapping each node to its neighbors.

from collections import defaultdict

adj = defaultdict(list)
for u, v in edges:
    adj[u].append(v)
    adj[v].append(u)      # drop this line for a directed graph

Build it in O(V + E); iterate a node's neighbors in O(degree). That defaultdict(list) four-liner should leave your fingers without thought.

The sneakiest graph is the grid. Half of "graph" interview questions hand you a 2-D matrix and never say the word graph: each cell is a node, and its edges go to the 4 orthogonal neighbors. You never build the adjacency list — you generate neighbors on the fly:

for dr, dc in ((1, 0), (-1, 0), (0, 1), (0, -1)):
    nr, nc = r + dr, c + dc
    if 0 <= nr < rows and 0 <= nc < cols:
        ...  # (nr, nc) is a neighbor

Islands, flood fill, word search, rotting oranges — all the same grid-as-graph move with different paint.

02BFS vs DFS — and the visited discipline that keeps you alive

Both traversals visit every reachable node in O(V + E). The difference is the order, and the order is the feature:

  • DFS (recursion or an explicit stack) plunges down one path before retreating. Use it for: explore a whole connected region (islands), detect cycles, anything where reaching matters but distance doesn't.
  • BFS (a deque) expands in rings — all nodes at distance 1, then distance 2, … Use it whenever the question contains "shortest" or "minimum steps" on an unweighted graph. BFS finds shortest paths because of the ring order; DFS does not.
from collections import deque

def bfs(start, adj):
    visited = {start}            # mark BEFORE enqueueing
    queue = deque([start])
    while queue:
        node = queue.popleft()
        for nxt in adj[node]:
            if nxt not in visited:
                visited.add(nxt)     # at enqueue time, not dequeue time
                queue.append(nxt)

Visited discipline — graphs have cycles, and traversal without a visited set is an infinite loop. Two rules:

  1. Mark a node visited when you enqueue/recurse into it, not when you process it — otherwise the same node enters the queue many times and BFS distances break.
  2. Never un-mark (that's backtracking's move, not traversal's). In grids, mutating the cell in place (grid[r][c] = 0) is a legal, allocation-free visited set — say you're doing it.
8-node graph · source A · target H · cross edges C–E, E–G, G–H form cyclesstep 1/18
ABCDEFGH
queue (front → back) · node @ distance
Ad=0

Start: A enters the queue at distance 0. BFS will reach every node in order of distance from A.

pseudocode
queue = deque([A]); dist = {A: 0}
parent = {}
while queue:
node = queue.popleft()
for nb in adj[node]:
if nb in dist: continue # reached earlier = reached shorter
dist[nb] = dist[node] + 1; parent[nb] = node
if nb == H: return walk_back(parent, H)
queue.append(nb)

03Topological sort: Kahn's algorithm

Directed graph, edges mean "must come before" — build targets, course prerequisites, task schedulers. A topological order lists the nodes so every edge points forward. It exists iff the graph has no cycle.

Kahn's algorithm is BFS with a bookkeeping twist: repeatedly peel off nodes that have no remaining prerequisites.

from collections import deque

def topo_order(n, edges):           # edges: u -> v  (u before v)
    adj = [[] for _ in range(n)]
    indegree = [0] * n
    for u, v in edges:
        adj[u].append(v)
        indegree[v] += 1

    queue = deque(i for i in range(n) if indegree[i] == 0)
    order = []
    while queue:
        u = queue.popleft()
        order.append(u)
        for v in adj[u]:
            indegree[v] -= 1        # u is done; v has one less blocker
            if indegree[v] == 0:
                queue.append(v)
    return order if len(order) == n else []   # leftovers ⇒ cycle

The elegance is the last line: if a cycle exists, its members never reach indegree 0, never enter the queue, and the output comes up short. One algorithm, two answers: the valid order, and cycle detection for free. "Can these courses be finished?" and "give me a build order" are the same code.

04Union-find: 'are we in the same group yet?'

Some problems don't need paths at all — only connectivity: are these two nodes in the same component? Which edge first creates a cycle? How many clusters remain? For these, union-find (a.k.a. disjoint set union) beats traversal.

The idea: every component elects a representative. Each node stores a parent pointer; follow parents until a node is its own parent — that's the representative. Two nodes are connected iff they share one.

parent = list(range(n))

def find(x):                       # who represents x's group?
    while parent[x] != x:
        parent[x] = parent[parent[x]]   # path compression: point at
        x = parent[x]                   # grandparent, flattening as we go
    return x

def union(a, b):                   # merge the two groups; False if already one
    ra, rb = find(a), find(b)
    if ra == rb:
        return False               # same group ⇒ this edge closes a cycle
    parent[ra] = rb
    return True

Path compression, simply: every time find walks up the chain, it re-points nodes closer to the root. The chain you walk today is shorter tomorrow. With it (plus union by rank, optional in interviews), operations are effectively O(1) amortized — the official bound is the inverse Ackermann function, a constant below 5 for any input that fits in this universe.

When to reach for it: edges arrive as a list and the question is about groups or cycles — "redundant connection," "number of provinces," "accounts merge." If you'd only build an adjacency list in order to ask "same component?", skip the list and union-find instead.

05Choosing your weapon

A fast routing table for the interview moment:

The question says…Reach for
"shortest path", "fewest moves" (unweighted)BFS
"count regions/islands", "explore the area"DFS or BFS (either; flood fill)
"prerequisites", "order of tasks", "can it finish?"Topological sort (Kahn's)
"are these connected?", "which edge makes a cycle?"Union-find
2-D matrix of cells with neighbor relationshipsgrid-as-graph + one of the above

Two habits that read as senior:

  • Name your complexity unprompted: traversal is O(V + E) time, O(V) space; for grids that's O(rows · cols).
  • State your visited strategy before coding: separate set, in-place mutation, or a parallel boolean grid. Choosing it consciously prevents the bug class that sinks most graph interviews — revisits and infinite loops.

Prove it — checkpoint quiz

1

Why is BFS — and not DFS — the right tool for 'minimum number of moves' on an unweighted graph?

2

In BFS, when should a node be marked visited?

3

Kahn's algorithm finishes and the output order contains fewer nodes than the graph has. What does that mean?

4

What does path compression do in union-find?

5

Edges arrive as a list and the only question is 'which edge first connects two already-connected nodes?' Best tool?

Now forge it

Reading is scaffolding. These drills are the building. Easiest first: