week 5 · lesson

Tries (Prefix Trees)

Pay per character, search by prefix.

Type three letters into a search box and ten suggestions appear before your next keystroke. The structure behind that is the trie (prefix tree): a tree where each edge is a character and each root-to-node path spells a prefix. Hash maps treat strings as opaque blobs; tries exploit their internal structure, making every prefix question — autocomplete, spell-check, shortest-root replacement, wildcard search — cost only as much as the prefix is long.

01The shape: paths spell prefixes

Insert "cat", "car", and "dog" into a trie:

(root)
 ├─ c ─ a ─ t●
 │        └─ r●
 └─ d ─ o ─ g●

"cat" and "car" share the path c → a and split only at the last letter; ● marks "a word ends here." Two consequences fall out of the picture:

Shared prefixes are stored once. A million words starting with "un" store that "un" exactly one time.
Every node is a prefix. Walking k characters lands you at the node representing that k-character prefix — and the subtree below it contains exactly the words that start with it.

The end-of-word marker is not optional decoration: without it, inserting "cattle" would make "cat" look like a word (or not — you couldn't tell). Node existence means prefix; the flag means word.

02Node-as-dict: the 10-line implementation

Skip the TrieNode class ceremony — in Python, a node is just a dict mapping character → child node, with a sentinel key marking word ends:

END = "$"          # sentinel — can't collide with a real character

def insert(root, word):
    node = root
    for ch in word:
        node = node.setdefault(ch, {})   # walk, creating as needed
    node[END] = True

def search(root, word):                  # exact word?
    node = root
    for ch in word:
        if ch not in node:
            return False
        node = node[ch]
    return END in node

def starts_with(root, prefix):           # any word with this prefix?
    node = root
    for ch in prefix:
        if ch not in node:
            return False
        node = node[ch]
    return True

search and starts_with are the same walk with different finishes — one checks the flag, the other only needs to arrive. Every operation is O(L) in the length of the string, independent of how many words the trie holds. setdefault does insert's create-or-descend in one call; it's the trie author's best friend.

03Why tries beat hashing for prefix queries

A hash set already gives O(L) exact lookup — hashing must read all L characters anyway. So when does a trie earn its keep?

Query	Hash set	Trie
"is `word` present?"	O(L) ✅	O(L)
"does any word start with `p`?"	O(n · L) scan	O(len(p))
"list all words starting with `p`"	O(n · L) scan	O(len(p) + size of answer)
"match `c.t` with a wildcard"	no help at all	branch only at the dot

The structural reason: a hash function scatters "car" and "cat" to unrelated buckets — by design, since uniform scattering is what makes hashing fast. The trie does the opposite: it clusters strings by shared prefix, so one short walk reaches the entire family. Neither wins everywhere; the trie wins exactly when the question is about prefixes, not whole strings.

It also unlocks moves hashing can't express: walk a trie while DFS-ing a letter grid and you prune entire search branches the moment a path stops being a prefix of any word — the trick behind multi-word boggle/word-search problems.

04Autocomplete: the trie as a product feature

Frame the search-box feature as operations you now own:

Walk the prefix — O(len(prefix)) to reach the node for what the user typed. Dead end ⇒ zero suggestions, instantly.
Enumerate the subtree — DFS below that node yields every completion. Unbounded, so…
Rank and cut — keep popularity counts and return the top k. Either heap-select over the subtree, or precompute "top k per node" at insert time to make reads O(len(prefix) + k).

That third step is a real engineering trade-off — precompute at write time vs compute at read time — and it's exactly the conversation a "design typeahead / search suggestions" system-design interview (week 6) wants. There, the trie gets sharded across machines by prefix, hot prefixes get cached, and rankings refresh offline — but the data structure at the core is the one you just built from dicts.

When to reach for a trie in a coding round: many strings + repeated prefix-shaped questions (autocomplete, longest shared prefix, shortest root, wildcards). One-shot exact membership? Stay with the hash set — simpler and cache-friendlier.

✦Prove it — checkpoint quiz

In a trie, what distinguishes a *word* from a mere *prefix*?

A trie holds 1,000,000 words. What does it cost to check whether any word starts with 'pre'?

Why is a hash set structurally *unable* to answer prefix queries efficiently?

Autocomplete must return the top-5 most popular completions in milliseconds. Which design moves work toward read time, not against it?

⚒Now forge it

Reading is scaffolding. These drills are the building. Easiest first:

·Shared SKU Prefixeasy ·Autocomplete Word Storemedium ·Redacted Name Lookupmedium ·Shorthand Rewritermedium ·Letter Grid Harvesthard