Hash Maps & Sets

Trading space for time — find what you need in O(1) by remembering what you've seen.

🌱 NoviceFundamentals

1. Overview

Hash maps and hash sets are the fastest way out of O(n²). Any time you find yourself writing "for each element, scan the rest of the array to check…", that inner scan is the problem — and a hash map eliminates it. You've already worked with arrays from Arrays & Strings; a hash map is how you replace searching with remembering.

The core trade is always the same: spend O(n) space to build a lookup structure, earn O(1) per query forever after. When queries are repeated (or when an inner loop's worth of lookups are needed), that trade always wins.

This guide covers three tools in progression: the hash map for counting and mapping (Level 1), the hash set for O(1) membership testing (Level 2), and prefix sums stored in a hash map for answering subarray range questions in O(1) (Level 3).

2. Core Concept & Mental Model

The Library Card Catalog

A large library holds tens of thousands of books. Without organization, finding any book means walking every aisle until you spot it — O(n) per question. Three tools solve this:

  • The card catalog is the hash map. Every book has a card filed by title: title → shelf location. Any lookup costs one step regardless of library size.
  • The checked-out logbook is the hash set. Not where the book is — just whether it's borrowed. Membership only, O(1), no value stored.
  • The daily borrow tally is the prefix sum stored in a map. Every hour, the librarian records the running total of borrowed books. "Books borrowed between 2pm and 4pm?" = tally at 4pm minus tally at 2pm. No re-counting.

These three tools — catalog, logbook, tally — handle nearly every hash-based interview problem.

Understanding the Analogy

The Setup

You're the head librarian. Patrons ask questions constantly: "Is this book checked out? Which shelf is it on? How many books were borrowed this afternoon?" Without tools, every answer costs a full library walk. You need a system that answers each question in one step, regardless of how large the library grows.

The Catalog and Logbook

When a new book arrives, you make a card — title on the front, shelf location on the back — and file it alphabetically. Filing is O(1): go to the right letter, insert. Looking up is O(1): go to the right letter, pull. The catalog works because the act of filing is the act of making the answer retrievable. You don't search for books anymore; you retrieve their cards.

The logbook is simpler. When a book is borrowed, you stamp its title on today's page. "Is Moby Dick checked out?" — one scan of the logbook, yes or no. The logbook is the catalog with the shelf location stripped out: existence only, no value.

What would go wrong without them? Every question becomes a full-library walk. Ten thousand books, one thousand daily questions: ten million steps. The catalog and logbook collapse this to around twenty thousand steps total.

The Tally Sheet

Every hour, a volunteer records the running total of borrowed books since opening. At 2pm the count is 47; at 4pm it's 112. "How many books were borrowed between 2pm and 4pm?" = 112 − 47 = 65. No re-scanning the logbook page by page.

In code, the tally sheet is a running sum stored in a hash map. As you walk an array, you keep a running prefix sum. You store each prefix sum in the map. When you want to know whether any subarray ending at the current position sums to a target k, you ask: "has the running sum ever been currentSum − k?" If yes, the elements between that earlier checkpoint and now sum to exactly k.

Why These Approaches

Every hash-based technique trades space for time. You spend O(n) space on the catalog, logbook, or tally sheet, and you earn O(1) per lookup instead of O(n) per scan. When you need to look something up for every element — that's a potential O(n²) loop — the hash structure buys you O(n) total instead.

The alternative, scanning, is quadratic when repeated. A single lookup over n elements is O(n). Do it for every element and you have O(n²). The catalog replaces each of those inner scans with O(1). One pass to build, O(1) per query after.

A Simple Example

A patron hands you a pile of book return slips and asks how many times each title appears. Without a catalog: for every slip, re-read the whole pile to count that title — O(n²) total. With a tally card per title: pick up each slip once, add one mark to that title's card, set it down. One pass through the pile, O(1) per slip. Done.

Now you understand the tools. Let's build them step by step.

How I Think Through This

When I see a problem involving "find the element that pairs with X" or "check if this appeared before" or "how often does Y appear," I immediately ask: what would I need to remember to avoid scanning? If the answer is a value I need to retrieve later (a count, an index, a group key) — that's a hash map. If it's just yes/no — that's a hash set. If the problem involves subarrays and their sums, I add a running prefix sum alongside the map. The recognition is almost always the same: I see a potential inner loop and I ask what the catalog equivalent would be.

Take [2, 7, 11, 15] with target 9. For each element I ask: does the complement (9 − element) already appear in what I've walked past? I keep a map of value → index as I go. When I reach 7: is 9 − 7 = 2 in the map? I inserted 2 one step ago, so yes. Return [0, 1] immediately. No inner loop, one pass. ✓

3. Building Blocks — Progressive Learning

Level 1: The Catalog — Counting and Mapping

Why this level matters Most hash map usage in interviews reduces to two operations: counting how often something appears, or remembering a value (an index, a group key) keyed to something you've seen. Once you can build a frequency table and do a complement lookup in your sleep, you've got the core of the majority of hash map problems.

How to think about it A hash map is the card catalog in code: map.set(key, value) files a card, map.get(key) retrieves it. The most common pattern — frequency counting — is just incrementing a counter for each key you encounter. As you walk, for each element: map.set(x, (map.get(x) ?? 0) + 1). That ?? 0 is the critical detail: if the key has never been seen, map.get returns undefined, not 0.

The second pattern — value → index — powers complement lookups. As you walk, store map.set(value, index). For each new element, check if target − element already exists in the map. If it does, you have your pair without scanning backwards.

Walking through it

Build a frequency table for ['cat', 'dog', 'cat', 'bird', 'dog', 'cat']:

i current
1 / 8
cat
0
dog
1
cat
2
bird
3
dog
4
cat
5
map
empty
Open an empty catalog — no cards filed yet. freq = {}

One pass, six steps. "How often does 'cat' appear?" — one lookup, answer: 3.

The one thing to get right

map.get(key) returns undefined when the key is absent, not 0. Writing map.get(key) + 1 produces NaN, which silently corrupts every count downstream. Always use (map.get(key) ?? 0) + 1. This is the single most common hash map bug in interviews.

Loading editor...

Mental anchor: "Walk once, catalog as you go. (freq[key] ?? 0) + 1 — the ?? 0 is the card that was never filed."

→ Bridge to Level 2: A frequency map stores values. But sometimes you only need to know whether a key exists — you don't care what it maps to. A hash set is a map with no values: pure membership in O(1), with less overhead and clearer intent.


Level 2: The Logbook — Membership in O(1)

Why this level matters Deduplication, intersection, "have I seen this before?" — all of these reduce to membership testing. A hash set answers membership in O(1). It is a focused tool: add an entry, check it, no values needed.

How to think about it set.add(x) stamps an entry in the logbook. set.has(x) checks whether that stamp exists. Nothing more. Use a set when the question is yes/no and you don't need to retrieve an associated value later.

The classic pattern: walk the array, check set.has(x) before calling set.add(x). If the check passes, you've found a duplicate. For intersection, load array A into a set, then walk array B and collect every element where set.has(element) is true.

Walking through it

"Does [3, 1, 4, 1, 5] contain a duplicate?"

i current
1 / 6
3
0
1
1
4
2
1
3
5
4
map
empty
Open empty logbook — nothing stamped yet. seen = {}

Found the duplicate at index 3 after four steps. Never had to scan backwards.

The one thing to get right

A set tracks existence, not count. If you need to know how often something appeared — for example, to find the first element that appears exactly once — a set is wrong: it cannot distinguish "appeared once" from "appeared five times." Use a frequency map for count-based questions; use a set only for pure existence checks.

Loading editor...

Mental anchor: "Set = the logbook. has() before add(). Set for existence, map for values."

→ Bridge to Level 3: Sets and maps answer per-element questions. But some problems ask about contiguous slices of an array: "does any subarray sum to k?" Checking every subarray is O(n²). The prefix sum stored in a map collapses this to O(n) by reframing the question as a single O(1) lookup per element.


Level 3: The Tally Sheet — Prefix Sum + Hash Map

Why this level matters "Count subarrays with sum equal to k" looks like it needs a nested loop. The prefix sum + hash map pattern reduces this to a single pass. Once you see how the tally sheet works, you'll recognize this trick across dozens of problems involving sums, differences, and ranges.

How to think about it Maintain a running prefixSum as you walk the array. You want to know: does any subarray ending here have sum exactly k? If the running sum at position j is S, and there was a running sum of S − k at some earlier position i, then the slice from i+1 to j sums to exactly k. Store all seen prefix sums in a map (prefixSum → count of how many times that sum appeared) and check map.has(prefixSum − k) at each step.

Initialize the map with {0: 1} before you start. This represents the empty prefix before any elements. Without it, when prefixSum === k (a subarray starting at index 0), you'd look for prefixSum − k = 0 and find nothing — silently missing valid answers.

Walking through it

Count subarrays with sum = 2 in [1, 1, 1]:

i current
1 / 5
1
0
1
1
1
2
prefixSum = 0result = 0
map
01
Initialize
prefixSum = 0
result = 0. Seed map with {0: 1} — the empty prefix before any elements.

Result: 2. The two subarrays are [1,1] at indices 0–1 and [1,1] at indices 1–2. ✓

The one thing to get right

Seed the map with {0: 1} before processing any element. Without the seed, subarrays that start at the very beginning (index 0) are invisible: when prefixSum === k, you look for prefixSum − k = 0 in the map, but zero was never recorded. The code runs silently and returns a count that is simply off by some amount with no error to catch it.

Loading editor...

Mental anchor: "prefixSum[j] − prefixSum[i] = subarray sum i+1..j. Store seen sums in map. Seed with {0:1} or you miss index-0 subarrays."

4. Key Patterns

Pattern: Complement Lookup

When to use: "Find two elements that sum to target", "find a pair with difference k", "find an element's partner." You need to check whether a specific value appeared somewhere earlier in the array.

How to think about it: For each element at position j, compute the "complement" — the value that would satisfy the condition when paired with element[j]. Store all previously seen values in a map (value → index). Then map.has(complement) answers the question in O(1). Always check before storing so an element cannot match itself.

Complexity: Time O(n), Space O(n)

Pattern: Frequency Map + Canonical Key

When to use: "Are these two strings anagrams?", "group words that are anagrams of each other", "do these collections have the same distribution?" You're checking whether two collections share the same multiset of elements, or bucketing items that share a property.

How to think about it: Build a frequency map for one collection, walk the second and decrement; if all counts reach zero, the distributions match. For grouping, use a canonical key: sort each word's characters to get a stable key like "aet" for "eat", "tea", and "ate". Words with the same canonical key land in the same bucket automatically.

Complexity: Time O(n · m log m) where m = average string length, Space O(n · m)

5. Decision Framework

Concept Map

Loading chart...

Complexity

OperationHash MapHash SetArray scan
InsertO(1) avgO(1) avgO(1) append
LookupO(1) avgO(1) avgO(n)
DeleteO(1) avgO(1) avgO(n)
SpaceO(n)O(n)O(1) extra

Decision Tree

Loading chart...

Recognition Signals

Problem keywordsTechnique
"contains duplicate", "any repeats", "seen before"Hash set, one pass
"two sum", "pair that adds to", "complement"Complement lookup (check before store)
"frequency", "most common", "count occurrences"Frequency map
"anagram", "same letters", "valid permutation"Frequency map or sorted canonical key
"subarray sum equals k", "subarray with sum"Prefix sum + map, seed {0:1}
"group by", "cluster", "bucket into"Map of arrays with canonical key
"longest consecutive sequence"Load into set, scan for sequence starts

When NOT to use

  • Data is already sorted → binary search is O(log n); a hash scan is O(n)
  • Key space is small and bounded (e.g., 26 lowercase letters) → a plain fixed-size array has the same O(1) access with less overhead
  • You need elements in sorted order → maps have no ordering guarantee; sort at the end or use a sorted structure
  • Spatial or positional reasoning → two pointers or sliding window; hash maps don't help with index arithmetic

6. Common Gotchas & Edge Cases

Typical Mistakes

  1. map.get(key) + 1 when the key is absentundefined + 1 produces NaN. Every subsequent count from that key is also NaN, corrupting results silently. Always write (map.get(key) ?? 0) + 1.

  2. Forgetting to seed the prefix map with {0: 1} — without the seed, subarrays starting at index 0 are invisible. When prefixSum === k, you look for 0 in the map; it's not there; you silently miss valid subarrays with no error.

  3. Storing before checking in complement lookups — if you call map.set(nums[i], i) before checking map.has(target − nums[i]), an element can match itself. Always check first, then store.

  4. Using a set when you need a count — sets cannot tell you "appeared exactly once" vs. "appeared three times." For first-unique-character style problems, you need a frequency map, not a set.

  5. for...in on a Mapfor...in iterates prototype properties, not map entries. Use for (const [k, v] of map) or map.forEach to iterate map contents.

Edge Cases to Always Test

  • Empty input → return 0, [], or false as appropriate; no crash
  • Single element → can't form a pair; subarray is just that element
  • All elements identical → frequency count = n; set size = 1
  • Target sum of 0 in prefix problems → the seed {0: 1} handles this correctly
  • Negative numbers in prefix problems → the algorithm is correct; no special-casing needed

Debugging Tips

  • Print the map after building: console.log([...map.entries()]) — check keys and counts look right
  • For prefix sum, log prefixSum and prefixSum − k at each step; the lookup should hit at the positions you expect
  • If a count is off by one, verify you seeded the prefix map and that you're checking before storing
  • For complement lookups, trace the map state at the moment the lookup fires — the complement should already be present