A new school has 100 lockers, numbered 0 to 99. The school assigns lockers using a formula based on student ID numbers. Most students get a unique locker. But one day, two students — ID 204 and ID 304 — both compute to locker number 4. They arrive, and both try to open the same locker.

The school needs a plan. How do you handle two things that need the same spot?

This is a collision in a hash table, and every real hash table must have a strategy to deal with it.

What Is a Collision?

Collision: A situation where two different keys produce the same hash value (and therefore map to the same array index).

Even with a perfect hash function, collisions are mathematically unavoidable if you have more possible keys than array slots. The birthday paradox shows that with just 23 people, there is a 50% chance two share a birthday — hash collisions happen far more frequently than intuition suggests.

# Simple demonstration of a collision with a small table
def simple_hash(key, size=10):
    return hash(key) % size

print(simple_hash("listen", 10))    # Might output: 5
print(simple_hash("silent", 10))    # Also might output: 5 — collision!

When two keys land at the same index, the table must have a plan. There are several well-established strategies.

Method 1: Chaining

Chaining stores multiple items at the same index using a linked list (or in Python's case, a list). Every slot holds not just one item but a chain of items that all hashed to that index.

Index 4: → [("student_204", "Alice")] → [("student_304", "Bob")] → None

class ChainingHashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(self.size)]   # Each slot holds a list

    def _hash(self, key):
        return hash(key) % self.size

    def set(self, key, value):
        index = self._hash(key)
        chain = self.table[index]
        for i, (k, v) in enumerate(chain):
            if k == key:
                chain[i] = (key, value)   # Update existing key
                return
        chain.append((key, value))        # No existing key — add new pair

    def get(self, key):
        index = self._hash(key)
        for k, v in self.table[index]:    # Search the chain at this index
            if k == key:
                return v
        return None

Chaining pros: Simple to implement, handles high load factors gracefully.
Chaining cons: Extra memory for pointers; cache-unfriendly (linked list nodes scattered in memory).

Method 2: Open Addressing

Instead of a linked list at each slot, open addressing stores only one element per slot. When a collision occurs, the algorithm probes for the next available empty slot.

Linear Probing

Try index + 1, then index + 2, and so on until an empty slot is found:

class OpenAddressingHashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [None] * self.size
        self.DELETED = "<DELETED>"        # Tombstone marker for deleted slots

    def _hash(self, key):
        return hash(key) % self.size

    def set(self, key, value):
        index = self._hash(key)
        while self.table[index] is not None and self.table[index] != self.DELETED:
            if self.table[index][0] == key:   # Found existing key — update
                self.table[index] = (key, value)
                return
            index = (index + 1) % self.size   # Probe next slot (wrap around)
        self.table[index] = (key, value)

    def get(self, key):
        index = self._hash(key)
        while self.table[index] is not None:
            if self.table[index] != self.DELETED and self.table[index][0] == key:
                return self.table[index][1]
            index = (index + 1) % self.size
        return None

Open addressing pros: Better cache performance (all data in one array), lower memory overhead.
Open addressing cons: Clustering — many collisions bunch up nearby, slowing future lookups. Deletions are tricky (need tombstone markers).

Method 3: Robin Hood Hashing (Brief)

A smarter variant of open addressing. When inserting and a collision occurs, the new element "steals" the slot from whichever element is furthest from its ideal position (the "rich" one), and the displaced element continues probing. This balances probe lengths across all elements, keeping performance more consistent. Many modern high-performance hash tables (including Rust's HashMap) use this strategy.

How CPython Dicts Actually Work

Python's dict uses open addressing with pseudo-random probing — not simple linear probing. The probing sequence mixes in bits of the hash value to avoid clustering:

next_index = (5 * current_index + 1 + perturb) % table_size

Key facts about CPython dicts:

Compact and ordered: Since Python 3.7, dicts maintain insertion order as a language guarantee.
Resize threshold: CPython resizes (doubles) when the dict is about 2/3 full (load factor ~0.67).
Two-array layout: Since Python 3.6+, CPython separates the hash indices from the actual key-value storage, making iteration faster.

Python Dicts Maintain Insertion Order (Python 3.7+)

d = {}
d["banana"] = 2
d["apple"] = 5
d["cherry"] = 1

for key, value in d.items():
    print(key, value)

Output:

banana 2
apple 5
cherry 1

The order is insertion order — not sorted, not random. This is guaranteed behavior in Python 3.7 and later.

When Dicts Hit O(n): Hash Flood Attacks

A malicious attacker could craft inputs where all keys hash to the same index, turning O(1) lookups into O(n) — effectively a denial of service. Python defends against this with hash randomization (PYTHONHASHSEED), which adds a random seed to string hashes at startup, making collisions unpredictable for attackers.

Sets: Hash Tables Without Values

A Python set is simply a hash table where each slot stores only a key — no associated value. This makes membership testing (in) run in O(1):

names = {"Alice", "Bob", "Charlie"}
print("Alice" in names)    # O(1) — True
print("Diana" in names)    # O(1) — False

Sets are perfect for deduplication and fast membership checks.

Practical Python Patterns Using Dicts

Frequency Counting

words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
freq = {}
for word in words:
    freq[word] = freq.get(word, 0) + 1   # Increment count, default 0
print(freq)
# Output: {'apple': 3, 'banana': 2, 'cherry': 1}

Grouping Items

students = [("Alice", "Math"), ("Bob", "Science"), ("Carol", "Math"), ("Dave", "Science")]
groups = {}
for name, subject in students:
    groups.setdefault(subject, []).append(name)
print(groups)
# Output: {'Math': ['Alice', 'Carol'], 'Science': ['Bob', 'Dave']}

Caching / Memoization

cache = {}

def fibonacci(n):
    if n in cache:
        return cache[n]             # O(1) lookup — already computed
    if n <= 1:
        return n
    result = fibonacci(n-1) + fibonacci(n-2)
    cache[n] = result               # Store result for future calls
    return result

print(fibonacci(50))    # 12586269025 — fast because of caching

Dict Comprehensions

# Build a dict from a list in one line
squares = {x: x**2 for x in range(6)}
print(squares)
# Output: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# Filter while building
even_squares = {x: x**2 for x in range(10) if x % 2 == 0}
print(even_squares)
# Output: {0: 0, 2: 4, 4: 16, 6: 36, 8: 64}

Key Takeaway

Collisions are inevitable in hash tables. Chaining handles them with linked lists per slot; open addressing probes for the next free slot. Python uses a sophisticated open addressing scheme with random probing, resizes before getting too full, and has maintained insertion order since Python 3.7. Understanding these internals explains why Python dicts are so fast, why you should use them for frequency counting and caching, and why you should never use a mutable object (like a list) as a dict key.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

25 minLesson 12 of 17

Course Contents(17 lessons)

▾

Chapter 1: Why Data Structures Matter

What Are Data Structures and Why Do They Exist?20 min

Big O Notation: Measuring Code Efficiency30 min

Chapter 2: Arrays and Strings

Arrays: The Foundation of Everything25 min

Dynamic Arrays and How Python Lists Really Work25 min

2D Arrays and Matrices22 min

Chapter 3: Linked Lists

Singly Linked Lists: Nodes and Chains30 min

Doubly Linked Lists: Going Both Ways28 min

Chapter 4: Stacks and Queues

Stacks: Last In, First Out (LIFO)28 min

Queues: First In, First Out (FIFO)25 min

Deques and Priority Queues22 min

Chapter 5: Hash Tables

Hash Functions: Turning Keys Into Addresses30 min

Collision Handling and Python Dict Internals25 min

Chapter 6: Trees

Binary Trees: Hierarchical Data32 min

Binary Search Trees: Ordered Lookup35 min

Tree Traversal: In-Order, Pre-Order, Post-Order28 min

Chapter 7: Graphs

Graph Representations: Adjacency List vs Matrix30 min

BFS and DFS: Exploring Graphs Systematically35 min