Imagine a library with one million books. Normally, finding a book means walking to the right section, scanning through alphabetical order, and hunting shelf by shelf. That might take minutes.

Now imagine a different library. You tell the librarian the title of the book you want. She runs a quick mental calculation — a formula based on the letters in the title — and says: "Aisle 7, shelf 4, slot 23." You walk directly there. The book is waiting. No searching, no scanning, no guessing. That formula is a hash function.

This is exactly how Python dictionaries, database indexes, and cache lookups work. A hash function turns any key into an exact storage location, making retrieval blindingly fast.

What Is a Hash Function?

Hash Function: A function that takes an input (called a key) of arbitrary size and returns a fixed-size integer (called a hash value or hash code). The hash value is used as an index into an array.

The critical idea: instead of searching for where something is stored, you calculate its location directly from its value.

key → hash_function(key) → index → array[index] → value

Properties of a Good Hash Function

Property	Meaning
Deterministic	The same input always produces the same output
Uniform distribution	Outputs should spread evenly across available indices
Fast to compute	The function itself must run in O(1)
Minimizes collisions	Two different keys should rarely produce the same index

Python's `hash()` Function

Python has a built-in hash() function you can call on any hashable object:

print(hash("apple"))        # Some large integer, e.g. 3713082716806266542
print(hash("apple"))        # Same value — deterministic
print(hash("apple2"))       # Very different value — small change, big difference
print(hash(42))             # 42 (integers hash to themselves)
print(hash(3.14))           # Some integer representation
print(hash((1, 2, 3)))      # Tuples are hashable
# print(hash([1, 2, 3]))    # ERROR — lists are NOT hashable (they are mutable)

Note: Python randomizes hash seeds per interpreter session for security (hash randomization). The exact numbers will differ each run, but the same key always produces the same value within one session.

To find the actual array index, you apply the modulo operator:

key = "apple"
array_size = 10
index = hash(key) % array_size   # Maps the hash to a valid array slot
print(index)                      # Some number 0-9

The Hash Table: Array + Hash Function

A hash table (Python's dict is one) is simply an array where the index for each element is determined by a hash function.

Array slots:  [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

hash("name") % 10  = 3  → store "Alice" at index 3
hash("age")  % 10  = 7  → store 30 at index 7
hash("city") % 10  = 1  → store "NY" at index 1

To retrieve "name": compute hash("name") % 10 = 3, go directly to slot 3. Done. O(1).

Simple Hash Table from Scratch

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [None] * self.size   # Array of empty slots

    def _hash(self, key):
        return hash(key) % self.size      # Convert key to valid index

    def set(self, key, value):
        index = self._hash(key)
        self.table[index] = (key, value)  # Store key-value pair at computed index

    def get(self, key):
        index = self._hash(key)
        if self.table[index] is not None:
            return self.table[index][1]   # Return just the value
        return None                       # Key not found

    def display(self):
        for i, slot in enumerate(self.table):
            if slot:
                print(f"  [{i}] {slot[0]} → {slot[1]}")

ht = HashTable()
ht.set("name", "Alice")
ht.set("age", 30)
ht.set("city", "New York")
ht.display()

print(ht.get("name"))   # Alice
print(ht.get("age"))    # 30

Output:

  [1] city → New York
  [3] name → Alice
  [7] age → 30
Alice
30

Load Factor: When to Resize

The load factor is the ratio of stored elements to total slots:

load_factor = number_of_elements / table_size

As the load factor grows, the chance of collisions (two keys mapping to the same slot) increases, and performance degrades. Most hash table implementations resize (rehash) when the load factor exceeds a threshold — typically 0.7.

Python's dict resizes when it is about 2/3 full, doubling its capacity each time.

Operations: Average vs Worst Case

Operation	Average Case	Worst Case (many collisions)
Insert	O(1)	O(n)
Lookup	O(1)	O(n)
Delete	O(1)	O(n)

The average case is what you get with a good hash function and a reasonable load factor. The worst case (O(n)) happens when many keys collide into the same slot — covered in the next lesson.

Proof: Dict Lookup vs List Search

The speed difference between hash-based lookup and linear search is dramatic. Here is a concrete demonstration:

import time

data = list(range(1_000_000))               # A list of 1 million integers
data_dict = {x: True for x in data}        # The same data in a dict (hash table)
target = 999_999                            # Look for the very last element

# Linear search through the list — O(n)
start = time.time()
result = target in data
list_time = (time.time() - start) * 1000
print(f"List search:  {list_time:.2f}ms")

# Hash table lookup in the dict — O(1)
start = time.time()
result = target in data_dict
dict_time = (time.time() - start) * 1000
print(f"Dict lookup:  {dict_time:.4f}ms")

print(f"Speedup: ~{list_time / dict_time:.0f}x faster")

Typical Output:

List search:  8.43ms
Dict lookup:  0.0012ms
Speedup: ~7025x faster

The dict does not scan. It computes the location and jumps straight there.

Real-World Applications

Python Dictionaries — Every Python dict is a hash table. Every key lookup, assignment, and deletion uses hashing internally.

Database Indexes — Hash indexes allow O(1) point lookups in databases. Searching for a user by ID, for example, can be nearly instantaneous with a hash index.

Caching (Memoization) — Storing previously computed results in a dict. The function arguments become the key; the result becomes the value.

Password Hashing — Passwords are never stored in plain text. They are hashed with secure functions like bcrypt or Argon2. On login, the entered password is hashed and compared to the stored hash. The original password is never recoverable (note: this is a different kind of hash function — cryptographic, not just indexing).

Symbol Tables in Compilers — Variable names and function names are stored in hash tables so the compiler can look up their types and locations in O(1).

Key Takeaway

A hash function transforms a key into an array index, and a hash table combines that function with an array to achieve O(1) average-case lookups. This is one of the most powerful ideas in computer science — the difference between searching and computing. Python's dict is a world-class hash table implementation, and understanding how it works explains why Python developers reach for dicts so naturally and so often.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

30 minLesson 11 of 17

Course Contents(17 lessons)

▾

Chapter 1: Why Data Structures Matter

What Are Data Structures and Why Do They Exist?20 min

Big O Notation: Measuring Code Efficiency30 min

Chapter 2: Arrays and Strings

Arrays: The Foundation of Everything25 min

Dynamic Arrays and How Python Lists Really Work25 min

2D Arrays and Matrices22 min

Chapter 3: Linked Lists

Singly Linked Lists: Nodes and Chains30 min

Doubly Linked Lists: Going Both Ways28 min

Chapter 4: Stacks and Queues

Stacks: Last In, First Out (LIFO)28 min

Queues: First In, First Out (FIFO)25 min

Deques and Priority Queues22 min

Chapter 5: Hash Tables

Hash Functions: Turning Keys Into Addresses30 min

Collision Handling and Python Dict Internals25 min

Chapter 6: Trees

Binary Trees: Hierarchical Data32 min

Binary Search Trees: Ordered Lookup35 min

Tree Traversal: In-Order, Pre-Order, Post-Order28 min

Chapter 7: Graphs

Graph Representations: Adjacency List vs Matrix30 min

BFS and DFS: Exploring Graphs Systematically35 min

Chapter 5: Hash Tables

Hash Functions: Turning Keys Into Addresses

Hash Functions and Hashing

The Magic Library

Imagine a library with one million books. Normally, finding a book means walking to the right section, scanning through alphabetical order, and hunting shelf by shelf. That might take minutes.

This is exactly how Python dictionaries, database indexes, and cache lookups work. A hash function turns any key into an exact storage location, making retrieval blindingly fast.

What Is a Hash Function?

Hash Function: A function that takes an input (called a key) of arbitrary size and returns a fixed-size integer (called a hash value or hash code). The hash value is used as an index into an array.

The critical idea: instead of searching for where something is stored, you calculate its location directly from its value.

key → hash_function(key) → index → array[index] → value

Properties of a Good Hash Function

Property	Meaning
Deterministic	The same input always produces the same output
Uniform distribution	Outputs should spread evenly across available indices
Fast to compute	The function itself must run in O(1)
Minimizes collisions	Two different keys should rarely produce the same index

Python's `hash()` Function

Python has a built-in hash() function you can call on any hashable object:

print(hash("apple"))        # Some large integer, e.g. 3713082716806266542
print(hash("apple"))        # Same value — deterministic
print(hash("apple2"))       # Very different value — small change, big difference
print(hash(42))             # 42 (integers hash to themselves)
print(hash(3.14))           # Some integer representation
print(hash((1, 2, 3)))      # Tuples are hashable
# print(hash([1, 2, 3]))    # ERROR — lists are NOT hashable (they are mutable)

Note: Python randomizes hash seeds per interpreter session for security (hash randomization). The exact numbers will differ each run, but the same key always produces the same value within one session.

To find the actual array index, you apply the modulo operator:

key = "apple"
array_size = 10
index = hash(key) % array_size   # Maps the hash to a valid array slot
print(index)                      # Some number 0-9

The Hash Table: Array + Hash Function

A hash table (Python's dict is one) is simply an array where the index for each element is determined by a hash function.

Array slots:  [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

hash("name") % 10  = 3  → store "Alice" at index 3
hash("age")  % 10  = 7  → store 30 at index 7
hash("city") % 10  = 1  → store "NY" at index 1

To retrieve "name": compute hash("name") % 10 = 3, go directly to slot 3. Done. O(1).

Simple Hash Table from Scratch

class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [None] * self.size   # Array of empty slots

    def _hash(self, key):
        return hash(key) % self.size      # Convert key to valid index

    def set(self, key, value):
        index = self._hash(key)
        self.table[index] = (key, value)  # Store key-value pair at computed index

    def get(self, key):
        index = self._hash(key)
        if self.table[index] is not None:
            return self.table[index][1]   # Return just the value
        return None                       # Key not found

    def display(self):
        for i, slot in enumerate(self.table):
            if slot:
                print(f"  [{i}] {slot[0]} → {slot[1]}")

ht = HashTable()
ht.set("name", "Alice")
ht.set("age", 30)
ht.set("city", "New York")
ht.display()

print(ht.get("name"))   # Alice
print(ht.get("age"))    # 30

Output:

  [1] city → New York
  [3] name → Alice
  [7] age → 30
Alice
30

Load Factor: When to Resize

The load factor is the ratio of stored elements to total slots:

load_factor = number_of_elements / table_size

Python's dict resizes when it is about 2/3 full, doubling its capacity each time.

Operations: Average vs Worst Case

Operation	Average Case	Worst Case (many collisions)
Insert	O(1)	O(n)
Lookup	O(1)	O(n)
Delete	O(1)	O(n)

The average case is what you get with a good hash function and a reasonable load factor. The worst case (O(n)) happens when many keys collide into the same slot — covered in the next lesson.

Proof: Dict Lookup vs List Search

The speed difference between hash-based lookup and linear search is dramatic. Here is a concrete demonstration:

import time

data = list(range(1_000_000))               # A list of 1 million integers
data_dict = {x: True for x in data}        # The same data in a dict (hash table)
target = 999_999                            # Look for the very last element

# Linear search through the list — O(n)
start = time.time()
result = target in data
list_time = (time.time() - start) * 1000
print(f"List search:  {list_time:.2f}ms")

# Hash table lookup in the dict — O(1)
start = time.time()
result = target in data_dict
dict_time = (time.time() - start) * 1000
print(f"Dict lookup:  {dict_time:.4f}ms")

print(f"Speedup: ~{list_time / dict_time:.0f}x faster")

Typical Output:

List search:  8.43ms
Dict lookup:  0.0012ms
Speedup: ~7025x faster

The dict does not scan. It computes the location and jumps straight there.

Real-World Applications

Python Dictionaries — Every Python dict is a hash table. Every key lookup, assignment, and deletion uses hashing internally.

Database Indexes — Hash indexes allow O(1) point lookups in databases. Searching for a user by ID, for example, can be nearly instantaneous with a hash index.

Caching (Memoization) — Storing previously computed results in a dict. The function arguments become the key; the result becomes the value.

Symbol Tables in Compilers — Variable names and function names are stored in hash tables so the compiler can look up their types and locations in O(1).

Key Takeaway

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

Hash Functions: Turning Keys Into Addresses

Hash Functions and Hashing

The Magic Library

What Is a Hash Function?

Properties of a Good Hash Function

Python's hash() Function

The Hash Table: Array + Hash Function

Simple Hash Table from Scratch

Load Factor: When to Resize

Operations: Average vs Worst Case

Proof: Dict Lookup vs List Search

Real-World Applications

Key Takeaway

💬 DiscussionPowered by GitHub Discussions

Hash Functions: Turning Keys Into Addresses

Hash Functions and Hashing

The Magic Library

What Is a Hash Function?

Properties of a Good Hash Function

Python's hash() Function

The Hash Table: Array + Hash Function

Simple Hash Table from Scratch

Load Factor: When to Resize

Operations: Average vs Worst Case

Proof: Dict Lookup vs List Search

Real-World Applications

Key Takeaway

💬 DiscussionPowered by GitHub Discussions

Python's `hash()` Function

Python's `hash()` Function