AiTechWorlds
AiTechWorlds
Imagine a library with one million books. Normally, finding a book means walking to the right section, scanning through alphabetical order, and hunting shelf by shelf. That might take minutes.
Now imagine a different library. You tell the librarian the title of the book you want. She runs a quick mental calculation — a formula based on the letters in the title — and says: "Aisle 7, shelf 4, slot 23." You walk directly there. The book is waiting. No searching, no scanning, no guessing. That formula is a hash function.
This is exactly how Python dictionaries, database indexes, and cache lookups work. A hash function turns any key into an exact storage location, making retrieval blindingly fast.
Hash Function: A function that takes an input (called a key) of arbitrary size and returns a fixed-size integer (called a hash value or hash code). The hash value is used as an index into an array.
The critical idea: instead of searching for where something is stored, you calculate its location directly from its value.
key → hash_function(key) → index → array[index] → value
| Property | Meaning |
|---|---|
| Deterministic | The same input always produces the same output |
| Uniform distribution | Outputs should spread evenly across available indices |
| Fast to compute | The function itself must run in O(1) |
| Minimizes collisions | Two different keys should rarely produce the same index |
hash() FunctionPython has a built-in hash() function you can call on any hashable object:
print(hash("apple")) # Some large integer, e.g. 3713082716806266542
print(hash("apple")) # Same value — deterministic
print(hash("apple2")) # Very different value — small change, big difference
print(hash(42)) # 42 (integers hash to themselves)
print(hash(3.14)) # Some integer representation
print(hash((1, 2, 3))) # Tuples are hashable
# print(hash([1, 2, 3])) # ERROR — lists are NOT hashable (they are mutable)
Note: Python randomizes hash seeds per interpreter session for security (hash randomization). The exact numbers will differ each run, but the same key always produces the same value within one session.
To find the actual array index, you apply the modulo operator:
key = "apple"
array_size = 10
index = hash(key) % array_size # Maps the hash to a valid array slot
print(index) # Some number 0-9
A hash table (Python's dict is one) is simply an array where the index for each element is determined by a hash function.
Array slots: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
hash("name") % 10 = 3 → store "Alice" at index 3
hash("age") % 10 = 7 → store 30 at index 7
hash("city") % 10 = 1 → store "NY" at index 1
To retrieve "name": compute hash("name") % 10 = 3, go directly to slot 3. Done. O(1).
class HashTable:
def __init__(self, size=10):
self.size = size
self.table = [None] * self.size # Array of empty slots
def _hash(self, key):
return hash(key) % self.size # Convert key to valid index
def set(self, key, value):
index = self._hash(key)
self.table[index] = (key, value) # Store key-value pair at computed index
def get(self, key):
index = self._hash(key)
if self.table[index] is not None:
return self.table[index][1] # Return just the value
return None # Key not found
def display(self):
for i, slot in enumerate(self.table):
if slot:
print(f" [{i}] {slot[0]} → {slot[1]}")
ht = HashTable()
ht.set("name", "Alice")
ht.set("age", 30)
ht.set("city", "New York")
ht.display()
print(ht.get("name")) # Alice
print(ht.get("age")) # 30
Output:
[1] city → New York
[3] name → Alice
[7] age → 30
Alice
30
The load factor is the ratio of stored elements to total slots:
load_factor = number_of_elements / table_size
As the load factor grows, the chance of collisions (two keys mapping to the same slot) increases, and performance degrades. Most hash table implementations resize (rehash) when the load factor exceeds a threshold — typically 0.7.
Python's dict resizes when it is about 2/3 full, doubling its capacity each time.
| Operation | Average Case | Worst Case (many collisions) |
|---|---|---|
| Insert | O(1) | O(n) |
| Lookup | O(1) | O(n) |
| Delete | O(1) | O(n) |
The average case is what you get with a good hash function and a reasonable load factor. The worst case (O(n)) happens when many keys collide into the same slot — covered in the next lesson.
The speed difference between hash-based lookup and linear search is dramatic. Here is a concrete demonstration:
import time
data = list(range(1_000_000)) # A list of 1 million integers
data_dict = {x: True for x in data} # The same data in a dict (hash table)
target = 999_999 # Look for the very last element
# Linear search through the list — O(n)
start = time.time()
result = target in data
list_time = (time.time() - start) * 1000
print(f"List search: {list_time:.2f}ms")
# Hash table lookup in the dict — O(1)
start = time.time()
result = target in data_dict
dict_time = (time.time() - start) * 1000
print(f"Dict lookup: {dict_time:.4f}ms")
print(f"Speedup: ~{list_time / dict_time:.0f}x faster")
Typical Output:
List search: 8.43ms
Dict lookup: 0.0012ms
Speedup: ~7025x faster
The dict does not scan. It computes the location and jumps straight there.
Python Dictionaries — Every Python dict is a hash table. Every key lookup, assignment, and deletion uses hashing internally.
Database Indexes — Hash indexes allow O(1) point lookups in databases. Searching for a user by ID, for example, can be nearly instantaneous with a hash index.
Caching (Memoization) — Storing previously computed results in a dict. The function arguments become the key; the result becomes the value.
Password Hashing — Passwords are never stored in plain text. They are hashed with secure functions like bcrypt or Argon2. On login, the entered password is hashed and compared to the stored hash. The original password is never recoverable (note: this is a different kind of hash function — cryptographic, not just indexing).
Symbol Tables in Compilers — Variable names and function names are stored in hash tables so the compiler can look up their types and locations in O(1).
A hash function transforms a key into an array index, and a hash table combines that function with an array to achieve O(1) average-case lookups. This is one of the most powerful ideas in computer science — the difference between searching and computing. Python's dict is a world-class hash table implementation, and understanding how it works explains why Python developers reach for dicts so naturally and so often.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises