AiTechWorlds
AiTechWorlds
A boutique hotel has 100 rooms. It's fully booked every night. But the hotel's marketing team sells to 500 guests simultaneously — because they know that on any given night, not all 500 guests will actually show up.
The hotel has perfected the art of allocation on demand. A room exists for you the moment you check in, and returns to the pool the moment you check out. If more guests arrive than rooms are available, the hotel makes arrangements — putting some guests in nearby partner hotels temporarily.
This is virtual memory. Every process running on your computer believes it has exclusive access to a massive, private address space — 128 TB on a modern x86-64 system. But your laptop may only have 16 GB of physical RAM. The operating system, working with the CPU's memory management hardware, creates this illusion through a mechanism called paging.
Understanding virtual memory is essential for understanding how modern operating systems achieve isolation, multitasking, and security.
When your program accesses memory at address 0x00007F3A40000000, that is a virtual address — a fiction that exists only within your process's private view of memory. The CPU's Memory Management Unit (MMU) translates it to a physical address — the actual location in RAM chips.
Key insight: Two different processes can have data at virtual address 0x1000. They don't conflict, because they map to different physical addresses.
Virtual memory is managed in fixed-size chunks:
Every page is either:
The page table is the data structure that maps virtual pages to physical frames. It lives in physical RAM and is managed by the OS.
Each page table entry (PTE) contains:
For a 32-bit address space (4 GB) with 4 KB pages:
With 100 processes: 400 MB just for page tables. And for 64-bit address spaces (128 TB), a flat page table would require 32 TB of RAM per process — obviously impossible.
The solution: use a tree of page tables. Only allocate the parts of the page table that are actually mapped. Most of a process's huge virtual address space is unmapped — no need to allocate page table entries for it.
48-bit virtual address breakdown on x86-64:
| Bits | Width | Purpose |
|---|---|---|
| [47:39] | 9 bits | PGD index (512 entries) |
| [38:30] | 9 bits | PUD index (512 entries) |
| [29:21] | 9 bits | PMD index (512 entries) |
| [20:12] | 9 bits | PTE index (512 entries) |
| [11:0] | 12 bits | Page offset within 4 KB frame |
Total: 9+9+9+9+12 = 48 bits → 256 TB addressable space (128 TB user + 128 TB kernel)
Why 48 bits and not 64? Modern chips only implement 48–57 bit virtual addresses (AMD's 5-level paging extends to 57 bits). A full 64-bit virtual space would require 5–6 page table levels and huge TLBs — unnecessary given current memory densities.
A 4-level page table walk requires 4 separate memory accesses to resolve one virtual address. At 100ns per access, that's 400ns of overhead — for every single memory operation. This would make virtual memory catastrophically slow.
The solution is the Translation Lookaside Buffer (TLB) — a small, fully associative cache of recent virtual-to-physical translations.
| Concept | x86-64 (Intel) | ARM64 | Performance Impact |
|---|---|---|---|
| Page size | 4 KB (standard), 2 MB, 1 GB | 4 KB, 16 KB, 64 KB | Larger pages = fewer TLB entries needed |
| Page table levels | 4 (PML4) or 5 (PML5) | 4 (VA48) or 5 (VA57) | More levels = deeper walk on miss |
| L1 ITLB size | 128 entries (Sapphire Rapids) | 48 entries (Cortex-X4) | Larger TLB = fewer misses |
| L1 DTLB size | 96 entries (Sapphire Rapids) | 48 entries (Cortex-X4) | — |
| L2 TLB size | 2,048 entries (Sapphire Rapids) | 1,536 entries (Cortex-X4) | Second-level hit ~10 cycles |
| TLB shootdown | Inter-processor interrupt required | Broadcast invalidation | Expensive in multicore |
TLB shootdown: When the OS changes a page mapping, it must invalidate the TLB entries in all CPU cores that might have cached that mapping. This requires an inter-processor interrupt (IPI), which is expensive — a significant cost of context switching and memory remapping in multicore systems.
When the CPU tries to access a virtual address and the PTE's Present bit is 0, it triggers a page fault exception. The OS's page fault handler takes control.
Minor (soft) page fault: The page exists in memory but isn't mapped in the page table yet.
fork() — parent and child share physical pages until one writesMajor (hard) page fault: The page is on disk (in the swap file/partition) and must be loaded.
The working set of a process is the set of pages it actively uses during a given time window. If the working set fits in RAM, the process runs fast. If it doesn't:
Thrashing occurs when the system spends more time swapping pages to and from disk than actually executing code. Symptoms: 100% disk activity, frozen system, terrible performance.
Prevention strategies:
Virtual memory enables a critical security feature: ASLR randomizes the base addresses of the stack, heap, and library mappings on each execution. An attacker who knows a buffer overflow exists in your program can't reliably redirect execution to a specific address — because that address changes every time.
Virtual memory transforms physical RAM into a flexible, multi-process resource through paging: fixed-size mappings between virtual pages and physical frames. The multi-level page table (4 levels on x86-64) makes this practical without consuming terabytes of memory for page tables. The TLB makes translation fast — reducing the ~400ns 4-level walk to a ~1ns cached lookup for the common case. Page faults allow the OS to transparently use disk as backing store, creating the illusion of more RAM than physically exists.
The same mechanism that enables memory isolation between processes, lazy allocation, copy-on-write, and swap also enables ASLR — making virtual memory one of the most security-critical components of modern operating system design.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises