A boutique hotel has 100 rooms. It's fully booked every night. But the hotel's marketing team sells to 500 guests simultaneously — because they know that on any given night, not all 500 guests will actually show up.

The hotel has perfected the art of allocation on demand. A room exists for you the moment you check in, and returns to the pool the moment you check out. If more guests arrive than rooms are available, the hotel makes arrangements — putting some guests in nearby partner hotels temporarily.

This is virtual memory. Every process running on your computer believes it has exclusive access to a massive, private address space — 128 TB on a modern x86-64 system. But your laptop may only have 16 GB of physical RAM. The operating system, working with the CPU's memory management hardware, creates this illusion through a mechanism called paging.

Understanding virtual memory is essential for understanding how modern operating systems achieve isolation, multitasking, and security.

Virtual vs Physical Address Space

When your program accesses memory at address 0x00007F3A40000000, that is a virtual address — a fiction that exists only within your process's private view of memory. The CPU's Memory Management Unit (MMU) translates it to a physical address — the actual location in RAM chips.

Key insight: Two different processes can have data at virtual address 0x1000. They don't conflict, because they map to different physical addresses.

Pages and Frames: The Unit of Virtual Memory

Virtual memory is managed in fixed-size chunks:

Page: a fixed-size block of virtual address space
Frame: a fixed-size block of physical memory (same size as a page)
Standard page size: 4 KB (4,096 bytes) on x86-64 and ARM64
Huge pages: 2 MB or 1 GB — for large allocations with fewer TLB entries needed

Every page is either:

Resident: mapped to a physical frame in RAM
Non-resident: mapped to disk (swap file/partition), or not yet allocated

The Page Table: Memory's Translation Dictionary

The page table is the data structure that maps virtual pages to physical frames. It lives in physical RAM and is managed by the OS.

Each page table entry (PTE) contains:

Physical Frame Number (PFN): where in RAM this page lives
Valid/Present bit: is this page currently in RAM?
Dirty bit: has this page been written to?
Accessed bit: has this page been read recently?
Permission bits: readable? writable? executable? (NX/XD bit)
User/Supervisor bit: can user-mode code access this page?

Page Table Size Problem

For a 32-bit address space (4 GB) with 4 KB pages:

Number of pages = 4 GB / 4 KB = 1,048,576 entries
At 4 bytes per entry = 4 MB per process

With 100 processes: 400 MB just for page tables. And for 64-bit address spaces (128 TB), a flat page table would require 32 TB of RAM per process — obviously impossible.

Multi-Level Page Tables: The Hierarchical Solution

The solution: use a tree of page tables. Only allocate the parts of the page table that are actually mapped. Most of a process's huge virtual address space is unmapped — no need to allocate page table entries for it.

Linux on x86-64: 4-Level Page Table

48-bit virtual address breakdown on x86-64:

Bits	Width	Purpose
[47:39]	9 bits	PGD index (512 entries)
[38:30]	9 bits	PUD index (512 entries)
[29:21]	9 bits	PMD index (512 entries)
[20:12]	9 bits	PTE index (512 entries)
[11:0]	12 bits	Page offset within 4 KB frame

Total: 9+9+9+9+12 = 48 bits → 256 TB addressable space (128 TB user + 128 TB kernel)

Why 48 bits and not 64? Modern chips only implement 48–57 bit virtual addresses (AMD's 5-level paging extends to 57 bits). A full 64-bit virtual space would require 5–6 page table levels and huge TLBs — unnecessary given current memory densities.

The TLB: Caching Page Table Lookups

A 4-level page table walk requires 4 separate memory accesses to resolve one virtual address. At 100ns per access, that's 400ns of overhead — for every single memory operation. This would make virtual memory catastrophically slow.

The solution is the Translation Lookaside Buffer (TLB) — a small, fully associative cache of recent virtual-to-physical translations.

TLB hit: translation found in TLB → ~1 extra cycle
TLB miss: hardware page table walker performs 4-memory-access walk → ~dozens of cycles
TLB size: L1 TLB typically 32–64 entries (data), 64–128 entries (instruction); L2 TLB 1,024–1,536 entries

Concept	x86-64 (Intel)	ARM64	Performance Impact
Page size	4 KB (standard), 2 MB, 1 GB	4 KB, 16 KB, 64 KB	Larger pages = fewer TLB entries needed
Page table levels	4 (PML4) or 5 (PML5)	4 (VA48) or 5 (VA57)	More levels = deeper walk on miss
L1 ITLB size	128 entries (Sapphire Rapids)	48 entries (Cortex-X4)	Larger TLB = fewer misses
L1 DTLB size	96 entries (Sapphire Rapids)	48 entries (Cortex-X4)	—
L2 TLB size	2,048 entries (Sapphire Rapids)	1,536 entries (Cortex-X4)	Second-level hit ~10 cycles
TLB shootdown	Inter-processor interrupt required	Broadcast invalidation	Expensive in multicore

TLB shootdown: When the OS changes a page mapping, it must invalidate the TLB entries in all CPU cores that might have cached that mapping. This requires an inter-processor interrupt (IPI), which is expensive — a significant cost of context switching and memory remapping in multicore systems.

Page Faults: When the Page Isn't in RAM

When the CPU tries to access a virtual address and the PTE's Present bit is 0, it triggers a page fault exception. The OS's page fault handler takes control.

Minor (soft) page fault: The page exists in memory but isn't mapped in the page table yet.

Example: Copy-on-write (COW) after fork() — parent and child share physical pages until one writes
Resolution: OS maps the page → return to user code. No disk I/O. Takes ~microseconds.

Major (hard) page fault: The page is on disk (in the swap file/partition) and must be loaded.

Resolution: OS suspends the process, reads the page from disk into a free frame, updates the PTE, resumes the process. Disk I/O involved. Takes milliseconds — thousands of times slower than a TLB hit.

Working Set and Thrashing

The working set of a process is the set of pages it actively uses during a given time window. If the working set fits in RAM, the process runs fast. If it doesn't:

Thrashing occurs when the system spends more time swapping pages to and from disk than actually executing code. Symptoms: 100% disk activity, frozen system, terrible performance.

Prevention strategies:

Working Set Model: only run processes whose working sets fit in available RAM
Page replacement algorithms: LRU (Least Recently Used), CLOCK (approximation), WSClock
OOM killer: Linux's Out-of-Memory Killer terminates processes when RAM is truly exhausted

Address Space Layout Randomization (ASLR)

Virtual memory enables a critical security feature: ASLR randomizes the base addresses of the stack, heap, and library mappings on each execution. An attacker who knows a buffer overflow exists in your program can't reliably redirect execution to a specific address — because that address changes every time.

Enabled by default on Linux, Windows, macOS since 2004–2007
Entropy: 28–40 bits of randomization on modern systems
Partially defeated by information leaks; complemented by SMEP/SMAP (Supervisor Mode Execution/Access Prevention)

Summary

Virtual memory transforms physical RAM into a flexible, multi-process resource through paging: fixed-size mappings between virtual pages and physical frames. The multi-level page table (4 levels on x86-64) makes this practical without consuming terabytes of memory for page tables. The TLB makes translation fast — reducing the ~400ns 4-level walk to a ~1ns cached lookup for the common case. Page faults allow the OS to transparently use disk as backing store, creating the illusion of more RAM than physically exists.

The same mechanism that enables memory isolation between processes, lazy allocation, copy-on-write, and swap also enables ASLR — making virtual memory one of the most security-critical components of modern operating system design.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

32 minLesson 12 of 16

Course Contents(16 lessons)

▾

Chapter 1: Foundations

What Is Computer Architecture? Von Neumann vs Harvard20 min

Number Systems: Binary, Octal, Hexadecimal28 min

Data Representation: Integers, Floats, and IEEE 75430 min

Chapter 2: Digital Logic

Boolean Algebra and Logic Gates32 min

Combinational Circuits: Adders, Multiplexers, Decoders28 min

Sequential Circuits: Flip-Flops, Registers, Counters30 min

Chapter 3: CPU Architecture

ALU, Registers, and the Datapath32 min

Instruction Set Architecture: RISC vs CISC35 min

CPU Pipeline: The 5-Stage Execution Engine35 min

Pipeline Hazards and Modern Solutions30 min

Chapter 4: Memory Systems

Cache Memory: Mapping, Associativity, Replacement35 min

Virtual Memory, Page Tables, and TLB32 min

Chapter 5: I/O and Advanced Topics

I/O Systems, Interrupts, and DMA28 min

Parallel Processing: Multicore and Flynn's Taxonomy30 min

Modern CPU Architectures: ARM, x86-64, Apple Silicon28 min

Chapter 6: Final Project

Final Project: Analyze and Compare CPU Architectures45 min