The year is 2003. Brad Fitzpatrick, the 23-year-old founder of LiveJournal, is watching his database servers buckle under load. LiveJournal had grown to millions of users posting blog entries, commenting, and refreshing pages — and every single page load hit the database. The MySQL servers were maxing out their CPU, queries were timing out, and users were seeing blank pages.

Fitzpatrick did not buy bigger servers. Instead, he built Memcached — a simple in-memory key-value store that sat in front of the database and answered the same question repeatedly without touching the database at all. The result: database load dropped by 95% overnight.

That insight — serve the same data from memory instead of re-reading it from disk — is now fundamental to how every high-traffic website on the planet operates. Understanding caching is understanding why systems can scale.

Why Caching Exists

Every database read involves disk I/O, query parsing, index lookups, and network round trips. For popular content read thousands of times per second, repeating that work is wasteful.

Caching solves three interconnected problems:

Latency: Memory reads take ~100 nanoseconds. SSD reads take ~100 microseconds. Network database reads can take 1–20 milliseconds. Caching cuts latency by 100–10,000×.
Database load: A single cache server can absorb millions of reads that would otherwise hit the database. This protects the database during traffic spikes.
Scalability: Stateless application servers can scale horizontally behind a shared cache, all reading from the same fast data layer.

The principle is simple: if data is expensive to compute or fetch, and the same data is requested repeatedly, remember the answer.

Cache-Aside (Lazy Loading)

The most common caching pattern. The application manages the cache directly.

How it works:

Application checks the cache for data.
On a cache hit, return immediately — database untouched.
On a cache miss, fetch from the database, store in cache, then return.

Pros: Only data that is actually requested gets cached (no wasted memory). Cache failures are non-fatal — the app falls back to the database.

Cons: The first request after a cache miss always pays the full database cost. On a fresh cache restart, every request is a miss — this is called a cold start.

Used by: Amazon product pages, Twitter timelines, most web applications.

Write-Through

Every write goes to the cache and the database simultaneously. The cache is always up to date.

Pros: No stale data in cache. Reads are always fast.

Cons: Every write is slower because it must complete two operations. Cache fills with data that may never be read.

Used by: Banking dashboards, inventory systems — anywhere stale reads are unacceptable.

Write-Back (Write-Behind)

Writes go to the cache immediately, but the database write is deferred and handled asynchronously in the background.

Pros: Extremely fast writes — the application does not wait for the database. Good for high-write workloads like gaming leaderboards.

Cons: If the cache crashes before the async write completes, data is lost. Not suitable for financial or transactional data.

Read-Through

The cache sits transparently in front of the database. The application only talks to the cache. On a miss, the cache fetches from the database and populates itself.

Pros: Simpler application code — one data source to manage.

Cons: First-request latency. Less control over what gets cached.

Redis: The Swiss Army Knife of Caching

Redis (Remote Dictionary Server) was created by Salvatore Sanfilippo in 2009 and is now the most widely deployed cache in the world. Unlike Memcached, Redis is far more than a key-value store.

Data Structure	Example Use Case
String	Session tokens, counters, API responses
List	Activity feeds, job queues
Hash	User profile objects
Set	Unique visitors, tag collections
Sorted Set	Leaderboards, rate limiting windows
Stream	Real-time event logs

Redis operates entirely in memory and delivers 100,000+ read/write operations per second on a single node. Key features:

TTL (Time-to-Live): Set any key to expire automatically. SET user:42 "data" EX 3600 expires in one hour.
Persistence options: RDB takes periodic snapshots of the dataset to disk. AOF (Append-Only File) logs every write command for full replay recovery.
Replication: Primary-replica setup for high availability.
Cluster mode: Automatically shards data across multiple nodes for horizontal scale.

Memcached: The Original

Fitzpatrick's Memcached remains relevant for one specific use case: pure, high-throughput key-value caching with multiple CPU cores. It is multi-threaded (Redis was historically single-threaded, though Redis 6.0 added I/O threading), has no persistence, and supports no data structures beyond strings. If you need the absolute maximum throughput for simple caches and do not need any of Redis's advanced features, Memcached remains competitive.

CDN: Caching at the Edge

A Content Delivery Network is a geographically distributed network of cache servers — called edge nodes — placed close to users worldwide.

When a user in Tokyo requests a video thumbnail, the edge node in Tokyo serves it directly from its local cache — not from a server in Virginia. Round-trip time drops from ~200ms to ~5ms.

Major CDN providers: CloudFront (AWS), Fastly, Cloudflare, Akamai.

CDNs cache static assets: images, CSS, JavaScript files, fonts, and videos. They can also cache entire HTML pages for anonymous users.

Netflix is the canonical CDN success story. Netflix built its own CDN called Open Connect and embedded its appliances directly inside ISP networks. Today, Netflix serves roughly 15% of all global internet traffic through this CDN, with most streams never leaving the ISP's own network. A user in London watching Stranger Things is almost certainly streaming from a server inside their ISP's data centre, not from Netflix's cloud.

Cache Invalidation: The Hard Problem

Phil Karlton, a Netscape engineer, famously said: "There are only two hard things in Computer Science: cache invalidation and naming things."

When the underlying data changes, the cache must be updated or the application serves stale data. Three common approaches:

TTL-based expiration: Every cached item has a time-to-live. After expiry, the next request fetches fresh data. Simple, but data can be stale for the full TTL duration.
Event-driven invalidation: When data changes, explicitly delete or update the relevant cache key. Ensures freshness but requires careful coordination between services.
Cache stampede (Thundering Herd): When a popular cached item expires, thousands of requests simultaneously hit the database before the cache is repopulated. Solutions: mutex locking (only one request populates the cache), probabilistic early expiration (proactively refresh before TTL ends), or staggered TTLs.

Cache Eviction Policies

When the cache is full, something must be removed to make room for new data.

Policy	How It Works	Best For
LRU (Least Recently Used)	Evict the item not accessed for the longest time	General-purpose caching
LFU (Least Frequently Used)	Evict the item accessed fewest times overall	Long-lived caches with clear hot/cold data
FIFO (First In, First Out)	Evict the oldest item regardless of access	Simple queues, not general caching
Random	Evict a random item	Surprisingly effective, extremely fast

Redis supports LRU and LFU eviction policies configurable per-instance.

Caching Strategy Comparison

Strategy	Read Performance	Write Performance	Complexity	Data Consistency	Use Case
Cache-Aside	Fast (after warm-up)	Unchanged	Low	Eventual	General web apps, APIs
Write-Through	Fast	Slower	Medium	Strong	Financial dashboards
Write-Back	Fast	Very fast	High	Weak (risk of loss)	Gaming scores, analytics
Read-Through	Fast (after warm-up)	Unchanged	Low	Eventual	ORM-level caching
CDN	Very fast (edge)	Not applicable	Low (managed)	Eventual	Static assets, media

Key Takeaways

Caching is one of the highest-leverage optimisations in system design. Brad Fitzpatrick's 2003 insight still holds: the fastest database query is the one you never make. Modern systems layer caches at every level — in-process memory, distributed cache (Redis), and edge CDN — each absorbing a different class of request.

The discipline is knowing what to cache, how long to cache it, and when to invalidate it. Those three decisions determine whether a cache is a performance multiplier or a source of subtle, hard-to-debug data corruption.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

28 minLesson 10 of 18

Course Contents(18 lessons)

▾

Chapter 1: SE Foundations

What Is Software Engineering? The Discipline Explained20 min

SDLC Models: Waterfall, Agile, Spiral, V-Model25 min

Agile, Scrum, and Kanban: How Teams Actually Work28 min

Requirements Engineering: User Stories to Specifications28 min

Chapter 2: Design Principles and Patterns

Software Design Principles: SOLID, DRY, KISS, YAGNI30 min

Creational Design Patterns: Singleton, Factory, Builder32 min

Structural and Behavioral Patterns: Decorator, Observer, Strategy35 min

Chapter 3: System Design Fundamentals

System Design Fundamentals: Approach and Trade-offs30 min

Scalability: Vertical, Horizontal, Load Balancing32 min

Caching Strategies: Redis, CDN, Cache Invalidation28 min

Databases in System Design: SQL vs NoSQL Trade-offs32 min

Chapter 4: Architecture Patterns

Microservices Architecture: Design and Communication35 min

API Design: REST, GraphQL, and gRPC30 min

Message Queues and Event-Driven Architecture28 min

Chapter 5: Quality and Delivery

Software Testing: Unit, Integration, E2E, TDD28 min

Security in Software Engineering: OWASP Top 1028 min

DevOps and CI/CD: From Code to Production30 min

Chapter 6: Final Project

Final Project: Design a URL Shortener at Scale45 min