On October 6, 2010, Instagram launched on the App Store. By the end of the first day, the app had one million registered users. By December 2010, it had 1 million more every few weeks. By 2012, when Facebook acquired Instagram for $1 billion, it had 30 million users — managed by a team of just 13 engineers.

How? Instagram's co-founder Mike Krieger later wrote about the decisions that made it possible: Postgres for data, Redis for caching, Gzip for compression, and — critically — a deliberate architecture built around horizontal scaling from day one. They chose tools that could scale out (add more machines) rather than up (buy a bigger machine). When traffic spiked, they added servers, not supercomputers.

This is the central decision in scalability planning.

Vertical Scaling (Scale Up)

Vertical scaling means making a single machine more powerful: adding CPU cores, RAM, faster storage, or faster network cards.

Strengths:

No application changes required — the software runs on one machine, just a bigger one
No distributed systems complexity
Transactions remain simple — no cross-machine coordination

Limits and weaknesses:

The largest available Amazon EC2 instance (as of 2024) is the x2idn.32xlarge: 128 vCPUs and 2 TB of RAM. That sounds enormous — but Facebook runs on tens of thousands of servers. No single machine, however powerful, can handle their load.

Problem	Description
Hard ceiling	The largest machine in existence sets your maximum scale
Single point of failure	One machine down means total outage
Diminishing returns	Cost doubles but performance does not
Maintenance window	Upgrading hardware requires downtime

Vertical scaling is appropriate for small-to-medium scale, monolithic applications, or when you are buying time before a more fundamental redesign.

Horizontal Scaling (Scale Out)

Horizontal scaling means adding more machines of the same type. 10 servers → 20 servers → 200 servers. In theory, unlimited scale.

Requirements for effective horizontal scaling:

Stateless services: any server must be able to handle any request — no server-local session state
Data partitioning: data must be distributed across databases, not stored on one giant server
Coordination: a load balancer must distribute traffic; services must discover each other

Netflix scaled from serving DVDs by mail to streaming to 238 million subscribers across 190 countries. They run on AWS with thousands of EC2 instances in multiple regions — the definition of horizontal scaling at extreme scale.

Load Balancers

A load balancer sits in front of a pool of servers and distributes incoming requests so no single server becomes a bottleneck.

Layer 4 vs Layer 7

Layer 4 (Transport Layer): Routes based on IP address and TCP/UDP port. Does not inspect the content of the request. Very fast, low overhead.

Use when: You need maximum throughput, protocol is not HTTP

Layer 7 (Application Layer): Reads the HTTP request — URL, headers, cookies, body content. Can route /api/videos to video servers and /api/users to user servers.

Use when: You need content-based routing, A/B testing, SSL termination, rate limiting

Load Balancing Algorithms

Algorithm	How It Works	Best For
Round Robin	Requests cycle through servers in order	Equal-capacity servers, stateless services
Weighted Round Robin	Heavier requests go to more powerful servers	Mixed server capacities
Least Connections	New request goes to server with fewest active connections	Requests with variable processing time
IP Hash	Hash of client IP always maps to same server	Session affinity (stateful apps)
Least Response Time	Routes to server with lowest average response time	Latency-sensitive services

Health Checks

Load balancers continuously probe servers: "Are you alive?"

GET /health HTTP/1.1
Host: app-server-3.internal

If a server fails to respond within a threshold (typically 3 consecutive failures), the load balancer removes it from rotation. Failed servers re-enter when health checks pass again. This provides automatic recovery from server crashes with zero manual intervention.

Real products: AWS Application Load Balancer (Layer 7), AWS Network Load Balancer (Layer 4), Nginx (both), HAProxy, Google Cloud Load Balancing.

Stateless vs Stateful Services

The Problem With State

If a user's session data is stored in memory on Server 1, and their next request routes to Server 2, Server 2 has no idea who they are. You cannot horizontally scale a stateful service without routing every user to the same server.

Stateless Services (Preferred)

A stateless service holds no user-specific data between requests. Every request carries all the information needed to process it.

JWT (JSON Web Token): The server encodes the user's identity and permissions into a signed token, sent to the client at login. Every subsequent request includes the token. Any server can verify the signature and know who the user is — no central session store needed.

Header: { alg: HS256, typ: JWT }
Payload: { user_id: 12345, role: "admin", exp: 1735689600 }
Signature: HMACSHA256(base64(header) + "." + base64(payload), secret)

Redis Session Store: When stateless tokens are not feasible, store sessions in a shared Redis instance. All servers look up sessions in the same place — the session is no longer tied to a specific server.

Auto-Scaling

Manual scaling (logging in and launching new servers) is too slow and too error-prone for modern traffic patterns. Auto-scaling monitors metrics and automatically adjusts capacity.

AWS Auto Scaling Groups: Define minimum, desired, and maximum instance counts. When CPU > 70% for 2 minutes, launch a new instance. When CPU < 20%, terminate an instance.

Kubernetes Horizontal Pod Autoscaler (HPA): Monitors CPU, memory, or custom metrics. Scales the number of pods running your container image up or down. Netflix, Uber, and Airbnb run their services on Kubernetes.

Database Scaling

Application servers are relatively easy to scale horizontally (add more, use load balancer). Databases are harder — they hold persistent state that must stay consistent.

Read Replicas: The primary database handles writes; replica databases replicate data and handle reads. Twitter's timeline reads went to replicas — only tweet creation hit the primary. Instagram ran 12 PostgreSQL replicas at the time of acquisition.

Sharding (Horizontal Partitioning): Split the data across multiple databases. Users with IDs 1–1M go to DB-shard-1; 1M–2M go to DB-shard-2. No single database holds all the data.

Range sharding: by ID range (easy, but creates hotspots if recent data is accessed most)
Hash sharding: hash the key to assign a shard (even distribution, harder to range-query)

CQRS (Command Query Responsibility Segregation): Maintain a separate write model (optimised for writes) and read model (optimised for reads). The read model may be a denormalised cache or a separate database.

Comparison: Scaling Strategies

Strategy	Max Scale	Complexity	Cost Efficiency	When to Use	Real Example
Vertical scaling	Hardware ceiling (~2TB RAM)	Low	Poor at extremes	Early-stage, quick wins, databases	RDS db.r6g.16xlarge
Horizontal scaling (app)	Near-unlimited	Medium	Good	Stateless services, API servers	Netflix EC2 fleet
Read replicas	5–10× read throughput	Low	Good	Read-heavy workloads	Instagram Postgres replicas
Sharding	Near-unlimited data volume	High	Good	Write-heavy, massive datasets	WhatsApp, Uber user data
Caching (Redis)	Reduces DB load 90%+	Low	Excellent	Repeated reads of same data	Twitter timeline cache
CDN	Global edge capacity	Very low	Excellent	Static assets, media files	Netflix video via Akamai
Auto-scaling	Elastic within account limits	Medium	Excellent (pay-per-use)	Variable/spiky traffic	Black Friday e-commerce

Netflix's Scaling Journey

Netflix's architecture evolution is one of the most documented in the industry:

2007: Monolithic application on a single Oracle database — a corrupted database halted DVD shipments for 3 days.
2009: Began migrating to AWS, moving from monolith to microservices — hundreds of independent services, each horizontally scalable.
2011: Introduced Chaos Monkey — a tool that randomly kills production servers to ensure the system survives failures. Resilience by design.
2015: Full migration to AWS complete. 51 million subscribers. Entire platform runs on horizontally scaled microservices.
2024: 238 million subscribers. Uses AWS across multiple regions. Video delivery through Akamai and Open Connect (Netflix's own CDN). Deploys code hundreds of times per day thanks to horizontal scaling and CI/CD pipelines.

The key lesson: Netflix's ability to scale did not come from buying bigger machines. It came from architectural decisions — stateless services, horizontal scaling, read replicas, CDN caching, and a culture of resilience engineering.

Key Takeaways

Vertical scaling is simple but has a hard ceiling and creates a single point of failure.
Horizontal scaling is theoretically unlimited but requires stateless services, data partitioning, and a load balancer.
Load balancers distribute traffic, perform health checks, and remove failed servers automatically — Layer 7 load balancers additionally route by URL and headers.
Stateless services are the prerequisite for horizontal scaling — use JWT tokens or a shared Redis session store.
Auto-scaling (AWS ASG, Kubernetes HPA) adjusts capacity automatically based on real-time metrics.
Database scaling options — read replicas, sharding, caching, CQRS — each address a different bottleneck.
Netflix's evolution from a monolithic DVD service to 238M subscribers is the canonical case study: every major scaling challenge has a documented architectural solution.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

32 minLesson 9 of 18

Course Contents(18 lessons)

▾

Chapter 1: SE Foundations

What Is Software Engineering? The Discipline Explained20 min

SDLC Models: Waterfall, Agile, Spiral, V-Model25 min

Agile, Scrum, and Kanban: How Teams Actually Work28 min

Requirements Engineering: User Stories to Specifications28 min

Chapter 2: Design Principles and Patterns

Software Design Principles: SOLID, DRY, KISS, YAGNI30 min

Creational Design Patterns: Singleton, Factory, Builder32 min

Structural and Behavioral Patterns: Decorator, Observer, Strategy35 min

Chapter 3: System Design Fundamentals

System Design Fundamentals: Approach and Trade-offs30 min

Scalability: Vertical, Horizontal, Load Balancing32 min

Caching Strategies: Redis, CDN, Cache Invalidation28 min

Databases in System Design: SQL vs NoSQL Trade-offs32 min

Chapter 4: Architecture Patterns

Microservices Architecture: Design and Communication35 min

API Design: REST, GraphQL, and gRPC30 min

Message Queues and Event-Driven Architecture28 min

Chapter 5: Quality and Delivery

Software Testing: Unit, Integration, E2E, TDD28 min

Security in Software Engineering: OWASP Top 1028 min

DevOps and CI/CD: From Code to Production30 min

Chapter 6: Final Project

Final Project: Design a URL Shortener at Scale45 min