AiTechWorlds
AiTechWorlds
On October 6, 2010, Instagram launched on the App Store. By the end of the first day, the app had one million registered users. By December 2010, it had 1 million more every few weeks. By 2012, when Facebook acquired Instagram for $1 billion, it had 30 million users — managed by a team of just 13 engineers.
How? Instagram's co-founder Mike Krieger later wrote about the decisions that made it possible: Postgres for data, Redis for caching, Gzip for compression, and — critically — a deliberate architecture built around horizontal scaling from day one. They chose tools that could scale out (add more machines) rather than up (buy a bigger machine). When traffic spiked, they added servers, not supercomputers.
This is the central decision in scalability planning.
Vertical scaling means making a single machine more powerful: adding CPU cores, RAM, faster storage, or faster network cards.
Strengths:
Limits and weaknesses:
The largest available Amazon EC2 instance (as of 2024) is the x2idn.32xlarge: 128 vCPUs and 2 TB of RAM. That sounds enormous — but Facebook runs on tens of thousands of servers. No single machine, however powerful, can handle their load.
| Problem | Description |
|---|---|
| Hard ceiling | The largest machine in existence sets your maximum scale |
| Single point of failure | One machine down means total outage |
| Diminishing returns | Cost doubles but performance does not |
| Maintenance window | Upgrading hardware requires downtime |
Vertical scaling is appropriate for small-to-medium scale, monolithic applications, or when you are buying time before a more fundamental redesign.
Horizontal scaling means adding more machines of the same type. 10 servers → 20 servers → 200 servers. In theory, unlimited scale.
Requirements for effective horizontal scaling:
Netflix scaled from serving DVDs by mail to streaming to 238 million subscribers across 190 countries. They run on AWS with thousands of EC2 instances in multiple regions — the definition of horizontal scaling at extreme scale.
A load balancer sits in front of a pool of servers and distributes incoming requests so no single server becomes a bottleneck.
Layer 4 (Transport Layer): Routes based on IP address and TCP/UDP port. Does not inspect the content of the request. Very fast, low overhead.
Layer 7 (Application Layer): Reads the HTTP request — URL, headers, cookies, body content. Can route /api/videos to video servers and /api/users to user servers.
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Requests cycle through servers in order | Equal-capacity servers, stateless services |
| Weighted Round Robin | Heavier requests go to more powerful servers | Mixed server capacities |
| Least Connections | New request goes to server with fewest active connections | Requests with variable processing time |
| IP Hash | Hash of client IP always maps to same server | Session affinity (stateful apps) |
| Least Response Time | Routes to server with lowest average response time | Latency-sensitive services |
Load balancers continuously probe servers: "Are you alive?"
GET /health HTTP/1.1
Host: app-server-3.internal
If a server fails to respond within a threshold (typically 3 consecutive failures), the load balancer removes it from rotation. Failed servers re-enter when health checks pass again. This provides automatic recovery from server crashes with zero manual intervention.
Real products: AWS Application Load Balancer (Layer 7), AWS Network Load Balancer (Layer 4), Nginx (both), HAProxy, Google Cloud Load Balancing.
If a user's session data is stored in memory on Server 1, and their next request routes to Server 2, Server 2 has no idea who they are. You cannot horizontally scale a stateful service without routing every user to the same server.
A stateless service holds no user-specific data between requests. Every request carries all the information needed to process it.
JWT (JSON Web Token): The server encodes the user's identity and permissions into a signed token, sent to the client at login. Every subsequent request includes the token. Any server can verify the signature and know who the user is — no central session store needed.
Header: { alg: HS256, typ: JWT }
Payload: { user_id: 12345, role: "admin", exp: 1735689600 }
Signature: HMACSHA256(base64(header) + "." + base64(payload), secret)
Redis Session Store: When stateless tokens are not feasible, store sessions in a shared Redis instance. All servers look up sessions in the same place — the session is no longer tied to a specific server.
Manual scaling (logging in and launching new servers) is too slow and too error-prone for modern traffic patterns. Auto-scaling monitors metrics and automatically adjusts capacity.
AWS Auto Scaling Groups: Define minimum, desired, and maximum instance counts. When CPU > 70% for 2 minutes, launch a new instance. When CPU < 20%, terminate an instance.
Kubernetes Horizontal Pod Autoscaler (HPA): Monitors CPU, memory, or custom metrics. Scales the number of pods running your container image up or down. Netflix, Uber, and Airbnb run their services on Kubernetes.
Application servers are relatively easy to scale horizontally (add more, use load balancer). Databases are harder — they hold persistent state that must stay consistent.
Read Replicas: The primary database handles writes; replica databases replicate data and handle reads. Twitter's timeline reads went to replicas — only tweet creation hit the primary. Instagram ran 12 PostgreSQL replicas at the time of acquisition.
Sharding (Horizontal Partitioning): Split the data across multiple databases. Users with IDs 1–1M go to DB-shard-1; 1M–2M go to DB-shard-2. No single database holds all the data.
CQRS (Command Query Responsibility Segregation): Maintain a separate write model (optimised for writes) and read model (optimised for reads). The read model may be a denormalised cache or a separate database.
| Strategy | Max Scale | Complexity | Cost Efficiency | When to Use | Real Example |
|---|---|---|---|---|---|
| Vertical scaling | Hardware ceiling (~2TB RAM) | Low | Poor at extremes | Early-stage, quick wins, databases | RDS db.r6g.16xlarge |
| Horizontal scaling (app) | Near-unlimited | Medium | Good | Stateless services, API servers | Netflix EC2 fleet |
| Read replicas | 5–10× read throughput | Low | Good | Read-heavy workloads | Instagram Postgres replicas |
| Sharding | Near-unlimited data volume | High | Good | Write-heavy, massive datasets | WhatsApp, Uber user data |
| Caching (Redis) | Reduces DB load 90%+ | Low | Excellent | Repeated reads of same data | Twitter timeline cache |
| CDN | Global edge capacity | Very low | Excellent | Static assets, media files | Netflix video via Akamai |
| Auto-scaling | Elastic within account limits | Medium | Excellent (pay-per-use) | Variable/spiky traffic | Black Friday e-commerce |
Netflix's architecture evolution is one of the most documented in the industry:
The key lesson: Netflix's ability to scale did not come from buying bigger machines. It came from architectural decisions — stateless services, horizontal scaling, read replicas, CDN caching, and a culture of resilience engineering.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises