A social network stores millions of users, and every user has a different combination of profile fields, privacy settings, and connected friends. A product catalog has phones with specs like "battery capacity" and shoes with specs like "heel height" — completely different attributes. A gaming leaderboard needs to update 50,000 scores per second.

Try to model all three in a relational schema. The phone-shoe problem means either dozens of NULL columns or a complex EAV (Entity-Attribute-Value) anti-pattern. The social graph means recursive self-joins that bring PostgreSQL to its knees. The leaderboard means locking and write contention that kills response times.

Not all data fits in tables. NoSQL databases were built for the shapes that relational models handle poorly.

Why NoSQL Emerged: RDBMS Limitations at Scale

Limitation	Relational Approach	The Problem
Horizontal scaling	Vertical scaling (bigger server)	One machine has physical limits
Schema flexibility	Rigid schema, ALTER TABLE needed	Product catalogs, user profiles vary by row
Write throughput	ACID transactions with locks	Social media, IoT need millions of writes/sec
Graph traversal	Recursive SQL joins	6 degrees of separation = catastrophic query
Semi-structured data	Normalize into many tables	JSON APIs, documents need direct storage

Document Stores — MongoDB

Model: Collections of JSON-like documents. Each document can have its own schema.

// products collection
{
  "_id": "prod_001",
  "name": "iPhone 15 Pro",
  "category": "phone",
  "specs": {
    "battery_mah": 3274,
    "chip": "A17 Pro",
    "storage_options": [128, 256, 512, 1024]
  },
  "price": 999.00,
  "tags": ["apple", "smartphone", "5G"]
}

{
  "_id": "prod_002",
  "name": "Air Max 90",
  "category": "shoe",
  "specs": {
    "heel_height_mm": 32,
    "material": "leather/mesh",
    "sizes": [7, 8, 9, 10, 11, 12]
  },
  "price": 110.00
}

Query example:

// Find all phones under $800 with storage over 256GB
db.products.find({
  category: "phone",
  price: { $lt: 800 },
  "specs.storage_options": { $gt: 256 }
})

Best use cases: Product catalogs, content management systems (CMS), user profiles, event logging, mobile app backends.

Key-Value Stores — Redis

Model: A giant hash map. Every value is stored and retrieved by a unique key. Values can be strings, lists, sets, hashes, sorted sets.

SET session:user_4821  '{"user_id":4821,"role":"admin","expires":1720000000}'
GET session:user_4821
→ '{"user_id":4821,"role":"admin","expires":1720000000}'

SETEX rate_limit:ip_192.168.1.1  60  "42"     -- expires in 60 seconds
INCR  rate_limit:ip_192.168.1.1               -- atomic increment

ZADD  leaderboard  98500 "player_alice"
ZADD  leaderboard  87200 "player_bob"
ZREVRANGE leaderboard 0 9 WITHSCORES          -- Top 10 players
→ 1) "player_alice"  2) "98500"
   3) "player_bob"   4) "87200"

Speed: Redis operates entirely in RAM — typical operations complete in under 1 millisecond.

Best use cases: Session storage, caching (cache-aside pattern), rate limiting, real-time leaderboards, pub/sub messaging queues.

Column Stores — Apache Cassandra

Model: Data is organized by rows with dynamic columns. Rows are grouped into partitions identified by a partition key, optimized for fast writes and time-series reads.

-- Cassandra CQL (similar to SQL)
CREATE TABLE sensor_readings (
    device_id    UUID,
    recorded_at  TIMESTAMP,
    temperature  FLOAT,
    humidity     FLOAT,
    PRIMARY KEY (device_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC);

-- Write millions of readings per second
INSERT INTO sensor_readings (device_id, recorded_at, temperature, humidity)
VALUES (uuid(), toTimestamp(now()), 22.4, 61.2);

-- Fast range read for one device
SELECT * FROM sensor_readings
WHERE device_id = 550e8400-e29b-41d4-a716-446655440000
  AND recorded_at >= '2024-01-01'
  AND recorded_at <  '2024-02-01';

Cassandra distributes data across a ring of nodes. There is no single point of failure. Writes go to multiple nodes simultaneously for fault tolerance.

Best use cases: IoT sensor data, time-series metrics, audit logs, messaging systems, analytics at massive scale (Netflix, Apple, Instagram use Cassandra).

Graph Databases — Neo4j

Model: Data is stored as nodes (entities) and edges (relationships), each with properties. Relationships are first-class citizens, not foreign keys.

Nodes:  (Alice:Person), (Bob:Person), (Python:Language), (DataCo:Company)
Edges:  Alice -[FRIENDS_WITH]-> Bob
        Alice -[KNOWS]-> Python
        Bob   -[WORKS_AT]-> DataCo
        DataCo-[USES]-> Python

Cypher Query — Find Alice's friends who work at companies using Python:

MATCH (alice:Person {name: "Alice"})
      -[:FRIENDS_WITH]->(friend:Person)
      -[:WORKS_AT]->(company:Company)
      -[:USES]->(lang:Language {name: "Python"})
RETURN friend.name, company.name

-- Output:
-- friend.name  | company.name
-- -------------|-------------
-- Bob          | DataCo

The same query in SQL requires 4 self-joins with recursive traversal — orders of magnitude slower for deep graph queries.

Best use cases: Social networks, recommendation engines, fraud detection, knowledge graphs, network topology, identity and access management.

CAP Theorem

Eric Brewer's CAP Theorem states that a distributed system can guarantee at most two of these three properties simultaneously:

           Consistency (C)
               /\
              /  \
             /    \
            /      \
           /        \
Availability(A) ---- Partition
                     Tolerance (P)

CA: Traditional RDBMS (single node)    — Not partition tolerant
CP: MongoDB, HBase, Zookeeper          — Sacrifices availability during partitions
AP: Cassandra, CouchDB, DynamoDB       — Sacrifices consistency (eventual)

In a distributed system, network partitions will happen. You must choose between Consistency and Availability when they do.

BASE vs ACID

Most NoSQL databases follow BASE semantics instead of ACID:

ACID	BASE
Atomic	Basically Available
Consistent	Soft state
Isolated	Eventually consistent
Durable

Eventual consistency means: if no new writes are made, all replicas will eventually converge to the same value. You might read stale data immediately after a write — but you will read the correct data within milliseconds to seconds.

For a social media "like" count, this is perfectly acceptable. For a bank balance, it is not.

Master Comparison Table

Property	RDBMS (PostgreSQL)	Document (MongoDB)	Key-Value (Redis)	Column (Cassandra)	Graph (Neo4j)
Data model	Tables & rows	JSON documents	Key → value	Wide rows/partitions	Nodes & edges
Schema	Rigid (defined)	Flexible	None	Flexible columns	Flexible
ACID support	Full	Document-level	Limited	Tunable	Full
Scaling	Vertical (mainly)	Horizontal	Horizontal	Horizontal	Vertical (mainly)
Query language	SQL	MQL / aggregation	Commands	CQL (SQL-like)	Cypher
Best for	Transactions, reports	Content, catalogs	Cache, sessions	Time-series, IoT	Connected data
Consistency	Strong	Configurable	Strong (single)	Eventual	Strong
Joins	Excellent	Limited ($lookup)	None	None	Native (traversal)

Key Takeaways

NoSQL does not mean "no SQL" — it means "Not Only SQL." Many NoSQL databases have query languages. It means non-relational.
Document stores (MongoDB) shine for flexible schemas and nested data.
Key-value stores (Redis) are unmatched for speed — use them for caching and sessions.
Column stores (Cassandra) handle massive write throughput and time-series data.
Graph databases (Neo4j) are purpose-built for relationships — social networks, recommendations, fraud detection.
CAP Theorem forces a tradeoff in distributed systems — understand whether your use case needs consistency or availability during a network partition.
BASE (eventual consistency) is acceptable for many use cases but never for financial transactions.

Choosing a database is choosing a tradeoff. The best database is the one whose tradeoffs align with your application's actual requirements — not the most popular one.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

30 minLesson 15 of 16

Course Contents(16 lessons)

▾

Chapter 1: Database Fundamentals

What Is a Database? DBMS vs File Systems20 min

ER Diagrams: Designing Databases Visually35 min

The Relational Model: Tables, Keys, and Constraints28 min

Chapter 2: SQL

SQL DDL: Creating Tables, Constraints, and Schemas28 min

SQL DML: SELECT, INSERT, UPDATE, DELETE35 min

SQL JOINs: Combining Tables38 min

SQL Advanced: Subqueries, Aggregates, Window Functions40 min

Chapter 3: Normalization

Functional Dependencies, 1NF, and 2NF32 min

3NF, BCNF: Removing All Redundancy30 min

Chapter 4: Transactions and Concurrency

Transactions and ACID Properties32 min

Concurrency Control: Locks, Isolation Levels30 min

Deadlocks: Detection, Prevention, and Recovery28 min

Chapter 5: Indexing and Optimization

Indexes and B-Trees: Making Queries Fast32 min

Query Optimization and Execution Plans30 min

Chapter 6: NoSQL and Final Project

NoSQL Databases: Types, Use Cases, Tradeoffs30 min

Final Project: Design a Complete Database System45 min