Recommendation Systems Explained: How Netflix and Amazon Know What You Want
Recommendation systems explained — how collaborative filtering, content-based, and hybrid systems work, with Python code to build your own, and how Netflix and Amazon use them.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Recommendation Systems Explained: How Netflix and Amazon Know What You Want
Netflix has 260 million subscribers and 15,000+ titles. If you had to manually browse to find something to watch, you'd spend 20 minutes deciding and possibly give up. Instead, 80% of what Netflix subscribers watch comes from recommendations.
The same pattern plays out across every major platform: 35% of Amazon purchases come from recommendations. 60% of YouTube watch time is recommended content. Spotify's Discover Weekly has 40 million weekly listeners despite no human curation.
Recommendation systems are one of machine learning's most commercially successful applications — genuinely making products better and driving measurable revenue. Understanding how they work reveals both impressive engineering and careful handling of fundamental mathematical challenges.
The Core Problem
A recommendation system estimates: "What is the probability that user u will like item i?"
We can frame this as predicting the rating a user would give an item they haven't interacted with, or as ranking all items by predicted preference and showing the top K.
The data:
User-Item Interaction Matrix:
Item1 Item2 Item3 Item4 Item5
User1: 5 4 - - 2
User2: - 3 5 - 4
User3: 4 - - 3 -
User4: - 5 4 5 -
- = not yet rated (most entries are missing)
The challenge: the matrix is almost entirely empty. Netflix has 260M users × 15K titles = 3.9 billion possible ratings, but users watch and rate a tiny fraction.
Approach 1: Collaborative Filtering
Collaborative filtering finds users with similar taste and recommends what they liked.
User-based collaborative filtering:
"Find users similar to User1. What have they liked that User1 hasn't seen?"
User1 rated: Item1=5, Item2=4, Item5=2
User3 rated: Item1=4, Item3=3
Similarity(User1, User3) = high (both liked Item1)
Recommendation: Show User1 Item3 (User3 liked it, and they have similar taste)
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
# Create user-item matrix (NaN = not rated)
ratings = pd.DataFrame({
'Alice': [5, 4, np.nan, np.nan, 2],
'Bob': [np.nan, 3, 5, np.nan, 4],
'Carol': [4, np.nan, np.nan, 3, np.nan],
'Dave': [np.nan, 5, 4, 5, np.nan],
}, index=['Item1', 'Item2', 'Item3', 'Item4', 'Item5']).T
def user_based_cf(ratings_df, target_user, n_similar=3, n_recommendations=3):
"""User-based collaborative filtering"""
# Fill NaN with 0 for similarity computation
ratings_filled = ratings_df.fillna(0)
# Compute cosine similarity between all users
user_similarity = pd.DataFrame(
cosine_similarity(ratings_filled),
index=ratings_df.index,
columns=ratings_df.index
)
# Get most similar users (excluding self)
similar_users = user_similarity[target_user].drop(target_user).nlargest(n_similar)
# Find items target user hasn't rated
unrated_items = ratings_df.columns[ratings_df.loc[target_user].isna()].tolist()
# Score each unrated item based on similar users' ratings
item_scores = {}
for item in unrated_items:
weighted_sum = 0
sim_sum = 0
for user, similarity in similar_users.items():
if not np.isnan(ratings_df.loc[user, item]):
weighted_sum += similarity * ratings_df.loc[user, item]
sim_sum += abs(similarity)
if sim_sum > 0:
item_scores[item] = weighted_sum / sim_sum
# Return top N recommendations
recommendations = sorted(item_scores.items(), key=lambda x: x[1], reverse=True)
return recommendations[:n_recommendations]
recs = user_based_cf(ratings, 'Alice')
print("Recommendations for Alice:")
for item, score in recs:
print(f" {item}: predicted rating {score:.2f}")
Approach 2: Matrix Factorization (SVD)
Instead of explicit user similarities, matrix factorization learns latent factors — hidden dimensions that explain rating patterns.
from surprise import SVD, Dataset, Reader, accuracy
from surprise.model_selection import cross_validate, train_test_split
# Create dataset in Surprise format
import pandas as pd
# Load MovieLens dataset (classic recommendation benchmark)
ratings_data = pd.DataFrame({
'user_id': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4],
'item_id': [1, 2, 5, 2, 3, 1, 4, 5, 2, 3],
'rating': [5, 4, 2, 3, 5, 4, 3, 1, 5, 4]
})
reader = Reader(rating_scale=(1, 5))
dataset = Dataset.load_from_df(ratings_data[['user_id', 'item_id', 'rating']], reader)
# Train SVD (matrix factorization)
trainset, testset = train_test_split(dataset, test_size=0.2)
algo = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
algo.fit(trainset)
# Make predictions
predictions = algo.test(testset)
print(f"RMSE: {accuracy.rmse(predictions):.4f}")
# Predict specific user-item pair
user_id = 1
item_id = 3
prediction = algo.predict(user_id, item_id)
print(f"\nPredicted rating for User {user_id}, Item {item_id}: {prediction.est:.2f}")
What Latent Factors Represent
For a movie recommendation system, latent factors might represent (though we never know exactly):
- How much a movie is action-oriented
- How "serious" vs. lighthearted it is
- Whether it's from the 1980s
- How visually impressive it is
Users who prefer action films have high factor values for "action-ness." Action movies have high factor values for the same dimension. The dot product of user and item factor vectors predicts affinity.
Approach 3: Content-Based Filtering
Recommend items similar to what the user has liked, based on item features:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
# Movie metadata
movies = pd.DataFrame({
'title': ['The Dark Knight', 'Inception', 'Interstellar', 'The Avengers',
'Iron Man', 'Titanic', 'Avatar', 'Forrest Gump'],
'description': [
'Batman fights the Joker in Gotham City crime action superhero',
'Dream heist thieves plant ideas into subconscious science fiction',
'Astronauts travel wormhole space exploration science time',
'Superheroes team up save world from alien invasion action',
'Billionaire creates Iron Man suit tech action superhero',
'Romance tragedy iceberg ship disaster historical',
'Sci-fi alien world colonization action adventure CGI',
'Simple man runs through history America drama heartwarming'
]
})
# Create TF-IDF matrix from descriptions
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['description'])
# Compute cosine similarity between all movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
def get_content_recommendations(title, movies_df, sim_matrix, n=3):
idx = movies_df[movies_df['title'] == title].index[0]
sim_scores = list(enumerate(sim_matrix[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
sim_scores = sim_scores[1:n+1] # Exclude the movie itself
recommendations = []
for movie_idx, score in sim_scores:
recommendations.append({
'title': movies_df.iloc[movie_idx]['title'],
'similarity': score
})
return recommendations
recs = get_content_recommendations('The Dark Knight', movies, cosine_sim)
print("Because you watched The Dark Knight:")
for rec in recs:
print(f" {rec['title']} (similarity: {rec['similarity']:.3f})")
Approach 4: Neural Collaborative Filtering
Deep learning captures non-linear user-item interactions that matrix factorization misses:
import torch
import torch.nn as nn
class NeuralCF(nn.Module):
def __init__(self, n_users, n_items, embedding_dim=64, hidden_layers=[128, 64, 32]):
super().__init__()
# Embedding layers
self.user_embedding = nn.Embedding(n_users, embedding_dim)
self.item_embedding = nn.Embedding(n_items, embedding_dim)
# MLP layers
layers = []
input_size = embedding_dim * 2
for hidden_size in hidden_layers:
layers.extend([
nn.Linear(input_size, hidden_size),
nn.ReLU(),
nn.Dropout(0.2)
])
input_size = hidden_size
layers.append(nn.Linear(input_size, 1))
layers.append(nn.Sigmoid())
self.mlp = nn.Sequential(*layers)
def forward(self, user_ids, item_ids):
user_embed = self.user_embedding(user_ids)
item_embed = self.item_embedding(item_ids)
# Concatenate user and item embeddings
combined = torch.cat([user_embed, item_embed], dim=1)
# Pass through MLP
rating_pred = self.mlp(combined)
return rating_pred.squeeze()
# Initialize
n_users, n_items = 1000, 5000
model = NeuralCF(n_users, n_items, embedding_dim=64)
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")
Building a Complete System: Architecture
A production recommendation system has more than just a model:
Data Collection Layer
├── User interactions (clicks, purchases, ratings, time spent)
├── User profile data (demographics, preferences)
├── Item metadata (content features)
└── Contextual data (time, device, location)
Feature Store
├── User features (precomputed, updated regularly)
├── Item features (precomputed at item creation)
└── Real-time features (current session behavior)
Model Layer
├── Candidate Generation (narrow from millions to thousands)
│ └── Collaborative filtering / ANN search
├── Ranking (score and order the candidates)
│ └── Neural ranking model
└── Re-ranking (business logic filters)
└── Remove recently purchased, age restrictions, etc.
Serving Layer
├── Low-latency API (<50ms target)
├── Caching (popular recommendations cached)
└── A/B testing framework
Monitoring
├── Recommendation quality metrics
├── Business metrics (CTR, conversion, revenue)
└── Drift detection (distribution shifts in user behavior)
Evaluation Metrics
import numpy as np
def precision_at_k(recommended, relevant, k):
"""Fraction of top-k recommendations that are relevant"""
top_k = recommended[:k]
relevant_set = set(relevant)
hits = sum(1 for item in top_k if item in relevant_set)
return hits / k
def recall_at_k(recommended, relevant, k):
"""Fraction of relevant items in top-k recommendations"""
top_k = recommended[:k]
relevant_set = set(relevant)
hits = sum(1 for item in top_k if item in relevant_set)
return hits / len(relevant_set) if relevant_set else 0
def ndcg_at_k(recommended, relevant, k):
"""Normalized Discounted Cumulative Gain"""
top_k = recommended[:k]
relevant_set = set(relevant)
dcg = sum(1/np.log2(i+2) for i, item in enumerate(top_k) if item in relevant_set)
ideal_dcg = sum(1/np.log2(i+2) for i in range(min(len(relevant_set), k)))
return dcg / ideal_dcg if ideal_dcg > 0 else 0
# Example evaluation
recommended = ['movie1', 'movie5', 'movie3', 'movie7', 'movie2']
actually_watched = ['movie1', 'movie2', 'movie6', 'movie3']
print(f"Precision@3: {precision_at_k(recommended, actually_watched, 3):.3f}")
print(f"Recall@3: {recall_at_k(recommended, actually_watched, 3):.3f}")
print(f"NDCG@5: {ndcg_at_k(recommended, actually_watched, 5):.3f}")
Comparison Table
| Approach | Best For | Limitation | Complexity |
|---|---|---|---|
| User-based CF | Small datasets, interpretability | Scalability, cold start | Low |
| Item-based CF | Item catalog stability | Cold start for new items | Medium |
| Matrix Factorization (SVD) | Large rating datasets | Cold start | Medium |
| Content-based | New items, few users | Limited to content features | Medium |
| Neural CF | Large datasets, complex patterns | Needs lots of data, compute | High |
| Hybrid | Production systems | Implementation complexity | High |
Conclusion
Recommendation systems sit at the intersection of practical impact and interesting ML challenges. They've driven billions in e-commerce revenue and billions of hours of engagement, while continuously pushing research in cold start, scalability, and beyond-accuracy goals like diversity and serendipity.
For most teams building their first recommendation system, start with matrix factorization (Surprise library or ALS in Spark) for the collaborative signal and add content-based features for cold start handling. Neural approaches deliver better results at scale but require significantly more data and infrastructure.
For the foundational ML skills, see our scikit-learn tutorial and neural networks explained guide.
Frequently Asked Questions
How does Netflix's recommendation system work?
A hybrid system combining collaborative filtering (users with similar histories), content-based filtering (similar content attributes), matrix factorization (latent taste dimensions), and deep neural networks. Netflix estimates recommendations are worth $1 billion annually in reduced churn.
What is the cold start problem?
Difficulty making good recommendations for new users (no interaction history) or new items (no ratings). Solutions: onboarding questions for new users, content-based features for new items, popularity-based fallbacks. Pure collaborative filtering cannot handle cold start — it needs a hybrid approach.
What is matrix factorization and how does it work?
Decomposes the user-item matrix into user latent factors and item latent factors. A user's factor vector dot product with an item's factor vector predicts the rating. Learns compact representations that allow predicting ratings for unobserved user-item pairs.
Should I use collaborative filtering or content-based filtering?
Use a hybrid. Collaborative filtering: great with dense interaction data, captures subjective quality. Content-based: great for new items, explainable recommendations. Together they solve each other's weaknesses. All major production systems (Netflix, Amazon, Spotify) use hybrid approaches.
How do I evaluate a recommendation system?
Offline: Precision@K, Recall@K, NDCG. Online (A/B test): click-through rate, conversion, long-term engagement. Important caveat: offline metrics often don't predict online performance. Validate with A/B tests before deploying changes.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
Best Machine Learning Courses in 2025: Ranked After Taking Them All
The best machine learning courses in 2025 — ranked by a practitioner who completed them. Honest assessments of Coursera, Fast.ai, Kaggle, and 7 others with cost and time required.
Computer Vision Tutorial: Build an Image Classifier from Scratch
Computer vision tutorial for beginners — build a real image classifier using CNNs and PyTorch, understand how computers see images, and learn transfer learning for production results.
Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs
Feature engineering guide for machine learning — practical techniques to create, transform, and select features that improve model accuracy, with Python code examples for every method.
Kaggle Competition Guide: How to Rank in the Top 10% Every Time
Kaggle competition guide — the systematic approach to finishing in the top 10%, from EDA and baseline models to ensembling and post-competition learning, used by Kaggle Masters.