Follow AiTechWorlds on LinkedIn for professional AI content!Follow Now →

Recommendation Systems Explained: How Netflix and Amazon Know What You Want

Recommendation systems explained — how collaborative filtering, content-based, and hybrid systems work, with Python code to build your own, and how Netflix and Amazon use them.

A
AiTechWorlds Team
May 27, 2026 9 min read
📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Recommendation Systems Explained: How Netflix and Amazon Know What You Want

Netflix has 260 million subscribers and 15,000+ titles. If you had to manually browse to find something to watch, you'd spend 20 minutes deciding and possibly give up. Instead, 80% of what Netflix subscribers watch comes from recommendations.

The same pattern plays out across every major platform: 35% of Amazon purchases come from recommendations. 60% of YouTube watch time is recommended content. Spotify's Discover Weekly has 40 million weekly listeners despite no human curation.

Recommendation systems are one of machine learning's most commercially successful applications — genuinely making products better and driving measurable revenue. Understanding how they work reveals both impressive engineering and careful handling of fundamental mathematical challenges.


The Core Problem

A recommendation system estimates: "What is the probability that user u will like item i?"

We can frame this as predicting the rating a user would give an item they haven't interacted with, or as ranking all items by predicted preference and showing the top K.

The data:

User-Item Interaction Matrix:

           Item1  Item2  Item3  Item4  Item5
User1:       5      4      -      -      2
User2:       -      3      5      -      4
User3:       4      -      -      3      -
User4:       -      5      4      5      -

- = not yet rated (most entries are missing)

The challenge: the matrix is almost entirely empty. Netflix has 260M users × 15K titles = 3.9 billion possible ratings, but users watch and rate a tiny fraction.


Approach 1: Collaborative Filtering

Collaborative filtering finds users with similar taste and recommends what they liked.

User-based collaborative filtering:

"Find users similar to User1. What have they liked that User1 hasn't seen?"

User1 rated: Item1=5, Item2=4, Item5=2
User3 rated: Item1=4, Item3=3

Similarity(User1, User3) = high (both liked Item1)
Recommendation: Show User1 Item3 (User3 liked it, and they have similar taste)
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# Create user-item matrix (NaN = not rated)
ratings = pd.DataFrame({
    'Alice':  [5, 4, np.nan, np.nan, 2],
    'Bob':    [np.nan, 3, 5, np.nan, 4],
    'Carol':  [4, np.nan, np.nan, 3, np.nan],
    'Dave':   [np.nan, 5, 4, 5, np.nan],
}, index=['Item1', 'Item2', 'Item3', 'Item4', 'Item5']).T

def user_based_cf(ratings_df, target_user, n_similar=3, n_recommendations=3):
    """User-based collaborative filtering"""
    # Fill NaN with 0 for similarity computation
    ratings_filled = ratings_df.fillna(0)
    
    # Compute cosine similarity between all users
    user_similarity = pd.DataFrame(
        cosine_similarity(ratings_filled),
        index=ratings_df.index,
        columns=ratings_df.index
    )
    
    # Get most similar users (excluding self)
    similar_users = user_similarity[target_user].drop(target_user).nlargest(n_similar)
    
    # Find items target user hasn't rated
    unrated_items = ratings_df.columns[ratings_df.loc[target_user].isna()].tolist()
    
    # Score each unrated item based on similar users' ratings
    item_scores = {}
    for item in unrated_items:
        weighted_sum = 0
        sim_sum = 0
        for user, similarity in similar_users.items():
            if not np.isnan(ratings_df.loc[user, item]):
                weighted_sum += similarity * ratings_df.loc[user, item]
                sim_sum += abs(similarity)
        if sim_sum > 0:
            item_scores[item] = weighted_sum / sim_sum
    
    # Return top N recommendations
    recommendations = sorted(item_scores.items(), key=lambda x: x[1], reverse=True)
    return recommendations[:n_recommendations]

recs = user_based_cf(ratings, 'Alice')
print("Recommendations for Alice:")
for item, score in recs:
    print(f"  {item}: predicted rating {score:.2f}")

Approach 2: Matrix Factorization (SVD)

Instead of explicit user similarities, matrix factorization learns latent factors — hidden dimensions that explain rating patterns.

from surprise import SVD, Dataset, Reader, accuracy
from surprise.model_selection import cross_validate, train_test_split

# Create dataset in Surprise format
import pandas as pd

# Load MovieLens dataset (classic recommendation benchmark)
ratings_data = pd.DataFrame({
    'user_id': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4],
    'item_id': [1, 2, 5, 2, 3, 1, 4, 5, 2, 3],
    'rating': [5, 4, 2, 3, 5, 4, 3, 1, 5, 4]
})

reader = Reader(rating_scale=(1, 5))
dataset = Dataset.load_from_df(ratings_data[['user_id', 'item_id', 'rating']], reader)

# Train SVD (matrix factorization)
trainset, testset = train_test_split(dataset, test_size=0.2)
algo = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
algo.fit(trainset)

# Make predictions
predictions = algo.test(testset)
print(f"RMSE: {accuracy.rmse(predictions):.4f}")

# Predict specific user-item pair
user_id = 1
item_id = 3
prediction = algo.predict(user_id, item_id)
print(f"\nPredicted rating for User {user_id}, Item {item_id}: {prediction.est:.2f}")

What Latent Factors Represent

For a movie recommendation system, latent factors might represent (though we never know exactly):

  • How much a movie is action-oriented
  • How "serious" vs. lighthearted it is
  • Whether it's from the 1980s
  • How visually impressive it is

Users who prefer action films have high factor values for "action-ness." Action movies have high factor values for the same dimension. The dot product of user and item factor vectors predicts affinity.


Approach 3: Content-Based Filtering

Recommend items similar to what the user has liked, based on item features:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

# Movie metadata
movies = pd.DataFrame({
    'title': ['The Dark Knight', 'Inception', 'Interstellar', 'The Avengers', 
              'Iron Man', 'Titanic', 'Avatar', 'Forrest Gump'],
    'description': [
        'Batman fights the Joker in Gotham City crime action superhero',
        'Dream heist thieves plant ideas into subconscious science fiction',
        'Astronauts travel wormhole space exploration science time',
        'Superheroes team up save world from alien invasion action',
        'Billionaire creates Iron Man suit tech action superhero',
        'Romance tragedy iceberg ship disaster historical',
        'Sci-fi alien world colonization action adventure CGI',
        'Simple man runs through history America drama heartwarming'
    ]
})

# Create TF-IDF matrix from descriptions
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['description'])

# Compute cosine similarity between all movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

def get_content_recommendations(title, movies_df, sim_matrix, n=3):
    idx = movies_df[movies_df['title'] == title].index[0]
    sim_scores = list(enumerate(sim_matrix[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:n+1]  # Exclude the movie itself
    
    recommendations = []
    for movie_idx, score in sim_scores:
        recommendations.append({
            'title': movies_df.iloc[movie_idx]['title'],
            'similarity': score
        })
    return recommendations

recs = get_content_recommendations('The Dark Knight', movies, cosine_sim)
print("Because you watched The Dark Knight:")
for rec in recs:
    print(f"  {rec['title']} (similarity: {rec['similarity']:.3f})")

Approach 4: Neural Collaborative Filtering

Deep learning captures non-linear user-item interactions that matrix factorization misses:

import torch
import torch.nn as nn

class NeuralCF(nn.Module):
    def __init__(self, n_users, n_items, embedding_dim=64, hidden_layers=[128, 64, 32]):
        super().__init__()
        
        # Embedding layers
        self.user_embedding = nn.Embedding(n_users, embedding_dim)
        self.item_embedding = nn.Embedding(n_items, embedding_dim)
        
        # MLP layers
        layers = []
        input_size = embedding_dim * 2
        for hidden_size in hidden_layers:
            layers.extend([
                nn.Linear(input_size, hidden_size),
                nn.ReLU(),
                nn.Dropout(0.2)
            ])
            input_size = hidden_size
        layers.append(nn.Linear(input_size, 1))
        layers.append(nn.Sigmoid())
        
        self.mlp = nn.Sequential(*layers)
    
    def forward(self, user_ids, item_ids):
        user_embed = self.user_embedding(user_ids)
        item_embed = self.item_embedding(item_ids)
        
        # Concatenate user and item embeddings
        combined = torch.cat([user_embed, item_embed], dim=1)
        
        # Pass through MLP
        rating_pred = self.mlp(combined)
        return rating_pred.squeeze()

# Initialize
n_users, n_items = 1000, 5000
model = NeuralCF(n_users, n_items, embedding_dim=64)
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

Building a Complete System: Architecture

A production recommendation system has more than just a model:

Data Collection Layer
├── User interactions (clicks, purchases, ratings, time spent)
├── User profile data (demographics, preferences)
├── Item metadata (content features)
└── Contextual data (time, device, location)

Feature Store
├── User features (precomputed, updated regularly)
├── Item features (precomputed at item creation)
└── Real-time features (current session behavior)

Model Layer
├── Candidate Generation (narrow from millions to thousands)
│   └── Collaborative filtering / ANN search
├── Ranking (score and order the candidates)
│   └── Neural ranking model
└── Re-ranking (business logic filters)
    └── Remove recently purchased, age restrictions, etc.

Serving Layer
├── Low-latency API (<50ms target)
├── Caching (popular recommendations cached)
└── A/B testing framework

Monitoring
├── Recommendation quality metrics
├── Business metrics (CTR, conversion, revenue)
└── Drift detection (distribution shifts in user behavior)

Evaluation Metrics

import numpy as np

def precision_at_k(recommended, relevant, k):
    """Fraction of top-k recommendations that are relevant"""
    top_k = recommended[:k]
    relevant_set = set(relevant)
    hits = sum(1 for item in top_k if item in relevant_set)
    return hits / k

def recall_at_k(recommended, relevant, k):
    """Fraction of relevant items in top-k recommendations"""
    top_k = recommended[:k]
    relevant_set = set(relevant)
    hits = sum(1 for item in top_k if item in relevant_set)
    return hits / len(relevant_set) if relevant_set else 0

def ndcg_at_k(recommended, relevant, k):
    """Normalized Discounted Cumulative Gain"""
    top_k = recommended[:k]
    relevant_set = set(relevant)
    
    dcg = sum(1/np.log2(i+2) for i, item in enumerate(top_k) if item in relevant_set)
    ideal_dcg = sum(1/np.log2(i+2) for i in range(min(len(relevant_set), k)))
    
    return dcg / ideal_dcg if ideal_dcg > 0 else 0

# Example evaluation
recommended = ['movie1', 'movie5', 'movie3', 'movie7', 'movie2']
actually_watched = ['movie1', 'movie2', 'movie6', 'movie3']

print(f"Precision@3: {precision_at_k(recommended, actually_watched, 3):.3f}")
print(f"Recall@3:    {recall_at_k(recommended, actually_watched, 3):.3f}")
print(f"NDCG@5:      {ndcg_at_k(recommended, actually_watched, 5):.3f}")

Comparison Table

ApproachBest ForLimitationComplexity
User-based CFSmall datasets, interpretabilityScalability, cold startLow
Item-based CFItem catalog stabilityCold start for new itemsMedium
Matrix Factorization (SVD)Large rating datasetsCold startMedium
Content-basedNew items, few usersLimited to content featuresMedium
Neural CFLarge datasets, complex patternsNeeds lots of data, computeHigh
HybridProduction systemsImplementation complexityHigh

Conclusion

Recommendation systems sit at the intersection of practical impact and interesting ML challenges. They've driven billions in e-commerce revenue and billions of hours of engagement, while continuously pushing research in cold start, scalability, and beyond-accuracy goals like diversity and serendipity.

For most teams building their first recommendation system, start with matrix factorization (Surprise library or ALS in Spark) for the collaborative signal and add content-based features for cold start handling. Neural approaches deliver better results at scale but require significantly more data and infrastructure.

For the foundational ML skills, see our scikit-learn tutorial and neural networks explained guide.


Frequently Asked Questions

How does Netflix's recommendation system work?

A hybrid system combining collaborative filtering (users with similar histories), content-based filtering (similar content attributes), matrix factorization (latent taste dimensions), and deep neural networks. Netflix estimates recommendations are worth $1 billion annually in reduced churn.

What is the cold start problem?

Difficulty making good recommendations for new users (no interaction history) or new items (no ratings). Solutions: onboarding questions for new users, content-based features for new items, popularity-based fallbacks. Pure collaborative filtering cannot handle cold start — it needs a hybrid approach.

What is matrix factorization and how does it work?

Decomposes the user-item matrix into user latent factors and item latent factors. A user's factor vector dot product with an item's factor vector predicts the rating. Learns compact representations that allow predicting ratings for unobserved user-item pairs.

Should I use collaborative filtering or content-based filtering?

Use a hybrid. Collaborative filtering: great with dense interaction data, captures subjective quality. Content-based: great for new items, explainable recommendations. Together they solve each other's weaknesses. All major production systems (Netflix, Amazon, Spotify) use hybrid approaches.

How do I evaluate a recommendation system?

Offline: Precision@K, Recall@K, NDCG. Online (A/B test): click-through rate, conversion, long-term engagement. Important caveat: offline metrics often don't predict online performance. Validate with A/B tests before deploying changes.

Share this article:

Frequently Asked Questions

Netflix uses a hybrid recommendation system combining multiple approaches. Collaborative filtering: users with similar viewing histories get similar recommendations ('users like you watched X'). Content-based filtering: recommends similar content to what you've watched (genre, director, actors, themes). Matrix factorization (SVD/ALS): decomposes the viewing matrix to find latent taste dimensions that predict viewing likelihood. Contextual signals: time of day, device type, how you accessed content. Neural networks: deep learning models that combine all signals. Netflix estimates that its recommendation system is worth $1 billion annually in reduced churn — better recommendations keep subscribers from canceling.
A

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

Related Articles

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources
Join Free Channel

No spam. Leave anytime.

!