A
AiTechWorlds
AiTechWorlds
Supervised, unsupervised, and reinforcement learning explained with sklearn examples, metrics, and decision framework.
| Paradigm | Has Labels? | Learns From | Goal |
|---|---|---|---|
| Supervised | Yes | (input, label) pairs | Predict label for new input |
| Unsupervised | No | Input data only | Find structure/patterns |
| Reinforcement | No (uses rewards) | Agent-environment interactions | Maximize cumulative reward |
Train a model on labeled examples (X, y) so it can predict y for unseen X.
| Type | Output | Example Algorithms | Example Problem |
|---|---|---|---|
| Classification | Discrete class | LogReg, SVM, RF, XGBoost | Spam detection, image classification |
| Regression | Continuous value | Linear Regression, SVR, Gradient Boosting | House price, stock prediction |
| Sequence labeling | Per-token label | CRF, BiLSTM, BERT fine-tuned | NER, POS tagging |
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))| Task | Metrics |
|---|---|
| Binary classification | Accuracy, Precision, Recall, F1, AUC-ROC |
| Multiclass | Macro/weighted F1, Confusion Matrix |
| Regression | MAE, MSE, RMSE, RΒ² |
Find structure in unlabeled data β no ground truth labels are provided.
| Type | Goal | Algorithms |
|---|---|---|
| Clustering | Group similar data points | K-Means, DBSCAN, Agglomerative |
| Dimensionality Reduction | Compress features | PCA, t-SNE, UMAP |
| Anomaly Detection | Identify outliers | Isolation Forest, One-Class SVM |
| Generative Models | Learn data distribution | VAE, GAN, Diffusion |
| Association Rules | Find co-occurrence patterns | Apriori, FP-Growth |
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=4, random_state=42, n_init='auto')
labels = kmeans.fit_predict(X_scaled)
# Evaluate cluster quality (no ground truth needed)
from sklearn.metrics import silhouette_score
score = silhouette_score(X_scaled, labels)
print(f"Silhouette: {score:.3f}") # range -1 to 1, higher is better# Elbow method
inertias = []
for k in range(2, 11):
km = KMeans(n_clusters=k, random_state=42, n_init='auto')
km.fit(X_scaled)
inertias.append(km.inertia_)
# Plot inertias β choose k at the "elbow" (diminishing returns)An agent takes actions in an environment, receives rewards, and learns a policy that maximizes cumulative reward over time.
| Term | Meaning |
|---|---|
| Agent | The learner/decision-maker |
| Environment | The world the agent interacts with |
| State (s) | Current situation |
| Action (a) | Possible moves the agent can take |
| Reward (r) | Feedback signal after action |
| Policy (Ο) | Strategy: state β action mapping |
| Value function (V) | Expected future reward from a state |
| Q-function | Expected reward from (state, action) pair |
| Algorithm | Type | Use Case |
|---|---|---|
| Q-Learning | Model-free, off-policy | Discrete action spaces |
| DQN | Deep Q-Network | Atari games, simple control |
| PPO | Policy gradient | Continuous control, RLHF |
| SAC | Soft actor-critic | Robotics, complex envs |
| A3C/A2C | Actor-critic | Parallel training |
| AlphaZero | MCTS + self-play | Board games |
import numpy as np
# Q-table: states Γ actions
Q = np.zeros((n_states, n_actions))
# Training loop
for episode in range(1000):
state = env.reset()
for step in range(max_steps):
# Epsilon-greedy action selection
if np.random.rand() < epsilon:
action = env.action_space.sample() # explore
else:
action = np.argmax(Q[state]) # exploit
next_state, reward, done, _ = env.step(action)
# Bellman update
Q[state, action] += alpha * (
reward + gamma * np.max(Q[next_state]) - Q[state, action]
)
state = next_state
if done: break| Type | Description | Example |
|---|---|---|
| Semi-supervised | Small labeled set + large unlabeled set | Label propagation, pseudo-labeling |
| Self-supervised | Create labels from data structure itself | BERT (masked token prediction), SimCLR |
| Transfer learning | Pre-train on one task, fine-tune on another | ImageNet β medical imaging |
Do you have labeled data?
ββ Yes β Supervised learning
β ββ Output is a category? Classification
β ββ Output is a number? Regression
β
ββ No β What is your goal?
ββ Find groups β Clustering (K-Means, DBSCAN)
ββ Detect outliers β Anomaly detection
ββ Reduce features β PCA, UMAP
ββ Sequential decisions β Reinforcement learning
ββ Generate new data β Generative models (GAN, VAE)Download ML Learning Paradigms: Complete Guide
Get this note + 100s more free on Telegram
Get more notes like this daily on Telegram!
Free study notes, cheat sheets & AI tips
Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content β 100% free!
No spam. Leave anytime.