AiTechWorlds
AiTechWorlds
Picture a student preparing for a national exam. Every night she drills the past papers from one specific school — same format, same question styles, same tricks. On exam day from that school, she scores 98%. Then she sits a paper from a different school. The format shifts, the phrasing changes, and she scores 61%.
She did not learn the subject. She memorized one version of it.
This is exactly what happens when you evaluate a machine learning model on the same data it was trained on, or even on a single fixed test split that you peeked at during development. The model learns the quirks of your particular split, not the underlying pattern. Cross-validation is the answer — it forces the model to prove itself on multiple unseen exam papers.
When you randomly split your data once (say 80/20), you introduce randomness into your evaluation. That specific 20% might be unusually easy or unusually hard. If you tune hyperparameters based on this single split, you are now leaking information about the test set into your development process.
The result: your reported accuracy is optimistic. The model has been — subtly or directly — fit to that test set. In the real world, performance drops.
K-Fold solves this by rotating the test set. The algorithm:
i:
i as the test set.Every data point serves as a test sample exactly once. The standard choices are K=5 (fast, still reliable) or K=10 (more accurate estimate, slower). The mean score is a much more trustworthy performance estimate than any single split.
Regular K-Fold splits randomly. With imbalanced data (e.g., 95% class 0, 5% class 1), a random fold might contain zero or very few minority-class samples — giving a useless evaluation.
Stratified K-Fold preserves the class ratio in every fold. If the overall dataset is 95/5, each fold will also be approximately 95/5. Always use stratified K-Fold for classification problems.
This is one of the most important concepts in all of machine learning. Every model's error on unseen data comes from two sources:
Bias (Underfitting): The model is too simple to capture the true pattern. A linear model trying to fit a curved relationship will always be wrong — not because of the data, but because of its own rigid assumptions. High bias means high training error.
Variance (Overfitting): The model is too complex and memorizes the training data — including its noise. It fits the training set perfectly but fails on anything new. High variance means low training error but high test error.
The tradeoff: as you increase model complexity, bias decreases (good) but variance increases (bad). The sweet spot is a model complex enough to capture real patterns, but not so complex that it memorizes noise.
| Model State | Training Error | Validation Error | Fix |
|---|---|---|---|
| Underfitting | High | High | More complex model, more features |
| Good fit | Low | Low | You're done |
| Overfitting | Very Low | High | Regularization, more data, simpler model |
A learning curve plots training and validation error as a function of training set size. It is the best diagnostic tool for bias vs variance.
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, StratifiedKFold, learning_curve
import matplotlib.pyplot as plt
# Load data
X, y = load_breast_cancer(return_X_y=True)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
model = LogisticRegression(max_iter=1000)
# --- Standard K-Fold (K=5) ---
cv_scores = cross_val_score(model, X_scaled, y, cv=5, scoring='accuracy')
print(f"5-Fold CV Accuracy: {cv_scores}")
print(f"Mean: {cv_scores.mean():.4f} | Std: {cv_scores.std():.4f}")
# Output:
# 5-Fold CV Accuracy: [0.9561 0.9737 0.9649 0.9561 0.9823]
# Mean: 0.9666 | Std: 0.0096
# --- Stratified K-Fold ---
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
strat_scores = cross_val_score(model, X_scaled, y, cv=skf, scoring='f1')
print(f"\n10-Fold Stratified F1: {strat_scores.mean():.4f} ± {strat_scores.std():.4f}")
# Output: 10-Fold Stratified F1: 0.9681 ± 0.0138
# --- Learning Curve ---
train_sizes, train_scores, val_scores = learning_curve(
model, X_scaled, y, cv=skf,
train_sizes=np.linspace(0.1, 1.0, 10),
scoring='accuracy', n_jobs=-1
)
train_mean = train_scores.mean(axis=1)
val_mean = val_scores.mean(axis=1)
plt.figure(figsize=(8, 5))
plt.plot(train_sizes, train_mean, 'o-', label='Training Accuracy', color='steelblue')
plt.plot(train_sizes, val_mean, 'o-', label='Validation Accuracy', color='tomato')
plt.fill_between(train_sizes,
train_scores.mean(1) - train_scores.std(1),
train_scores.mean(1) + train_scores.std(1), alpha=0.1, color='steelblue')
plt.fill_between(train_sizes,
val_scores.mean(1) - val_scores.std(1),
val_scores.mean(1) + val_scores.std(1), alpha=0.1, color='tomato')
plt.xlabel('Training Set Size')
plt.ylabel('Accuracy')
plt.title('Learning Curve — Logistic Regression (Breast Cancer)')
plt.legend()
plt.tight_layout()
plt.savefig('learning_curve.png', dpi=150)
plt.show()
# The two curves converge → this model is well-fitted, not overfitting
When you tune hyperparameters, you need cross-validation during the tuning — not just after. GridSearchCV does both:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
param_grid = {
'C': [0.01, 0.1, 1, 10, 100],
'gamma': ['scale', 'auto'],
'kernel': ['rbf', 'linear']
}
grid_search = GridSearchCV(
SVC(), param_grid,
cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=42),
scoring='f1', n_jobs=-1, verbose=1
)
grid_search.fit(X_scaled, y)
print(f"Best params: {grid_search.best_params_}")
print(f"Best CV F1: {grid_search.best_score_:.4f}")
# Output:
# Best params: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
# Best CV F1: 0.9789
GridSearchCV tests every combination of parameters using K-Fold internally. The best parameters are those that perform best on the held-out folds — not just training data. This is the correct, leak-free way to tune.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises