AiTechWorlds
AiTechWorlds
Picture a student preparing for an exam. You notice that students who study more hours tend to score higher. If you plot hours on the x-axis and scores on the y-axis, the points form a rough upward-sloping cloud. Draw the best possible straight line through that cloud, and you have linear regression.
The line lets you make predictions: "A student who studies 7 hours will probably score around 78." You're not memorizing all 30 students' exact results — you're learning the underlying relationship. That ability to generalize from examples to predictions is the heart of machine learning.
Linear regression is the simplest supervised learning algorithm and the foundation upon which nearly every more complex method is built. Understanding it deeply unlocks intuition for neural networks, regularization, and beyond.
The core idea is to find a line — or a hyperplane in multiple dimensions — that best explains the relationship between input features and a continuous output. "Best" means the line that minimizes prediction error across all training examples.
The equation for simple linear regression is:
y = mx + b
Where y is the predicted output, x is the input feature, m is the slope (how much y changes per unit of x), and b is the intercept (the y-value when x is zero).
The model learns m and b by minimizing Mean Squared Error (MSE):
MSE = (1/n) * Σ (y_actual - y_predicted)²
Squaring the errors penalizes large mistakes more than small ones, and makes the math differentiable so gradient descent can be applied.
Imagine you are blindfolded on a hilly landscape, and your goal is to reach the lowest valley. You can only feel the slope beneath your feet. At each step, you move in the downhill direction. The size of your step is the learning rate. Too large a step and you overshoot the valley. Too small and it takes forever. Gradient descent is exactly this process — iteratively adjusting the model parameters in the direction that reduces error.
Scikit-learn uses a closed-form solution (the Normal Equation) that finds the optimal parameters directly, so you don't need to tune learning rate for standard linear regression. Gradient descent matters when the dataset is too large for the closed-form solution.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
# Dataset: hours studied vs exam score
np.random.seed(42)
hours = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 3])
scores = hours * 7.5 + 30 + np.random.normal(0, 5, len(hours))
X = hours.reshape(-1, 1)
y = scores
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# --- Scikit-learn approach ---
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Slope (m): {model.coef_[0]:.4f}")
print(f"Intercept (b): {model.intercept_:.4f}")
print(f"MSE: {mean_squared_error(y_test, y_pred):.4f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.4f}")
print(f"R² Score: {r2_score(y_test, y_pred):.4f}")
Output:
Slope (m): 7.6231
Intercept (b): 29.4817
MSE: 22.3156
RMSE: 4.7240
R² Score: 0.9312
# --- From-scratch NumPy approach ---
def linear_regression_numpy(X, y):
"""Closed-form solution: w = (X^T X)^(-1) X^T y"""
X_b = np.c_[np.ones((len(X), 1)), X] # Add bias column
theta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
return theta[0], theta[1] # intercept, slope
intercept, slope = linear_regression_numpy(X_train, y_train)
print(f"\nNumPy from-scratch:")
print(f"Slope: {slope:.4f}")
print(f"Intercept: {intercept:.4f}")
# Predict and score
X_test_b = np.c_[np.ones((len(X_test), 1)), X_test]
y_pred_np = X_test_b @ np.array([intercept, slope])
print(f"R² Score: {r2_score(y_test, y_pred_np):.4f}")
Output:
NumPy from-scratch:
Slope: 7.6231
Intercept: 29.4817
R² Score: 0.9312
Both approaches produce identical results. Scikit-learn is preferred in practice; the NumPy version builds intuition.
When you have more than one feature — say, hours studied AND hours of sleep — the equation extends to:
y = b0 + b1*x1 + b2*x2 + ... + bn*xn
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing(as_frame=True)
X_house = housing.data[['MedInc', 'AveRooms', 'AveOccup', 'HouseAge']]
y_house = housing.target
X_tr, X_te, y_tr, y_te = train_test_split(X_house, y_house, test_size=0.2, random_state=42)
mlr = LinearRegression()
mlr.fit(X_tr, y_tr)
coeff_df = pd.DataFrame({'Feature': X_house.columns, 'Coefficient': mlr.coef_})
print(coeff_df.to_string(index=False))
print(f"\nR² on test set: {mlr.score(X_te, y_te):.4f}")
Output:
Feature Coefficient
MedInc 0.4517
AveRooms -0.0523
AveOccup -0.0031
HouseAge 0.0098
R² on test set: 0.5143
Each coefficient tells you: holding all other features constant, how much does the target change per one-unit increase in this feature?
R² (R-squared) is the proportion of variance in the target that the model explains.
An R² of 0.93 for predicting exam scores from study hours is excellent. An R² of 0.51 for predicting California house prices from just four features is reasonable given that many important factors (location, neighborhood, school district) are omitted.
Linear regression works well when these conditions hold:
| Assumption | What It Means | How to Check |
|---|---|---|
| Linearity | Relationship between X and y is linear | Scatter plot, residual plot |
| Independence | Observations are independent of each other | Domain knowledge |
| Homoscedasticity | Residuals have constant variance | Residuals vs fitted values plot |
| Normality of residuals | Residuals are normally distributed | Q-Q plot |
| No multicollinearity | Features are not highly correlated with each other | VIF (Variance Inflation Factor) |
When these assumptions are violated, linear regression still often produces useful approximate results — but you should consider transformations or a different model.
Linear regression is the right first choice when:
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises