Logistic Regression & Classification

Logistic regression is the most important classification algorithm you'll learn — not because it's the most powerful, but because it builds the conceptual foundation for neural networks, understanding probability outputs, and interpreting ML models.

Despite the name, logistic regression is a classification algorithm, not regression.

What Logistic Regression Does

Linear regression predicts a continuous number. Logistic regression predicts a probability between 0 and 1, then classifies based on a threshold (default: 0.5).

The core idea: take the linear equation y = wx + b and pass it through the sigmoid function to squash the output to [0, 1]:

sigmoid(x) = 1 / (1 + e^(-x))

Output:
- Close to 0 → class 0 (negative)
- Close to 1 → class 1 (positive)
- 0.5 → decision boundary

import numpy as np
import matplotlib.pyplot as plt

# The sigmoid function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

x = np.linspace(-10, 10, 100)
plt.plot(x, sigmoid(x))
plt.axhline(y=0.5, color='r', linestyle='--', label='Decision boundary')
plt.xlabel('Linear combination of features')
plt.ylabel('Probability')
plt.title('Sigmoid Function')
plt.legend()
plt.show()

Implementation with Scikit-Learn

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
import numpy as np

# Example: Email spam classification
# Features: [word_count, exclamation_marks, contains_unsubscribe, uppercase_ratio]
np.random.seed(42)
n_samples = 1000

X = np.column_stack([
    np.random.randint(50, 500, n_samples),   # word count
    np.random.randint(0, 20, n_samples),      # exclamation marks
    np.random.randint(0, 2, n_samples),       # unsubscribe link
    np.random.uniform(0, 0.5, n_samples),     # uppercase ratio
])
# Create labels — spam more likely with many exclamation marks + unsubscribe
y = ((X[:, 1] > 10) | (X[:, 2] == 1) & (X[:, 3] > 0.3)).astype(int)

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale (important for logistic regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X_train_scaled, y_train)

# Evaluate
y_pred = model.predict(X_test_scaled)
y_proba = model.predict_proba(X_test_scaled)[:, 1]  # Probability of spam

print(classification_report(y_test, y_pred, target_names=['Not Spam', 'Spam']))

Understanding the Classification Report

              precision    recall  f1-score   support

    Not Spam       0.92      0.95      0.93       155
        Spam       0.88      0.82      0.85        45

    accuracy                           0.91       200

Precision: Of all emails we called spam, 88% actually were spam. Recall: Of all actual spam emails, we caught 82% of them. F1-score: Harmonic mean of precision and recall (useful when classes are imbalanced).

The Confusion Matrix

import seaborn as sns

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Not Spam', 'Spam'],
            yticklabels=['Not Spam', 'Spam'])
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.title('Confusion Matrix')
plt.show()

# The four cells:
# True Negative (TN): Correctly classified as not spam
# False Positive (FP): Wrongly called spam (sent real email to junk)
# False Negative (FN): Missed spam (spam got through)
# True Positive (TP): Correctly caught spam

The Precision-Recall Tradeoff

Adjusting the classification threshold shifts the tradeoff between precision and recall:

# Lower threshold → catch more spam (higher recall, lower precision)
# Higher threshold → only flag obvious spam (higher precision, lower recall)

thresholds = [0.3, 0.5, 0.7]
for t in thresholds:
    y_pred_t = (y_proba >= t).astype(int)
    cm = confusion_matrix(y_test, y_pred_t)
    tp = cm[1, 1]
    fp = cm[0, 1]
    fn = cm[1, 0]
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    print(f"Threshold {t}: Precision={precision:.2f}, Recall={recall:.2f}")

Which matters more depends on your problem:

Medical diagnosis: maximize recall (don't miss diseases)
Spam filter: balance precision and recall (annoying either way)
Content moderation: usually maximize precision (avoid false positives)

Feature Coefficients — Interpreting the Model

feature_names = ['word_count', 'exclamation_marks', 'unsubscribe', 'uppercase_ratio']
coefs = pd.Series(model.coef_[0], index=feature_names).sort_values()

coefs.plot(kind='barh', color=['red' if c < 0 else 'blue' for c in coefs])
plt.title('Feature Coefficients (positive = pushes toward Spam)')
plt.axvline(x=0, color='black', linestyle='-')
plt.show()

A positive coefficient means the feature pushes toward class 1 (spam). A negative coefficient pushes toward class 0 (not spam). Coefficients are interpretable when features are scaled.

Multi-Class Classification

from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# multi_class='auto' handles multi-class automatically
model = LogisticRegression(multi_class='auto', solver='lbfgs', max_iter=1000)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.3f}")
print(classification_report(y_test, model.predict(X_test), target_names=iris.target_names))

When to Use Logistic Regression

Use it when:

You need probability estimates (not just class predictions)
Interpretability matters — you can explain why the model made a prediction
Your dataset is linearly separable (or close to it)
As a fast baseline before trying complex models

Consider alternatives when:

The relationship between features and target is highly non-linear
Accuracy is more important than interpretability
You have high-dimensional sparse data (consider Naive Bayes or SVM)

Next lesson: Decision Trees — a powerful non-linear classifier that makes interpretable decisions.