Is feature engineering still important with deep learning?

Feature engineering remains important, though deep learning reduces the need for manual feature creation for certain data types. For unstructured data (images, text, audio), deep learning learns features automatically — you need minimal feature engineering. For tabular/structured data, traditional feature engineering still matters significantly: Gradient Boosting and Random Forest (often best for tabular data) don't automatically learn interaction terms, ratio features, or temporal patterns. Even for deep learning on tabular data, thoughtful feature engineering typically improves results. AutoML tools automate some feature engineering, but domain-knowledge-driven features often outperform automatically generated ones.

How do I handle missing values in machine learning?

The right approach depends on why values are missing. Missing completely at random (MCAR): mean/median imputation is reasonable. Missing not at random (MNAR): the missingness itself is informative — add a binary indicator column 'feature_was_missing' before imputing. For categorical features: add 'Unknown' as a category rather than imputing. For time series: forward-fill (use last known value) is often most appropriate. What not to do: drop rows with missing values if you'll lose significant data; impute using test set statistics (data leakage); impute without understanding why values are missing. In scikit-learn, use SimpleImputer or IterativeImputer (more sophisticated, imputes based on other features).

What is feature selection and why does it matter?

Feature selection removes irrelevant, redundant, or noise features from your dataset before training. It matters for three reasons: (1) reduces overfitting — irrelevant features add noise that the model may fit instead of the real signal; (2) speeds up training — fewer features means faster computation; (3) improves interpretability — a model with 10 important features is easier to understand than one with 1000 mixed-importance features. Common methods: correlation analysis (remove highly correlated features), feature importance from tree models, Recursive Feature Elimination (RFE), SelectKBest (statistical tests). Rule of thumb: always verify that removing features doesn't decrease validation performance before including them in production.

What is one-hot encoding and when should I use it?

One-hot encoding converts a categorical variable with N categories into N binary columns, each indicating membership in that category. Example: 'color' with values [red, green, blue] becomes three columns: 'is_red', 'is_green', 'is_blue'. Use one-hot encoding when: the categories have no natural order (nominal categories), you're using algorithms that can't handle categorical data natively (linear regression, SVM, neural networks), and the number of categories is manageable (<20-50). For high-cardinality categories (hundreds of values), prefer target encoding or embedding layers. For tree-based models (Random Forest, Gradient Boosting), ordinal encoding often works as well as one-hot and can be more efficient.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

Machine Learning

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

Q: What is feature engineering in machine learning?

Feature engineering is the process of using domain knowledge to create, transform, and select input variables (features) that help ML models learn patterns more effectively. Raw data rarely comes in a form that algorithms can directly use — feature engineering bridges that gap. Examples: from a datetime column, you might extract hour of day, day of week, and whether it's a holiday (three features that capture temporal patterns better than a raw timestamp). From text, you might extract length, sentiment score, and keyword presence. Good feature engineering can improve model accuracy more than switching from a simple algorithm to a complex one.

⚡ Quick Answer

Feature engineering guide for machine learning — practical techniques to create, transform, and select features that improve model accuracy, with Python code examples for every method.

AiTechWorlds Team May 27, 2026 9 min read

#feature-engineering-guide #feature-selection-ml #ml-preprocessing #machine-learning

📚Part of the Machine Learning guide — explore all Machine Learning articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

I've seen teams spend weeks tuning neural network architectures and hyperparameters and gain 1-2% improvement. Then a thoughtful domain expert suggested three new features, and accuracy jumped 8%.

Feature engineering is consistently underestimated by people new to ML. Algorithms — even complex deep learning ones — can only learn patterns that are present in the features you give them. If a key pattern requires an interaction between two variables, a ratio of three values, or a temporal aggregation, the model cannot discover it unless you present it.

This guide covers every major feature engineering technique with Python code, when to apply each, and how to think about feature creation from a domain perspective.

Numerical Features

Scaling and Normalization

Many algorithms (SVM, neural networks, logistic regression, KNN) are sensitive to feature magnitude. Features on different scales cause some features to dominate:

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

# Generate example data
data = pd.DataFrame({
    'age': [25, 45, 35, 55, 28],
    'salary': [50000, 120000, 75000, 180000, 45000],
    'years_experience': [2, 20, 10, 30, 3]
})

# StandardScaler: mean=0, std=1 — best for most algorithms
standard = StandardScaler()
data_standard = pd.DataFrame(
    standard.fit_transform(data),
    columns=data.columns
)

# MinMaxScaler: scales to [0, 1] — good for neural networks
minmax = MinMaxScaler()
data_minmax = pd.DataFrame(
    minmax.fit_transform(data),
    columns=data.columns
)

# RobustScaler: uses median and IQR — best when outliers are present
robust = RobustScaler()
data_robust = pd.DataFrame(
    robust.fit_transform(data),
    columns=data.columns
)

print("StandardScaler:\n", data_standard.round(3))

When to use each:

StandardScaler: default choice, works well when features are roughly normally distributed
MinMaxScaler: when you need values in a specific range (e.g., neural network inputs)
RobustScaler: when your data has significant outliers

Transformations for Skewed Features

Right-skewed distributions (like income, price, count) often benefit from transformation:

import matplotlib.pyplot as plt
from scipy import stats

# Salary is right-skewed: most people earn moderate amounts, few earn very high
salary = np.array([30000, 35000, 40000, 45000, 50000, 60000, 75000, 100000, 250000, 500000])

# Log transformation (compress large values)
salary_log = np.log1p(salary)  # log(x+1) to handle zeros

# Box-Cox transformation (finds optimal transformation)
salary_boxcox, lambda_val = stats.boxcox(salary)
print(f"Optimal Box-Cox lambda: {lambda_val:.3f}")

# Visualize before/after
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].hist(salary, bins=20)
axes[0].set_title('Original (right-skewed)')
axes[1].hist(salary_log, bins=20)
axes[1].set_title('Log transformed')
axes[2].hist(salary_boxcox, bins=20)
axes[2].set_title('Box-Cox transformed')
plt.tight_layout()
plt.show()

Binning Continuous Variables

Converting continuous to categorical can capture non-linear relationships:

df = pd.DataFrame({'age': [22, 25, 31, 42, 48, 55, 63, 70]})

# Equal-width bins
df['age_group'] = pd.cut(df['age'],
                          bins=[0, 30, 45, 60, 100],
                          labels=['Young Adult', 'Middle', 'Senior', 'Elder'])

# Quantile-based bins (equal-size groups)
df['age_quantile'] = pd.qcut(df['age'], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])

print(df[['age', 'age_group', 'age_quantile']])

Categorical Features

One-Hot Encoding

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

df = pd.DataFrame({
    'city': ['New York', 'London', 'Tokyo', 'New York', 'Paris', 'London'],
    'category': ['A', 'B', 'A', 'C', 'B', 'A']
})

# pandas get_dummies (simple, good for exploration)
dummies = pd.get_dummies(df, columns=['city', 'category'], drop_first=True)

# scikit-learn OneHotEncoder (better for pipelines)
encoder = OneHotEncoder(drop='first', sparse_output=False, handle_unknown='ignore')
encoded = encoder.fit_transform(df[['city', 'category']])
print("Encoded shape:", encoded.shape)
print("Feature names:", encoder.get_feature_names_out())

Ordinal Encoding

For categories with natural ordering:

from sklearn.preprocessing import OrdinalEncoder

df = pd.DataFrame({'education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor']})

encoder = OrdinalEncoder(categories=[['High School', 'Bachelor', 'Master', 'PhD']])
df['education_encoded'] = encoder.fit_transform(df[['education']])

# High School=0, Bachelor=1, Master=2, PhD=3
print(df)

Target Encoding (for High-Cardinality)

For features with hundreds of categories, one-hot creates too many columns. Target encoding replaces each category with the mean target value for that category:

import numpy as np
import pandas as pd
from sklearn.model_selection import KFold

def target_encode_cv(train_df, val_df, column, target, n_splits=5):
    """Cross-validated target encoding (prevents data leakage)"""
    # Train set: encode using cross-validation to avoid leakage
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
    train_encoded = np.zeros(len(train_df))
    
    for train_idx, val_idx in kf.split(train_df):
        mean_map = train_df.iloc[train_idx].groupby(column)[target].mean()
        train_encoded[val_idx] = train_df.iloc[val_idx][column].map(mean_map)
    
    # Fill unmapped with global mean
    global_mean = train_df[target].mean()
    train_encoded = np.where(np.isnan(train_encoded), global_mean, train_encoded)
    
    # Validation/test set: encode using all training data
    mean_map_full = train_df.groupby(column)[target].mean()
    val_encoded = val_df[column].map(mean_map_full).fillna(global_mean)
    
    return train_encoded, val_encoded.values

# Example usage
train_df = pd.DataFrame({
    'city': ['NYC', 'LA', 'NYC', 'SF', 'LA', 'NYC', 'SF', 'LA'],
    'target': [1, 0, 1, 1, 0, 0, 1, 1]
})

Datetime Features

Datetime columns are gold mines for features:

import pandas as pd

df = pd.DataFrame({
    'transaction_time': pd.to_datetime([
        '2024-01-15 09:23:00', '2024-07-04 18:45:00',
        '2024-12-24 14:30:00', '2024-03-17 02:15:00'
    ])
})

# Extract all useful time components
df['hour'] = df['transaction_time'].dt.hour
df['day_of_week'] = df['transaction_time'].dt.dayofweek    # 0=Monday, 6=Sunday
df['day_of_month'] = df['transaction_time'].dt.day
df['month'] = df['transaction_time'].dt.month
df['quarter'] = df['transaction_time'].dt.quarter
df['year'] = df['transaction_time'].dt.year
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
df['is_business_hours'] = ((df['hour'] >= 9) & (df['hour'] < 17)).astype(int)

# Cyclical encoding (hour 23 is close to hour 0 — linear encoding misses this)
df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)
df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)

print(df[['transaction_time', 'hour', 'is_weekend', 'is_business_hours', 'hour_sin', 'hour_cos']])

Why cyclical encoding matters: Hours 23 and 0 are only 1 hour apart, but linear encoding gives them values 23 and 0 — making them appear far apart. Sine/cosine encoding correctly represents the circular nature.

Interaction Features

Combining features can capture relationships that individual features miss:

df = pd.DataFrame({
    'age': [25, 45, 35, 55, 28],
    'income': [50000, 120000, 75000, 180000, 45000],
    'credit_score': [650, 780, 720, 800, 600]
})

# Ratio features (often more informative than raw values)
df['income_per_age'] = df['income'] / df['age']
df['credit_score_normalized'] = df['credit_score'] / 850  # Normalize to max possible

# Polynomial features (capture non-linear relationships)
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
poly_features = poly.fit_transform(df[['age', 'income', 'credit_score']])
poly_df = pd.DataFrame(poly_features, columns=poly.get_feature_names_out())
print("Polynomial feature names:", poly.get_feature_names_out())

Aggregation Features

For datasets with one-to-many relationships (e.g., customers and transactions):

transactions = pd.DataFrame({
    'customer_id': [1, 1, 1, 2, 2, 3, 3, 3, 3],
    'amount': [50, 120, 75, 200, 45, 30, 80, 150, 25],
    'category': ['food', 'retail', 'food', 'retail', 'food', 'food', 'retail', 'retail', 'food'],
    'date': pd.to_datetime(['2024-01-01', '2024-01-05', '2024-01-10',
                             '2024-01-03', '2024-01-08', '2024-01-02',
                             '2024-01-04', '2024-01-09', '2024-01-12'])
})

# Aggregate to customer level
customer_features = transactions.groupby('customer_id').agg(
    total_transactions=('amount', 'count'),
    total_spend=('amount', 'sum'),
    avg_spend=('amount', 'mean'),
    max_spend=('amount', 'max'),
    min_spend=('amount', 'min'),
    spend_std=('amount', 'std'),
    unique_categories=('category', 'nunique'),
    days_active=('date', lambda x: (x.max() - x.min()).days)
).reset_index()

# Category-specific features
category_pivot = transactions.pivot_table(
    values='amount', index='customer_id', columns='category', aggfunc='sum', fill_value=0
)
category_pivot.columns = [f'spend_{c}' for c in category_pivot.columns]

customer_features = customer_features.merge(category_pivot, on='customer_id')
print(customer_features)

Handling Missing Values

from sklearn.impute import SimpleImputer, KNNImputer
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'age': [25, np.nan, 35, 45, np.nan],
    'income': [50000, 80000, np.nan, 120000, 45000],
    'education': ['Bachelor', 'Master', np.nan, 'PhD', 'Bachelor']
})

# Strategy 1: Simple imputation
# Numerical — use median (robust to outliers)
num_imputer = SimpleImputer(strategy='median')
df[['age', 'income']] = num_imputer.fit_transform(df[['age', 'income']])

# Categorical — use most frequent
cat_imputer = SimpleImputer(strategy='most_frequent')
df[['education']] = cat_imputer.fit_transform(df[['education']])

# Strategy 2: Add missingness indicator (when missing is informative)
df_with_indicator = pd.DataFrame({
    'age': [25, np.nan, 35, 45, np.nan],
    'income': [50000, 80000, np.nan, 120000, 45000]
})

# First, create indicator columns
df_with_indicator['age_missing'] = df_with_indicator['age'].isnull().astype(int)
df_with_indicator['income_missing'] = df_with_indicator['income'].isnull().astype(int)
# Then impute
df_with_indicator[['age', 'income']] = SimpleImputer(strategy='median').fit_transform(
    df_with_indicator[['age', 'income']]
)

# Strategy 3: KNN imputation (uses similar rows to impute)
knn_imputer = KNNImputer(n_neighbors=3)
df_knn = pd.DataFrame({
    'age': [25, np.nan, 35, 45, np.nan],
    'income': [50000, 80000, np.nan, 120000, 45000]
})
df_knn_imputed = pd.DataFrame(knn_imputer.fit_transform(df_knn), columns=df_knn.columns)

Feature Selection

After creating features, select the most informative ones:

from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt

# Method 1: Feature importance from tree models
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

feature_importance = pd.Series(rf.feature_importances_, index=feature_names).sort_values(ascending=False)
print("Top 10 features by importance:")
print(feature_importance.head(10))

# Method 2: Statistical selection (SelectKBest)
selector = SelectKBest(score_func=mutual_info_classif, k=10)
X_selected = selector.fit_transform(X_train, y_train)
selected_features = [feature_names[i] for i in selector.get_support(indices=True)]
print("Selected features:", selected_features)

# Method 3: Remove highly correlated features
def remove_correlated_features(df, threshold=0.95):
    corr_matrix = df.corr().abs()
    upper_triangle = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
    to_drop = [col for col in upper_triangle.columns if any(upper_triangle[col] > threshold)]
    print(f"Removing {len(to_drop)} highly correlated features: {to_drop}")
    return df.drop(columns=to_drop)

X_train_uncorrelated = remove_correlated_features(pd.DataFrame(X_train, columns=feature_names))

The Feature Engineering Workflow

1. Explore raw data
   - distributions, missing values, outliers
   - correlation with target
   - domain-specific patterns

2. Basic preprocessing
   - Handle missing values
   - Encode categoricals
   - Scale numerics

3. Create new features
   - Datetime decomposition
   - Interaction terms
   - Domain-specific aggregations
   - Ratio features

4. Select features
   - Remove highly correlated
   - Feature importance ranking
   - Validate: do added features improve CV score?

5. Iterate
   - Test each feature's contribution
   - Remove features that don't improve validation score

Conclusion

Feature engineering is where domain knowledge and ML skill intersect. The best features come from understanding the problem deeply: why do customers churn, what makes a transaction fraudulent, what time patterns drive sales. No automated feature generation tool replaces this understanding.

The practical rule: always test features on validation data before including them in production. A feature that looks informative can still add noise rather than signal — cross-validation scores tell you the truth.

For the modeling skills that use these features, see our scikit-learn tutorial and machine learning beginners guide.

Frequently Asked Questions

Feature engineering is the process of using domain knowledge to create, transform, and select input variables (features) that help ML models learn patterns more effectively. Raw data rarely comes in a form that algorithms can directly use — feature engineering bridges that gap. Examples: from a datetime column, you might extract hour of day, day of week, and whether it's a holiday (three features that capture temporal patterns better than a raw timestamp). From text, you might extract length, sentiment score, and keyword presence. Good feature engineering can improve model accuracy more than switching from a simple algorithm to a complex one.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

machine learning data visualization and model training — best machine learning courses in 2025

AI Learning

Best Machine Learning Courses in 2025: Ranked After Taking Them All

The best machine learning courses in 2025 — ranked by a practitioner who completed them. Honest assessments of Coursera, Fast.ai, Kaggle, and 7 others with cost and time required.

May 27, 2026 10 min read

machine learning data visualization and model training — computer vision tutorial

AI Learning

Computer Vision Tutorial: Build an Image Classifier from Scratch

Computer vision tutorial for beginners — build a real image classifier using CNNs and PyTorch, understand how computers see images, and learn transfer learning for production results.

May 27, 2026 9 min read

machine learning data visualization and model training — kaggle competition guide

AI Learning

Kaggle Competition Guide: How to Rank in the Top 10% Every Time

Kaggle competition guide — the systematic approach to finishing in the top 10%, from EDA and baseline models to ensembling and post-competition learning, used by Kaggle Masters.

May 27, 2026 8 min read

machine learning data visualization and model training — machine learning for beginners machine learning beginners

AI Learning

🔥 Trending

Machine Learning for Beginners: A Honest Guide to Getting Started

Machine learning for beginners explained honestly — what ML actually is, which skills you need first, the fastest learning path, and what to build to prove you can do it.

May 27, 2026 9 min read

Go deeper on this topic

NotesLLM Core Concepts Explained NotesML Learning Paradigms: Complete Guide CourseMachine Learning CourseMachine Learning Fundamentals NotesPrompt Engineering Cheat Sheet NotesChatGPT Tips & Tricks Cheat Sheet

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Machine Learning

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

⚡ Quick Answer

Feature engineering guide for machine learning — practical techniques to create, transform, and select features that improve model accuracy, with Python code examples for every method.

AiTechWorlds Team May 27, 2026 9 min read

#feature-engineering-guide #feature-selection-ml #ml-preprocessing #machine-learning

📚Part of the Machine Learning guide — explore all Machine Learning articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

I've seen teams spend weeks tuning neural network architectures and hyperparameters and gain 1-2% improvement. Then a thoughtful domain expert suggested three new features, and accuracy jumped 8%.

This guide covers every major feature engineering technique with Python code, when to apply each, and how to think about feature creation from a domain perspective.

Numerical Features

Scaling and Normalization

Many algorithms (SVM, neural networks, logistic regression, KNN) are sensitive to feature magnitude. Features on different scales cause some features to dominate:

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

# Generate example data
data = pd.DataFrame({
    'age': [25, 45, 35, 55, 28],
    'salary': [50000, 120000, 75000, 180000, 45000],
    'years_experience': [2, 20, 10, 30, 3]
})

# StandardScaler: mean=0, std=1 — best for most algorithms
standard = StandardScaler()
data_standard = pd.DataFrame(
    standard.fit_transform(data),
    columns=data.columns
)

# MinMaxScaler: scales to [0, 1] — good for neural networks
minmax = MinMaxScaler()
data_minmax = pd.DataFrame(
    minmax.fit_transform(data),
    columns=data.columns
)

# RobustScaler: uses median and IQR — best when outliers are present
robust = RobustScaler()
data_robust = pd.DataFrame(
    robust.fit_transform(data),
    columns=data.columns
)

print("StandardScaler:\n", data_standard.round(3))

When to use each:

StandardScaler: default choice, works well when features are roughly normally distributed
MinMaxScaler: when you need values in a specific range (e.g., neural network inputs)
RobustScaler: when your data has significant outliers

Transformations for Skewed Features

Right-skewed distributions (like income, price, count) often benefit from transformation:

import matplotlib.pyplot as plt
from scipy import stats

# Salary is right-skewed: most people earn moderate amounts, few earn very high
salary = np.array([30000, 35000, 40000, 45000, 50000, 60000, 75000, 100000, 250000, 500000])

# Log transformation (compress large values)
salary_log = np.log1p(salary)  # log(x+1) to handle zeros

# Box-Cox transformation (finds optimal transformation)
salary_boxcox, lambda_val = stats.boxcox(salary)
print(f"Optimal Box-Cox lambda: {lambda_val:.3f}")

# Visualize before/after
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].hist(salary, bins=20)
axes[0].set_title('Original (right-skewed)')
axes[1].hist(salary_log, bins=20)
axes[1].set_title('Log transformed')
axes[2].hist(salary_boxcox, bins=20)
axes[2].set_title('Box-Cox transformed')
plt.tight_layout()
plt.show()

Binning Continuous Variables

Converting continuous to categorical can capture non-linear relationships:

df = pd.DataFrame({'age': [22, 25, 31, 42, 48, 55, 63, 70]})

# Equal-width bins
df['age_group'] = pd.cut(df['age'],
                          bins=[0, 30, 45, 60, 100],
                          labels=['Young Adult', 'Middle', 'Senior', 'Elder'])

# Quantile-based bins (equal-size groups)
df['age_quantile'] = pd.qcut(df['age'], q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])

print(df[['age', 'age_group', 'age_quantile']])

Categorical Features

One-Hot Encoding

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

df = pd.DataFrame({
    'city': ['New York', 'London', 'Tokyo', 'New York', 'Paris', 'London'],
    'category': ['A', 'B', 'A', 'C', 'B', 'A']
})

# pandas get_dummies (simple, good for exploration)
dummies = pd.get_dummies(df, columns=['city', 'category'], drop_first=True)

# scikit-learn OneHotEncoder (better for pipelines)
encoder = OneHotEncoder(drop='first', sparse_output=False, handle_unknown='ignore')
encoded = encoder.fit_transform(df[['city', 'category']])
print("Encoded shape:", encoded.shape)
print("Feature names:", encoder.get_feature_names_out())

Ordinal Encoding

For categories with natural ordering:

from sklearn.preprocessing import OrdinalEncoder

df = pd.DataFrame({'education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor']})

encoder = OrdinalEncoder(categories=[['High School', 'Bachelor', 'Master', 'PhD']])
df['education_encoded'] = encoder.fit_transform(df[['education']])

# High School=0, Bachelor=1, Master=2, PhD=3
print(df)

Target Encoding (for High-Cardinality)

For features with hundreds of categories, one-hot creates too many columns. Target encoding replaces each category with the mean target value for that category:

import numpy as np
import pandas as pd
from sklearn.model_selection import KFold

def target_encode_cv(train_df, val_df, column, target, n_splits=5):
    """Cross-validated target encoding (prevents data leakage)"""
    # Train set: encode using cross-validation to avoid leakage
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
    train_encoded = np.zeros(len(train_df))
    
    for train_idx, val_idx in kf.split(train_df):
        mean_map = train_df.iloc[train_idx].groupby(column)[target].mean()
        train_encoded[val_idx] = train_df.iloc[val_idx][column].map(mean_map)
    
    # Fill unmapped with global mean
    global_mean = train_df[target].mean()
    train_encoded = np.where(np.isnan(train_encoded), global_mean, train_encoded)
    
    # Validation/test set: encode using all training data
    mean_map_full = train_df.groupby(column)[target].mean()
    val_encoded = val_df[column].map(mean_map_full).fillna(global_mean)
    
    return train_encoded, val_encoded.values

# Example usage
train_df = pd.DataFrame({
    'city': ['NYC', 'LA', 'NYC', 'SF', 'LA', 'NYC', 'SF', 'LA'],
    'target': [1, 0, 1, 1, 0, 0, 1, 1]
})

Datetime Features

Datetime columns are gold mines for features:

import pandas as pd

df = pd.DataFrame({
    'transaction_time': pd.to_datetime([
        '2024-01-15 09:23:00', '2024-07-04 18:45:00',
        '2024-12-24 14:30:00', '2024-03-17 02:15:00'
    ])
})

# Extract all useful time components
df['hour'] = df['transaction_time'].dt.hour
df['day_of_week'] = df['transaction_time'].dt.dayofweek    # 0=Monday, 6=Sunday
df['day_of_month'] = df['transaction_time'].dt.day
df['month'] = df['transaction_time'].dt.month
df['quarter'] = df['transaction_time'].dt.quarter
df['year'] = df['transaction_time'].dt.year
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
df['is_business_hours'] = ((df['hour'] >= 9) & (df['hour'] < 17)).astype(int)

# Cyclical encoding (hour 23 is close to hour 0 — linear encoding misses this)
df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)
df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)

print(df[['transaction_time', 'hour', 'is_weekend', 'is_business_hours', 'hour_sin', 'hour_cos']])

Interaction Features

Combining features can capture relationships that individual features miss:

df = pd.DataFrame({
    'age': [25, 45, 35, 55, 28],
    'income': [50000, 120000, 75000, 180000, 45000],
    'credit_score': [650, 780, 720, 800, 600]
})

# Ratio features (often more informative than raw values)
df['income_per_age'] = df['income'] / df['age']
df['credit_score_normalized'] = df['credit_score'] / 850  # Normalize to max possible

# Polynomial features (capture non-linear relationships)
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
poly_features = poly.fit_transform(df[['age', 'income', 'credit_score']])
poly_df = pd.DataFrame(poly_features, columns=poly.get_feature_names_out())
print("Polynomial feature names:", poly.get_feature_names_out())

Aggregation Features

For datasets with one-to-many relationships (e.g., customers and transactions):

transactions = pd.DataFrame({
    'customer_id': [1, 1, 1, 2, 2, 3, 3, 3, 3],
    'amount': [50, 120, 75, 200, 45, 30, 80, 150, 25],
    'category': ['food', 'retail', 'food', 'retail', 'food', 'food', 'retail', 'retail', 'food'],
    'date': pd.to_datetime(['2024-01-01', '2024-01-05', '2024-01-10',
                             '2024-01-03', '2024-01-08', '2024-01-02',
                             '2024-01-04', '2024-01-09', '2024-01-12'])
})

# Aggregate to customer level
customer_features = transactions.groupby('customer_id').agg(
    total_transactions=('amount', 'count'),
    total_spend=('amount', 'sum'),
    avg_spend=('amount', 'mean'),
    max_spend=('amount', 'max'),
    min_spend=('amount', 'min'),
    spend_std=('amount', 'std'),
    unique_categories=('category', 'nunique'),
    days_active=('date', lambda x: (x.max() - x.min()).days)
).reset_index()

# Category-specific features
category_pivot = transactions.pivot_table(
    values='amount', index='customer_id', columns='category', aggfunc='sum', fill_value=0
)
category_pivot.columns = [f'spend_{c}' for c in category_pivot.columns]

customer_features = customer_features.merge(category_pivot, on='customer_id')
print(customer_features)

Handling Missing Values

from sklearn.impute import SimpleImputer, KNNImputer
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'age': [25, np.nan, 35, 45, np.nan],
    'income': [50000, 80000, np.nan, 120000, 45000],
    'education': ['Bachelor', 'Master', np.nan, 'PhD', 'Bachelor']
})

# Strategy 1: Simple imputation
# Numerical — use median (robust to outliers)
num_imputer = SimpleImputer(strategy='median')
df[['age', 'income']] = num_imputer.fit_transform(df[['age', 'income']])

# Categorical — use most frequent
cat_imputer = SimpleImputer(strategy='most_frequent')
df[['education']] = cat_imputer.fit_transform(df[['education']])

# Strategy 2: Add missingness indicator (when missing is informative)
df_with_indicator = pd.DataFrame({
    'age': [25, np.nan, 35, 45, np.nan],
    'income': [50000, 80000, np.nan, 120000, 45000]
})

# First, create indicator columns
df_with_indicator['age_missing'] = df_with_indicator['age'].isnull().astype(int)
df_with_indicator['income_missing'] = df_with_indicator['income'].isnull().astype(int)
# Then impute
df_with_indicator[['age', 'income']] = SimpleImputer(strategy='median').fit_transform(
    df_with_indicator[['age', 'income']]
)

# Strategy 3: KNN imputation (uses similar rows to impute)
knn_imputer = KNNImputer(n_neighbors=3)
df_knn = pd.DataFrame({
    'age': [25, np.nan, 35, 45, np.nan],
    'income': [50000, 80000, np.nan, 120000, 45000]
})
df_knn_imputed = pd.DataFrame(knn_imputer.fit_transform(df_knn), columns=df_knn.columns)

Feature Selection

After creating features, select the most informative ones:

from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt

# Method 1: Feature importance from tree models
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

feature_importance = pd.Series(rf.feature_importances_, index=feature_names).sort_values(ascending=False)
print("Top 10 features by importance:")
print(feature_importance.head(10))

# Method 2: Statistical selection (SelectKBest)
selector = SelectKBest(score_func=mutual_info_classif, k=10)
X_selected = selector.fit_transform(X_train, y_train)
selected_features = [feature_names[i] for i in selector.get_support(indices=True)]
print("Selected features:", selected_features)

# Method 3: Remove highly correlated features
def remove_correlated_features(df, threshold=0.95):
    corr_matrix = df.corr().abs()
    upper_triangle = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
    to_drop = [col for col in upper_triangle.columns if any(upper_triangle[col] > threshold)]
    print(f"Removing {len(to_drop)} highly correlated features: {to_drop}")
    return df.drop(columns=to_drop)

X_train_uncorrelated = remove_correlated_features(pd.DataFrame(X_train, columns=feature_names))

The Feature Engineering Workflow

1. Explore raw data
   - distributions, missing values, outliers
   - correlation with target
   - domain-specific patterns

2. Basic preprocessing
   - Handle missing values
   - Encode categoricals
   - Scale numerics

3. Create new features
   - Datetime decomposition
   - Interaction terms
   - Domain-specific aggregations
   - Ratio features

4. Select features
   - Remove highly correlated
   - Feature importance ranking
   - Validate: do added features improve CV score?

5. Iterate
   - Test each feature's contribution
   - Remove features that don't improve validation score

Conclusion

For the modeling skills that use these features, see our scikit-learn tutorial and machine learning beginners guide.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI Learning

Best Machine Learning Courses in 2025: Ranked After Taking Them All

The best machine learning courses in 2025 — ranked by a practitioner who completed them. Honest assessments of Coursera, Fast.ai, Kaggle, and 7 others with cost and time required.

May 27, 2026 10 min read

AI Learning

Computer Vision Tutorial: Build an Image Classifier from Scratch

Computer vision tutorial for beginners — build a real image classifier using CNNs and PyTorch, understand how computers see images, and learn transfer learning for production results.

May 27, 2026 9 min read

AI Learning

Kaggle Competition Guide: How to Rank in the Top 10% Every Time

Kaggle competition guide — the systematic approach to finishing in the top 10%, from EDA and baseline models to ensembling and post-competition learning, used by Kaggle Masters.

May 27, 2026 8 min read

AI Learning

🔥 Trending

Machine Learning for Beginners: A Honest Guide to Getting Started

Machine learning for beginners explained honestly — what ML actually is, which skills you need first, the fastest learning path, and what to build to prove you can do it.

May 27, 2026 9 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

Numerical Features

Scaling and Normalization

Transformations for Skewed Features

Binning Continuous Variables

Categorical Features

One-Hot Encoding

Ordinal Encoding

Target Encoding (for High-Cardinality)

Datetime Features

Interaction Features

Aggregation Features

Handling Missing Values

Feature Selection

The Feature Engineering Workflow

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

Best Machine Learning Courses in 2025: Ranked After Taking Them All

Computer Vision Tutorial: Build an Image Classifier from Scratch

Kaggle Competition Guide: How to Rank in the Top 10% Every Time

Machine Learning for Beginners: A Honest Guide to Getting Started

Go deeper on this topic

Get Free AI Notes Daily

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

Feature Engineering Guide: Turn Raw Data into Powerful ML Inputs

Numerical Features

Scaling and Normalization

Transformations for Skewed Features

Binning Continuous Variables

Categorical Features

One-Hot Encoding

Ordinal Encoding

Target Encoding (for High-Cardinality)

Datetime Features

Interaction Features

Aggregation Features

Handling Missing Values

Feature Selection

The Feature Engineering Workflow

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

Best Machine Learning Courses in 2025: Ranked After Taking Them All

Computer Vision Tutorial: Build an Image Classifier from Scratch

Kaggle Competition Guide: How to Rank in the Top 10% Every Time

Machine Learning for Beginners: A Honest Guide to Getting Started

Go deeper on this topic

Get Free AI Notes Daily