What is Pandas used for?

Pandas is used for loading, cleaning, transforming, and analyzing tabular data (like spreadsheets). It's the foundation of almost all Python data science work.

Is Pandas better than Excel?

For large datasets (100K+ rows), complex transformations, reproducible analysis, and automation, Python Pandas is far superior. Excel is better for quick, one-off analysis by non-programmers.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

Python code editor with script on monitor — python pandas tutorial 2026 data analysis for

Python Development

Python Pandas Tutorial 2026 — Data Analysis for Beginners to Pro

⚡ Quick Answer

Master Python Pandas for data analysis. Learn to load, clean, transform, and visualize data with hands-on examples. From CSV files to actionable insights — complete 2026 guide.

AiTechWorlds Team May 3, 2026 7 min readUpdated May 15, 2026

#python #pandas #data-analysis #data-science #numpy #matplotlib #coding

📚Part of the Python Development guide — explore all Python Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Python Pandas Tutorial 2026 — Turn Raw Data Into Real Insights

Every data analyst has a turning point. For most, it happens the first time they open a messy 50,000-row CSV in Excel, watch it grind slowly, and realize there has to be a better way.

That better way is Pandas.

Pandas is the Python library that turns raw, messy data into clean, actionable insights. It is the backbone of data science, used by data analysts at every company from startups to Fortune 500s. And it is surprisingly enjoyable to learn.

This tutorial teaches you Pandas with real datasets and real analysis techniques, from loading your first CSV to building a complete data analysis project.

Why Pandas?

Before writing code, here is what makes Pandas worth learning:

Speed: Handles millions of rows without breaking a sweat
Flexibility: Works with CSV, Excel, JSON, SQL databases, APIs
Power: Filter, group, pivot, merge, reshape data with single lines
Ecosystem: Connects seamlessly with NumPy, Matplotlib, scikit-learn, and Plotly
Automation: Run the same analysis on new data every week automatically

Setup

pip install pandas numpy matplotlib openpyxl

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(pd.__version__)  # Should be 2.x+

The Two Core Data Structures

Series — A Single Column

# A Series is like a labeled array
sales = pd.Series([1200, 1800, 950, 2300, 1650],
                  index=["Mon", "Tue", "Wed", "Thu", "Fri"])

print(sales)
print(f"Total: {sales.sum()}")
print(f"Average: {sales.mean():.0f}")
print(f"Best day: {sales.idxmax()} (${sales.max()})")

DataFrame — A Table

# A DataFrame is a table with labeled rows and columns
data = {
    "name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
    "department": ["Engineering", "Marketing", "Engineering", "HR", "Marketing"],
    "salary": [95000, 72000, 88000, 65000, 78000],
    "years": [5, 3, 7, 2, 4],
    "remote": [True, False, True, True, False],
}

df = pd.DataFrame(data)
print(df)
print(f"\nShape: {df.shape}")  # (5, 5)
print(f"Columns: {list(df.columns)}")

Loading Data

From CSV

# Load a CSV file
df = pd.read_csv("sales_data.csv")

# Load with options
df = pd.read_csv(
    "sales_data.csv",
    parse_dates=["order_date"],     # Parse date columns automatically
    dtype={"product_id": str},     # Force column types
    na_values=["N/A", "unknown"],  # Treat these as NaN
)

print(df.head())        # First 5 rows
print(df.tail(3))       # Last 3 rows
print(df.shape)         # (rows, columns)
print(df.dtypes)        # Data type of each column
print(df.info())        # Summary with memory usage
print(df.describe())    # Statistical summary for numeric columns

From Excel

df = pd.read_excel("report.xlsx", sheet_name="Sales Q1")

From a SQL Database

import sqlite3

conn = sqlite3.connect("company.db")
df = pd.read_sql("SELECT * FROM orders WHERE year = 2026", conn)
conn.close()

Selecting and Filtering Data

# Select columns
names = df["name"]               # Single column → Series
subset = df[["name", "salary"]]  # Multiple columns → DataFrame

# Select rows by position
first_row = df.iloc[0]
first_three = df.iloc[:3]
last_two = df.iloc[-2:]

# Select rows by label (index)
row = df.loc[0]

# Filter rows with conditions
engineers = df[df["department"] == "Engineering"]
high_earners = df[df["salary"] > 80000]
senior_engineers = df[(df["department"] == "Engineering") & (df["years"] > 4)]
remote_or_senior = df[(df["remote"] == True) | (df["years"] > 5)]

# Query method — cleaner syntax
result = df.query("salary > 80000 and department == 'Engineering'")

Data Cleaning

Real-world data is messy. Cleaning it is 80% of the job.

# Check for missing values
print(df.isnull().sum())
print(f"Missing %: {df.isnull().mean() * 100}")

# Drop rows with any missing value
df_clean = df.dropna()

# Fill missing values
df["age"].fillna(df["age"].median(), inplace=True)         # Fill with median
df["category"].fillna("Unknown", inplace=True)             # Fill with constant
df["sales"].fillna(method="ffill", inplace=True)          # Forward fill

# Remove duplicates
df = df.drop_duplicates()
df = df.drop_duplicates(subset=["email"])  # Based on specific column

# Fix data types
df["order_date"] = pd.to_datetime(df["order_date"])
df["price"] = df["price"].str.replace("$", "").astype(float)
df["zip_code"] = df["zip_code"].astype(str).str.zfill(5)  # Pad to 5 digits

# Rename columns
df.rename(columns={"qty": "quantity", "amt": "amount"}, inplace=True)

# Strip whitespace from strings
df["name"] = df["name"].str.strip()
df["email"] = df["email"].str.lower().str.strip()

Transforming Data

# Add new columns
df["annual_bonus"] = df["salary"] * 0.10
df["total_comp"] = df["salary"] + df["annual_bonus"]
df["experience_level"] = df["years"].apply(
    lambda y: "Junior" if y < 3 else ("Mid" if y < 7 else "Senior")
)

# Apply a function to each row
def categorize_salary(row):
    if row["salary"] > 90000:
        return "High"
    elif row["salary"] > 70000:
        return "Medium"
    return "Low"

df["salary_band"] = df.apply(categorize_salary, axis=1)

# Map values
dept_codes = {"Engineering": "ENG", "Marketing": "MKT", "HR": "HR"}
df["dept_code"] = df["department"].map(dept_codes)

# String operations
df["first_name"] = df["name"].str.split(" ").str[0]
df["email_domain"] = df["email"].str.split("@").str[1]

Grouping and Aggregation

This is where Pandas truly shines — answering business questions instantly.

# Group by one column
dept_summary = df.groupby("department")["salary"].agg(["mean", "min", "max", "count"])
print(dept_summary)

# Multiple aggregations
summary = df.groupby("department").agg(
    avg_salary=("salary", "mean"),
    total_employees=("name", "count"),
    avg_years=("years", "mean"),
    max_salary=("salary", "max"),
)

# Group by multiple columns
pivot = df.groupby(["department", "remote"])["salary"].mean().unstack()

# Value counts — great for categorical analysis
print(df["department"].value_counts())
print(df["experience_level"].value_counts(normalize=True))  # Percentages

Merging DataFrames

# Join two DataFrames (like SQL JOIN)
orders = pd.DataFrame({
    "order_id": [1, 2, 3, 4],
    "customer_id": [101, 102, 101, 103],
    "amount": [250, 180, 320, 95]
})

customers = pd.DataFrame({
    "customer_id": [101, 102, 103],
    "name": ["Alice", "Bob", "Charlie"],
    "city": ["NYC", "LA", "Chicago"]
})

# Inner join (only matching rows)
merged = pd.merge(orders, customers, on="customer_id", how="inner")

# Left join (all orders, even without customer match)
merged_left = pd.merge(orders, customers, on="customer_id", how="left")

# Concatenate (stack DataFrames vertically)
q1 = pd.read_csv("q1_sales.csv")
q2 = pd.read_csv("q2_sales.csv")
full_year = pd.concat([q1, q2], ignore_index=True)

Working with Dates

df["order_date"] = pd.to_datetime(df["order_date"])

# Extract date components
df["year"] = df["order_date"].dt.year
df["month"] = df["order_date"].dt.month
df["day_of_week"] = df["order_date"].dt.day_name()
df["quarter"] = df["order_date"].dt.quarter

# Filter by date range
jan_orders = df[df["order_date"].dt.month == 1]
recent = df[df["order_date"] >= "2026-01-01"]

# Resample time series
monthly = df.set_index("order_date").resample("ME")["amount"].sum()
weekly = df.set_index("order_date").resample("W")["amount"].mean()

Visualization with Matplotlib

import matplotlib.pyplot as plt

# Bar chart
dept_salary = df.groupby("department")["salary"].mean()
dept_salary.plot(kind="bar", color="#4f46e5", figsize=(10, 6))
plt.title("Average Salary by Department")
plt.ylabel("Salary ($)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("salary_chart.png", dpi=150)
plt.show()

# Line chart for time series
monthly_sales = df.set_index("order_date").resample("ME")["amount"].sum()
monthly_sales.plot(kind="line", marker="o", figsize=(12, 5))
plt.title("Monthly Sales 2026")
plt.ylabel("Revenue ($)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Histogram
df["salary"].plot(kind="hist", bins=20, color="#0ea5e9", edgecolor="white")
plt.title("Salary Distribution")
plt.xlabel("Salary ($)")
plt.show()

Complete Mini-Project: Sales Analysis

import pandas as pd
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv("ecommerce_sales.csv", parse_dates=["order_date"])

# Clean
df = df.dropna(subset=["amount", "category"])
df["amount"] = pd.to_numeric(df["amount"], errors="coerce")
df = df[df["amount"] > 0]

# Analyze
print("=== Sales Summary ===")
print(f"Total Revenue: ${df['amount'].sum():,.2f}")
print(f"Total Orders: {len(df):,}")
print(f"Average Order: ${df['amount'].mean():.2f}")

print("\n=== Top Categories ===")
print(df.groupby("category")["amount"].sum().sort_values(ascending=False).head(5))

print("\n=== Monthly Trend ===")
monthly = df.set_index("order_date").resample("ME")["amount"].sum()
print(monthly)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart
df.groupby("category")["amount"].sum().sort_values().plot(
    kind="barh", ax=axes[0], color="#4f46e5"
)
axes[0].set_title("Revenue by Category")

# Line chart
monthly.plot(kind="line", ax=axes[1], marker="o", color="#0ea5e9")
axes[1].set_title("Monthly Revenue")
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig("sales_analysis.png", dpi=150)
plt.show()

Where to Go Next

Once you can analyze data with Pandas, the natural next step is machine learning. Our Python for Machine Learning guide builds on everything here — you will use Pandas to prepare data for scikit-learn models.

For collecting your own data to analyze, check out the Python web scraping guide. And if you are still building your Python foundation, start with the Python beginners roadmap.

Pandas Cheat Sheet

Task	Code
Load CSV	`pd.read_csv("file.csv")`
First rows	`df.head()`
Shape	`df.shape`
Missing values	`df.isnull().sum()`
Filter rows	`df[df["col"] > 100]`
Group + aggregate	`df.groupby("col")["val"].mean()`
Sort	`df.sort_values("col", ascending=False)`
Add column	`df["new"] = df["a"] + df["b"]`
Merge DataFrames	`pd.merge(df1, df2, on="id")`
Save to CSV	`df.to_csv("output.csv", index=False)`

Pandas takes practice. The best approach: find a dataset you genuinely care about, load it, and start asking questions. Every analysis you do teaches you something new.

Get Pandas cheat sheets and practice datasets in the AiTechWorlds Telegram channel — free!

Frequently Asked Questions

Basic math helps (mean, median, percentage) but you don't need advanced math. Pandas handles the calculations — you just need to know what to ask for.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Python code editor with script on monitor — the python libraries every developer must know in best python libraries 2025

Programming & Web

The Python Libraries Every Developer Must Know in 2025

The essential Python libraries for 2025: from requests and pandas to FastAPI and LangChain — what each does, when to use it, and how to get started quickly.

May 27, 2026 7 min read

Python code editor with script on monitor — django vs flask in 2025

Programming & Web

Django vs Flask in 2025: Which Framework Should You Learn?

An honest Django vs Flask comparison for 2025 — which Python framework to learn first, when each excels, and why FastAPI has changed the equation.

May 27, 2026 7 min read

Python code editor with script on monitor — fastapi tutorial

Programming & Web

FastAPI Tutorial: Building Your First REST API in 30 Minutes

A hands-on FastAPI tutorial for beginners: build a fully functional REST API in 30 minutes with CRUD endpoints, request validation, and automatic docs.

May 27, 2026 7 min read

Python code editor with script on monitor — jupyter notebook guide jupyter notebook tutorial

Programming & Web

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

A complete Jupyter Notebook guide for 2025: installation, essential shortcuts, best practices, and how data scientists use Jupyter for exploration, analysis, and sharing.

May 27, 2026 7 min read

Go deeper on this topic

InterviewPython BookPython Mastery 2026 NotesPython Syntax Quick Reference BookMachine Learning: A Visual Guide BookPython Cheat Sheet Pack BookMachine Learning Formulas & Concepts

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Python Development

Python Pandas Tutorial 2026 — Data Analysis for Beginners to Pro

⚡ Quick Answer

Master Python Pandas for data analysis. Learn to load, clean, transform, and visualize data with hands-on examples. From CSV files to actionable insights — complete 2026 guide.

AiTechWorlds Team May 3, 2026 7 min readUpdated May 15, 2026

#python #pandas #data-analysis #data-science #numpy #matplotlib #coding

📚Part of the Python Development guide — explore all Python Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Python Pandas Tutorial 2026 — Turn Raw Data Into Real Insights

Every data analyst has a turning point. For most, it happens the first time they open a messy 50,000-row CSV in Excel, watch it grind slowly, and realize there has to be a better way.

That better way is Pandas.

This tutorial teaches you Pandas with real datasets and real analysis techniques, from loading your first CSV to building a complete data analysis project.

Why Pandas?

Before writing code, here is what makes Pandas worth learning:

Speed: Handles millions of rows without breaking a sweat
Flexibility: Works with CSV, Excel, JSON, SQL databases, APIs
Power: Filter, group, pivot, merge, reshape data with single lines
Ecosystem: Connects seamlessly with NumPy, Matplotlib, scikit-learn, and Plotly
Automation: Run the same analysis on new data every week automatically

Setup

pip install pandas numpy matplotlib openpyxl

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(pd.__version__)  # Should be 2.x+

The Two Core Data Structures

Series — A Single Column

# A Series is like a labeled array
sales = pd.Series([1200, 1800, 950, 2300, 1650],
                  index=["Mon", "Tue", "Wed", "Thu", "Fri"])

print(sales)
print(f"Total: {sales.sum()}")
print(f"Average: {sales.mean():.0f}")
print(f"Best day: {sales.idxmax()} (${sales.max()})")

DataFrame — A Table

# A DataFrame is a table with labeled rows and columns
data = {
    "name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
    "department": ["Engineering", "Marketing", "Engineering", "HR", "Marketing"],
    "salary": [95000, 72000, 88000, 65000, 78000],
    "years": [5, 3, 7, 2, 4],
    "remote": [True, False, True, True, False],
}

df = pd.DataFrame(data)
print(df)
print(f"\nShape: {df.shape}")  # (5, 5)
print(f"Columns: {list(df.columns)}")

Loading Data

From CSV

# Load a CSV file
df = pd.read_csv("sales_data.csv")

# Load with options
df = pd.read_csv(
    "sales_data.csv",
    parse_dates=["order_date"],     # Parse date columns automatically
    dtype={"product_id": str},     # Force column types
    na_values=["N/A", "unknown"],  # Treat these as NaN
)

print(df.head())        # First 5 rows
print(df.tail(3))       # Last 3 rows
print(df.shape)         # (rows, columns)
print(df.dtypes)        # Data type of each column
print(df.info())        # Summary with memory usage
print(df.describe())    # Statistical summary for numeric columns

From Excel

df = pd.read_excel("report.xlsx", sheet_name="Sales Q1")

From a SQL Database

import sqlite3

conn = sqlite3.connect("company.db")
df = pd.read_sql("SELECT * FROM orders WHERE year = 2026", conn)
conn.close()

Selecting and Filtering Data

# Select columns
names = df["name"]               # Single column → Series
subset = df[["name", "salary"]]  # Multiple columns → DataFrame

# Select rows by position
first_row = df.iloc[0]
first_three = df.iloc[:3]
last_two = df.iloc[-2:]

# Select rows by label (index)
row = df.loc[0]

# Filter rows with conditions
engineers = df[df["department"] == "Engineering"]
high_earners = df[df["salary"] > 80000]
senior_engineers = df[(df["department"] == "Engineering") & (df["years"] > 4)]
remote_or_senior = df[(df["remote"] == True) | (df["years"] > 5)]

# Query method — cleaner syntax
result = df.query("salary > 80000 and department == 'Engineering'")

Data Cleaning

Real-world data is messy. Cleaning it is 80% of the job.

# Check for missing values
print(df.isnull().sum())
print(f"Missing %: {df.isnull().mean() * 100}")

# Drop rows with any missing value
df_clean = df.dropna()

# Fill missing values
df["age"].fillna(df["age"].median(), inplace=True)         # Fill with median
df["category"].fillna("Unknown", inplace=True)             # Fill with constant
df["sales"].fillna(method="ffill", inplace=True)          # Forward fill

# Remove duplicates
df = df.drop_duplicates()
df = df.drop_duplicates(subset=["email"])  # Based on specific column

# Fix data types
df["order_date"] = pd.to_datetime(df["order_date"])
df["price"] = df["price"].str.replace("$", "").astype(float)
df["zip_code"] = df["zip_code"].astype(str).str.zfill(5)  # Pad to 5 digits

# Rename columns
df.rename(columns={"qty": "quantity", "amt": "amount"}, inplace=True)

# Strip whitespace from strings
df["name"] = df["name"].str.strip()
df["email"] = df["email"].str.lower().str.strip()

Transforming Data

# Add new columns
df["annual_bonus"] = df["salary"] * 0.10
df["total_comp"] = df["salary"] + df["annual_bonus"]
df["experience_level"] = df["years"].apply(
    lambda y: "Junior" if y < 3 else ("Mid" if y < 7 else "Senior")
)

# Apply a function to each row
def categorize_salary(row):
    if row["salary"] > 90000:
        return "High"
    elif row["salary"] > 70000:
        return "Medium"
    return "Low"

df["salary_band"] = df.apply(categorize_salary, axis=1)

# Map values
dept_codes = {"Engineering": "ENG", "Marketing": "MKT", "HR": "HR"}
df["dept_code"] = df["department"].map(dept_codes)

# String operations
df["first_name"] = df["name"].str.split(" ").str[0]
df["email_domain"] = df["email"].str.split("@").str[1]

Grouping and Aggregation

This is where Pandas truly shines — answering business questions instantly.

# Group by one column
dept_summary = df.groupby("department")["salary"].agg(["mean", "min", "max", "count"])
print(dept_summary)

# Multiple aggregations
summary = df.groupby("department").agg(
    avg_salary=("salary", "mean"),
    total_employees=("name", "count"),
    avg_years=("years", "mean"),
    max_salary=("salary", "max"),
)

# Group by multiple columns
pivot = df.groupby(["department", "remote"])["salary"].mean().unstack()

# Value counts — great for categorical analysis
print(df["department"].value_counts())
print(df["experience_level"].value_counts(normalize=True))  # Percentages

Merging DataFrames

# Join two DataFrames (like SQL JOIN)
orders = pd.DataFrame({
    "order_id": [1, 2, 3, 4],
    "customer_id": [101, 102, 101, 103],
    "amount": [250, 180, 320, 95]
})

customers = pd.DataFrame({
    "customer_id": [101, 102, 103],
    "name": ["Alice", "Bob", "Charlie"],
    "city": ["NYC", "LA", "Chicago"]
})

# Inner join (only matching rows)
merged = pd.merge(orders, customers, on="customer_id", how="inner")

# Left join (all orders, even without customer match)
merged_left = pd.merge(orders, customers, on="customer_id", how="left")

# Concatenate (stack DataFrames vertically)
q1 = pd.read_csv("q1_sales.csv")
q2 = pd.read_csv("q2_sales.csv")
full_year = pd.concat([q1, q2], ignore_index=True)

Working with Dates

df["order_date"] = pd.to_datetime(df["order_date"])

# Extract date components
df["year"] = df["order_date"].dt.year
df["month"] = df["order_date"].dt.month
df["day_of_week"] = df["order_date"].dt.day_name()
df["quarter"] = df["order_date"].dt.quarter

# Filter by date range
jan_orders = df[df["order_date"].dt.month == 1]
recent = df[df["order_date"] >= "2026-01-01"]

# Resample time series
monthly = df.set_index("order_date").resample("ME")["amount"].sum()
weekly = df.set_index("order_date").resample("W")["amount"].mean()

Visualization with Matplotlib

import matplotlib.pyplot as plt

# Bar chart
dept_salary = df.groupby("department")["salary"].mean()
dept_salary.plot(kind="bar", color="#4f46e5", figsize=(10, 6))
plt.title("Average Salary by Department")
plt.ylabel("Salary ($)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("salary_chart.png", dpi=150)
plt.show()

# Line chart for time series
monthly_sales = df.set_index("order_date").resample("ME")["amount"].sum()
monthly_sales.plot(kind="line", marker="o", figsize=(12, 5))
plt.title("Monthly Sales 2026")
plt.ylabel("Revenue ($)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Histogram
df["salary"].plot(kind="hist", bins=20, color="#0ea5e9", edgecolor="white")
plt.title("Salary Distribution")
plt.xlabel("Salary ($)")
plt.show()

Complete Mini-Project: Sales Analysis

import pandas as pd
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv("ecommerce_sales.csv", parse_dates=["order_date"])

# Clean
df = df.dropna(subset=["amount", "category"])
df["amount"] = pd.to_numeric(df["amount"], errors="coerce")
df = df[df["amount"] > 0]

# Analyze
print("=== Sales Summary ===")
print(f"Total Revenue: ${df['amount'].sum():,.2f}")
print(f"Total Orders: {len(df):,}")
print(f"Average Order: ${df['amount'].mean():.2f}")

print("\n=== Top Categories ===")
print(df.groupby("category")["amount"].sum().sort_values(ascending=False).head(5))

print("\n=== Monthly Trend ===")
monthly = df.set_index("order_date").resample("ME")["amount"].sum()
print(monthly)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart
df.groupby("category")["amount"].sum().sort_values().plot(
    kind="barh", ax=axes[0], color="#4f46e5"
)
axes[0].set_title("Revenue by Category")

# Line chart
monthly.plot(kind="line", ax=axes[1], marker="o", color="#0ea5e9")
axes[1].set_title("Monthly Revenue")
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig("sales_analysis.png", dpi=150)
plt.show()

Where to Go Next

For collecting your own data to analyze, check out the Python web scraping guide. And if you are still building your Python foundation, start with the Python beginners roadmap.

Pandas Cheat Sheet

Task	Code
Load CSV	`pd.read_csv("file.csv")`
First rows	`df.head()`
Shape	`df.shape`
Missing values	`df.isnull().sum()`
Filter rows	`df[df["col"] > 100]`
Group + aggregate	`df.groupby("col")["val"].mean()`
Sort	`df.sort_values("col", ascending=False)`
Add column	`df["new"] = df["a"] + df["b"]`
Merge DataFrames	`pd.merge(df1, df2, on="id")`
Save to CSV	`df.to_csv("output.csv", index=False)`

Pandas takes practice. The best approach: find a dataset you genuinely care about, load it, and start asking questions. Every analysis you do teaches you something new.

Get Pandas cheat sheets and practice datasets in the AiTechWorlds Telegram channel — free!

Frequently Asked Questions

Basic math helps (mean, median, percentage) but you don't need advanced math. Pandas handles the calculations — you just need to know what to ask for.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Programming & Web

The Python Libraries Every Developer Must Know in 2025

The essential Python libraries for 2025: from requests and pandas to FastAPI and LangChain — what each does, when to use it, and how to get started quickly.

May 27, 2026 7 min read

Programming & Web

Django vs Flask in 2025: Which Framework Should You Learn?

An honest Django vs Flask comparison for 2025 — which Python framework to learn first, when each excels, and why FastAPI has changed the equation.

May 27, 2026 7 min read

Programming & Web

FastAPI Tutorial: Building Your First REST API in 30 Minutes

A hands-on FastAPI tutorial for beginners: build a fully functional REST API in 30 minutes with CRUD endpoints, request validation, and automatic docs.

May 27, 2026 7 min read

Programming & Web

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

A complete Jupyter Notebook guide for 2025: installation, essential shortcuts, best practices, and how data scientists use Jupyter for exploration, analysis, and sharing.

May 27, 2026 7 min read

Go deeper on this topic

InterviewPython BookPython Mastery 2026 NotesPython Syntax Quick Reference BookMachine Learning: A Visual Guide BookPython Cheat Sheet Pack BookMachine Learning Formulas & Concepts

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Python Pandas Tutorial 2026 — Data Analysis for Beginners to Pro

Python Pandas Tutorial 2026 — Turn Raw Data Into Real Insights

Why Pandas?

Setup

The Two Core Data Structures

Series — A Single Column

DataFrame — A Table

Loading Data

From CSV

From Excel

From a SQL Database

Selecting and Filtering Data

Data Cleaning

Transforming Data

Grouping and Aggregation

Merging DataFrames

Working with Dates

Visualization with Matplotlib

Complete Mini-Project: Sales Analysis

Where to Go Next

Pandas Cheat Sheet

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

The Python Libraries Every Developer Must Know in 2025

Django vs Flask in 2025: Which Framework Should You Learn?

FastAPI Tutorial: Building Your First REST API in 30 Minutes

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

Go deeper on this topic

Get Free AI Notes Daily

Python Pandas Tutorial 2026 — Data Analysis for Beginners to Pro

Python Pandas Tutorial 2026 — Turn Raw Data Into Real Insights

Why Pandas?

Setup

The Two Core Data Structures

Series — A Single Column

DataFrame — A Table

Loading Data

From CSV

From Excel

From a SQL Database

Selecting and Filtering Data

Data Cleaning

Transforming Data

Grouping and Aggregation

Merging DataFrames

Working with Dates

Visualization with Matplotlib

Complete Mini-Project: Sales Analysis

Where to Go Next

Pandas Cheat Sheet

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

The Python Libraries Every Developer Must Know in 2025

Django vs Flask in 2025: Which Framework Should You Learn?

FastAPI Tutorial: Building Your First REST API in 30 Minutes

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

Go deeper on this topic

Get Free AI Notes Daily