Setting Up Python & Libraries | Machine Learning Fundamentals | AiTechWorlds

Setting Up Python & Libraries for ML

The right environment setup saves hours of frustration. A broken Python installation where packages conflict, or a machine where import sklearn fails halfway through a tutorial, kills momentum fast. This lesson builds your ML environment properly from the start.

Choosing a Python Version

Use Python 3.10 or 3.11. Here's why:

Python 3.10+ has excellent typing support, which modern ML libraries leverage
Python 3.12 is newer but some libraries lag in support
Python 2 is dead — don't go near it

Check what you have:

python --version
# Python 3.11.4

If you need to install Python, download from python.org and check "Add to PATH" during installation on Windows.

Creating a Virtual Environment

Always use a virtual environment for ML projects. Without one, you install packages globally and they start conflicting with each other within weeks. With one, each project has its own isolated set of dependencies.

Option A: venv (built-in, recommended for most cases)

# Create a new project directory
mkdir ml-project && cd ml-project

# Create virtual environment
python -m venv .venv

# Activate it
# On Windows:
.venv\Scripts\activate

# On Mac/Linux:
source .venv/bin/activate

# Your prompt now shows (.venv) — you're inside the environment
(.venv) $

Deactivate when done: just type deactivate.

Option B: conda (better for data science, handles non-Python dependencies)

# Install Miniconda from conda.io first, then:

# Create environment with a specific Python version
conda create -n ml-env python=3.11

# Activate
conda activate ml-env

# Your prompt shows (ml-env)

Conda handles packages like numpy more reliably on some systems because it ships compiled binaries rather than building from source.

Which to choose: Use venv if you want something lightweight. Use conda if you're on Windows and run into compilation issues with pip, or if you need packages from the scientific Python ecosystem beyond ML.

Installing the Core ML Libraries

With your environment activated, install everything at once:

pip install numpy pandas scikit-learn matplotlib seaborn jupyter ipykernel

Or with conda:

conda install numpy pandas scikit-learn matplotlib seaborn jupyter -c conda-forge

What each library does:

Library	Purpose
`numpy`	Numerical arrays, linear algebra, random number generation
`pandas`	DataFrames, data loading, cleaning, manipulation
`scikit-learn`	ML algorithms, preprocessing, model evaluation
`matplotlib`	Core plotting library, low-level control
`seaborn`	Statistical visualization built on matplotlib, higher-level
`jupyter`	Interactive notebook environment

For more advanced work, add these later:

pip install xgboost lightgbm plotly scipy statsmodels

Jupyter Notebook Setup

Jupyter lets you combine code, output, and written explanation in one document. It's the standard environment for ML exploration.

Start Jupyter:

jupyter notebook
# Opens browser at http://localhost:8888

Or use the newer JupyterLab (better interface):

pip install jupyterlab
jupyter lab

Essential Jupyter tips:

Shift + Enter: run current cell and move to next
Ctrl + Enter: run current cell and stay
B: insert cell below (in command mode, press Esc first)
M: convert cell to Markdown
DD: delete current cell
Ctrl + Shift + -: split cell at cursor

Make sure your virtual environment is available as a kernel:

# Inside your activated environment:
python -m ipykernel install --user --name=ml-env --display-name "Python (ml-env)"

Now when you create a notebook, select "Python (ml-env)" as the kernel. This ensures the notebook uses your project's packages, not the system Python.

Useful notebook magic commands:

# Time how long a cell takes
%timeit model.fit(X_train, y_train)

# Show plots inline (usually automatic in modern Jupyter)
%matplotlib inline

# Reload external modules without restarting kernel
%load_ext autoreload
%autoreload 2

Verifying Your Installation

Run this in a new notebook or Python file. Every import should succeed without errors:

# Run this cell to verify your ML environment is complete

import sys
print(f"Python: {sys.version}")

import numpy as np
print(f"NumPy: {np.__version__}")

import pandas as pd
print(f"Pandas: {pd.__version__}")

import sklearn
print(f"Scikit-learn: {sklearn.__version__}")

import matplotlib
print(f"Matplotlib: {matplotlib.__version__}")

import seaborn as sns
print(f"Seaborn: {sns.__version__}")

print("\nAll libraries imported successfully.")

# Quick sanity check — train a model in 5 lines
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=10, random_state=42)
model.fit(X, y)
print(f"Model trained. Score: {model.score(X, y):.2%}")
# Model trained. Score: 100.00%

If any import fails, the fix is almost always: pip install <library-name>.

VS Code Setup for ML

VS Code is an excellent editor for ML work — it handles both .py scripts and .ipynb notebooks.

Extensions to install:

Python (Microsoft) — essential, adds IntelliSense, linting, debugging
Jupyter (Microsoft) — run notebooks directly inside VS Code
Pylance — faster IntelliSense and type checking

Selecting the right interpreter:

Press Ctrl + Shift + P → type "Python: Select Interpreter" → choose your virtual environment (it will show the path to .venv/Scripts/python.exe or the conda env).

VS Code will now use that environment for running code, imports, and autocomplete.

Useful VS Code settings for ML (add to settings.json):

{
    "python.defaultInterpreterPath": ".venv/Scripts/python",
    "jupyter.notebookFileRoot": "${workspaceFolder}",
    "editor.formatOnSave": true,
    "[python]": {
        "editor.defaultFormatter": "ms-python.black-formatter"
    }
}

Google Colab: Zero-Install Alternative

If you cannot install Python locally, or want to run code on a free GPU, use Google Colab at colab.research.google.com.

Colab is a Jupyter notebook environment that runs entirely in your browser, hosted by Google. It comes with numpy, pandas, scikit-learn, matplotlib, and seaborn pre-installed.

# Colab-specific: install extra packages when needed
!pip install lightgbm xgboost

# Mount Google Drive to access your data files
from google.colab import drive
drive.mount('/content/drive')

# Read a CSV from your Drive
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/data/housing.csv')

Colab limitations to know:

Sessions disconnect after ~90 minutes of inactivity
Free GPU access is limited and can be revoked if demand is high
Files saved to Colab's /content/ are lost when the session ends — always save to Drive

For learning and experimentation, Colab is excellent. For long training runs or production work, a local environment or cloud VM is more reliable.

Requirements File: Reproducibility

Once your environment is working, save it so others (or future you) can recreate it exactly:

pip freeze > requirements.txt

This creates a file like:

numpy==1.25.2
pandas==2.0.3
scikit-learn==1.3.0
matplotlib==3.7.2
seaborn==0.12.2

Anyone can then recreate your exact environment:

python -m venv .venv
.venv\Scripts\activate     # Windows
pip install -r requirements.txt

Key Takeaway

A clean, reproducible environment is not optional — it's part of professional ML practice. Spend 20 minutes setting this up properly and you'll never debug a "works on my machine" problem again.

Next lesson: We'll put this environment to work with NumPy and Pandas — the two libraries you'll use in every single ML project.