Setting Up Python & Libraries
Setting Up Python & Libraries for ML
The right environment setup saves hours of frustration. A broken Python installation where packages conflict, or a machine where import sklearn fails halfway through a tutorial, kills momentum fast. This lesson builds your ML environment properly from the start.
Choosing a Python Version
Use Python 3.10 or 3.11. Here's why:
- Python 3.10+ has excellent typing support, which modern ML libraries leverage
- Python 3.12 is newer but some libraries lag in support
- Python 2 is dead — don't go near it
Check what you have:
python --version
# Python 3.11.4
If you need to install Python, download from python.org and check "Add to PATH" during installation on Windows.
Creating a Virtual Environment
Always use a virtual environment for ML projects. Without one, you install packages globally and they start conflicting with each other within weeks. With one, each project has its own isolated set of dependencies.
Option A: venv (built-in, recommended for most cases)
# Create a new project directory
mkdir ml-project && cd ml-project
# Create virtual environment
python -m venv .venv
# Activate it
# On Windows:
.venv\Scripts\activate
# On Mac/Linux:
source .venv/bin/activate
# Your prompt now shows (.venv) — you're inside the environment
(.venv) $
Deactivate when done: just type deactivate.
Option B: conda (better for data science, handles non-Python dependencies)
# Install Miniconda from conda.io first, then:
# Create environment with a specific Python version
conda create -n ml-env python=3.11
# Activate
conda activate ml-env
# Your prompt shows (ml-env)
Conda handles packages like numpy more reliably on some systems because it ships compiled binaries rather than building from source.
Which to choose: Use venv if you want something lightweight. Use conda if you're on Windows and run into compilation issues with pip, or if you need packages from the scientific Python ecosystem beyond ML.
Installing the Core ML Libraries
With your environment activated, install everything at once:
pip install numpy pandas scikit-learn matplotlib seaborn jupyter ipykernel
Or with conda:
conda install numpy pandas scikit-learn matplotlib seaborn jupyter -c conda-forge
What each library does:
| Library | Purpose |
|---|---|
numpy | Numerical arrays, linear algebra, random number generation |
pandas | DataFrames, data loading, cleaning, manipulation |
scikit-learn | ML algorithms, preprocessing, model evaluation |
matplotlib | Core plotting library, low-level control |
seaborn | Statistical visualization built on matplotlib, higher-level |
jupyter | Interactive notebook environment |
For more advanced work, add these later:
pip install xgboost lightgbm plotly scipy statsmodels
Jupyter Notebook Setup
Jupyter lets you combine code, output, and written explanation in one document. It's the standard environment for ML exploration.
Start Jupyter:
jupyter notebook
# Opens browser at http://localhost:8888
Or use the newer JupyterLab (better interface):
pip install jupyterlab
jupyter lab
Essential Jupyter tips:
Shift + Enter: run current cell and move to nextCtrl + Enter: run current cell and stayB: insert cell below (in command mode, press Esc first)M: convert cell to MarkdownDD: delete current cellCtrl + Shift + -: split cell at cursor
Make sure your virtual environment is available as a kernel:
# Inside your activated environment:
python -m ipykernel install --user --name=ml-env --display-name "Python (ml-env)"
Now when you create a notebook, select "Python (ml-env)" as the kernel. This ensures the notebook uses your project's packages, not the system Python.
Useful notebook magic commands:
# Time how long a cell takes
%timeit model.fit(X_train, y_train)
# Show plots inline (usually automatic in modern Jupyter)
%matplotlib inline
# Reload external modules without restarting kernel
%load_ext autoreload
%autoreload 2
Verifying Your Installation
Run this in a new notebook or Python file. Every import should succeed without errors:
# Run this cell to verify your ML environment is complete
import sys
print(f"Python: {sys.version}")
import numpy as np
print(f"NumPy: {np.__version__}")
import pandas as pd
print(f"Pandas: {pd.__version__}")
import sklearn
print(f"Scikit-learn: {sklearn.__version__}")
import matplotlib
print(f"Matplotlib: {matplotlib.__version__}")
import seaborn as sns
print(f"Seaborn: {sns.__version__}")
print("\nAll libraries imported successfully.")
# Quick sanity check — train a model in 5 lines
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=10, random_state=42)
model.fit(X, y)
print(f"Model trained. Score: {model.score(X, y):.2%}")
# Model trained. Score: 100.00%
If any import fails, the fix is almost always: pip install <library-name>.
VS Code Setup for ML
VS Code is an excellent editor for ML work — it handles both .py scripts and .ipynb notebooks.
Extensions to install:
- Python (Microsoft) — essential, adds IntelliSense, linting, debugging
- Jupyter (Microsoft) — run notebooks directly inside VS Code
- Pylance — faster IntelliSense and type checking
Selecting the right interpreter:
Press Ctrl + Shift + P → type "Python: Select Interpreter" → choose your virtual environment (it will show the path to .venv/Scripts/python.exe or the conda env).
VS Code will now use that environment for running code, imports, and autocomplete.
Useful VS Code settings for ML (add to settings.json):
{
"python.defaultInterpreterPath": ".venv/Scripts/python",
"jupyter.notebookFileRoot": "${workspaceFolder}",
"editor.formatOnSave": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
}
}
Google Colab: Zero-Install Alternative
If you cannot install Python locally, or want to run code on a free GPU, use Google Colab at colab.research.google.com.
Colab is a Jupyter notebook environment that runs entirely in your browser, hosted by Google. It comes with numpy, pandas, scikit-learn, matplotlib, and seaborn pre-installed.
# Colab-specific: install extra packages when needed
!pip install lightgbm xgboost
# Mount Google Drive to access your data files
from google.colab import drive
drive.mount('/content/drive')
# Read a CSV from your Drive
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/data/housing.csv')
Colab limitations to know:
- Sessions disconnect after ~90 minutes of inactivity
- Free GPU access is limited and can be revoked if demand is high
- Files saved to Colab's
/content/are lost when the session ends — always save to Drive
For learning and experimentation, Colab is excellent. For long training runs or production work, a local environment or cloud VM is more reliable.
Requirements File: Reproducibility
Once your environment is working, save it so others (or future you) can recreate it exactly:
pip freeze > requirements.txt
This creates a file like:
numpy==1.25.2
pandas==2.0.3
scikit-learn==1.3.0
matplotlib==3.7.2
seaborn==0.12.2
Anyone can then recreate your exact environment:
python -m venv .venv
.venv\Scripts\activate # Windows
pip install -r requirements.txt
Key Takeaway
A clean, reproducible environment is not optional — it's part of professional ML practice. Spend 20 minutes setting this up properly and you'll never debug a "works on my machine" problem again.
Next lesson: We'll put this environment to work with NumPy and Pandas — the two libraries you'll use in every single ML project.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises