What is the difference between requests and Playwright for web scraping?

requests + BeautifulSoup scrapes static HTML — what's in the page source. It's fast, simple, and sufficient for most scraping tasks. Playwright (and Selenium) can scrape JavaScript-rendered content — content that loads after the page, generated by JavaScript. Use requests/BeautifulSoup first (simpler, faster). If the data you want isn't in the page source (right-click → View Page Source), switch to Playwright. JavaScript-heavy single-page apps (React, Vue) require Playwright.

How do I scrape websites without getting blocked?

Tactics to avoid getting blocked: add realistic User-Agent headers (mimic a real browser), add delays between requests (2–5 seconds minimum), use rotating proxies for large-scale scraping, handle 429 (rate limit) responses with exponential backoff, maintain session cookies when sites expect them, and respect robots.txt. The simplest and most important: add a realistic User-Agent header and add time.sleep(2) between requests. Most amateur scrapers get blocked because they have no User-Agent and make hundreds of requests per second.

What Python library is best for web scraping?

For beginners: requests + BeautifulSoup. This combination handles 80% of scraping tasks, has excellent documentation, and is easy to learn. For JavaScript-rendered content: Playwright (recommended over Selenium for new projects — better async support, faster, easier API). For large-scale scraping: Scrapy (a full scraping framework with built-in scheduling, pipelines, and rate limiting). The progression most scrapers follow: start with requests+BeautifulSoup → add Playwright for JS sites → consider Scrapy when managing multiple scrapers.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

Python code editor with script on monitor — web scraping with python python web scraping

Python Development

Web Scraping with Python: A Gentle Introduction for Beginners

⚡ Quick Answer

A beginner-friendly Python web scraping guide using requests and BeautifulSoup: extract data from websites, handle pagination, and store results in 2025.

AiTechWorlds Team May 27, 2026 6 min read

#python-web-scraping #beautifulsoup-tutorial #scraping-python-guide #python-development

📚Part of the Python Development guide — explore all Python Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Web Scraping with Python: A Gentle Introduction for Beginners

I remember the moment web scraping became real for me: I needed a list of every Python package on PyPI with their download counts. The data existed on a webpage. The alternative was copying 4,000 rows manually.

I wrote a scraper in 45 minutes. It ran in 30 seconds. I had my data.

Web scraping is that kind of tool — once you know the fundamentals, it unlocks data that would otherwise require hours of manual copying. This guide covers the fundamentals: making requests, parsing HTML, handling the common challenges, and storing what you find.

What Web Scraping Actually Is

When you visit a website, your browser sends an HTTP request and receives HTML, CSS, and JavaScript in response. Your browser renders that into a visual page.

Web scraping does the same thing programmatically: send a request, receive the HTML, and extract the specific data you want from it.

The basic pipeline:

Use requests to download the HTML page
Use BeautifulSoup to parse the HTML
Find the HTML elements containing your data
Extract the data and clean it
Store it (CSV, database, JSON)

Setup

pip install requests beautifulsoup4

Step 1: Making Your First Request

import requests

url = "https://books.toscrape.com/"  # A practice website for scraping
response = requests.get(url)

print(response.status_code)  # 200 means success
print(response.text[:500])   # First 500 characters of HTML

Adding Headers (Important)

Some websites block requests without a proper User-Agent header. Always include one:

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers)

Step 2: Parsing HTML with BeautifulSoup

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "html.parser")

# Find elements
title = soup.find("title")
print(title.text)

# Find all elements of a type
all_links = soup.find_all("a")
print(f"Found {len(all_links)} links")

# Find by CSS class
highlighted_items = soup.find_all("div", class_="highlight")

# Find by ID
header = soup.find("div", id="header")

The Two Most Important Methods

# .find() — returns the first match
first_paragraph = soup.find("p")

# .find_all() — returns ALL matches as a list
all_paragraphs = soup.find_all("p")

# CSS selector approach (often more precise)
# select_one() returns first match, select() returns all
book_titles = soup.select("article.product_pod h3 a")

Step 3: Extracting Data

# Get element text
element = soup.find("h1")
text = element.text          # Raw text with whitespace
text = element.text.strip()  # Cleaned text

# Get element attributes
link = soup.find("a")
href = link["href"]          # Get attribute value
href = link.get("href", "#") # Safe get with default

# Get nested data
product = soup.find("article", class_="product_pod")
if product:
    name = product.find("h3").find("a")["title"]
    price = product.find("p", class_="price_color").text
    rating = product.find("p", class_="star-rating")["class"][1]

Full Example: Scraping a Book Catalog

Let's scrape book titles, prices, and ratings from books.toscrape.com — a practice website designed for scraping.

import requests
from bs4 import BeautifulSoup
import csv
import time

def scrape_books(base_url: str = "https://books.toscrape.com/") -> list[dict]:
    books = []
    page_url = base_url
    
    while page_url:
        print(f"Scraping: {page_url}")
        response = requests.get(page_url, headers={
            "User-Agent": "Mozilla/5.0 (educational scraper)"
        })
        soup = BeautifulSoup(response.text, "html.parser")
        
        # Extract books from this page
        for article in soup.select("article.product_pod"):
            books.append({
                "title": article.find("h3").find("a")["title"],
                "price": article.find("p", class_="price_color").text.strip(),
                "rating": article.find("p", class_="star-rating")["class"][1],
                "availability": article.find("p", class_="availability").text.strip(),
            })
        
        # Get the next page URL
        next_btn = soup.select_one("li.next a")
        if next_btn:
            # Handle relative URLs
            page_url = base_url + next_btn["href"]
        else:
            page_url = None
        
        time.sleep(1)  # Be polite — wait between requests
    
    return books

def save_to_csv(books: list[dict], filename: str):
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["title", "price", "rating", "availability"])
        writer.writeheader()
        writer.writerows(books)
    print(f"Saved {len(books)} books to {filename}")

if __name__ == "__main__":
    books = scrape_books()
    save_to_csv(books, "books.csv")
    print(f"Total books scraped: {len(books)}")

Run this and you'll have a CSV of 1,000 books with prices and ratings.

Step 4: Handling Common Challenges

Finding the Right CSS Selectors

Right-click on the data you want in Chrome/Firefox → "Inspect" → find the HTML element. Look for:

The element type (div, span, p, a, li)
The class attribute
The id attribute

The browser DevTools → Console → $$("your.selector") lets you test CSS selectors before writing code.

Handling Pages That Require Scrolling (JavaScript)

If the data isn't in the page source (right-click → View Page Source and search for your data), it's loaded by JavaScript. Use Playwright instead:

from playwright.sync_api import sync_playwright

def scrape_js_page(url: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto(url)
        page.wait_for_load_state("networkidle")  # Wait for JS to finish
        content = page.content()
        browser.close()
    return content

Install: pip install playwright && playwright install chromium

session = requests.Session()

# Login
session.post("https://example.com/login", data={
    "username": "your_username",
    "password": "your_password"
})

# Now make authenticated requests
response = session.get("https://example.com/protected-page")

Rate Limiting and Error Handling

import time

def get_with_retry(url: str, max_retries: int = 3) -> requests.Response:
    for attempt in range(max_retries):
        response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
        
        if response.status_code == 200:
            return response
        elif response.status_code == 429:  # Too Many Requests
            wait_time = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
        else:
            print(f"Error {response.status_code}. Attempt {attempt + 1}/{max_retries}")
    
    raise Exception(f"Failed to fetch {url} after {max_retries} attempts")

Storing Scraped Data

CSV (Simple, Universal)

import csv
with open("data.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "price", "url"])
    writer.writeheader()
    writer.writerows(data)

SQLite (For Larger Datasets)

import sqlite3

conn = sqlite3.connect("scraped_data.db")
cursor = conn.cursor()

cursor.execute("""
    CREATE TABLE IF NOT EXISTS products (
        id INTEGER PRIMARY KEY,
        name TEXT,
        price REAL,
        url TEXT UNIQUE
    )
""")

for item in data:
    cursor.execute(
        "INSERT OR IGNORE INTO products (name, price, url) VALUES (?, ?, ?)",
        (item["name"], item["price"], item["url"])
    )

conn.commit()
conn.close()

Scraping Ethics and Best Practices

Check robots.txt first: https://example.com/robots.txt — if it says don't scrape, don't
Don't overload servers: Add 1–3 second delays between requests
Identify yourself: A descriptive User-Agent is courteous
Use the API if it exists: Many sites have official APIs that are better than scraping
Scrape public data: Don't scrape data behind login unless you're scraping your own data

For a portfolio project that applies these skills, see our guide on Python projects that get developer jobs.

Frequently Asked Questions

Web scraping legality depends on what you're scraping and how. Legal: scraping publicly available information (no login required), respecting robots.txt, not scraping personal data. Potentially problematic: ignoring Terms of Service that prohibit scraping, commercial use of scraped data, overwhelming servers with requests. The safest approach: check robots.txt before scraping, add delays between requests, don't scrape personal data, and check if the site offers an API instead. Many sites (Twitter, Reddit, GitHub) provide official APIs that are the preferred way to access their data.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Python code editor with script on monitor — the python libraries every developer must know in best python libraries 2025

Programming & Web

The Python Libraries Every Developer Must Know in 2025

The essential Python libraries for 2025: from requests and pandas to FastAPI and LangChain — what each does, when to use it, and how to get started quickly.

May 27, 2026 7 min read

Python code editor with script on monitor — django vs flask in 2025

Programming & Web

Django vs Flask in 2025: Which Framework Should You Learn?

An honest Django vs Flask comparison for 2025 — which Python framework to learn first, when each excels, and why FastAPI has changed the equation.

May 27, 2026 7 min read

Python code editor with script on monitor — fastapi tutorial

Programming & Web

FastAPI Tutorial: Building Your First REST API in 30 Minutes

A hands-on FastAPI tutorial for beginners: build a fully functional REST API in 30 minutes with CRUD endpoints, request validation, and automatic docs.

May 27, 2026 7 min read

Python code editor with script on monitor — jupyter notebook guide jupyter notebook tutorial

Programming & Web

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

A complete Jupyter Notebook guide for 2025: installation, essential shortcuts, best practices, and how data scientists use Jupyter for exploration, analysis, and sharing.

May 27, 2026 7 min read

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Python Development

Web Scraping with Python: A Gentle Introduction for Beginners

⚡ Quick Answer

A beginner-friendly Python web scraping guide using requests and BeautifulSoup: extract data from websites, handle pagination, and store results in 2025.

AiTechWorlds Team May 27, 2026 6 min read

#python-web-scraping #beautifulsoup-tutorial #scraping-python-guide #python-development

📚Part of the Python Development guide — explore all Python Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Web Scraping with Python: A Gentle Introduction for Beginners

I wrote a scraper in 45 minutes. It ran in 30 seconds. I had my data.

What Web Scraping Actually Is

When you visit a website, your browser sends an HTTP request and receives HTML, CSS, and JavaScript in response. Your browser renders that into a visual page.

Web scraping does the same thing programmatically: send a request, receive the HTML, and extract the specific data you want from it.

The basic pipeline:

Use requests to download the HTML page
Use BeautifulSoup to parse the HTML
Find the HTML elements containing your data
Extract the data and clean it
Store it (CSV, database, JSON)

Setup

pip install requests beautifulsoup4

Step 1: Making Your First Request

import requests

url = "https://books.toscrape.com/"  # A practice website for scraping
response = requests.get(url)

print(response.status_code)  # 200 means success
print(response.text[:500])   # First 500 characters of HTML

Adding Headers (Important)

Some websites block requests without a proper User-Agent header. Always include one:

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers)

Step 2: Parsing HTML with BeautifulSoup

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "html.parser")

# Find elements
title = soup.find("title")
print(title.text)

# Find all elements of a type
all_links = soup.find_all("a")
print(f"Found {len(all_links)} links")

# Find by CSS class
highlighted_items = soup.find_all("div", class_="highlight")

# Find by ID
header = soup.find("div", id="header")

The Two Most Important Methods

# .find() — returns the first match
first_paragraph = soup.find("p")

# .find_all() — returns ALL matches as a list
all_paragraphs = soup.find_all("p")

# CSS selector approach (often more precise)
# select_one() returns first match, select() returns all
book_titles = soup.select("article.product_pod h3 a")

Step 3: Extracting Data

# Get element text
element = soup.find("h1")
text = element.text          # Raw text with whitespace
text = element.text.strip()  # Cleaned text

# Get element attributes
link = soup.find("a")
href = link["href"]          # Get attribute value
href = link.get("href", "#") # Safe get with default

# Get nested data
product = soup.find("article", class_="product_pod")
if product:
    name = product.find("h3").find("a")["title"]
    price = product.find("p", class_="price_color").text
    rating = product.find("p", class_="star-rating")["class"][1]

Full Example: Scraping a Book Catalog

Let's scrape book titles, prices, and ratings from books.toscrape.com — a practice website designed for scraping.

import requests
from bs4 import BeautifulSoup
import csv
import time

def scrape_books(base_url: str = "https://books.toscrape.com/") -> list[dict]:
    books = []
    page_url = base_url
    
    while page_url:
        print(f"Scraping: {page_url}")
        response = requests.get(page_url, headers={
            "User-Agent": "Mozilla/5.0 (educational scraper)"
        })
        soup = BeautifulSoup(response.text, "html.parser")
        
        # Extract books from this page
        for article in soup.select("article.product_pod"):
            books.append({
                "title": article.find("h3").find("a")["title"],
                "price": article.find("p", class_="price_color").text.strip(),
                "rating": article.find("p", class_="star-rating")["class"][1],
                "availability": article.find("p", class_="availability").text.strip(),
            })
        
        # Get the next page URL
        next_btn = soup.select_one("li.next a")
        if next_btn:
            # Handle relative URLs
            page_url = base_url + next_btn["href"]
        else:
            page_url = None
        
        time.sleep(1)  # Be polite — wait between requests
    
    return books

def save_to_csv(books: list[dict], filename: str):
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["title", "price", "rating", "availability"])
        writer.writeheader()
        writer.writerows(books)
    print(f"Saved {len(books)} books to {filename}")

if __name__ == "__main__":
    books = scrape_books()
    save_to_csv(books, "books.csv")
    print(f"Total books scraped: {len(books)}")

Run this and you'll have a CSV of 1,000 books with prices and ratings.

Step 4: Handling Common Challenges

Finding the Right CSS Selectors

Right-click on the data you want in Chrome/Firefox → "Inspect" → find the HTML element. Look for:

The element type (div, span, p, a, li)
The class attribute
The id attribute

The browser DevTools → Console → $$("your.selector") lets you test CSS selectors before writing code.

Handling Pages That Require Scrolling (JavaScript)

If the data isn't in the page source (right-click → View Page Source and search for your data), it's loaded by JavaScript. Use Playwright instead:

from playwright.sync_api import sync_playwright

def scrape_js_page(url: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto(url)
        page.wait_for_load_state("networkidle")  # Wait for JS to finish
        content = page.content()
        browser.close()
    return content

Install: pip install playwright && playwright install chromium

session = requests.Session()

# Login
session.post("https://example.com/login", data={
    "username": "your_username",
    "password": "your_password"
})

# Now make authenticated requests
response = session.get("https://example.com/protected-page")

Rate Limiting and Error Handling

import time

def get_with_retry(url: str, max_retries: int = 3) -> requests.Response:
    for attempt in range(max_retries):
        response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
        
        if response.status_code == 200:
            return response
        elif response.status_code == 429:  # Too Many Requests
            wait_time = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
        else:
            print(f"Error {response.status_code}. Attempt {attempt + 1}/{max_retries}")
    
    raise Exception(f"Failed to fetch {url} after {max_retries} attempts")

Storing Scraped Data

CSV (Simple, Universal)

import csv
with open("data.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "price", "url"])
    writer.writeheader()
    writer.writerows(data)

SQLite (For Larger Datasets)

import sqlite3

conn = sqlite3.connect("scraped_data.db")
cursor = conn.cursor()

cursor.execute("""
    CREATE TABLE IF NOT EXISTS products (
        id INTEGER PRIMARY KEY,
        name TEXT,
        price REAL,
        url TEXT UNIQUE
    )
""")

for item in data:
    cursor.execute(
        "INSERT OR IGNORE INTO products (name, price, url) VALUES (?, ?, ?)",
        (item["name"], item["price"], item["url"])
    )

conn.commit()
conn.close()

Scraping Ethics and Best Practices

Check robots.txt first: https://example.com/robots.txt — if it says don't scrape, don't
Don't overload servers: Add 1–3 second delays between requests
Identify yourself: A descriptive User-Agent is courteous
Use the API if it exists: Many sites have official APIs that are better than scraping
Scrape public data: Don't scrape data behind login unless you're scraping your own data

For a portfolio project that applies these skills, see our guide on Python projects that get developer jobs.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Programming & Web

The Python Libraries Every Developer Must Know in 2025

The essential Python libraries for 2025: from requests and pandas to FastAPI and LangChain — what each does, when to use it, and how to get started quickly.

May 27, 2026 7 min read

Programming & Web

Django vs Flask in 2025: Which Framework Should You Learn?

An honest Django vs Flask comparison for 2025 — which Python framework to learn first, when each excels, and why FastAPI has changed the equation.

May 27, 2026 7 min read

Programming & Web

FastAPI Tutorial: Building Your First REST API in 30 Minutes

A hands-on FastAPI tutorial for beginners: build a fully functional REST API in 30 minutes with CRUD endpoints, request validation, and automatic docs.

May 27, 2026 7 min read

Programming & Web

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

A complete Jupyter Notebook guide for 2025: installation, essential shortcuts, best practices, and how data scientists use Jupyter for exploration, analysis, and sharing.

May 27, 2026 7 min read

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Web Scraping with Python: A Gentle Introduction for Beginners

Web Scraping with Python: A Gentle Introduction for Beginners

What Web Scraping Actually Is

Setup

Step 1: Making Your First Request

Adding Headers (Important)

Step 2: Parsing HTML with BeautifulSoup

The Two Most Important Methods

Step 3: Extracting Data

Full Example: Scraping a Book Catalog

Step 4: Handling Common Challenges

Finding the Right CSS Selectors

Handling Pages That Require Scrolling (JavaScript)

Handling Authentication (Login Required)

Rate Limiting and Error Handling

Storing Scraped Data

CSV (Simple, Universal)

SQLite (For Larger Datasets)

Scraping Ethics and Best Practices

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

The Python Libraries Every Developer Must Know in 2025

Django vs Flask in 2025: Which Framework Should You Learn?

FastAPI Tutorial: Building Your First REST API in 30 Minutes

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

Get Free AI Notes Daily

Web Scraping with Python: A Gentle Introduction for Beginners

Web Scraping with Python: A Gentle Introduction for Beginners

What Web Scraping Actually Is

Setup

Step 1: Making Your First Request

Adding Headers (Important)

Step 2: Parsing HTML with BeautifulSoup

The Two Most Important Methods

Step 3: Extracting Data

Full Example: Scraping a Book Catalog

Step 4: Handling Common Challenges

Finding the Right CSS Selectors

Handling Pages That Require Scrolling (JavaScript)

Handling Authentication (Login Required)

Rate Limiting and Error Handling

Storing Scraped Data

CSV (Simple, Universal)

SQLite (For Larger Datasets)

Scraping Ethics and Best Practices

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

The Python Libraries Every Developer Must Know in 2025

Django vs Flask in 2025: Which Framework Should You Learn?

FastAPI Tutorial: Building Your First REST API in 30 Minutes

Jupyter Notebook Guide: The Data Scientist's Favorite Tool

Get Free AI Notes Daily