How do I prevent a research agent from hallucinating citations?

Citation hallucination is the most critical failure mode. Prevention strategies: store every source URL with the content retrieved, only allow the agent to cite URLs it actually retrieved (not from training data), use structured output (Pydantic) to enforce URL fields in citations, and verify URLs are present in the search results before including them. Adding a 'grounding check' step that cross-references the final report's claims against the retrieved content catches most hallucinated citations. Never let the model generate URLs it didn't receive from tools.

What is the difference between a research agent and a RAG system?

A RAG system retrieves from a static, pre-indexed knowledge base. A research agent dynamically searches the web (or other sources) to gather information at query time. RAG is faster and cheaper for known domains; research agents are better for current events, novel topics, or when you don't know what to index in advance. Many production systems combine both: the agent searches the web for new information and stores results in a vector database that gets reused for follow-up queries, creating a hybrid that improves over time.

How many search iterations should a research agent do?

For most research tasks, 3-5 search iterations with 2-3 queries each gives good coverage without excessive cost. A single broad search misses nuance; too many searches add noise and cost. A practical structure: one planning phase (generate diverse queries), one execution phase (run queries in parallel), one gap analysis (what's missing?), and one optional follow-up search for gaps. For a 10-page report, 10-15 total web pages retrieved and synthesized is usually sufficient. More than 20 sources rarely improves quality and significantly increases cost.

How much does running a research agent cost?

A typical research task (5-10 queries, 10-15 pages retrieved, one 2000-word report) costs approximately: Tavily API — $0.01-0.05 per search ($0.05-0.25 total); GPT-4o-mini for planning/analysis — ~$0.01-0.03; GPT-4o for final synthesis — ~$0.05-0.15. Total: roughly $0.10-0.45 per research task. Using gpt-4o-mini for all but the final synthesis step reduces costs by 70%. Caching repeated searches and storing results reduces costs further for similar follow-up queries.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

AI agent workflow automation on development screen — build a research agent ai research agent build

Agent Development

Build a Research Agent: End-to-End Autonomous Research Tool in Python

⚡ Quick Answer

Build a complete AI research agent in Python — web search, source validation, synthesis, and report generation. Production patterns with LangGraph and real code.

AiTechWorlds Team May 27, 2026 10 min read

#ai-research-agent-build #research-agent-python #autonomous-research-agent #agent-development

📚Part of the Agent Development guide — explore all Agent Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Build a Research Agent: End-to-End Autonomous Research Tool in Python

I needed to research "the current state of edge AI inference" for a technical report. My options: spend 4 hours reading papers and articles myself, or spend an afternoon building a research agent that could do it in 3 minutes.

I built the agent. Three months later, it's handled hundreds of research tasks, and the version I'll show you is cleaner than my first attempt — which hallucinated citations, looped endlessly on unclear queries, and produced reports that mixed facts from 2022 with facts from 2025 without distinguishing them.

This is the version that actually works: a research agent with real search, real source tracking, real synthesis, and no hallucinated citations.

Architecture Overview

Research Agent Architecture:

User Query
    ↓
[Planning Node]
  → Generate diverse search queries
  → Identify research dimensions
    ↓
[Search Execution Node]
  → Run queries in parallel
  → Extract content from URLs
  → Store in vector memory
    ↓
[Gap Analysis Node]
  → What's covered?
  → What's missing?
  → Generate follow-up queries (if needed)
    ↓
[Synthesis Node]
  → Analyze all retrieved content
  → Extract key findings
  → Identify conflicts/contradictions
    ↓
[Report Generation Node]
  → Write structured report
  → Cite only retrieved sources
  → Format with sections
    ↓
Final Report with Citations

This five-node structure prevents the main failure modes: unbounded search loops (gap analysis has a max), citation hallucination (only retrieved URLs allowed), and shallow coverage (planning generates diverse queries upfront).

Setup and Dependencies

pip install langchain langchain-openai langgraph tavily-python
pip install beautifulsoup4 requests pydantic tiktoken

import os
from typing import TypedDict, Annotated, List, Optional
import operator

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph, END
from pydantic import BaseModel, Field
from tavily import TavilyClient
import requests
from bs4 import BeautifulSoup
import tiktoken

# Models
llm_fast = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_smart = ChatOpenAI(model="gpt-4o", temperature=0.1)

# Search client
tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

# Token counter for context management
enc = tiktoken.encoding_for_model("gpt-4o")

State Definition

The state machine needs to track everything from initial query through final report:

class Source(BaseModel):
    url: str
    title: str
    content: str
    search_query: str
    relevance_score: float = 0.0

class ResearchState(TypedDict):
    # Input
    query: str
    research_depth: str  # "quick", "standard", "deep"
    
    # Planning
    search_queries: List[str]
    research_dimensions: List[str]
    
    # Execution
    sources: Annotated[List[Source], operator.add]
    search_iterations: int
    
    # Analysis
    key_findings: str
    coverage_gaps: List[str]
    needs_more_research: bool
    
    # Output
    report: str
    citations: List[str]
    
    # Control
    max_iterations: int
    error: Optional[str]

Node 1: Research Planning

The planning node generates diverse, specific queries rather than one broad search:

class ResearchPlan(BaseModel):
    search_queries: List[str] = Field(
        description="5-7 specific search queries that cover different angles of the topic"
    )
    research_dimensions: List[str] = Field(
        description="Key dimensions to cover: technical, historical, practical, comparative, etc."
    )
    
structured_planner = llm_fast.with_structured_output(ResearchPlan)

def planning_node(state: ResearchState) -> ResearchState:
    depth_instructions = {
        "quick": "Generate 3 focused queries for a quick overview.",
        "standard": "Generate 5 queries covering technical details, examples, and comparisons.",
        "deep": "Generate 7 queries including edge cases, criticisms, and recent developments."
    }
    
    depth_note = depth_instructions.get(state["research_depth"], depth_instructions["standard"])
    
    plan = structured_planner.invoke([
        SystemMessage(content=f"""You are a research planning expert. 
        {depth_note}
        Each query should target a different aspect of the topic.
        Make queries specific enough to return useful results (not just the topic name).
        Include queries for: current state, comparisons, practical examples, limitations."""),
        HumanMessage(content=f"Create a research plan for: {state['query']}")
    ])
    
    return {
        "search_queries": plan.search_queries,
        "research_dimensions": plan.research_dimensions,
        "search_iterations": 0,
        "sources": [],
        "needs_more_research": False,
        "coverage_gaps": []
    }

Node 2: Search and Content Extraction

This node executes searches and extracts actual content from pages — not just snippets:

def extract_page_content(url: str, max_tokens: int = 1500) -> str:
    """Extract and clean content from a URL."""
    try:
        response = requests.get(url, timeout=10, headers={
            "User-Agent": "Mozilla/5.0 (compatible; ResearchBot/1.0)"
        })
        
        soup = BeautifulSoup(response.text, "html.parser")
        
        # Remove navigation, ads, scripts
        for tag in soup(["script", "style", "nav", "footer", "header", "aside"]):
            tag.decompose()
        
        # Extract main content
        main = soup.find("main") or soup.find("article") or soup.find("body")
        text = main.get_text(separator="\n", strip=True) if main else ""
        
        # Truncate to token limit
        tokens = enc.encode(text)
        if len(tokens) > max_tokens:
            text = enc.decode(tokens[:max_tokens])
        
        return text
        
    except Exception as e:
        return f"[Could not retrieve content: {e}]"

def search_execution_node(state: ResearchState) -> ResearchState:
    new_sources = []
    
    for query in state["search_queries"]:
        try:
            # Tavily returns structured results with content
            results = tavily.search(
                query=query,
                max_results=3,
                search_depth="advanced",  # More thorough than basic
                include_raw_content=True
            )
            
            for result in results["results"]:
                # Use Tavily's extracted content or fall back to scraping
                content = result.get("raw_content") or extract_page_content(result["url"])
                
                if len(content) < 100:
                    continue
                
                source = Source(
                    url=result["url"],
                    title=result.get("title", ""),
                    content=content[:2000],  # Cap per-source content
                    search_query=query,
                    relevance_score=result.get("score", 0.0)
                )
                new_sources.append(source)
                
        except Exception as e:
            print(f"Search failed for '{query}': {e}")
    
    # Deduplicate by URL
    existing_urls = {s.url for s in state["sources"]}
    unique_sources = [s for s in new_sources if s.url not in existing_urls]
    
    print(f"Found {len(unique_sources)} new sources (iteration {state['search_iterations'] + 1})")
    
    return {
        "sources": unique_sources,
        "search_iterations": state["search_iterations"] + 1
    }

Node 3: Gap Analysis

After searching, the agent evaluates coverage and decides whether to search more:

class CoverageAnalysis(BaseModel):
    covered_dimensions: List[str]
    missing_dimensions: List[str]
    follow_up_queries: List[str] = Field(
        description="2-3 specific queries to fill the most important gaps"
    )
    needs_more_research: bool

structured_gap_analyzer = llm_fast.with_structured_output(CoverageAnalysis)

def gap_analysis_node(state: ResearchState) -> ResearchState:
    # Don't search more than max_iterations
    if state["search_iterations"] >= state["max_iterations"]:
        return {
            "needs_more_research": False,
            "coverage_gaps": []
        }
    
    # Summarize what we have
    source_summary = "\n".join([
        f"- [{s.title}]({s.url}): {s.content[:200]}..."
        for s in state["sources"][:10]
    ])
    
    analysis = structured_gap_analyzer.invoke([
        SystemMessage(content="""Analyze research coverage. 
        Identify what dimensions are well-covered and what's missing.
        Only request more research if there are significant gaps that matter for the query.
        Be conservative — 10+ sources is usually sufficient."""),
        HumanMessage(content=f"""Query: {state['query']}
        
Intended dimensions: {', '.join(state['research_dimensions'])}
        
Sources retrieved ({len(state['sources'])} total):
{source_summary}

Is more research needed?""")
    ])
    
    # Update queries if follow-up needed
    if analysis.needs_more_research and analysis.follow_up_queries:
        return {
            "search_queries": analysis.follow_up_queries,
            "coverage_gaps": analysis.missing_dimensions,
            "needs_more_research": True
        }
    
    return {
        "needs_more_research": False,
        "coverage_gaps": analysis.missing_dimensions
    }

def should_search_more(state: ResearchState) -> str:
    if state["needs_more_research"] and state["search_iterations"] < state["max_iterations"]:
        return "search_more"
    return "synthesize"

Node 4: Synthesis

This node analyzes all retrieved content to extract structured findings:

def synthesis_node(state: ResearchState) -> ResearchState:
    # Build context from all sources
    source_context = []
    for i, source in enumerate(state["sources"]):
        source_context.append(
            f"Source {i+1}: {source.title}\nURL: {source.url}\n\n{source.content}\n"
        )
    
    # Keep within context limits (roughly 60k tokens max for gpt-4o)
    full_context = "\n---\n".join(source_context)
    tokens = enc.encode(full_context)
    if len(tokens) > 50000:
        # Trim least relevant sources
        sorted_sources = sorted(state["sources"], key=lambda s: s.relevance_score, reverse=True)
        top_sources = sorted_sources[:15]
        source_context = [
            f"Source {i+1}: {s.title}\nURL: {s.url}\n\n{s.content}\n"
            for i, s in enumerate(top_sources)
        ]
        full_context = "\n---\n".join(source_context)
    
    analysis = llm_smart.invoke([
        SystemMessage(content="""You are an expert research analyst.
        Analyze the provided sources and extract key findings.
        Focus on: main themes, conflicting information, data points, expert opinions.
        Note any outdated information (check dates where visible).
        Be specific — include numbers, names, and facts from the sources."""),
        HumanMessage(content=f"""Research query: {state['query']}

Sources:
{full_context}

Provide a detailed analysis of key findings across all sources.""")
    ])
    
    return {"key_findings": analysis.content}

Node 5: Report Generation

The final node writes the report, strictly citing only retrieved sources:

def report_generation_node(state: ResearchState) -> ResearchState:
    # Build citation index
    citation_map = {
        i + 1: source
        for i, source in enumerate(state["sources"])
    }
    
    citation_list = "\n".join([
        f"[{i}] {source.title} — {source.url}"
        for i, source in citation_map.items()
    ])
    
    report = llm_smart.invoke([
        SystemMessage(content=f"""You are a research report writer.
Write a comprehensive, well-structured research report.

CRITICAL CITATION RULES:
1. Only cite sources from the provided citation list below
2. Use [N] format for inline citations
3. Never cite URLs not in this list
4. If information isn't from a source, don't add a citation

Report structure:
- Executive Summary (2-3 sentences)
- Key Findings (3-5 bullet points with citations)
- Detailed Analysis (3-4 H2 sections, ~200 words each, with citations)
- Limitations and Gaps
- Sources

Available citations:
{citation_list}"""),
        HumanMessage(content=f"""Query: {state['query']}

Research findings:
{state['key_findings']}

Write the full research report now.""")
    ])
    
    citations = [
        f"[{i}] {source.title} — {source.url}"
        for i, source in citation_map.items()
    ]
    
    return {
        "report": report.content,
        "citations": citations
    }

Assembling the LangGraph Workflow

def build_research_agent(max_iterations: int = 2):
    workflow = StateGraph(ResearchState)
    
    workflow.add_node("planning", planning_node)
    workflow.add_node("search", search_execution_node)
    workflow.add_node("gap_analysis", gap_analysis_node)
    workflow.add_node("synthesis", synthesis_node)
    workflow.add_node("report", report_generation_node)
    
    workflow.set_entry_point("planning")
    workflow.add_edge("planning", "search")
    workflow.add_edge("search", "gap_analysis")
    
    workflow.add_conditional_edges(
        "gap_analysis",
        should_search_more,
        {
            "search_more": "search",
            "synthesize": "synthesis"
        }
    )
    
    workflow.add_edge("synthesis", "report")
    workflow.add_edge("report", END)
    
    return workflow.compile()

agent = build_research_agent(max_iterations=2)

def research(query: str, depth: str = "standard") -> dict:
    """Run the research agent."""
    depth_to_iterations = {"quick": 1, "standard": 2, "deep": 3}
    
    initial_state = {
        "query": query,
        "research_depth": depth,
        "search_queries": [],
        "research_dimensions": [],
        "sources": [],
        "search_iterations": 0,
        "key_findings": "",
        "coverage_gaps": [],
        "needs_more_research": False,
        "report": "",
        "citations": [],
        "max_iterations": depth_to_iterations.get(depth, 2),
        "error": None
    }
    
    result = agent.invoke(initial_state)
    
    return {
        "report": result["report"],
        "sources_used": len(result["sources"]),
        "citations": result["citations"],
        "coverage_gaps": result["coverage_gaps"]
    }

# Run it
result = research(
    "What are the current limitations of AI coding agents in 2025?",
    depth="standard"
)

print(result["report"])
print(f"\n{result['sources_used']} sources used")

Adding Streaming Progress Updates

For production use, stream progress updates to the user:

def research_with_streaming(query: str, depth: str = "standard"):
    """Stream research progress."""
    
    initial_state = {
        "query": query,
        "research_depth": depth,
        "search_queries": [],
        "research_dimensions": [],
        "sources": [],
        "search_iterations": 0,
        "key_findings": "",
        "coverage_gaps": [],
        "needs_more_research": False,
        "report": "",
        "citations": [],
        "max_iterations": 2,
        "error": None
    }
    
    for event in agent.stream(initial_state, stream_mode="updates"):
        for node, updates in event.items():
            if node == "planning":
                queries = updates.get("search_queries", [])
                print(f"Planning complete: {len(queries)} queries generated")
                for q in queries:
                    print(f"  → {q}")
            
            elif node == "search":
                new_sources = updates.get("sources", [])
                iteration = updates.get("search_iterations", 0)
                print(f"\nSearch iteration {iteration}: {len(new_sources)} new sources")
            
            elif node == "gap_analysis":
                gaps = updates.get("coverage_gaps", [])
                needs_more = updates.get("needs_more_research", False)
                if needs_more:
                    print(f"Gaps found: {gaps}. Searching more...")
                else:
                    print("Coverage sufficient. Moving to synthesis.")
            
            elif node == "synthesis":
                print("\nSynthesizing findings...")
            
            elif node == "report":
                report = updates.get("report", "")
                print(f"\nReport complete ({len(report)} characters)")
                print("\n" + "="*60)
                print(report[:500] + "...")  # Preview

Production Considerations

Rate Limiting and Cost Control

import time
from functools import wraps

def rate_limited(calls_per_minute: int):
    """Simple rate limiter for API calls."""
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]
    
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            if elapsed < min_interval:
                time.sleep(min_interval - elapsed)
            result = func(*args, **kwargs)
            last_called[0] = time.time()
            return result
        return wrapper
    return decorator

@rate_limited(calls_per_minute=30)
def safe_search(query: str) -> dict:
    return tavily.search(query=query, max_results=3)

Caching Repeated Research

import hashlib
import json
from pathlib import Path

CACHE_DIR = Path("research_cache")
CACHE_DIR.mkdir(exist_ok=True)

def cached_research(query: str, depth: str = "standard", max_age_hours: int = 24) -> dict:
    """Cache research results to avoid duplicate API calls."""
    import time
    
    cache_key = hashlib.md5(f"{query}:{depth}".encode()).hexdigest()
    cache_file = CACHE_DIR / f"{cache_key}.json"
    
    # Check cache
    if cache_file.exists():
        cached = json.loads(cache_file.read_text())
        age_hours = (time.time() - cached["timestamp"]) / 3600
        if age_hours < max_age_hours:
            print(f"Cache hit (age: {age_hours:.1f}h)")
            return cached["result"]
    
    # Run research
    result = research(query, depth)
    
    # Save to cache
    cache_file.write_text(json.dumps({
        "query": query,
        "depth": depth,
        "timestamp": time.time(),
        "result": result
    }))
    
    return result

Conclusion

A working research agent requires five things that AutoGPT-style agents lacked: structured planning, bounded search iterations, content extraction (not just snippets), gap analysis with a termination condition, and citation tracking that prevents hallucination. LangGraph's explicit state machine makes each of these constraints enforceable rather than hoped-for.

The cost is roughly $0.10-$0.45 per research task with the configuration above — competitive with paying for a research assistant even at scale.

For the broader agent frameworks that power this pattern, see our LangGraph tutorial. For agent memory systems that let research agents learn across tasks, see our agent memory and planning guide.

Frequently Asked Questions

A practical research agent needs: a web search tool (Tavily, SerpAPI, or DuckDuckGo) for finding current information, a URL reader/scraper to extract content from pages, a vector store for deduplication and memory across searches, and an LLM for synthesis and report generation. Optional additions: citation tracker, source credibility scorer, and a structured output formatter. The most common failure point is poor search quality — using Tavily or Exa over generic Google scraping significantly improves result relevance.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI agent workflow automation on development screen — ai agent memory and planning ai agent memory planning

AI Learning

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

AI agent memory and planning explained — how agents store context across sessions, plan multi-step tasks, and use working memory, episodic memory, and semantic memory effectively.

May 27, 2026 8 min read

AI agent workflow automation on development screen — ai agents explained

AI Learning

🔥 Trending

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI agents explained — how autonomous AI systems perceive, reason, and act to complete complex tasks, the architectures powering them, and practical examples from ReAct to LangGraph.

May 27, 2026 7 min read

AI agent workflow automation on development screen — ai agents and the future of work ai agents future work

AI Learning

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

AI agents and the future of work — what tasks are being automated, which jobs are transforming, and what skills matter most as autonomous agents reshape knowledge work.

May 27, 2026 9 min read

AI agent workflow automation on development screen — will ai agents replace software developers

AI Learning

🔥 Trending

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Will AI agents replace software developers? An honest technical analysis of what AI agents can and can't do, current limitations, and what skills remain uniquely human in 2025.

May 27, 2026 8 min read

Go deeper on this topic

NotesPrompt Engineering Cheat Sheet NotesLLM Core Concepts Explained NotesChatGPT Tips & Tricks Cheat Sheet NotesTransformer Architecture Cheat Sheet NotesPrompt Engineering vs Fine-Tuning vs RLHF NotesRAG: Retrieval-Augmented Generation Guide

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Agent Development

Build a Research Agent: End-to-End Autonomous Research Tool in Python

⚡ Quick Answer

Build a complete AI research agent in Python — web search, source validation, synthesis, and report generation. Production patterns with LangGraph and real code.

AiTechWorlds Team May 27, 2026 10 min read

#ai-research-agent-build #research-agent-python #autonomous-research-agent #agent-development

📚Part of the Agent Development guide — explore all Agent Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Build a Research Agent: End-to-End Autonomous Research Tool in Python

This is the version that actually works: a research agent with real search, real source tracking, real synthesis, and no hallucinated citations.

Architecture Overview

Research Agent Architecture:

User Query
    ↓
[Planning Node]
  → Generate diverse search queries
  → Identify research dimensions
    ↓
[Search Execution Node]
  → Run queries in parallel
  → Extract content from URLs
  → Store in vector memory
    ↓
[Gap Analysis Node]
  → What's covered?
  → What's missing?
  → Generate follow-up queries (if needed)
    ↓
[Synthesis Node]
  → Analyze all retrieved content
  → Extract key findings
  → Identify conflicts/contradictions
    ↓
[Report Generation Node]
  → Write structured report
  → Cite only retrieved sources
  → Format with sections
    ↓
Final Report with Citations

Setup and Dependencies

pip install langchain langchain-openai langgraph tavily-python
pip install beautifulsoup4 requests pydantic tiktoken

import os
from typing import TypedDict, Annotated, List, Optional
import operator

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph, END
from pydantic import BaseModel, Field
from tavily import TavilyClient
import requests
from bs4 import BeautifulSoup
import tiktoken

# Models
llm_fast = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_smart = ChatOpenAI(model="gpt-4o", temperature=0.1)

# Search client
tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

# Token counter for context management
enc = tiktoken.encoding_for_model("gpt-4o")

State Definition

The state machine needs to track everything from initial query through final report:

class Source(BaseModel):
    url: str
    title: str
    content: str
    search_query: str
    relevance_score: float = 0.0

class ResearchState(TypedDict):
    # Input
    query: str
    research_depth: str  # "quick", "standard", "deep"
    
    # Planning
    search_queries: List[str]
    research_dimensions: List[str]
    
    # Execution
    sources: Annotated[List[Source], operator.add]
    search_iterations: int
    
    # Analysis
    key_findings: str
    coverage_gaps: List[str]
    needs_more_research: bool
    
    # Output
    report: str
    citations: List[str]
    
    # Control
    max_iterations: int
    error: Optional[str]

Node 1: Research Planning

The planning node generates diverse, specific queries rather than one broad search:

class ResearchPlan(BaseModel):
    search_queries: List[str] = Field(
        description="5-7 specific search queries that cover different angles of the topic"
    )
    research_dimensions: List[str] = Field(
        description="Key dimensions to cover: technical, historical, practical, comparative, etc."
    )
    
structured_planner = llm_fast.with_structured_output(ResearchPlan)

def planning_node(state: ResearchState) -> ResearchState:
    depth_instructions = {
        "quick": "Generate 3 focused queries for a quick overview.",
        "standard": "Generate 5 queries covering technical details, examples, and comparisons.",
        "deep": "Generate 7 queries including edge cases, criticisms, and recent developments."
    }
    
    depth_note = depth_instructions.get(state["research_depth"], depth_instructions["standard"])
    
    plan = structured_planner.invoke([
        SystemMessage(content=f"""You are a research planning expert. 
        {depth_note}
        Each query should target a different aspect of the topic.
        Make queries specific enough to return useful results (not just the topic name).
        Include queries for: current state, comparisons, practical examples, limitations."""),
        HumanMessage(content=f"Create a research plan for: {state['query']}")
    ])
    
    return {
        "search_queries": plan.search_queries,
        "research_dimensions": plan.research_dimensions,
        "search_iterations": 0,
        "sources": [],
        "needs_more_research": False,
        "coverage_gaps": []
    }

Node 2: Search and Content Extraction

This node executes searches and extracts actual content from pages — not just snippets:

def extract_page_content(url: str, max_tokens: int = 1500) -> str:
    """Extract and clean content from a URL."""
    try:
        response = requests.get(url, timeout=10, headers={
            "User-Agent": "Mozilla/5.0 (compatible; ResearchBot/1.0)"
        })
        
        soup = BeautifulSoup(response.text, "html.parser")
        
        # Remove navigation, ads, scripts
        for tag in soup(["script", "style", "nav", "footer", "header", "aside"]):
            tag.decompose()
        
        # Extract main content
        main = soup.find("main") or soup.find("article") or soup.find("body")
        text = main.get_text(separator="\n", strip=True) if main else ""
        
        # Truncate to token limit
        tokens = enc.encode(text)
        if len(tokens) > max_tokens:
            text = enc.decode(tokens[:max_tokens])
        
        return text
        
    except Exception as e:
        return f"[Could not retrieve content: {e}]"

def search_execution_node(state: ResearchState) -> ResearchState:
    new_sources = []
    
    for query in state["search_queries"]:
        try:
            # Tavily returns structured results with content
            results = tavily.search(
                query=query,
                max_results=3,
                search_depth="advanced",  # More thorough than basic
                include_raw_content=True
            )
            
            for result in results["results"]:
                # Use Tavily's extracted content or fall back to scraping
                content = result.get("raw_content") or extract_page_content(result["url"])
                
                if len(content) < 100:
                    continue
                
                source = Source(
                    url=result["url"],
                    title=result.get("title", ""),
                    content=content[:2000],  # Cap per-source content
                    search_query=query,
                    relevance_score=result.get("score", 0.0)
                )
                new_sources.append(source)
                
        except Exception as e:
            print(f"Search failed for '{query}': {e}")
    
    # Deduplicate by URL
    existing_urls = {s.url for s in state["sources"]}
    unique_sources = [s for s in new_sources if s.url not in existing_urls]
    
    print(f"Found {len(unique_sources)} new sources (iteration {state['search_iterations'] + 1})")
    
    return {
        "sources": unique_sources,
        "search_iterations": state["search_iterations"] + 1
    }

Node 3: Gap Analysis

After searching, the agent evaluates coverage and decides whether to search more:

class CoverageAnalysis(BaseModel):
    covered_dimensions: List[str]
    missing_dimensions: List[str]
    follow_up_queries: List[str] = Field(
        description="2-3 specific queries to fill the most important gaps"
    )
    needs_more_research: bool

structured_gap_analyzer = llm_fast.with_structured_output(CoverageAnalysis)

def gap_analysis_node(state: ResearchState) -> ResearchState:
    # Don't search more than max_iterations
    if state["search_iterations"] >= state["max_iterations"]:
        return {
            "needs_more_research": False,
            "coverage_gaps": []
        }
    
    # Summarize what we have
    source_summary = "\n".join([
        f"- [{s.title}]({s.url}): {s.content[:200]}..."
        for s in state["sources"][:10]
    ])
    
    analysis = structured_gap_analyzer.invoke([
        SystemMessage(content="""Analyze research coverage. 
        Identify what dimensions are well-covered and what's missing.
        Only request more research if there are significant gaps that matter for the query.
        Be conservative — 10+ sources is usually sufficient."""),
        HumanMessage(content=f"""Query: {state['query']}
        
Intended dimensions: {', '.join(state['research_dimensions'])}
        
Sources retrieved ({len(state['sources'])} total):
{source_summary}

Is more research needed?""")
    ])
    
    # Update queries if follow-up needed
    if analysis.needs_more_research and analysis.follow_up_queries:
        return {
            "search_queries": analysis.follow_up_queries,
            "coverage_gaps": analysis.missing_dimensions,
            "needs_more_research": True
        }
    
    return {
        "needs_more_research": False,
        "coverage_gaps": analysis.missing_dimensions
    }

def should_search_more(state: ResearchState) -> str:
    if state["needs_more_research"] and state["search_iterations"] < state["max_iterations"]:
        return "search_more"
    return "synthesize"

Node 4: Synthesis

This node analyzes all retrieved content to extract structured findings:

def synthesis_node(state: ResearchState) -> ResearchState:
    # Build context from all sources
    source_context = []
    for i, source in enumerate(state["sources"]):
        source_context.append(
            f"Source {i+1}: {source.title}\nURL: {source.url}\n\n{source.content}\n"
        )
    
    # Keep within context limits (roughly 60k tokens max for gpt-4o)
    full_context = "\n---\n".join(source_context)
    tokens = enc.encode(full_context)
    if len(tokens) > 50000:
        # Trim least relevant sources
        sorted_sources = sorted(state["sources"], key=lambda s: s.relevance_score, reverse=True)
        top_sources = sorted_sources[:15]
        source_context = [
            f"Source {i+1}: {s.title}\nURL: {s.url}\n\n{s.content}\n"
            for i, s in enumerate(top_sources)
        ]
        full_context = "\n---\n".join(source_context)
    
    analysis = llm_smart.invoke([
        SystemMessage(content="""You are an expert research analyst.
        Analyze the provided sources and extract key findings.
        Focus on: main themes, conflicting information, data points, expert opinions.
        Note any outdated information (check dates where visible).
        Be specific — include numbers, names, and facts from the sources."""),
        HumanMessage(content=f"""Research query: {state['query']}

Sources:
{full_context}

Provide a detailed analysis of key findings across all sources.""")
    ])
    
    return {"key_findings": analysis.content}

Node 5: Report Generation

The final node writes the report, strictly citing only retrieved sources:

def report_generation_node(state: ResearchState) -> ResearchState:
    # Build citation index
    citation_map = {
        i + 1: source
        for i, source in enumerate(state["sources"])
    }
    
    citation_list = "\n".join([
        f"[{i}] {source.title} — {source.url}"
        for i, source in citation_map.items()
    ])
    
    report = llm_smart.invoke([
        SystemMessage(content=f"""You are a research report writer.
Write a comprehensive, well-structured research report.

CRITICAL CITATION RULES:
1. Only cite sources from the provided citation list below
2. Use [N] format for inline citations
3. Never cite URLs not in this list
4. If information isn't from a source, don't add a citation

Report structure:
- Executive Summary (2-3 sentences)
- Key Findings (3-5 bullet points with citations)
- Detailed Analysis (3-4 H2 sections, ~200 words each, with citations)
- Limitations and Gaps
- Sources

Available citations:
{citation_list}"""),
        HumanMessage(content=f"""Query: {state['query']}

Research findings:
{state['key_findings']}

Write the full research report now.""")
    ])
    
    citations = [
        f"[{i}] {source.title} — {source.url}"
        for i, source in citation_map.items()
    ]
    
    return {
        "report": report.content,
        "citations": citations
    }

Assembling the LangGraph Workflow

def build_research_agent(max_iterations: int = 2):
    workflow = StateGraph(ResearchState)
    
    workflow.add_node("planning", planning_node)
    workflow.add_node("search", search_execution_node)
    workflow.add_node("gap_analysis", gap_analysis_node)
    workflow.add_node("synthesis", synthesis_node)
    workflow.add_node("report", report_generation_node)
    
    workflow.set_entry_point("planning")
    workflow.add_edge("planning", "search")
    workflow.add_edge("search", "gap_analysis")
    
    workflow.add_conditional_edges(
        "gap_analysis",
        should_search_more,
        {
            "search_more": "search",
            "synthesize": "synthesis"
        }
    )
    
    workflow.add_edge("synthesis", "report")
    workflow.add_edge("report", END)
    
    return workflow.compile()

agent = build_research_agent(max_iterations=2)

def research(query: str, depth: str = "standard") -> dict:
    """Run the research agent."""
    depth_to_iterations = {"quick": 1, "standard": 2, "deep": 3}
    
    initial_state = {
        "query": query,
        "research_depth": depth,
        "search_queries": [],
        "research_dimensions": [],
        "sources": [],
        "search_iterations": 0,
        "key_findings": "",
        "coverage_gaps": [],
        "needs_more_research": False,
        "report": "",
        "citations": [],
        "max_iterations": depth_to_iterations.get(depth, 2),
        "error": None
    }
    
    result = agent.invoke(initial_state)
    
    return {
        "report": result["report"],
        "sources_used": len(result["sources"]),
        "citations": result["citations"],
        "coverage_gaps": result["coverage_gaps"]
    }

# Run it
result = research(
    "What are the current limitations of AI coding agents in 2025?",
    depth="standard"
)

print(result["report"])
print(f"\n{result['sources_used']} sources used")

Adding Streaming Progress Updates

For production use, stream progress updates to the user:

def research_with_streaming(query: str, depth: str = "standard"):
    """Stream research progress."""
    
    initial_state = {
        "query": query,
        "research_depth": depth,
        "search_queries": [],
        "research_dimensions": [],
        "sources": [],
        "search_iterations": 0,
        "key_findings": "",
        "coverage_gaps": [],
        "needs_more_research": False,
        "report": "",
        "citations": [],
        "max_iterations": 2,
        "error": None
    }
    
    for event in agent.stream(initial_state, stream_mode="updates"):
        for node, updates in event.items():
            if node == "planning":
                queries = updates.get("search_queries", [])
                print(f"Planning complete: {len(queries)} queries generated")
                for q in queries:
                    print(f"  → {q}")
            
            elif node == "search":
                new_sources = updates.get("sources", [])
                iteration = updates.get("search_iterations", 0)
                print(f"\nSearch iteration {iteration}: {len(new_sources)} new sources")
            
            elif node == "gap_analysis":
                gaps = updates.get("coverage_gaps", [])
                needs_more = updates.get("needs_more_research", False)
                if needs_more:
                    print(f"Gaps found: {gaps}. Searching more...")
                else:
                    print("Coverage sufficient. Moving to synthesis.")
            
            elif node == "synthesis":
                print("\nSynthesizing findings...")
            
            elif node == "report":
                report = updates.get("report", "")
                print(f"\nReport complete ({len(report)} characters)")
                print("\n" + "="*60)
                print(report[:500] + "...")  # Preview

Production Considerations

Rate Limiting and Cost Control

import time
from functools import wraps

def rate_limited(calls_per_minute: int):
    """Simple rate limiter for API calls."""
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]
    
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            if elapsed < min_interval:
                time.sleep(min_interval - elapsed)
            result = func(*args, **kwargs)
            last_called[0] = time.time()
            return result
        return wrapper
    return decorator

@rate_limited(calls_per_minute=30)
def safe_search(query: str) -> dict:
    return tavily.search(query=query, max_results=3)

Caching Repeated Research

import hashlib
import json
from pathlib import Path

CACHE_DIR = Path("research_cache")
CACHE_DIR.mkdir(exist_ok=True)

def cached_research(query: str, depth: str = "standard", max_age_hours: int = 24) -> dict:
    """Cache research results to avoid duplicate API calls."""
    import time
    
    cache_key = hashlib.md5(f"{query}:{depth}".encode()).hexdigest()
    cache_file = CACHE_DIR / f"{cache_key}.json"
    
    # Check cache
    if cache_file.exists():
        cached = json.loads(cache_file.read_text())
        age_hours = (time.time() - cached["timestamp"]) / 3600
        if age_hours < max_age_hours:
            print(f"Cache hit (age: {age_hours:.1f}h)")
            return cached["result"]
    
    # Run research
    result = research(query, depth)
    
    # Save to cache
    cache_file.write_text(json.dumps({
        "query": query,
        "depth": depth,
        "timestamp": time.time(),
        "result": result
    }))
    
    return result

Conclusion

The cost is roughly $0.10-$0.45 per research task with the configuration above — competitive with paying for a research assistant even at scale.

For the broader agent frameworks that power this pattern, see our LangGraph tutorial. For agent memory systems that let research agents learn across tasks, see our agent memory and planning guide.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI Learning

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

AI agent memory and planning explained — how agents store context across sessions, plan multi-step tasks, and use working memory, episodic memory, and semantic memory effectively.

May 27, 2026 8 min read

AI Learning

🔥 Trending

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI agents explained — how autonomous AI systems perceive, reason, and act to complete complex tasks, the architectures powering them, and practical examples from ReAct to LangGraph.

May 27, 2026 7 min read

AI Learning

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

AI agents and the future of work — what tasks are being automated, which jobs are transforming, and what skills matter most as autonomous agents reshape knowledge work.

May 27, 2026 9 min read

AI Learning

🔥 Trending

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Will AI agents replace software developers? An honest technical analysis of what AI agents can and can't do, current limitations, and what skills remain uniquely human in 2025.

May 27, 2026 8 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Architecture Overview

Setup and Dependencies

State Definition

Node 1: Research Planning

Node 2: Search and Content Extraction

Node 3: Gap Analysis

Node 4: Synthesis

Node 5: Report Generation

Assembling the LangGraph Workflow

Adding Streaming Progress Updates

Production Considerations

Rate Limiting and Cost Control

Caching Repeated Research

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Go deeper on this topic

Get Free AI Notes Daily

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Build a Research Agent: End-to-End Autonomous Research Tool in Python

Architecture Overview

Setup and Dependencies

State Definition

Node 1: Research Planning

Node 2: Search and Content Extraction

Node 3: Gap Analysis

Node 4: Synthesis

Node 5: Report Generation

Assembling the LangGraph Workflow

Adding Streaming Progress Updates

Production Considerations

Rate Limiting and Cost Control

Caching Repeated Research

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI Agent Memory and Planning: How Agents Remember and Reason About Long Tasks

AI Agents Explained: How Autonomous AI Systems Work and What They Can Do

AI Agents and the Future of Work: What's Actually Changing in 2025-2030

Will AI Agents Replace Software Developers? The Honest Technical Analysis

Go deeper on this topic

Get Free AI Notes Daily