AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

autonomous research agent searching web and summarizing — AutoGPT research

Build a Research Agent with AutoGPT (Web Search + Summarize)

⚡ Quick Answer

Build an autonomous research agent with AutoGPT that searches the web, extracts key information, and produces structured summaries with configurable output formats.

AiTechWorlds Team May 31, 2026 10 min read

#AutoGPT #research agent #web search #summarization

📚Part of the Autogpt Autogen guide — explore all Autogpt Autogen articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

A research agent that actually works — not one that loops for 40 cycles and produces a garbled paragraph — requires careful goal design, the right search configuration, and a clear output format. This guide builds one end-to-end.

The target: an autonomous agent that takes a research topic, searches the web, reads relevant pages, extracts key findings, and writes a structured report. No human intervention after setup.

For context on what makes autonomous agents different from regular LLM calls, AI agents explained is a good primer. And if you want to compare this AutoGPT approach to what you can build with LangChain, AI research agent build covers the LangChain side.

What You Need Before Starting

AutoGPT installed (version 0.5+)
OpenAI API key (GPT-4 or GPT-4o recommended)
Search API key: SerpAPI, Google Custom Search, or Bing Search API
Python 3.10+
Optional: Docker (for safer execution)

# Install AutoGPT
git clone https://github.com/Significant-Gravitas/AutoGPT
cd AutoGPT/autogpts/autogpt

# Install dependencies
poetry install

# Copy and configure environment
cp .env.template .env

Configuring the Search Plugin

AutoGPT supports multiple search backends. SerpAPI is the most reliable for research tasks:

# .env file — search configuration
OPENAI_API_KEY=sk-your-openai-key

# Search backend — choose one
GOOGLE_API_KEY=your-google-api-key
CUSTOM_SEARCH_ENGINE_ID=your-search-engine-id

# OR use SerpAPI
SERPAPI_API_KEY=your-serpapi-key

# OR use DuckDuckGo (free but rate-limited)
# No key needed, just set:
SEARCH_BACKEND=duckduckgo

# Browser for reading web pages
USE_WEB_BROWSER=selenium
HEADLESS_BROWSER=True

# Model configuration
SMART_LLM_MODEL=gpt-4o
FAST_LLM_MODEL=gpt-4o-mini

# Output and memory
MEMORY_BACKEND=local
WORKSPACE_BACKEND=local
RESTRICT_TO_WORKSPACE=True

# Cost control
CONTINUOUS_LIMIT=25  # max actions per run

SerpAPI costs roughly $0.005 per search and gives you structured results. DuckDuckGo is free but has rate limits that can stall research runs. For serious research tasks, SerpAPI is worth the cost.

Designing the Research Goal

The goal structure makes or breaks the agent. Here is the template:

# research_agent.yaml
ai_name: ResearchAgent
ai_role: >
  An autonomous research assistant that searches the web, reads sources,
  and produces structured reports with citations.

ai_goals:
  - >
    Search for "{TOPIC}" using web search. Find at least 5 credible sources
    published in {YEAR_RANGE}. Prefer: academic papers, official documentation,
    major tech publications (TechCrunch, Wired, MIT Technology Review).
  
  - >
    For each source found, browse the URL and extract:
    (1) Key claim or finding, (2) Supporting evidence or data,
    (3) Publication date, (4) Author or organization.
    
  - >
    Write a structured research report to research_report.md containing:
    (1) Executive summary (3-5 sentences),
    (2) Key findings as bullet points with citations,
    (3) Comparison table if multiple items are being compared,
    (4) Conclusion with 2-3 actionable insights.
    
  - >
    Verify that research_report.md exists and contains at least 500 words
    with citations for at least 3 sources.
    
  - >
    Task is COMPLETE when research_report.md is written and verified.
    Do not search for more sources after the file is written.

api_budget: 3.00

Notice the pattern: each goal is specific, the output format is defined, and there is an explicit termination condition. This prevents the most common failure mode — infinite refinement loops.

The Research Agent in Action

Here is a concrete run using LangChain's AutoGPT implementation, which gives you more control than the CLI:

from langchain_experimental.autonomous_agents import AutoGPT
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.tools.file_management import WriteFileTool, ReadFileTool
from langchain_community.document_loaders import WebBaseLoader
import json

# Initialize components
llm = ChatOpenAI(model="gpt-4o", temperature=0.2, max_tokens=4000)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["initial"], embeddings)

# Search tool
search = DuckDuckGoSearchRun()

# Web reader tool — fetches and summarizes web pages
def read_webpage(url: str) -> str:
    """Read and extract text content from a URL."""
    try:
        loader = WebBaseLoader(url)
        docs = loader.load()
        if docs:
            # Return first 3000 characters to avoid token overflow
            content = docs[0].page_content[:3000]
            return f"Content from {url}:\n{content}"
        return f"Could not load content from {url}"
    except Exception as e:
        return f"Error reading {url}: {str(e)}"

webpage_tool = Tool(
    name="read_webpage",
    func=read_webpage,
    description="""Read the full content of a webpage. 
    Input: a complete URL starting with http:// or https://
    Output: extracted text content from the page.""",
)

# Citation tracker
citations = []

def add_citation(citation_json: str) -> str:
    """Save a citation. Input: JSON with 'source', 'title', 'finding' keys."""
    try:
        citation = json.loads(citation_json)
        citations.append(citation)
        return f"Citation saved. Total citations: {len(citations)}"
    except:
        return "Citation format error. Use JSON: {\"source\": \"url\", \"title\": \"title\", \"finding\": \"key finding\"}"

citation_tool = Tool(
    name="save_citation",
    func=add_citation,
    description="Save a citation from a source. Use after reading each relevant webpage.",
)

tools = [
    search,
    webpage_tool,
    citation_tool,
    WriteFileTool(),
    ReadFileTool(),
]

# Create the AutoGPT agent
agent = AutoGPT.from_llm_and_tools(
    ai_name="ResearchBot",
    ai_role="""An autonomous research assistant that produces well-cited reports.
    Always search first, then read sources, then write the report.
    Never make up statistics or claims — only use information from sources you have read.""",
    tools=tools,
    llm=llm,
    memory=vectorstore.as_retriever(),
)

agent.chain.verbose = True

# Run the research agent
research_topic = "the impact of vector databases on enterprise AI applications in 2024-2025"

agent.run([
    f"Search for: {research_topic}",
    "Read the top 4-5 most relevant results. Use save_citation for each key finding.",
    "Write a structured research report to vector_db_report.md with: summary, key findings with citations, and conclusion.",
    "Stop after the report is written. Do not continue searching.",
])

print(f"\nCollected {len(citations)} citations")
for i, c in enumerate(citations, 1):
    print(f"{i}. {c.get('title', 'Unknown')} — {c.get('source', '')}")

Adding a Summarization Chain

The raw content from web pages is often messy. Adding an explicit summarization step improves report quality:

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

summarize_prompt = PromptTemplate(
    input_variables=["content", "topic"],
    template="""
You are a research assistant. Summarize the following content in the context of: {topic}

Content:
{content}

Provide a summary with:
1. Main claim or finding (1-2 sentences)
2. Key supporting data or evidence
3. Relevance to the topic

Summary:"""
)

summarize_chain = LLMChain(
    llm=ChatOpenAI(model="gpt-4o-mini", temperature=0),  # cheaper model for summarization
    prompt=summarize_prompt,
)

def summarize_webpage(url_and_topic: str) -> str:
    """
    Summarize a webpage in context of the research topic.
    Input format: 'URL|||RESEARCH_TOPIC'
    """
    parts = url_and_topic.split("|||")
    if len(parts) != 2:
        return "Format error. Use: URL|||research topic"
    
    url, topic = parts[0].strip(), parts[1].strip()
    
    # Fetch content
    content = read_webpage(url)
    
    # Summarize
    result = summarize_chain.run(content=content, topic=topic)
    return result

summarize_tool = Tool(
    name="summarize_webpage",
    func=summarize_webpage,
    description="""Fetch and summarize a webpage in context of your research topic.
    Input format: 'https://url.com|||your research topic'
    Better than read_webpage for extracting relevant information.""",
)

This two-step pattern (fetch + summarize) is more expensive in tokens but produces much cleaner source material for the final report.

Output Format Configuration

The structure of the final report is determined by what you put in the goal. Here are templates for different output formats:

Technical Brief Format:

report_format_goal = """
Write research_report.md with exactly this structure:

# [Topic] Research Brief
*Date: [today's date]*

## Executive Summary
[3-5 sentences summarizing the key findings]

## Key Findings

### [Finding 1 Title]
[2-3 sentences with evidence]
*Source: [URL]*

### [Finding 2 Title]
[2-3 sentences with evidence]
*Source: [URL]*

[Continue for all findings]

## Comparison Table

| Aspect | Option A | Option B | Option C |
|--------|----------|----------|----------|
| [row]  | [value]  | [value]  | [value]  |

## Recommendations
1. [Actionable recommendation]
2. [Actionable recommendation]
3. [Actionable recommendation]

## Sources
- [Title] — [URL] — [Accessed date]
"""

JSON Format (for programmatic use):

json_format_goal = """
Save research results to research_results.json with this structure:
{
  "topic": "string",
  "date": "YYYY-MM-DD",
  "summary": "string",
  "findings": [
    {
      "title": "string",
      "content": "string",
      "source_url": "string",
      "relevance_score": 1-5
    }
  ],
  "recommendations": ["string"],
  "sources": [{"title": "string", "url": "string"}]
}
"""

Running a Multi-Topic Research Session

For research across multiple related topics, use sequential AutoGPT runs or a loop:

research_topics = [
    "Pinecone vector database pricing and performance 2025",
    "Weaviate open source deployment options 2025",
    "Qdrant vs Pinecone performance benchmarks 2025",
]

results = {}

for topic in research_topics:
    print(f"\nResearching: {topic}")
    
    # Fresh agent for each topic to avoid context contamination
    topic_agent = AutoGPT.from_llm_and_tools(
        ai_name="ResearchBot",
        ai_role="Precise research assistant. Only report verified facts from sources.",
        tools=tools,
        llm=llm,
        memory=FAISS.from_texts(["initial"], embeddings).as_retriever(),
    )
    
    safe_filename = topic.replace(" ", "_")[:50]
    
    topic_agent.run([
        f"Search for: {topic}",
        f"Read the top 3 results. Extract key facts.",
        f"Save findings to {safe_filename}.json as a JSON object with 'topic', 'findings', 'sources' keys.",
        "Stop after saving the file.",
    ])
    
    results[topic] = safe_filename

print("\nAll topics researched. Files saved:")
for topic, filename in results.items():
    print(f"  {filename}.json — {topic}")

Combining with a Summarization Agent

The most powerful pattern pairs AutoGPT's web research with a separate summarization agent:

import autogen

# AutoGPT handles raw research and saves to files
# AutoGen handles synthesis and final report generation

config_list = [{"model": "gpt-4o", "api_key": "YOUR_KEY"}]
llm_config = {"config_list": config_list}

synthesizer = autogen.AssistantAgent(
    name="Synthesizer",
    system_message="""You synthesize research from multiple files into a cohesive report.
    When given a list of JSON research files, read each one and combine the findings.
    Produce a final report that avoids duplication and highlights consensus findings.""",
    llm_config=llm_config,
)

editor = autogen.AssistantAgent(
    name="Editor",
    system_message="""You edit and polish research reports. 
    Check for: logical flow, repetition, unsupported claims, and readability.
    Return a polished version ready for publication.""",
    llm_config=llm_config,
)

user = autogen.UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "workspace"},
)

# After AutoGPT research phase
user.initiate_chat(
    synthesizer,
    message="""Read the research files: pinecone_research.json, weaviate_research.json, 
    qdrant_research.json. Synthesize a comprehensive comparison report.""",
)

This hybrid approach uses each tool for what it does best: AutoGPT for autonomous web research, AutoGen for structured synthesis.

For more on AutoGen orchestration patterns, AutoGen group chat patterns shows how to coordinate multiple synthesis agents. And for a LangChain-only approach to the same research pattern, Build AI agent with LangChain is worth comparing.

Common Failure Modes and Fixes

Agent loops on search: Add to goals — "After finding 5 sources, stop searching and move to writing."

Off-topic research: Add to goals — "Only research [SPECIFIC TOPIC]. Do not follow links about unrelated subjects."

Empty or thin reports: Increase CONTINUOUS_LIMIT and add a goal — "The report must contain at least 600 words and 3 citations."

Browser errors: Switch from Selenium to Playwright in .env. For headless environments, set HEADLESS_BROWSER=True.

Token cost overruns: Use FAST_LLM_MODEL=gpt-4o-mini for simple browsing and summarization steps, reserving SMART_LLM_MODEL=gpt-4o for planning and report writing.

Cost Estimation

A typical research run producing a 500-word report with 5 sources costs:

Component	Approx. Cost
5 search queries (SerpAPI)	$0.025
5 web page reads (tokens)	$0.08
Planning and reasoning steps	$0.15
Report generation	$0.06
Total	~$0.32

This scales roughly linearly with the number of sources. A deep research project covering 20 sources with a full 2000-word report typically runs $1.00–$1.50. Set api_budget accordingly.

For agents that need to work with existing document collections rather than live web search, Vector database guide explains how to build retrieval systems that let your agent search pre-indexed content instead of browsing the web each time.

Frequently Asked Questions

How does AutoGPT search the web? AutoGPT uses search APIs (Google Custom Search, SerpAPI, or DuckDuckGo) to retrieve search results, then uses a browser command (Selenium or Playwright) to fetch and read the full content of relevant URLs. You configure the search provider in your .env file.

Can AutoGPT summarize PDFs and documents? AutoGPT can summarize web pages directly. For PDFs, it depends on whether the PDF is accessible via URL — it can fetch and extract text from web-hosted PDFs. For local PDFs, you need to combine AutoGPT with a document processing tool or use LangChain's document loaders instead.

How do I prevent the research agent from going off-topic? Use specific, bounded goals with explicit constraints. Specify the exact topics, time ranges, and number of sources. Include a termination condition (e.g., "stop after finding 5 sources"). You can also add a constraint in the goal: "Only research [TOPIC]. Do not follow unrelated links."

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AutoGPT uses search APIs (Google Custom Search, SerpAPI, or DuckDuckGo) to retrieve search results, then uses a browser command (Selenium or Playwright) to fetch and read the full content of relevant URLs. You configure the search provider in your .env file.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI agent role assignment diagram — AutoGen agent types roles

Agent Development

5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)

Understand the 5 core AutoGen agent types — AssistantAgent, UserProxyAgent, CodeExecutorAgent, and more — with code examples and a comparison table for each role.

May 31, 2026 11 min read

AutoGen agent served as REST API endpoint — FastAPI deployment

Agent Development

How to Deploy AutoGen Agents as APIs with FastAPI (2026)

Learn to serve AutoGen multi-agent systems as production REST APIs using FastAPI with async endpoints and real-time streaming responses.

May 31, 2026 10 min read

Azure OpenAI enterprise integration with AutoGen — managed private instances

Agent Development

How to Use AutoGen with Azure OpenAI (Enterprise Security)

Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.

May 31, 2026 10 min read

AI agent automatically fixing code bugs — AutoGen code debugging auto-fix

Agent Development

Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)

Build an AutoGen agent that reviews code, analyzes PR diffs, suggests fixes, and automates code quality improvements with a full working implementation.

May 31, 2026 11 min read

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Autogpt Autogen

Build a Research Agent with AutoGPT (Web Search + Summarize)

⚡ Quick Answer

Build an autonomous research agent with AutoGPT that searches the web, extracts key information, and produces structured summaries with configurable output formats.

AiTechWorlds Team May 31, 2026 10 min read

#AutoGPT #research agent #web search #summarization

📚Part of the Autogpt Autogen guide — explore all Autogpt Autogen articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

The target: an autonomous agent that takes a research topic, searches the web, reads relevant pages, extracts key findings, and writes a structured report. No human intervention after setup.

What You Need Before Starting

AutoGPT installed (version 0.5+)
OpenAI API key (GPT-4 or GPT-4o recommended)
Search API key: SerpAPI, Google Custom Search, or Bing Search API
Python 3.10+
Optional: Docker (for safer execution)

# Install AutoGPT
git clone https://github.com/Significant-Gravitas/AutoGPT
cd AutoGPT/autogpts/autogpt

# Install dependencies
poetry install

# Copy and configure environment
cp .env.template .env

Configuring the Search Plugin

AutoGPT supports multiple search backends. SerpAPI is the most reliable for research tasks:

# .env file — search configuration
OPENAI_API_KEY=sk-your-openai-key

# Search backend — choose one
GOOGLE_API_KEY=your-google-api-key
CUSTOM_SEARCH_ENGINE_ID=your-search-engine-id

# OR use SerpAPI
SERPAPI_API_KEY=your-serpapi-key

# OR use DuckDuckGo (free but rate-limited)
# No key needed, just set:
SEARCH_BACKEND=duckduckgo

# Browser for reading web pages
USE_WEB_BROWSER=selenium
HEADLESS_BROWSER=True

# Model configuration
SMART_LLM_MODEL=gpt-4o
FAST_LLM_MODEL=gpt-4o-mini

# Output and memory
MEMORY_BACKEND=local
WORKSPACE_BACKEND=local
RESTRICT_TO_WORKSPACE=True

# Cost control
CONTINUOUS_LIMIT=25  # max actions per run

SerpAPI costs roughly $0.005 per search and gives you structured results. DuckDuckGo is free but has rate limits that can stall research runs. For serious research tasks, SerpAPI is worth the cost.

Designing the Research Goal

The goal structure makes or breaks the agent. Here is the template:

# research_agent.yaml
ai_name: ResearchAgent
ai_role: >
  An autonomous research assistant that searches the web, reads sources,
  and produces structured reports with citations.

ai_goals:
  - >
    Search for "{TOPIC}" using web search. Find at least 5 credible sources
    published in {YEAR_RANGE}. Prefer: academic papers, official documentation,
    major tech publications (TechCrunch, Wired, MIT Technology Review).
  
  - >
    For each source found, browse the URL and extract:
    (1) Key claim or finding, (2) Supporting evidence or data,
    (3) Publication date, (4) Author or organization.
    
  - >
    Write a structured research report to research_report.md containing:
    (1) Executive summary (3-5 sentences),
    (2) Key findings as bullet points with citations,
    (3) Comparison table if multiple items are being compared,
    (4) Conclusion with 2-3 actionable insights.
    
  - >
    Verify that research_report.md exists and contains at least 500 words
    with citations for at least 3 sources.
    
  - >
    Task is COMPLETE when research_report.md is written and verified.
    Do not search for more sources after the file is written.

api_budget: 3.00

Notice the pattern: each goal is specific, the output format is defined, and there is an explicit termination condition. This prevents the most common failure mode — infinite refinement loops.

The Research Agent in Action

Here is a concrete run using LangChain's AutoGPT implementation, which gives you more control than the CLI:

from langchain_experimental.autonomous_agents import AutoGPT
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.tools.file_management import WriteFileTool, ReadFileTool
from langchain_community.document_loaders import WebBaseLoader
import json

# Initialize components
llm = ChatOpenAI(model="gpt-4o", temperature=0.2, max_tokens=4000)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["initial"], embeddings)

# Search tool
search = DuckDuckGoSearchRun()

# Web reader tool — fetches and summarizes web pages
def read_webpage(url: str) -> str:
    """Read and extract text content from a URL."""
    try:
        loader = WebBaseLoader(url)
        docs = loader.load()
        if docs:
            # Return first 3000 characters to avoid token overflow
            content = docs[0].page_content[:3000]
            return f"Content from {url}:\n{content}"
        return f"Could not load content from {url}"
    except Exception as e:
        return f"Error reading {url}: {str(e)}"

webpage_tool = Tool(
    name="read_webpage",
    func=read_webpage,
    description="""Read the full content of a webpage. 
    Input: a complete URL starting with http:// or https://
    Output: extracted text content from the page.""",
)

# Citation tracker
citations = []

def add_citation(citation_json: str) -> str:
    """Save a citation. Input: JSON with 'source', 'title', 'finding' keys."""
    try:
        citation = json.loads(citation_json)
        citations.append(citation)
        return f"Citation saved. Total citations: {len(citations)}"
    except:
        return "Citation format error. Use JSON: {\"source\": \"url\", \"title\": \"title\", \"finding\": \"key finding\"}"

citation_tool = Tool(
    name="save_citation",
    func=add_citation,
    description="Save a citation from a source. Use after reading each relevant webpage.",
)

tools = [
    search,
    webpage_tool,
    citation_tool,
    WriteFileTool(),
    ReadFileTool(),
]

# Create the AutoGPT agent
agent = AutoGPT.from_llm_and_tools(
    ai_name="ResearchBot",
    ai_role="""An autonomous research assistant that produces well-cited reports.
    Always search first, then read sources, then write the report.
    Never make up statistics or claims — only use information from sources you have read.""",
    tools=tools,
    llm=llm,
    memory=vectorstore.as_retriever(),
)

agent.chain.verbose = True

# Run the research agent
research_topic = "the impact of vector databases on enterprise AI applications in 2024-2025"

agent.run([
    f"Search for: {research_topic}",
    "Read the top 4-5 most relevant results. Use save_citation for each key finding.",
    "Write a structured research report to vector_db_report.md with: summary, key findings with citations, and conclusion.",
    "Stop after the report is written. Do not continue searching.",
])

print(f"\nCollected {len(citations)} citations")
for i, c in enumerate(citations, 1):
    print(f"{i}. {c.get('title', 'Unknown')} — {c.get('source', '')}")

Adding a Summarization Chain

The raw content from web pages is often messy. Adding an explicit summarization step improves report quality:

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

summarize_prompt = PromptTemplate(
    input_variables=["content", "topic"],
    template="""
You are a research assistant. Summarize the following content in the context of: {topic}

Content:
{content}

Provide a summary with:
1. Main claim or finding (1-2 sentences)
2. Key supporting data or evidence
3. Relevance to the topic

Summary:"""
)

summarize_chain = LLMChain(
    llm=ChatOpenAI(model="gpt-4o-mini", temperature=0),  # cheaper model for summarization
    prompt=summarize_prompt,
)

def summarize_webpage(url_and_topic: str) -> str:
    """
    Summarize a webpage in context of the research topic.
    Input format: 'URL|||RESEARCH_TOPIC'
    """
    parts = url_and_topic.split("|||")
    if len(parts) != 2:
        return "Format error. Use: URL|||research topic"
    
    url, topic = parts[0].strip(), parts[1].strip()
    
    # Fetch content
    content = read_webpage(url)
    
    # Summarize
    result = summarize_chain.run(content=content, topic=topic)
    return result

summarize_tool = Tool(
    name="summarize_webpage",
    func=summarize_webpage,
    description="""Fetch and summarize a webpage in context of your research topic.
    Input format: 'https://url.com|||your research topic'
    Better than read_webpage for extracting relevant information.""",
)

This two-step pattern (fetch + summarize) is more expensive in tokens but produces much cleaner source material for the final report.

Output Format Configuration

The structure of the final report is determined by what you put in the goal. Here are templates for different output formats:

Technical Brief Format:

report_format_goal = """
Write research_report.md with exactly this structure:

# [Topic] Research Brief
*Date: [today's date]*

## Executive Summary
[3-5 sentences summarizing the key findings]

## Key Findings

### [Finding 1 Title]
[2-3 sentences with evidence]
*Source: [URL]*

### [Finding 2 Title]
[2-3 sentences with evidence]
*Source: [URL]*

[Continue for all findings]

## Comparison Table

| Aspect | Option A | Option B | Option C |
|--------|----------|----------|----------|
| [row]  | [value]  | [value]  | [value]  |

## Recommendations
1. [Actionable recommendation]
2. [Actionable recommendation]
3. [Actionable recommendation]

## Sources
- [Title] — [URL] — [Accessed date]
"""

JSON Format (for programmatic use):

json_format_goal = """
Save research results to research_results.json with this structure:
{
  "topic": "string",
  "date": "YYYY-MM-DD",
  "summary": "string",
  "findings": [
    {
      "title": "string",
      "content": "string",
      "source_url": "string",
      "relevance_score": 1-5
    }
  ],
  "recommendations": ["string"],
  "sources": [{"title": "string", "url": "string"}]
}
"""

Running a Multi-Topic Research Session

For research across multiple related topics, use sequential AutoGPT runs or a loop:

research_topics = [
    "Pinecone vector database pricing and performance 2025",
    "Weaviate open source deployment options 2025",
    "Qdrant vs Pinecone performance benchmarks 2025",
]

results = {}

for topic in research_topics:
    print(f"\nResearching: {topic}")
    
    # Fresh agent for each topic to avoid context contamination
    topic_agent = AutoGPT.from_llm_and_tools(
        ai_name="ResearchBot",
        ai_role="Precise research assistant. Only report verified facts from sources.",
        tools=tools,
        llm=llm,
        memory=FAISS.from_texts(["initial"], embeddings).as_retriever(),
    )
    
    safe_filename = topic.replace(" ", "_")[:50]
    
    topic_agent.run([
        f"Search for: {topic}",
        f"Read the top 3 results. Extract key facts.",
        f"Save findings to {safe_filename}.json as a JSON object with 'topic', 'findings', 'sources' keys.",
        "Stop after saving the file.",
    ])
    
    results[topic] = safe_filename

print("\nAll topics researched. Files saved:")
for topic, filename in results.items():
    print(f"  {filename}.json — {topic}")

Combining with a Summarization Agent

The most powerful pattern pairs AutoGPT's web research with a separate summarization agent:

import autogen

# AutoGPT handles raw research and saves to files
# AutoGen handles synthesis and final report generation

config_list = [{"model": "gpt-4o", "api_key": "YOUR_KEY"}]
llm_config = {"config_list": config_list}

synthesizer = autogen.AssistantAgent(
    name="Synthesizer",
    system_message="""You synthesize research from multiple files into a cohesive report.
    When given a list of JSON research files, read each one and combine the findings.
    Produce a final report that avoids duplication and highlights consensus findings.""",
    llm_config=llm_config,
)

editor = autogen.AssistantAgent(
    name="Editor",
    system_message="""You edit and polish research reports. 
    Check for: logical flow, repetition, unsupported claims, and readability.
    Return a polished version ready for publication.""",
    llm_config=llm_config,
)

user = autogen.UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "workspace"},
)

# After AutoGPT research phase
user.initiate_chat(
    synthesizer,
    message="""Read the research files: pinecone_research.json, weaviate_research.json, 
    qdrant_research.json. Synthesize a comprehensive comparison report.""",
)

This hybrid approach uses each tool for what it does best: AutoGPT for autonomous web research, AutoGen for structured synthesis.

Common Failure Modes and Fixes

Agent loops on search: Add to goals — "After finding 5 sources, stop searching and move to writing."

Off-topic research: Add to goals — "Only research [SPECIFIC TOPIC]. Do not follow links about unrelated subjects."

Empty or thin reports: Increase CONTINUOUS_LIMIT and add a goal — "The report must contain at least 600 words and 3 citations."

Browser errors: Switch from Selenium to Playwright in .env. For headless environments, set HEADLESS_BROWSER=True.

Token cost overruns: Use FAST_LLM_MODEL=gpt-4o-mini for simple browsing and summarization steps, reserving SMART_LLM_MODEL=gpt-4o for planning and report writing.

Cost Estimation

A typical research run producing a 500-word report with 5 sources costs:

Component	Approx. Cost
5 search queries (SerpAPI)	$0.025
5 web page reads (tokens)	$0.08
Planning and reasoning steps	$0.15
Report generation	$0.06
Total	~$0.32

This scales roughly linearly with the number of sources. A deep research project covering 20 sources with a full 2000-word report typically runs $1.00–$1.50. Set api_budget accordingly.

Frequently Asked Questions

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Agent Development

5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)

Understand the 5 core AutoGen agent types — AssistantAgent, UserProxyAgent, CodeExecutorAgent, and more — with code examples and a comparison table for each role.

May 31, 2026 11 min read

Agent Development

How to Deploy AutoGen Agents as APIs with FastAPI (2026)

Learn to serve AutoGen multi-agent systems as production REST APIs using FastAPI with async endpoints and real-time streaming responses.

May 31, 2026 10 min read

Agent Development

How to Use AutoGen with Azure OpenAI (Enterprise Security)

Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.

May 31, 2026 10 min read

Agent Development

Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)

Build an AutoGen agent that reviews code, analyzes PR diffs, suggests fixes, and automates code quality improvements with a full working implementation.

May 31, 2026 11 min read

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Build a Research Agent with AutoGPT (Web Search + Summarize)

What You Need Before Starting

Configuring the Search Plugin

Designing the Research Goal

The Research Agent in Action

Adding a Summarization Chain

Output Format Configuration

Running a Multi-Topic Research Session

Combining with a Summarization Agent

Common Failure Modes and Fixes

Cost Estimation

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)

How to Deploy AutoGen Agents as APIs with FastAPI (2026)

How to Use AutoGen with Azure OpenAI (Enterprise Security)

Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)

Get Free AI Notes Daily

Build a Research Agent with AutoGPT (Web Search + Summarize)

What You Need Before Starting

Configuring the Search Plugin

Designing the Research Goal

The Research Agent in Action

Adding a Summarization Chain

Output Format Configuration

Running a Multi-Topic Research Session

Combining with a Summarization Agent

Common Failure Modes and Fixes

Cost Estimation

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)

How to Deploy AutoGen Agents as APIs with FastAPI (2026)

How to Use AutoGen with Azure OpenAI (Enterprise Security)

Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)

Get Free AI Notes Daily