How to Build a Multi-Agent System with AutoGen (Group Chat)
Step-by-step AutoGen group chat tutorial: build a researcher, coder, and critic agent system with proper termination logic and real working code.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
I spent an afternoon last month trying to get an AutoGen group chat to terminate properly. The agents kept congratulating each other on good work and then starting the task over again. It was both funny and deeply frustrating.
AutoGen's GroupChat feature is genuinely useful once you understand the mechanics — but the documentation leaves some gaps. This tutorial builds a complete three-agent system (Researcher, Coder, Critic) from scratch, with working termination conditions and proper role definitions. Every code snippet here has been tested.
If you're brand new to AutoGen, the AutoGen tutorial covers the basics. This article goes deeper into group chat specifically.
What AutoGen GroupChat Actually Does
Before writing any code, it helps to understand the mechanics.
In a standard two-agent AutoGen setup, you have a human proxy (the initiating agent) and an assistant. They pass messages back and forth. Simple, linear, easy to debug.
GroupChat changes this: multiple agents share a single conversation thread. A GroupChatManager acts as a traffic controller — it looks at the conversation history, applies your configured selection method, and decides which agent should speak next.
The key objects are:
AssistantAgent— an LLM-backed agent with a role and optional toolsUserProxyAgent— typically the human-in-the-loop (or the code executor)GroupChat— holds the list of agents and configurationGroupChatManager— routes messages; itself an LLM agent
The GroupChatManager is easy to forget about, but it's doing important work. It decides the next speaker. You can give it instructions about how to coordinate the team, which affects how well the conversation flows.
Setup and Dependencies
pip install pyautogen openai python-dotenv
import os
from dotenv import load_dotenv
import autogen
load_dotenv()
# LLM configuration
llm_config = {
"model": "gpt-4o",
"api_key": os.environ["OPENAI_API_KEY"],
"temperature": 0.1,
}
# Slightly higher temperature for creative agents
creative_llm_config = {
"model": "gpt-4o",
"api_key": os.environ["OPENAI_API_KEY"],
"temperature": 0.4,
}
Setting temperature to 0.1 for structured agents (researcher, coder) and slightly higher for the critic gives you more consistent technical outputs with some creative flexibility in feedback.
Defining the Three Agents
The Researcher
researcher = autogen.AssistantAgent(
name="Researcher",
system_message="""You are a technical researcher. Your job:
1. Analyze the user's request to identify what information is needed.
2. Provide accurate, well-sourced information relevant to the task.
3. Structure your findings clearly with headers and bullet points.
4. After providing findings, explicitly state: 'Research complete. Ready for implementation.'
Do NOT write code. That is the Coder's job.
Do NOT critique work. That is the Critic's job.
""",
llm_config=llm_config,
)
The Coder
The Coder needs code execution enabled. AutoGen can run code automatically using a UserProxyAgent acting as executor, or you can give the Coder a code execution environment.
coder = autogen.AssistantAgent(
name="Coder",
system_message="""You are an expert Python developer. Your job:
1. Wait for the Researcher to provide background information.
2. Write clean, working Python code based on the research and requirements.
3. Include docstrings and inline comments.
4. Wrap ALL executable code in ```python code blocks.
5. After writing code, state: 'Code complete. Ready for review.'
Do NOT conduct research. Do NOT provide critique beyond code quality.
""",
llm_config=llm_config,
)
The Critic
critic = autogen.AssistantAgent(
name="Critic",
system_message="""You are a technical reviewer. Your job:
1. Wait for the Coder to produce code.
2. Review the code for: correctness, edge cases, security issues, performance.
3. Provide specific, actionable feedback with line references where possible.
4. If the code is acceptable, state exactly: 'APPROVED. TASK COMPLETE'
5. If revisions are needed, list them clearly and address them to the Coder.
Be direct. Do not pad feedback with compliments.
""",
llm_config=llm_config,
)
The User Proxy (Initiator + Code Executor)
user_proxy = autogen.UserProxyAgent(
name="UserProxy",
human_input_mode="NEVER", # Fully automated
max_consecutive_auto_reply=1,
is_termination_msg=lambda msg: "TASK COMPLETE" in msg.get("content", ""),
code_execution_config={
"work_dir": "agent_workspace",
"use_docker": False, # Set True in production for safety
},
system_message="You initiate tasks and execute code when required.",
)
Two important things here:
is_termination_msgchecks for our termination phrase "TASK COMPLETE" — this is what stops the chatcode_execution_configgives the UserProxy a sandboxed directory to run code in
Setting Up the GroupChat
# Create the group chat with all agents
groupchat = autogen.GroupChat(
agents=[user_proxy, researcher, coder, critic],
messages=[],
max_round=20, # Absolute maximum — prevents infinite loops
speaker_selection_method="auto", # Let the manager decide
allow_repeat_speaker=False, # Prevents agents from talking twice in a row
)
# Create the manager that coordinates the conversation
manager = autogen.GroupChatManager(
groupchat=groupchat,
llm_config=llm_config,
system_message="""You coordinate a team of three specialists: Researcher, Coder, and Critic.
Typical workflow:
1. Researcher gathers information first.
2. Coder implements based on research.
3. Critic reviews the code.
4. If revisions needed, Coder revises, then Critic re-reviews.
Enforce this order. If an agent speaks out of turn, redirect appropriately.
Stop the conversation when Critic says 'APPROVED. TASK COMPLETE'.
""",
)
The manager's system message matters a lot. I've found that giving the manager an explicit workflow description significantly improves conversation coherence — without it, the selection algorithm sometimes lets the Critic jump in before the Researcher has even started.
Running the System
# Start the group chat with a task
result = user_proxy.initiate_chat(
manager,
message="""Build a Python function that:
1. Takes a list of URLs as input
2. Fetches the content of each URL concurrently using asyncio
3. Returns a dictionary mapping each URL to its response status code and content length
4. Handles errors gracefully (timeouts, connection failures)
Include proper type hints and a usage example.
""",
)
What the Conversation Looks Like
Here's a condensed transcript of what you'd see:
UserProxy → Manager:
"Build a Python function that fetches URLs concurrently..."
Manager → Researcher:
"Researcher, please analyze the requirements and provide relevant background."
Researcher → All:
"For concurrent URL fetching in Python, the best approach is asyncio + aiohttp...
[detailed research on async patterns, error handling, type hints]
Research complete. Ready for implementation."
Manager → Coder:
"Coder, please implement based on the Researcher's findings."
Coder → All:
"```python
import asyncio
import aiohttp
from typing import Dict, Tuple
async def fetch_url(session: aiohttp.ClientSession, url: str) -> Tuple[int, int]:
...
Code complete. Ready for review."
Manager → Critic: "Critic, please review the Coder's implementation."
Critic → All: "Review findings:
- Line 15: Missing timeout configuration — could hang indefinitely
- Line 28: Exception catch is too broad, should catch specific exceptions Revisions needed. Coder please address timeout and exception handling."
Manager → Coder: "Coder, please address the Critic's feedback."
Coder → All: "```python
Revised with timeout and specific exception handling
...
Code complete. Ready for review."
Critic → All:
"Revisions look good. Timeout set correctly, exceptions properly scoped.
APPROVED. TASK COMPLETE"
[Chat terminates]
Customizing Speaker Selection
The speaker_selection_method parameter accepts three values:
"auto" — The GroupChatManager's LLM decides who speaks next. Most flexible, uses tokens for each routing decision.
"round_robin" — Agents speak in turn: UserProxy → Researcher → Coder → Critic → UserProxy... Predictable but rigid.
"random" — Random selection. Mostly useful for testing.
For production systems, I prefer "auto" with a well-crafted manager system message, or a custom function:
def custom_speaker_selection(last_speaker, groupchat):
"""Custom speaker selection based on workflow state."""
messages = groupchat.messages
if len(messages) == 0:
return researcher # Always start with researcher
last_msg = messages[-1]["content"].lower()
last_name = last_speaker.name
# After research is done, go to coder
if last_name == "Researcher" and "research complete" in last_msg:
return coder
# After code is done, go to critic
if last_name == "Coder" and "code complete" in last_msg:
return critic
# After critique, go back to coder if revisions needed
if last_name == "Critic" and "revisions needed" in last_msg:
return coder
# Default: let auto decide
return "auto"
groupchat = autogen.GroupChat(
agents=[user_proxy, researcher, coder, critic],
messages=[],
max_round=20,
speaker_selection_method=custom_speaker_selection,
)
This custom function enforces the workflow explicitly. It's more predictable and cheaper (fewer routing LLM calls) but requires you to anticipate all state transitions.
Termination Conditions — Getting This Right
Termination is where most tutorials skip important details. Here's what you need:
1. Phrase-based termination — The is_termination_msg function on UserProxy:
is_termination_msg=lambda msg: "TASK COMPLETE" in msg.get("content", "")
2. Max round limit — The max_round parameter on GroupChat. Set this to 2-3x your expected conversation length. It's a safety net, not a target.
3. Consecutive reply limit — max_consecutive_auto_reply on UserProxyAgent stops the proxy from speaking more than N times in a row without a human stepping in.
For the termination phrase to work, your agents need to be trained to use it consistently. Put the exact phrase in the Critic's system prompt: "If the code is acceptable, state exactly: 'APPROVED. TASK COMPLETE'". The word "exactly" matters — agents are less likely to paraphrase if you emphasize precision.
Handling Errors and Retries
Production systems need error handling:
import logging
logging.basicConfig(level=logging.INFO)
try:
result = user_proxy.initiate_chat(
manager,
message="Your task here",
max_turns=20,
)
# Check if we hit max_round without terminating
if len(groupchat.messages) >= groupchat.max_round:
logging.warning("GroupChat hit max_round limit without clean termination")
# Extract the last Coder output as fallback
for msg in reversed(groupchat.messages):
if msg["name"] == "Coder" and "```python" in msg["content"]:
print("Fallback: using last Coder output")
print(msg["content"])
break
except Exception as e:
logging.error(f"GroupChat failed: {e}")
raise
Cost Awareness
This is something the docs don't emphasize enough. Group chats can get expensive fast because:
- Every message appends to the shared context
- The GroupChatManager's routing calls also consume tokens
- If max_round is too high, a looping conversation costs a lot before it terminates
For a 20-round group chat with GPT-4o, budget roughly $0.10–0.40 per run depending on task complexity. Use GPT-4o-mini for the manager and less critical agents if cost is a concern.
For more on agent memory systems that can help manage context costs, see AI agent memory and planning. For a comparison of how different frameworks handle this, multi-agent frameworks comparison 2026 has cost benchmarks.
Complete Working Example
Putting it all together in a single file:
import os
import autogen
from dotenv import load_dotenv
load_dotenv()
llm_config = {"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]}
researcher = autogen.AssistantAgent(
name="Researcher",
system_message="Analyze requirements and provide technical background. End with 'Research complete. Ready for implementation.'",
llm_config=llm_config,
)
coder = autogen.AssistantAgent(
name="Coder",
system_message="Implement Python code based on research. End with 'Code complete. Ready for review.'",
llm_config=llm_config,
)
critic = autogen.AssistantAgent(
name="Critic",
system_message="Review code for correctness and quality. If acceptable, say exactly: 'APPROVED. TASK COMPLETE'",
llm_config=llm_config,
)
user_proxy = autogen.UserProxyAgent(
name="UserProxy",
human_input_mode="NEVER",
is_termination_msg=lambda msg: "TASK COMPLETE" in msg.get("content", ""),
code_execution_config={"work_dir": "workspace", "use_docker": False},
)
groupchat = autogen.GroupChat(
agents=[user_proxy, researcher, coder, critic],
messages=[],
max_round=15,
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
user_proxy.initiate_chat(
manager,
message="Write a Python class for a rate-limited HTTP client with exponential backoff."
)
The AutoGen group chat patterns article goes deeper into advanced patterns like nested chats and dynamic agent creation if you want to keep going.
Conclusion
AutoGen GroupChat is one of the more capable multi-agent frameworks available right now. The core workflow — define agents with clear roles, configure a manager, set termination conditions — is straightforward once you see it in action. The tricky parts are termination logic and speaker selection, both of which benefit from explicit configuration rather than relying on defaults.
Build the three-agent system from this tutorial first. Run it on a few tasks. Then experiment with adding a fourth agent (like a Tester that runs the code and reports results) or implementing custom speaker selection logic.
The build AI agent with LangChain guide covers an alternative approach using LangChain's agent primitives if you want to compare implementations.
Frequently Asked Questions
What is AutoGen GroupChat? AutoGen GroupChat is a feature that allows multiple agents to participate in a shared conversation thread. A GroupChatManager routes messages between agents, and you can configure selection methods to control who speaks next.
How do I stop an AutoGen group chat from running forever? Set max_round on the GroupChat object, and configure is_termination_msg on agent objects to detect a termination phrase like 'TASK COMPLETE'. Both conditions will stop the conversation.
Can AutoGen agents use tools like web search or code execution? Yes. AssistantAgents can be given tool functions via the tools parameter, and UserProxyAgent can execute code automatically with code_execution_config. You can also use a custom executor with the ConversableAgent class.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)
Understand the 5 core AutoGen agent types — AssistantAgent, UserProxyAgent, CodeExecutorAgent, and more — with code examples and a comparison table for each role.
How to Deploy AutoGen Agents as APIs with FastAPI (2026)
Learn to serve AutoGen multi-agent systems as production REST APIs using FastAPI with async endpoints and real-time streaming responses.
How to Use AutoGen with Azure OpenAI (Enterprise Security)
Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.
Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)
Build an AutoGen agent that reviews code, analyzes PR diffs, suggests fixes, and automates code quality improvements with a full working implementation.