Build a Customer Service Agent with AutoGen (Multi-Turn)
Build a production-ready customer service agent with AutoGen featuring multi-turn conversations, escalation logic, FAQ tools, and handoff patterns.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Support teams at scale face a simple math problem: the volume of incoming tickets grows faster than you can hire agents. AI-powered customer service is not about replacing human agents — it is about making sure humans only handle the conversations that actually need them.
AutoGen is particularly well-suited for this problem because of how it handles multi-agent conversations. You can build a layered system where a first-line AI agent handles common questions, a specialist agent handles product-specific queries, and a human handoff mechanism activates when the AI reaches its limits.
This guide builds that entire system from scratch.
What You Will Build
By the end of this guide, you will have a working customer service agent that:
- Answers FAQs using a structured knowledge tool
- Maintains context across a multi-turn conversation
- Detects when to escalate to a human
- Hands off gracefully with a conversation summary
The final code is production-structured, meaning you can extend it directly rather than treating it as a toy example.
Prerequisites
pip install pyautogen openai python-dotenv
Set your API key:
export OPENAI_API_KEY=sk-proj-your-key-here
For the full context on what AutoGen is and how its agent model works, start with AI agents explained.
Setting Up the Knowledge Base
Real customer service agents need access to actual product information. We will represent this as a structured FAQ tool — in production, you would replace this with a vector search over your documentation.
# knowledge_base.py
FAQ_DATA = {
"refund": {
"policy": "Refunds are accepted within 30 days of purchase for unused items.",
"process": "Submit a refund request through your account portal under Orders > Request Refund.",
"timeline": "Refunds are processed within 5-7 business days."
},
"shipping": {
"standard": "Standard shipping takes 5-7 business days.",
"express": "Express shipping takes 1-2 business days for an additional $12.99.",
"international": "International shipping takes 10-14 business days. Customs duties are the customer's responsibility."
},
"account": {
"password_reset": "Go to login page and click 'Forgot Password'. A reset link will be emailed within 2 minutes.",
"email_change": "Email changes require verification from both the old and new email address. Allow 24 hours.",
"deletion": "Account deletion requests are processed within 30 days. Contact support@example.com."
},
"billing": {
"payment_methods": "We accept Visa, Mastercard, Amex, PayPal, and Apple Pay.",
"invoice": "Invoices are emailed automatically after each purchase. You can also download them from Account > Billing.",
"subscription": "Subscriptions can be cancelled at any time. Access continues until the end of the billing period."
}
}
def search_faq(topic: str, subtopic: str = None) -> str:
"""Search the FAQ knowledge base."""
topic = topic.lower()
if topic not in FAQ_DATA:
available = ", ".join(FAQ_DATA.keys())
return f"Topic '{topic}' not found. Available topics: {available}"
if subtopic:
subtopic = subtopic.lower()
if subtopic in FAQ_DATA[topic]:
return FAQ_DATA[topic][subtopic]
else:
# Return all entries for this topic
entries = FAQ_DATA[topic]
return "\n".join([f"{k}: {v}" for k, v in entries.items()])
# Return all entries for the topic
entries = FAQ_DATA[topic]
return "\n".join([f"{k}: {v}" for k, v in entries.items()])
def detect_escalation_needed(message: str, turn_count: int) -> dict:
"""Determine if conversation should be escalated to a human."""
escalation_triggers = [
"speak to human",
"talk to agent",
"real person",
"supervisor",
"manager",
"this is unacceptable",
"legal action",
"complaint",
"frustrated",
"angry",
"lawsuit"
]
message_lower = message.lower()
triggered_keywords = [t for t in escalation_triggers if t in message_lower]
# Also escalate after too many turns without resolution
too_many_turns = turn_count > 8
return {
"should_escalate": len(triggered_keywords) > 0 or too_many_turns,
"reason": triggered_keywords[0] if triggered_keywords else ("unresolved after multiple turns" if too_many_turns else None),
"priority": "high" if triggered_keywords else "normal"
}
Building the Core Agent
Now let us build the AutoGen agent setup. The key insight here is that AutoGen uses a ConversableAgent with registered tools — the agent decides when to call a tool based on the conversation context.
# customer_service_agent.py
import os
import json
from autogen import ConversableAgent, UserProxyAgent
from knowledge_base import search_faq, detect_escalation_needed
# Configuration
llm_config = {
"config_list": [
{
"model": "gpt-4-turbo",
"api_key": os.environ.get("OPENAI_API_KEY"),
}
],
"temperature": 0.3,
"timeout": 30,
}
SYSTEM_MESSAGE = """You are a helpful customer service agent for Acme Store.
Your responsibilities:
- Answer customer questions about refunds, shipping, accounts, and billing
- Use the search_faq tool to look up accurate information before answering
- Be friendly, concise, and helpful
- If you cannot find relevant information, be honest about it
- If the customer is frustrated, upset, or explicitly requests a human, use check_escalation
Rules:
- Always use search_faq before answering product or policy questions
- Never make up policies or timelines — only use information from the tool
- Keep responses under 150 words unless the customer asks for detail
- Use the customer's name if they provide it
- End each response by asking if the customer has additional questions
You cannot help with: legal disputes, fraud investigations, or custom enterprise pricing.
For those topics, always escalate to a human immediately."""
# Create the customer service agent
service_agent = ConversableAgent(
name="CustomerServiceAgent",
system_message=SYSTEM_MESSAGE,
llm_config=llm_config,
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
)
# Create a user proxy that represents the customer
customer_proxy = UserProxyAgent(
name="Customer",
human_input_mode="ALWAYS",
max_consecutive_auto_reply=0,
code_execution_config=False,
)
# Register tools with the service agent
@service_agent.register_for_llm(description="Search the FAQ knowledge base for customer questions about refunds, shipping, accounts, or billing.")
def faq_search(topic: str, subtopic: str = "") -> str:
"""Search FAQ knowledge base."""
return search_faq(topic, subtopic if subtopic else None)
@service_agent.register_for_llm(description="Check if the current conversation should be escalated to a human agent.")
def check_escalation(customer_message: str, turn_number: int) -> str:
"""Check escalation criteria."""
result = detect_escalation_needed(customer_message, turn_number)
return json.dumps(result)
# Register tools for execution
@customer_proxy.register_for_execution()
def faq_search(topic: str, subtopic: str = "") -> str:
return search_faq(topic, subtopic if subtopic else None)
@customer_proxy.register_for_execution()
def check_escalation(customer_message: str, turn_number: int) -> str:
result = detect_escalation_needed(customer_message, turn_number)
return json.dumps(result)
Adding Escalation and Handoff Logic
The escalation pattern is where most customer service agents fall short. When the AI cannot help, it needs to do three things: recognize the situation, stop trying, and transfer context cleanly.
# escalation_handler.py
from dataclasses import dataclass
from typing import Optional
import datetime
@dataclass
class EscalationTicket:
ticket_id: str
customer_id: Optional[str]
conversation_summary: str
escalation_reason: str
priority: str
timestamp: str
conversation_history: list
def create_escalation_ticket(
conversation_history: list,
reason: str,
priority: str = "normal",
customer_id: str = None
) -> EscalationTicket:
"""Create a structured handoff ticket for human agents."""
# Generate a simple ticket ID
ticket_id = f"ESC-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}"
# Summarize the conversation
summary = summarize_conversation(conversation_history)
return EscalationTicket(
ticket_id=ticket_id,
customer_id=customer_id,
conversation_summary=summary,
escalation_reason=reason,
priority=priority,
timestamp=datetime.datetime.now().isoformat(),
conversation_history=conversation_history
)
def summarize_conversation(history: list) -> str:
"""Create a concise summary for the human agent."""
if not history:
return "No conversation history available."
# Extract key points from the conversation
customer_messages = [
msg["content"] for msg in history
if msg.get("role") == "user" or msg.get("name") == "Customer"
]
if not customer_messages:
return "Customer contacted support."
first_message = customer_messages[0][:200]
turn_count = len(customer_messages)
last_message = customer_messages[-1][:200] if len(customer_messages) > 1 else ""
summary = f"Customer initiated contact about: {first_message}\n"
summary += f"Conversation length: {turn_count} customer messages\n"
if last_message and last_message != first_message:
summary += f"Most recent customer message: {last_message}\n"
return summary
def format_handoff_message(ticket: EscalationTicket) -> str:
"""Format the message shown to the customer during handoff."""
return f"""I understand this situation needs more attention than I can provide.
I've created a priority support ticket (ID: {ticket.ticket_id}) and a human agent will follow up shortly.
Here is what I've shared with the team:
- Summary: {ticket.conversation_summary[:200]}
- Priority: {ticket.priority.upper()}
A human agent will contact you within:
- High priority: 15 minutes
- Normal priority: 2-4 hours
Is there anything else I can note for the agent before they reach out?"""
The Multi-Turn Conversation Loop
Now let us wire everything together into a conversation manager that handles the full flow:
# conversation_manager.py
import os
from autogen import ConversableAgent, UserProxyAgent, GroupChat, GroupChatManager
from customer_service_agent import service_agent, customer_proxy
from escalation_handler import create_escalation_ticket, format_handoff_message
class CustomerServiceSession:
def __init__(self, customer_id: str = None):
self.customer_id = customer_id
self.turn_count = 0
self.escalated = False
self.conversation_history = []
def run_conversation(self, initial_message: str):
"""Start and manage a customer service conversation."""
print(f"\n{'='*60}")
print("Customer Service Session Started")
print(f"{'='*60}\n")
# Start the conversation
result = customer_proxy.initiate_chat(
service_agent,
message=initial_message,
max_turns=15,
)
# Store history
self.conversation_history = result.chat_history
return result
def check_and_handle_escalation(self, message: str) -> bool:
"""Check if escalation is needed and handle it."""
from knowledge_base import detect_escalation_needed
escalation_check = detect_escalation_needed(message, self.turn_count)
if escalation_check["should_escalate"]:
ticket = create_escalation_ticket(
conversation_history=self.conversation_history,
reason=escalation_check["reason"],
priority=escalation_check["priority"],
customer_id=self.customer_id
)
handoff_message = format_handoff_message(ticket)
print(f"\nAgent: {handoff_message}")
# In production, you would:
# - Create a ticket in your helpdesk (Zendesk, Freshdesk, etc.)
# - Notify the human agent queue
# - Store conversation in your CRM
print(f"\n[SYSTEM: Escalation ticket {ticket.ticket_id} created]")
self.escalated = True
return True
return False
# Specialist agent for billing issues
billing_specialist = ConversableAgent(
name="BillingSpecialist",
system_message="""You are a billing specialist with access to account information.
You handle complex billing disputes, payment failures, and subscription issues.
You have authority to issue credits up to $50 without manager approval.
Always verify the customer's identity before discussing account details.""",
llm_config={
"config_list": [{"model": "gpt-4-turbo", "api_key": os.environ.get("OPENAI_API_KEY")}],
"temperature": 0.2,
},
human_input_mode="NEVER",
)
def run_multi_agent_session(customer_message: str):
"""Run a session with automatic routing between agents."""
# Determine initial routing based on message content
billing_keywords = ["charge", "payment", "invoice", "subscription", "refund denied", "dispute"]
is_billing = any(kw in customer_message.lower() for kw in billing_keywords)
if is_billing:
print("[Routing to billing specialist]")
initial_agent = billing_specialist
else:
print("[Routing to general support]")
initial_agent = service_agent
session = CustomerServiceSession()
result = customer_proxy.initiate_chat(
initial_agent,
message=customer_message,
max_turns=12,
)
return result
if __name__ == "__main__":
print("Customer Service Agent (AutoGen)")
print("Type 'quit' to exit\n")
initial_message = input("How can we help you today? ")
if initial_message.lower() != "quit":
run_multi_agent_session(initial_message)
Testing the Agent
Let us walk through what a real interaction looks like:
# test_agent.py
from conversation_manager import run_multi_agent_session
# Test 1: Simple FAQ question
result = run_multi_agent_session("What is your refund policy?")
# Test 2: Multi-turn with escalation trigger
result = run_multi_agent_session(
"I need to return an item I bought 45 days ago and your system won't let me"
)
# Test 3: Angry customer requiring human
result = run_multi_agent_session(
"This is completely unacceptable. I want to speak to a manager immediately."
)
Expected flow for Test 2:
- Agent calls
search_faq("refund", "policy") - Returns the 30-day policy
- Agent explains the policy limitation
- Customer pushes back
- Agent calls
check_escalation - Escalation ticket created
- Handoff message delivered
Performance and Cost Considerations
Customer service at scale means understanding the economics:
| Configuration | Cost per 10-turn conversation | Quality |
|---|---|---|
| GPT-4 Turbo, 0 tools | ~$0.18 | High |
| GPT-4 Turbo, with tools | ~$0.22 | Higher (accurate) |
| GPT-3.5 Turbo, with tools | ~$0.03 | Good for simple queries |
| GPT-4o mini, with tools | ~$0.01 | Good baseline |
For high-volume deployments, run GPT-4o mini as the first-line agent and only escalate to GPT-4 Turbo when the mini agent flags a complex issue.
For building memory across sessions (recognizing returning customers), see AI agent memory and planning.
What to Build Next
This is a functional foundation. Production extensions worth adding:
- Session persistence: Store conversation history in Redis so customers can resume across page loads
- Sentiment tracking: Log sentiment scores per turn to identify struggling conversations early
- Handoff to live chat: Integrate with Intercom, Zendesk, or Freshdesk via their APIs
- Analytics dashboard: Track resolution rates, escalation reasons, and average turns to resolution
If you want a deeper look at how multi-agent patterns work in a research context, the AI research agent build guide shows similar orchestration patterns applied to knowledge retrieval.
For the CrewAI approach to multi-agent customer service, CrewAI tutorial covers how role-based agents handle specialization differently.
Frequently Asked Questions
How does AutoGen handle conversation state across multiple turns? AutoGen maintains conversation history in each agent's message list. The ConversableAgent class stores all prior messages and includes them in subsequent LLM calls, enabling context-aware multi-turn dialogue without manual state management.
Can I connect an AutoGen customer service agent to a live chat system? Yes. AutoGen agents expose a simple Python interface. You can wrap any AutoGen agent in a FastAPI or WebSocket server and route live chat messages through it. The agent's reply method accepts message strings and returns response strings.
How do I prevent the agent from answering questions outside its scope? Define a strict system message that limits the agent's domain, and implement a topic classifier as a tool. If the classifier returns out-of-scope, the agent can route to escalation rather than attempting to answer.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)
Understand the 5 core AutoGen agent types — AssistantAgent, UserProxyAgent, CodeExecutorAgent, and more — with code examples and a comparison table for each role.
How to Deploy AutoGen Agents as APIs with FastAPI (2026)
Learn to serve AutoGen multi-agent systems as production REST APIs using FastAPI with async endpoints and real-time streaming responses.
How to Use AutoGen with Azure OpenAI (Enterprise Security)
Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.
Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)
Build an AutoGen agent that reviews code, analyzes PR diffs, suggests fixes, and automates code quality improvements with a full working implementation.