Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)
Build an AutoGen agent that reviews code, analyzes PR diffs, suggests fixes, and automates code quality improvements with a full working implementation.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Code review is one of the most time-consuming parts of software development. The mechanical parts — catching null pointer exceptions, identifying duplicate code, flagging incorrect error handling — can be automated. The judgment calls — architecture decisions, API design, business logic correctness — still need humans.
An AutoGen-powered debugging agent handles the mechanical parts extremely well. It reads a PR diff, identifies likely bugs, generates fixes, and can even apply those fixes to the codebase directly. This frees human reviewers to focus on the decisions that actually require context and experience.
This guide builds a complete code debugging agent with PR diff analysis, fix suggestion, and optional auto-apply capabilities.
What the Agent Does
The final system:
- Accepts a PR diff or code file as input
- Analyzes the code for bugs, security issues, and code quality problems
- Generates specific fix suggestions with code
- Optionally executes test code to verify fixes
- Outputs a structured review report
For context on how autonomous code agents fit into broader AI agent patterns, see AI agents explained and AI research agent build.
Setup
pip install pyautogen openai pygithub python-dotenv
export OPENAI_API_KEY=sk-proj-your-key
export GITHUB_TOKEN=ghp_your_github_token # Only needed for GitHub integration
The Code Analyzer Tool
First, let us build the analysis engine that the agent will use as a tool:
# code_analyzer.py
import ast
import re
from dataclasses import dataclass
from typing import Optional
@dataclass
class CodeIssue:
severity: str # "error", "warning", "info"
category: str # "bug", "security", "performance", "style"
line_number: Optional[int]
description: str
suggested_fix: Optional[str]
confidence: float # 0.0 to 1.0
def analyze_python_syntax(code: str) -> list[CodeIssue]:
"""Check for Python syntax errors."""
issues = []
try:
ast.parse(code)
except SyntaxError as e:
issues.append(CodeIssue(
severity="error",
category="bug",
line_number=e.lineno,
description=f"Syntax error: {e.msg}",
suggested_fix=None,
confidence=1.0
))
return issues
def detect_common_python_bugs(code: str) -> list[CodeIssue]:
"""Detect common Python bug patterns using regex heuristics."""
issues = []
lines = code.split("\n")
for i, line in enumerate(lines, 1):
# Check for bare except
if re.match(r'\s*except\s*:', line):
issues.append(CodeIssue(
severity="warning",
category="bug",
line_number=i,
description="Bare 'except:' catches all exceptions including KeyboardInterrupt and SystemExit",
suggested_fix="Use 'except Exception:' or a specific exception type",
confidence=0.95
))
# Check for mutable default arguments
if re.search(r'def\s+\w+\s*\([^)]*=\s*[\[\{]', line):
issues.append(CodeIssue(
severity="warning",
category="bug",
line_number=i,
description="Mutable default argument (list or dict) — shared across all function calls",
suggested_fix="Use None as default and initialize inside the function",
confidence=0.90
))
# Check for == None instead of is None
if re.search(r'==\s*None', line) or re.search(r'!=\s*None', line):
issues.append(CodeIssue(
severity="info",
category="style",
line_number=i,
description="Use 'is None' or 'is not None' instead of '== None' or '!= None'",
suggested_fix="Replace '== None' with 'is None'",
confidence=0.99
))
# Check for potential SQL injection
if re.search(r'execute\s*\(\s*["\'].*%s.*["\'].*%', line):
issues.append(CodeIssue(
severity="error",
category="security",
line_number=i,
description="Potential SQL injection: string formatting in SQL query",
suggested_fix="Use parameterized queries: cursor.execute(query, (param,))",
confidence=0.85
))
# Check for hardcoded credentials
if re.search(r'(password|secret|api_key|token)\s*=\s*["\'][^"\']{6,}["\']', line, re.IGNORECASE):
issues.append(CodeIssue(
severity="error",
category="security",
line_number=i,
description="Potential hardcoded credential detected",
suggested_fix="Use environment variables: os.environ.get('SECRET_KEY')",
confidence=0.80
))
return issues
def parse_diff(diff_text: str) -> dict[str, str]:
"""Parse a unified diff into {filename: added_code} pairs."""
files = {}
current_file = None
current_code_lines = []
for line in diff_text.split("\n"):
if line.startswith("+++ b/"):
if current_file and current_code_lines:
files[current_file] = "\n".join(current_code_lines)
current_file = line[6:]
current_code_lines = []
elif line.startswith("+") and not line.startswith("+++"):
current_code_lines.append(line[1:])
if current_file and current_code_lines:
files[current_file] = "\n".join(current_code_lines)
return files
def analyze_diff(diff_text: str) -> dict:
"""Analyze a PR diff for issues."""
results = {}
changed_files = parse_diff(diff_text)
for filename, code in changed_files.items():
file_issues = []
if filename.endswith(".py"):
file_issues.extend(analyze_python_syntax(code))
file_issues.extend(detect_common_python_bugs(code))
results[filename] = {
"issues": [
{
"severity": issue.severity,
"category": issue.category,
"line": issue.line_number,
"description": issue.description,
"fix": issue.suggested_fix,
"confidence": issue.confidence
}
for issue in file_issues
],
"issue_count": len(file_issues),
"error_count": sum(1 for i in file_issues if i.severity == "error"),
"warning_count": sum(1 for i in file_issues if i.severity == "warning")
}
return results
The AutoGen Debugging Agent
Now the main agent setup. This uses a two-agent pattern: a code reviewer agent that analyzes issues, and an executor that runs test code.
# debugging_agent.py
import os
import json
from autogen import AssistantAgent, UserProxyAgent
from code_analyzer import analyze_diff, analyze_python_syntax, detect_common_python_bugs
llm_config = {
"config_list": [
{"model": "gpt-4-turbo", "api_key": os.environ.get("OPENAI_API_KEY")}
],
"temperature": 0.1, # Low temperature for deterministic code analysis
}
CODE_REVIEWER_PROMPT = """You are an expert code reviewer and debugging agent.
Your workflow for analyzing code or PRs:
1. Call analyze_code_diff or analyze_code_snippet first to get automated analysis
2. Review the automated findings and the full code context
3. Add any additional issues the automated tools missed
4. Generate specific, working fix suggestions with code
5. Prioritize issues: errors > security > warnings > style
6. Provide a final verdict: APPROVE / REQUEST_CHANGES / NEEDS_DISCUSSION
For each issue you identify, provide:
- Severity (error/warning/info)
- Category (bug/security/performance/style)
- Line number if applicable
- Clear description of the problem
- Working code fix
Format your final report as structured markdown with clear sections.
Be direct about problems — don't soften critical security or correctness issues.
"""
# Create the reviewer agent
reviewer_agent = AssistantAgent(
name="CodeReviewer",
system_message=CODE_REVIEWER_PROMPT,
llm_config=llm_config,
human_input_mode="NEVER",
)
# Create the executor (runs test code to verify fixes)
executor = UserProxyAgent(
name="TestExecutor",
human_input_mode="NEVER",
code_execution_config={
"work_dir": "review_workspace",
"use_docker": True, # Use Docker for safe execution
"timeout": 30,
},
max_consecutive_auto_reply=5,
)
# Register the analysis tools
@reviewer_agent.register_for_llm(description="Analyze a PR diff string for bugs, security issues, and code quality problems.")
def analyze_code_diff(diff_text: str) -> str:
"""Run automated analysis on a PR diff."""
results = analyze_diff(diff_text)
return json.dumps(results, indent=2)
@reviewer_agent.register_for_llm(description="Analyze a Python code snippet directly for common bugs and issues.")
def analyze_code_snippet(code: str, filename: str = "code.py") -> str:
"""Run automated analysis on a code snippet."""
syntax_issues = analyze_python_syntax(code)
pattern_issues = detect_common_python_bugs(code)
all_issues = syntax_issues + pattern_issues
result = {
"filename": filename,
"issues": [
{
"severity": i.severity,
"category": i.category,
"line": i.line_number,
"description": i.description,
"fix": i.suggested_fix,
"confidence": i.confidence
}
for i in all_issues
],
"total_issues": len(all_issues),
"errors": sum(1 for i in all_issues if i.severity == "error"),
"warnings": sum(1 for i in all_issues if i.severity == "warning")
}
return json.dumps(result, indent=2)
# Register for execution
@executor.register_for_execution()
def analyze_code_diff(diff_text: str) -> str:
results = analyze_diff(diff_text)
return json.dumps(results, indent=2)
@executor.register_for_execution()
def analyze_code_snippet(code: str, filename: str = "code.py") -> str:
syntax_issues = analyze_python_syntax(code)
pattern_issues = detect_common_python_bugs(code)
all_issues = syntax_issues + pattern_issues
result = {
"filename": filename,
"issues": [
{"severity": i.severity, "category": i.category, "line": i.line_number,
"description": i.description, "fix": i.suggested_fix, "confidence": i.confidence}
for i in all_issues
],
"total_issues": len(all_issues)
}
return json.dumps(result, indent=2)
PR Diff Analysis Workflow
Here is the complete workflow for analyzing a real GitHub PR:
# pr_reviewer.py
import os
from github import Github
from debugging_agent import reviewer_agent, executor
def get_pr_diff(repo_name: str, pr_number: int) -> str:
"""Fetch a PR diff from GitHub."""
g = Github(os.environ.get("GITHUB_TOKEN"))
repo = g.get_repo(repo_name)
pr = repo.get_pull(pr_number)
# Get the raw diff
diff_content = pr.get_files()
diff_parts = []
for file in diff_content:
diff_parts.append(f"+++ b/{file.filename}")
if file.patch:
diff_parts.append(file.patch)
return "\n".join(diff_parts)
def post_review_comment(repo_name: str, pr_number: int, review_text: str, verdict: str):
"""Post the review back to GitHub."""
g = Github(os.environ.get("GITHUB_TOKEN"))
repo = g.get_repo(repo_name)
pr = repo.get_pull(pr_number)
# Map verdict to GitHub review event
event_map = {
"APPROVE": "APPROVE",
"REQUEST_CHANGES": "REQUEST_CHANGES",
"NEEDS_DISCUSSION": "COMMENT"
}
pr.create_review(
body=f"## AI Code Review\n\n{review_text}\n\n*Reviewed by AutoGen Code Debugging Agent*",
event=event_map.get(verdict, "COMMENT")
)
def review_pull_request(repo_name: str, pr_number: int, post_to_github: bool = False):
"""Full PR review workflow."""
print(f"Fetching diff for PR #{pr_number} in {repo_name}...")
try:
diff = get_pr_diff(repo_name, pr_number)
print(f"Diff fetched: {len(diff)} characters")
except Exception as e:
print(f"Error fetching PR: {e}")
return None
# Start the review conversation
result = executor.initiate_chat(
reviewer_agent,
message=f"""Please review this pull request diff:
# start: python code
{diff}
# end code block
1. Call analyze_code_diff with the diff above
2. Review all findings thoroughly
3. Add any issues the automated analysis missed
4. Generate specific fixes for all errors and high-confidence warnings
5. Write a complete markdown review report
6. End with your verdict: APPROVE / REQUEST_CHANGES / NEEDS_DISCUSSION""",
max_turns=8,
)
# Extract the review from the last meaningful message
review_messages = [
msg["content"] for msg in result.chat_history
if msg.get("name") == "CodeReviewer" and len(msg.get("content", "")) > 200
]
if review_messages:
final_review = review_messages[-1]
# Determine verdict from review content
verdict = "NEEDS_DISCUSSION"
if "APPROVE" in final_review.upper() and "REQUEST_CHANGES" not in final_review.upper():
verdict = "APPROVE"
elif "REQUEST_CHANGES" in final_review.upper():
verdict = "REQUEST_CHANGES"
if post_to_github:
post_review_comment(repo_name, pr_number, final_review, verdict)
print(f"Review posted to GitHub with verdict: {verdict}")
return final_review
return "Review could not be generated"
# Review a specific file without a PR
def review_code_file(file_path: str):
"""Review a single code file."""
with open(file_path, "r") as f:
code = f.read()
result = executor.initiate_chat(
reviewer_agent,
message=f"""Review this code file ({file_path}):
# start: python code
{code}
# end code block
Call analyze_code_snippet, then provide a comprehensive review with fixes.""",
max_turns=6,
)
return result
Fix Suggestion Pattern
The fix suggestion workflow generates working replacement code for identified issues:
# fix_generator.py
import os
from autogen import AssistantAgent, UserProxyAgent
FIX_GENERATOR_PROMPT = """You are a code fix generator. Given a code snippet and a list of issues:
1. Generate a corrected version of the code that fixes all reported issues
2. Add inline comments explaining what was changed and why
3. Preserve all original functionality — only fix the identified issues
4. If a fix requires understanding context not present in the snippet, note the assumption you made
5. Output ONLY the corrected code block followed by a brief change summary
Do not change variable names, function signatures, or logic unless fixing a reported bug.
"""
fix_agent = AssistantAgent(
name="FixGenerator",
system_message=FIX_GENERATOR_PROMPT,
llm_config={
"config_list": [{"model": "gpt-4-turbo", "api_key": os.environ.get("OPENAI_API_KEY")}],
"temperature": 0.05,
},
)
fix_executor = UserProxyAgent(
name="FixVerifier",
human_input_mode="NEVER",
code_execution_config={"work_dir": "fix_workspace", "use_docker": True, "timeout": 20},
max_consecutive_auto_reply=3,
)
def generate_and_verify_fix(original_code: str, issues: list[dict]) -> dict:
"""Generate a fix and verify it runs without errors."""
issues_summary = "\n".join([
f"- Line {i.get('line', '?')}: [{i['severity']}] {i['description']} → {i.get('fix', 'See agent suggestion')}"
for i in issues
])
result = fix_executor.initiate_chat(
fix_agent,
message=f"""Fix the following issues in this code:
Issues:
{issues_summary}
Original code:
# start: python code
{original_code}
# end code block
Generate the fixed version, then write a test that validates the fix runs without errors.""",
max_turns=5,
)
# Extract fixed code from conversation
fixed_code = None
for msg in result.chat_history:
if msg.get("name") == "FixGenerator":
content = msg.get("content", "")
# Extract code block
import re
code_match = re.search(r'```python\n(.*?)\n```', content, re.DOTALL)
if code_match:
fixed_code = code_match.group(1)
break
return {
"original_code": original_code,
"fixed_code": fixed_code,
"issues_addressed": len(issues),
"conversation": result.chat_history
}
Example: Reviewing Buggy Code
Let us test the agent on a code snippet with deliberate issues:
# test_review.py
from debugging_agent import reviewer_agent, executor
BUGGY_CODE = '''
import sqlite3
def get_user(username, password="default_pass"):
conn = sqlite3.connect("users.db")
cursor = conn.cursor()
# Get user from database
query = "SELECT * FROM users WHERE username = '%s' AND password = '%s'" % (username, password)
cursor.execute(query)
user = cursor.fetchone()
if user != None:
return user
else:
return None
def process_users(user_list=[]):
for user in user_list:
try:
result = get_user(user)
print(result)
except:
print("Error")
# Hardcoded admin credentials
ADMIN_PASSWORD = "admin123"
'''
result = executor.initiate_chat(
reviewer_agent,
message=f"""Review this code and fix all issues:
# start: python code
{BUGGY_CODE}
# end code block
Run the analyzer, identify all problems, then provide corrected code.""",
max_turns=6,
)
The agent will identify:
- SQL injection on line 8 (high confidence, error)
- Hardcoded credential on the last line (high confidence, error)
- Mutable default argument in
process_users(high confidence, warning) == Noneinstead ofis None(high confidence, info)- Bare
except:in process_users (high confidence, warning)
Each with a specific code fix.
Integrating with CI/CD
To run this on every PR automatically, add a GitHub Actions workflow:
# .github/workflows/ai-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install pyautogen openai pygithub
- name: Run AI code review
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
python -c "
from pr_reviewer import review_pull_request
review_pull_request(
'${{ github.repository }}',
${{ github.event.pull_request.number }},
post_to_github=True
)
"
This posts the AI review as a PR comment automatically on every new PR or push to an existing PR.
For the underlying LangChain-based code analysis patterns that complement this approach, see Build AI agent with LangChain. For deploying this as a production service, deploy AI model to production covers the infrastructure considerations.
Frequently Asked Questions
Can this agent automatically push fixes back to GitHub? Yes, if you enable the GitHub tool with write permissions (the github.com/settings/tokens with repo scope). The agent can create branches, commit fixes, and open PRs. We recommend keeping auto-push as an optional flag rather than the default to maintain human oversight.
How does the agent handle false positives in code review? The agent includes a confidence score with each suggestion. Low-confidence suggestions are flagged for human review rather than auto-applied. You can tune the threshold via the MIN_CONFIDENCE constant in the agent configuration.
What programming languages does the debugging agent support? The agent uses GPT-4's code understanding capabilities, which covers Python, JavaScript, TypeScript, Java, Go, Rust, C/C++, and most other major languages. The code execution sandbox runs Python by default — for other languages, you would need language-specific execution environments.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)
Understand the 5 core AutoGen agent types — AssistantAgent, UserProxyAgent, CodeExecutorAgent, and more — with code examples and a comparison table for each role.
How to Deploy AutoGen Agents as APIs with FastAPI (2026)
Learn to serve AutoGen multi-agent systems as production REST APIs using FastAPI with async endpoints and real-time streaming responses.
How to Use AutoGen with Azure OpenAI (Enterprise Security)
Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.
How to Use AutoGen with Code Interpreter (Execute Python)
Learn how to set up AutoGen's code interpreter with LocalCommandLineCodeExecutor and DockerCommandLineCodeExecutor to safely execute Python in agent workflows.