How to Use AutoGen with Code Interpreter (Execute Python)
Learn how to set up AutoGen's code interpreter with LocalCommandLineCodeExecutor and DockerCommandLineCodeExecutor to safely execute Python in agent workflows.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
The ability to write and run code in the same conversation is what separates AutoGen from basic chatbot wrappers. When an agent can test its own output — run the script, see the error, fix it, run again — you get a feedback loop that produces working code instead of plausible-looking code that fails on first run.
AutoGen's code execution system gives you two main paths: local execution for development speed, and Docker execution for production safety. Knowing which to use, how to configure each, and how the self-correction loop works will save you hours of debugging.
For foundational context on agent architectures before diving into code execution specifics, AI agents explained is the right starting point.
How AutoGen Code Execution Works
AutoGen's code execution follows a specific flow:
AssistantAgentgenerates a response containing a code block (usually marked with```python)UserProxyAgent(orCodeExecutorAgent) detects the code block- The executor runs the code and captures stdout, stderr, and return code
- The output is sent back to
AssistantAgentas the next message - If the code failed,
AssistantAgentanalyzes the error and generates a fix - The loop repeats until success or
max_consecutive_auto_replyis hit
This automatic error-correction loop is the core of what makes code agents useful.
Setting Up LocalCommandLineCodeExecutor
The local executor runs Python directly on your machine. Fast to set up, ideal for development.
import autogen
from autogen.coding import LocalCommandLineCodeExecutor
from pathlib import Path
# Create a working directory for generated files
work_dir = Path("coding_workspace")
work_dir.mkdir(exist_ok=True)
# Configure the local executor
executor = LocalCommandLineCodeExecutor(
timeout=60, # seconds before killing execution
work_dir=work_dir, # where scripts are saved and run
)
# Create the code executor agent
code_executor_agent = autogen.ConversableAgent(
name="CodeExecutor",
llm_config=False, # no LLM needed for execution
code_execution_config={"executor": executor},
human_input_mode="NEVER",
)
# Create the assistant that writes code
llm_config = {
"config_list": [{"model": "gpt-4o", "api_key": "YOUR_KEY"}],
"cache_seed": 42,
}
code_writer = autogen.ConversableAgent(
name="CodeWriter",
system_message="""You are a Python expert. Write clean, working Python code.
Always include error handling. Return results by printing them.""",
llm_config=llm_config,
code_execution_config=False, # writer doesn't execute
)
# Run the coding conversation
result = code_executor_agent.initiate_chat(
code_writer,
message="Write Python code to fetch the current Bitcoin price from CoinGecko API and print it.",
max_turns=5,
)
The code_executor_agent has llm_config=False because it does not need an LLM — it just runs code. The code_writer has code_execution_config=False because it only generates code, never runs it. Keeping these roles separate is cleaner and cheaper.
Using DockerCommandLineCodeExecutor
For production or any scenario involving untrusted code, Docker isolation is essential.
import autogen
from autogen.coding import DockerCommandLineCodeExecutor
from pathlib import Path
work_dir = Path("docker_workspace")
work_dir.mkdir(exist_ok=True)
# Docker executor configuration
docker_executor = DockerCommandLineCodeExecutor(
image="python:3.11-slim", # base Docker image
timeout=120, # longer timeout for complex tasks
work_dir=work_dir,
bind_dir=work_dir, # mount local dir into container
auto_remove=True, # remove container after execution
stop_container=True, # stop container when done
)
code_executor_agent = autogen.ConversableAgent(
name="SecureCodeExecutor",
llm_config=False,
code_execution_config={"executor": docker_executor},
human_input_mode="NEVER",
)
The Docker executor spins up a fresh container for each code execution, runs the code inside it, captures output, and removes the container. Your host machine is never touched.
Pre-installing packages in the Docker image:
If your workflows regularly use specific packages, build a custom image instead of relying on pip installs during execution:
# Dockerfile
FROM python:3.11-slim
RUN pip install --no-cache-dir \
pandas \
numpy \
requests \
matplotlib \
scikit-learn \
openai
WORKDIR /workspace
docker build -t autogen-executor:latest .
docker_executor = DockerCommandLineCodeExecutor(
image="autogen-executor:latest",
timeout=120,
work_dir=work_dir,
)
Pre-installing packages dramatically speeds up execution and avoids network calls during agent runs.
The Code Execution Config (Legacy API)
AutoGen also supports a simpler code_execution_config dictionary on UserProxyAgent, which many tutorials still use:
user_proxy = autogen.UserProxyAgent(
name="User",
human_input_mode="NEVER",
code_execution_config={
"work_dir": "workspace",
"use_docker": False, # True for Docker isolation
"timeout": 60,
"last_n_messages": 3, # how many messages to look back for code
},
max_consecutive_auto_reply=10,
)
This is the older API. The newer LocalCommandLineCodeExecutor / DockerCommandLineCodeExecutor classes give you more control and are the recommended path for new projects.
Building a Data Analysis Agent
Here is a complete example — an agent that analyzes CSV data using code execution:
import autogen
from autogen.coding import LocalCommandLineCodeExecutor
from pathlib import Path
import pandas as pd
# Create sample data
sample_data = pd.DataFrame({
"month": ["Jan", "Feb", "Mar", "Apr", "May"],
"revenue": [45000, 52000, 48000, 61000, 58000],
"expenses": [32000, 35000, 31000, 38000, 40000],
})
sample_data.to_csv("coding_workspace/sales_data.csv", index=False)
# Setup
work_dir = Path("coding_workspace")
work_dir.mkdir(exist_ok=True)
executor = LocalCommandLineCodeExecutor(
timeout=30,
work_dir=work_dir,
)
llm_config = {
"config_list": [{"model": "gpt-4o", "api_key": "YOUR_KEY"}],
}
code_executor = autogen.ConversableAgent(
name="Executor",
llm_config=False,
code_execution_config={"executor": executor},
human_input_mode="NEVER",
is_termination_msg=lambda msg: "ANALYSIS_COMPLETE" in msg.get("content", ""),
)
data_analyst = autogen.ConversableAgent(
name="DataAnalyst",
system_message="""You are a Python data analyst.
When given data analysis tasks:
1. Write Python code using pandas
2. Always read from 'sales_data.csv' in the current directory
3. Print all results clearly
4. After the analysis is complete and results are printed, say ANALYSIS_COMPLETE
""",
llm_config=llm_config,
code_execution_config=False,
)
result = code_executor.initiate_chat(
data_analyst,
message="""Analyze the sales_data.csv file. Calculate:
1. Total revenue and expenses for all months
2. Average monthly profit (revenue - expenses)
3. The month with highest profit
Print each result on a separate line.""",
)
When you run this, the DataAnalyst will generate code like:
import pandas as pd
df = pd.read_csv("sales_data.csv")
df["profit"] = df["revenue"] - df["expenses"]
print(f"Total Revenue: ${df['revenue'].sum():,}")
print(f"Total Expenses: ${df['expenses'].sum():,}")
print(f"Average Monthly Profit: ${df['profit'].mean():,.0f}")
print(f"Highest Profit Month: {df.loc[df['profit'].idxmax(), 'month']}")
The executor runs it, captures the output, feeds it back, and the analyst confirms completion.
Handling Multi-File Code Projects
AutoGen can manage multi-file projects within the work directory:
data_analyst = autogen.ConversableAgent(
name="DataAnalyst",
system_message="""You are a Python developer.
For complex projects, you can create multiple files.
Always specify filenames explicitly in your code blocks.
Use: # filename: script.py as the first comment in each code block.
""",
llm_config=llm_config,
code_execution_config=False,
)
With the # filename: comment convention, AutoGen saves each code block as a separate file in the work directory:
```python
# filename: data_loader.py
import pandas as pd
def load_data(filepath):
return pd.read_csv(filepath)
```
```python
# filename: analysis.py
from data_loader import load_data
df = load_data("sales_data.csv")
print(df.describe())
```
Comparison: Local vs Docker Execution
| Feature | LocalCommandLineCodeExecutor | DockerCommandLineCodeExecutor |
|---|---|---|
| Setup complexity | Minimal | Requires Docker installed |
| Execution speed | Fast | Slower (container startup) |
| Isolation | None | Full container isolation |
| Network access | Full host access | Configurable |
| File system access | Work dir only (if restricted) | Container only |
| Best for | Development, trusted code | Production, untrusted code |
| Package management | Host Python packages | Container packages |
| Parallel execution | Limited | Multiple containers |
A benchmark using 100 code tasks found that Docker execution adds roughly 3–8 seconds per task for container startup. For quick scripts, that overhead matters. For long-running data processing tasks, it is negligible.
Configuring Timeouts and Error Handling
Code execution can hang. Always configure timeouts:
executor = LocalCommandLineCodeExecutor(
timeout=30, # kill after 30 seconds
work_dir=work_dir,
)
# For the agent, limit how many auto-replies happen
code_executor = autogen.ConversableAgent(
name="Executor",
llm_config=False,
code_execution_config={"executor": executor},
human_input_mode="NEVER",
max_consecutive_auto_reply=8, # max 8 correction attempts
)
When execution times out, the executor returns a timeout error message. The AssistantAgent receives this and typically tries to generate a more efficient version of the code.
Using CodeExecutorAgent (Newer API)
AutoGen 0.4+ introduces a cleaner CodeExecutorAgent class:
from autogen import CodeExecutorAgent, AssistantAgent
from autogen.coding import LocalCommandLineCodeExecutor
executor = LocalCommandLineCodeExecutor(work_dir="workspace", timeout=60)
code_executor = CodeExecutorAgent(
name="CodeExecutor",
code_executor=executor,
)
assistant = AssistantAgent(
name="Assistant",
llm_config=llm_config,
system_message="Write and debug Python code. Always test your solutions.",
)
# Two-agent coding workflow
result = assistant.initiate_chat(
code_executor,
message="Write a function to find all prime numbers up to 1000 using the Sieve of Eratosthenes. Print the count and the last 10 primes.",
max_turns=6,
)
CodeExecutorAgent is purpose-built for execution tasks. It handles code block detection, execution, and output formatting automatically.
Streaming Code Output
For longer-running scripts, you may want to stream output rather than wait for completion:
import subprocess
from pathlib import Path
class StreamingLocalExecutor(LocalCommandLineCodeExecutor):
def execute_code_blocks(self, code_blocks):
results = []
for lang, code in code_blocks:
if lang == "python":
script_path = self.work_dir / "temp_script.py"
script_path.write_text(code)
process = subprocess.Popen(
["python", str(script_path)],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
cwd=self.work_dir,
)
output_lines = []
for line in process.stdout:
print(f"[Live] {line}", end="") # stream to terminal
output_lines.append(line)
process.wait()
results.append("\n".join(output_lines))
return results
This modification streams each line of output to your terminal as the code runs, useful for long data processing tasks where you want progress visibility.
Security Considerations for Production
When running code execution in production environments, these practices matter:
# Restrict network access in Docker
docker_executor = DockerCommandLineCodeExecutor(
image="python:3.11-slim",
timeout=60,
work_dir=work_dir,
# Pass extra Docker run args via environment
extra_volumes={}, # limit volume mounts
)
# Validate code before execution (basic check)
def safe_code_filter(code_block):
dangerous_patterns = [
"import os", "subprocess", "shutil.rmtree",
"open('/etc", "__import__", "exec(", "eval(",
]
for pattern in dangerous_patterns:
if pattern in code_block:
return False, f"Blocked: contains '{pattern}'"
return True, "OK"
For a deeper look at deploying agents securely, Deploy AI model to production covers the infrastructure side of things.
Combining Code Execution with RAG
The real power emerges when you combine code execution with retrieval. An agent that can search documentation, write code based on what it finds, and verify that code works:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
# This pattern connects to LangChain-based RAG
# For full RAG pipeline details, see the LangChain guide
# Build AI agent with LangChain: /post/build-ai-agent-langchain
rag_code_writer = autogen.ConversableAgent(
name="RAGCodeWriter",
system_message="""You are a Python developer with access to documentation.
Use the provided documentation context to write accurate code.
Always test your code and fix any errors.""",
llm_config=llm_config,
code_execution_config=False,
)
For the full integration pattern with vector stores, Vector database guide and Build AI agent with LangChain cover the retrieval side.
What Makes Code Execution Agents Actually Useful
The self-correction loop is the key insight. Without it, you get code that looks right but fails on edge cases. With it, you get code that was actually run, errors that were actually fixed, and output that was actually validated — all without human intervention.
The pattern scales from simple scripts to multi-file projects. Start with local execution during development, switch to Docker when you are ready to handle untrusted inputs or run in production. The agent architecture stays the same; only the executor changes.
For broader multi-agent patterns that build on code execution, see AutoGen group chat patterns and Build AI agent with LangChain.
Frequently Asked Questions
Is it safe to run AutoGen code execution without Docker? Running without Docker uses LocalCommandLineCodeExecutor, which executes code directly on your machine. This is fine for development and trusted workloads, but for untrusted or user-supplied code, always use DockerCommandLineCodeExecutor. Docker sandboxes execution so malicious code cannot affect your host system.
What happens when AutoGen-generated code has a bug? AutoGen's code execution loop automatically feeds the error output back to the AssistantAgent, which then generates a corrected version of the code. This self-correction loop continues until the code runs successfully or max_consecutive_auto_reply is reached.
Can AutoGen install Python packages during code execution? Yes. If the agent generates code that imports a missing package, AutoGen can detect the ImportError and generate a pip install command. You can also pre-configure allowed packages or use a virtual environment to control what gets installed.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)
Understand the 5 core AutoGen agent types — AssistantAgent, UserProxyAgent, CodeExecutorAgent, and more — with code examples and a comparison table for each role.
How to Deploy AutoGen Agents as APIs with FastAPI (2026)
Learn to serve AutoGen multi-agent systems as production REST APIs using FastAPI with async endpoints and real-time streaming responses.
How to Use AutoGen with Azure OpenAI (Enterprise Security)
Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.
Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)
Build an AutoGen agent that reviews code, analyzes PR diffs, suggests fixes, and automates code quality improvements with a full working implementation.