7 AutoGPT Safety Features (Banned Commands, Approval, Sandbox)
AutoGPT's 7 core safety features explained: banned commands, human approval gates, sandbox config, and how to prevent runaway autonomous agent behavior.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Autonomous agents are powerful precisely because they make decisions and take actions without waiting for you. That same property makes them genuinely risky when the goal is ambiguous, the configuration is loose, or the agent reaches an unexpected state. AutoGPT has been adding safety features since its early versions, but they are not all enabled by default, and understanding them matters before you point an agent at anything important.
This guide covers all seven core safety mechanisms, how to configure each one, and what can go wrong when you skip them.
Why Agent Safety Is Not Optional
The risks from an autonomous agent are different from the risks from a chatbot. A chatbot gives you bad advice. An autonomous agent acts on bad advice. It can delete files, send emails, make API calls, post to social media, or execute code — all before you realize the plan went sideways.
The AI agents explained overview covers the general architecture, but safety requires understanding the specific control points where human oversight can intercept agent behavior. AutoGPT exposes several of these, and knowing them lets you dial the risk level appropriately for each task.
Safety Feature 1: Human Approval Gates
The most important safety control in AutoGPT is the simplest: requiring human approval before each action.
By default, AutoGPT runs in interactive mode. Before executing each step — whether that is writing a file, running a search, or calling an API — it prints the proposed action and waits for you to type y to proceed or n to skip.
NEXT ACTION: COMMAND = write_file ARGUMENTS = {'filename': 'output.txt', 'text': '...'}
Enter 'y' to authorize command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for the AI:
This is where most disasters are caught. An agent that decides to "clean up old files" by deleting your project directory gets stopped here, before the delete happens.
Continuous mode removes this gate entirely:
# Do NOT use this unless you have tested the workflow thoroughly
python -m autogpt --continuous
Or in .env:
CONTINUOUS_MODE=True
CONTINUOUS_LIMIT=10 # Auto-stop after 10 steps even in continuous mode
The CONTINUOUS_LIMIT setting is critical if you ever use continuous mode. It creates a hard stop after N steps regardless of whether the agent thinks it is done. Set it conservatively — 10 steps catches most runaway behavior, and you can always re-run with a fresh goal from where the agent left off.
Safety Feature 2: The Banned Commands List
AutoGPT ships with a list of commands it will never execute, regardless of goal or context. These are hard-coded prohibitions for actions with high destructive potential.
The default banned commands include:
| Command | Risk Prevented |
|---|---|
execute_shell with rm -rf | Recursive file deletion |
execute_shell with format | Drive formatting (Windows) |
execute_shell with dd | Direct disk writes |
execute_shell with mkfs | Filesystem creation |
execute_python_file with system paths | Executing arbitrary system scripts |
| Direct cron/task scheduler writes | Persistence mechanisms |
| Recursive directory deletion | Mass file removal |
You can extend this list in your configuration:
# In autogpt_config.yaml
command_restrictions:
deny_commands:
- "execute_shell" # Disable ALL shell execution
- "delete_file" # Disable file deletion entirely
- "append_to_file" # Prevent appending to existing files
- "send_tweet" # Disable social media posting
- "send_email" # Disable email sending
Or via environment variable:
# In .env
DENY_COMMANDS=execute_shell,delete_file,send_tweet
For production or shared deployments, disabling execute_shell entirely is the single most impactful security decision you can make. It removes an entire category of risk at the cost of some automation capability.
Safety Feature 3: Workspace Restriction
By default, AutoGPT can write files anywhere the running user has permission. RESTRICT_TO_WORKSPACE=True confines all file operations to a single designated directory.
WORKSPACE_DIRECTORY=/home/user/autogpt-workspace
RESTRICT_TO_WORKSPACE=True
With this enabled, any attempt by the agent to read or write outside /home/user/autogpt-workspace will fail with a permissions error, logged but not acted on. The agent knows the operation failed and must find another way to accomplish its goal — or stop.
This does not protect against network operations, API calls, or anything that does not touch the local filesystem. But it is effective at preventing the most common accidental damage: an agent that navigates up the directory tree and starts modifying files outside its intended scope.
Combining workspace restriction with a dedicated agent user account is the recommended approach:
# Create a dedicated user with minimal permissions
sudo useradd -m -s /bin/bash autogpt-runner
sudo -u autogpt-runner python -m autogpt
The agent process inherits the permissions of autogpt-runner, which has no write access outside its home directory. Even if the workspace restriction is bypassed somehow, the OS-level permissions act as a second barrier.
Safety Feature 4: Docker Sandbox Deployment
The strongest isolation available for AutoGPT is running it inside a Docker container. The container has its own filesystem, network namespace, and process space. The agent cannot access files or processes on the host system.
# Dockerfile for sandboxed AutoGPT
FROM python:3.11-slim
WORKDIR /app
# Install AutoGPT
RUN pip install autogpt-core
# Create a non-root user inside the container
RUN useradd -m -u 1000 agentuser
USER agentuser
# Workspace is mounted from host at runtime
WORKDIR /home/agentuser/workspace
CMD ["python", "-m", "autogpt"]
# Run with strict volume mounting — agent can only access the workspace volume
docker run \
--rm \
--network=host \
-v /path/to/safe/workspace:/home/agentuser/workspace \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e WORKSPACE_DIRECTORY=/home/agentuser/workspace \
-e RESTRICT_TO_WORKSPACE=True \
--memory=2g \
--cpus=2 \
autogpt-sandboxed
The --memory and --cpus flags prevent the container from consuming all host resources if the agent enters a tight loop. The --rm flag ensures the container is cleaned up after each run, preventing state accumulation.
For even stronger isolation, --network=none disables all network access. This makes sense for agents that only process local files and do not need API access. For AutoGPT with OpenAI, you need network access, but you can use --network=host combined with firewall rules to restrict which hosts the agent can reach.
Safety Feature 5: Smart Token and Cost Limits
An agent in a loop can burn your OpenAI API budget in minutes. AutoGPT includes token budgeting controls that stop the agent when it reaches a spending threshold.
# Maximum total token spend per session
SMART_TOKEN_LIMIT=8000 # Tokens for the "smart" (GPT-4) model
FAST_TOKEN_LIMIT=4000 # Tokens for the "fast" (GPT-4o-mini) model
# Warn when approaching limit
TOKEN_WARN_THRESHOLD=0.8 # Warn at 80% of limit
You can also set hard cost limits in USD:
# In your agent initialization code
from autogpt.core.runner import AgentRunner
runner = AgentRunner(
max_cost_usd=0.50, # Stop if API costs exceed $0.50
warn_at_cost_usd=0.30, # Print warning at $0.30
)
Combining CONTINUOUS_LIMIT with token limits gives you defense in depth: the agent stops after N steps regardless, AND stops if it spends more than $X. Either condition alone can be circumvented by a creative goal-pursuing agent; both together create a much tighter boundary.
A reasonable production configuration for non-critical automation:
CONTINUOUS_LIMIT=20
SMART_TOKEN_LIMIT=15000
FAST_TOKEN_LIMIT=8000
This allows about 20 reasoning steps with a moderate token budget before the agent must be manually re-authorized to continue.
Safety Feature 6: Memory Scope Controls
AutoGPT's long-term memory stores facts, decisions, and context across sessions. This memory can accumulate problematic patterns — an agent that learned a shortcut that worked once might apply it in contexts where it is dangerous.
Memory scope controls let you limit what gets stored and how long it persists:
# Memory backend options: local_json, redis, pinecone, milvus
MEMORY_BACKEND=local_json
# Clear memory between sessions (prevents carryover of bad patterns)
WIPE_REDIS_ON_START=True # If using Redis backend
# Limit memory entries to prevent unbounded growth
MEMORY_MAX_ENTRIES=1000
For sensitive tasks, running with ephemeral memory (cleared at session start) is safer than allowing the agent to build up context from previous runs. The trade-off is that the agent cannot learn from past sessions, which reduces efficiency for repetitive tasks.
# Reset memory programmatically before a sensitive run
from autogpt.memory.vector import get_memory
memory = get_memory(config)
memory.clear()
print("Memory cleared for fresh session")
The AI agent memory and planning guide goes deeper into memory architecture and the trade-offs between persistent and ephemeral memory for different agent types.
Safety Feature 7: Goal Constraints and Role Boundaries
The final safety layer is arguably the most underused: writing explicit constraints into the agent's role definition. AutoGPT lets you define not just what the agent should do, but what it should never do.
agent_constraints = [
"You must not delete any files without explicit confirmation in the goal",
"You must not send any emails or post to social media",
"You must not make purchases or financial transactions",
"You must not access files outside the designated workspace",
"You must stop and report if you encounter unexpected errors",
"You must not run for more than 15 steps without completing a goal",
"You must not access any URLs not directly relevant to the stated goal"
]
These constraints are soft controls — they rely on the LLM following instructions, not on technical enforcement. A sufficiently motivated goal-pursuing agent might rationalize around them. That is why they work best in combination with the hard technical controls above, not as a replacement.
Think of the constraint list as the agent's professional ethics: it shapes behavior in the majority of cases, but the Docker sandbox and workspace restrictions are the actual security perimeter.
Command Approval Flow
The interaction between safety features creates a layered approval process for every action:
Agent proposes action
↓
Is command on DENY_COMMANDS list?
→ YES: Block, log, agent tries alternative
→ NO: Continue
↓
Does action affect files outside WORKSPACE?
→ YES: Block, log error
→ NO: Continue
↓
CONTINUOUS_MODE enabled?
→ NO: Prompt human for approval
→ Human approves → Execute
→ Human rejects → Agent tries alternative
→ YES: Skip to execution
↓
Execute action
↓
Log to audit trail
This flow means that even in continuous mode, the banned commands list and workspace restriction still protect you. The human approval gate is the only control that continuous mode bypasses.
Risk Scenarios and Mitigations
Here are the most common ways AutoGPT safety goes wrong in practice, and the specific settings that prevent each one:
Scenario: Agent deletes project files while "cleaning up"
Mitigation: Add delete_file to DENY_COMMANDS. Enable RESTRICT_TO_WORKSPACE.
Scenario: Agent sends test emails to real customers
Mitigation: Add send_email to DENY_COMMANDS during testing. Use a test email account in staging.
Scenario: Agent runs in a loop, burning $50 in API calls overnight
Mitigation: Set CONTINUOUS_LIMIT=25 and SMART_TOKEN_LIMIT=20000. Set up OpenAI's hard spending limit in your account dashboard as a final backstop.
Scenario: Agent writes a script that achieves its goal by modifying system files
Mitigation: Run in Docker with --user flag and restricted volume mounts. Disable execute_shell.
Scenario: Agent stores sensitive data in memory from one run, leaks it in the next
Mitigation: Set WIPE_REDIS_ON_START=True or use local_json backend with session clearing.
For teams deploying agents at scale, the Deploy AI model to production guide covers infrastructure-level controls including network policies, secret management, and monitoring that complement AutoGPT's built-in safety features.
The Minimum Safe Configuration
If you take nothing else from this guide, use this baseline configuration for any non-trivial AutoGPT run:
# Minimum safe AutoGPT configuration
CONTINUOUS_MODE=False
CONTINUOUS_LIMIT=15
RESTRICT_TO_WORKSPACE=True
WORKSPACE_DIRECTORY=/path/to/isolated/folder
DENY_COMMANDS=execute_shell,delete_file
SMART_TOKEN_LIMIT=10000
EXECUTE_LOCAL_COMMANDS=False
MEMORY_BACKEND=local_json
And run it as a non-root user with no write permissions outside the workspace directory. These settings do not prevent all possible misuse, but they eliminate the most common and most damaging failure modes while preserving enough capability for meaningful automation.
Safety in autonomous agents is not about preventing the agent from doing anything — it is about making the consequences of unexpected behavior bounded and recoverable. Every setting above serves that goal.
Frequently Asked Questions
Can AutoGPT delete files on my computer? By default with EXECUTE_LOCAL_COMMANDS enabled and no restrictions, yes — AutoGPT can delete files if its goal leads it there. This is why enabling RESTRICT_TO_WORKSPACE and reviewing the banned commands list before any run is essential.
What is continuous mode and why is it dangerous? Continuous mode skips the human approval prompt between each agent action. The agent runs until it completes or hits an error. It is dangerous because there is no checkpoint to catch a bad decision before it executes. Only use it for thoroughly tested workflows.
Does the sandbox actually isolate AutoGPT from my system? Docker-based sandbox deployment provides meaningful isolation — the agent cannot access files outside the mounted volume. The software sandbox within a non-containerized install is more limited and only restricts the workspace path, not system calls or network access.
How do I add my own commands to the ban list? Add command names to the DENY_COMMANDS list in your .env file or configuration YAML. Commands are identified by the internal tool name AutoGPT uses, not the underlying system command. Check the commands registry in the source code to find the right names.
Should I run AutoGPT as root or administrator? Absolutely not. Always run AutoGPT under a user account with minimal permissions. If using Docker, avoid --privileged mode. The principle of least privilege applies directly — the agent should only have the access it needs, nothing more.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
10 AutoGPT Command Line Arguments (Continuous Mode, Speak)
Complete reference for AutoGPT's 10 most powerful CLI arguments. Master continuous mode, headless operation, and CI/CD integration for automated agent workflows.
10 AutoGPT Configuration Tweaks for Better Performance
10 proven AutoGPT configuration tweaks to improve speed, cut costs, and boost task success. Model selection, temperature, token limits, and workspace settings.
Build a Content Research Agent with AutoGPT (Trends, Outlines)
Build an AutoGPT content research agent that finds trending topics, analyzes SERPs, and generates SEO-ready outlines automatically — full workflow inside.
Build a Data Analysis Agent with AutoGPT (CSV, SQL, Plots)
Build a data analysis agent using AutoGPT that reads CSVs, queries SQL databases, and generates plots automatically. Full code with pandas and matplotlib.