ReAct Prompting: Combining Reasoning and Acting in AI Agents
ReAct prompting combines chain-of-thought reasoning with tool use in AI agents. Learn how it works, when to use it, and how to implement it in production.
Get more content like this on Telegram!
Daily AI tips, notes & resources β free
ReAct Prompting: Combining Reasoning and Acting in AI Agents
Here's something that tripped up my team during our first production agent deployment: a model with excellent reasoning performance on benchmarks was consistently wrong when answering questions about our product catalog. The reasoning was coherent, the logic was sound β the facts were just stale. The model was reasoning brilliantly from premises that were six months out of date.
ReAct was designed precisely for this failure mode.
The original ReAct paper from Google Brain (2022) made an observation that seems obvious in retrospect: language models can't fix their own knowledge gaps through reasoning alone. No amount of chain-of-thought will give you a model that knows today's stock prices. What you need is a loop β reason, act on the world, observe the result, reason again.
That's ReAct. And once you've built one agent with it, you'll be annoyed you didn't earlier.
What ReAct Actually Does
The core loop is three repeating steps:
- Thought β the model reasons about the current state and what it needs
- Action β the model invokes a tool (search, calculator, API call, database query)
- Observation β the tool result is fed back into context
This continues until the model has enough information to produce a final answer.
The contrast with plain chain-of-thought is sharpest on factual tasks. CoT produces a reasoning chain, then an answer. ReAct produces a reasoning chain, pauses for evidence, updates the chain, and so on. The "T" in the loop isn't just decoration β it's the mechanism that prevents the model from reasoning confidently toward a wrong answer.
The Prompt Structure
A minimal ReAct system prompt looks like this:
REACT_SYSTEM_PROMPT = """You are a research assistant with access to tools.
For each user query, follow this loop:
Thought: reason about what you need
Action: one of [{tool_names}]
Action Input: the input to the tool
Observation: (the tool result β provided by the system)
... (repeat Thought/Action/Observation as needed)
Thought: I now have enough information to answer
Final Answer: your answer to the user
Available tools:
{tool_descriptions}
Begin!
"""
The explicit Thought/Action/Observation labels aren't cosmetic. They serve as parsing anchors β you stop generation at "Observation:", inject the actual tool result, then resume generation. Without the labels, parsing becomes fragile.
Here's a full implementation using the OpenAI API with a Wikipedia search tool:
import re
import httpx
from openai import OpenAI
client = OpenAI()
def search_wikipedia(query: str) -> str:
"""Simple Wikipedia API search."""
url = "https://en.wikipedia.org/w/api.php"
params = {
"action": "query",
"list": "search",
"srsearch": query,
"format": "json",
"srlimit": 1,
}
resp = httpx.get(url, params=params, timeout=10)
results = resp.json()["query"]["search"]
if not results:
return "No results found."
return results[0]["snippet"].replace("<span class='searchmatch'>", "").replace("</span>", "")
def calculate(expression: str) -> str:
"""Safe eval for math expressions."""
try:
# Only allow numbers and math operators
if re.match(r'^[\d\s\+\-\*\/\(\)\.]+$', expression):
return str(eval(expression))
return "Invalid expression"
except Exception as e:
return f"Error: {e}"
TOOLS = {
"search": search_wikipedia,
"calculate": calculate,
}
TOOL_DESCRIPTIONS = """
- search(query): Search Wikipedia for information. Use for facts, dates, definitions.
- calculate(expression): Evaluate a math expression. Use for arithmetic.
"""
def run_react_agent(user_query: str, max_steps: int = 8) -> str:
system_prompt = f"""You are a research assistant. Use tools to answer questions accurately.
Follow this exact format:
Thought: what you need to figure out
Action: search or calculate
Action Input: the input
Observation: (filled in by system)
... repeat as needed ...
Thought: I have enough to answer
Final Answer: your complete answer
Available tools:{TOOL_DESCRIPTIONS}"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query},
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
stop=["Observation:"], # Stop before the observation
temperature=0.0,
)
agent_output = response.choices[0].message.content
messages.append({"role": "assistant", "content": agent_output})
# Check if we've reached a final answer
if "Final Answer:" in agent_output:
final = agent_output.split("Final Answer:")[-1].strip()
return final
# Parse the action
action_match = re.search(r'Action:\s*(\w+)', agent_output)
input_match = re.search(r'Action Input:\s*(.+?)(?:\n|$)', agent_output)
if not action_match or not input_match:
return "Agent failed to produce a valid action."
tool_name = action_match.group(1).strip().lower()
tool_input = input_match.group(1).strip()
if tool_name not in TOOLS:
observation = f"Unknown tool: {tool_name}"
else:
observation = TOOLS[tool_name](tool_input)
# Feed the observation back
messages.append({
"role": "user",
"content": f"Observation: {observation}\n"
})
return "Max steps reached without final answer."
# Example usage
result = run_react_agent(
"What year was the Transformer architecture introduced, and how many parameters did the original model have?"
)
print(result)
This is the skeleton most production ReAct agents are built on. The core loop is simple β the complexity lives in the tools and in handling edge cases.
ReAct vs Other Agent Patterns
ReAct doesn't exist in isolation. It's one point in a design space of agent patterns. Here's how they compare on real tasks:
| Pattern | Latency | Accuracy (factual Q&A) | Accuracy (multi-step) | Cost | Best For |
|---|---|---|---|---|---|
| Zero-shot | ~1s | 62% | 41% | $ | Simple generation |
| Chain-of-Thought | ~2s | 71% | 58% | $$ | Reasoning tasks |
| ReAct | ~5-15s | 89% | 76% | $$$ | Grounded Q&A |
| Plan-and-Execute | ~10-30s | 85% | 82% | $$$$ | Long complex tasks |
| ReAct + Reflection | ~8-20s | 91% | 81% | $$$$ | High-stakes tasks |
Benchmarks approximated from HotpotQA and ALFWorld evaluations. Actual numbers vary by model and domain.
The latency hit is real. A single ReAct trajectory on a complex question typically takes 3-6 LLM calls plus tool execution time. For synchronous user-facing applications, this matters. For background agents and async workflows, it usually doesn't.
Implementing ReAct with LangChain
If you're using LangChain, much of the above plumbing is handled:
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain import hub
# Load the ReAct prompt from LangChain hub
prompt = hub.pull("hwchase17/react")
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [
Tool(
name="Search",
func=search_wikipedia,
description="Search for factual information. Input should be a search query."
),
Tool(
name="Calculator",
func=calculate,
description="Evaluate math expressions. Input should be a valid arithmetic expression."
),
]
agent = create_react_agent(llm, tools, prompt)
executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # See the thought/action/observation loop
max_iterations=8,
handle_parsing_errors=True,
)
result = executor.invoke({
"input": "What is the population of Tokyo divided by the population of London?"
})
print(result["output"])
verbose=True is your friend during development. Watching the thought/action/observation loop unfold in real time reveals failure modes you'd never catch from the final output alone.
Common Failure Modes and Fixes
Reasoning loops are the most frequent production issue. The agent asks "what is the current temperature in Paris?", gets an answer, then asks again with slightly different phrasing, loops indefinitely.
# Add explicit loop detection
from collections import Counter
def run_react_agent_with_loop_detection(query: str) -> str:
action_history = []
for step in range(max_steps):
# ... get action from model ...
action_key = f"{tool_name}:{tool_input}"
action_history.append(action_key)
# Detect if same action called more than twice
counts = Counter(action_history)
if counts[action_key] > 2:
# Inject a hint to move on
messages.append({
"role": "user",
"content": "Observation: You've already retrieved this information. Please synthesize what you have and provide a Final Answer."
})
Hallucinated tool parameters happen when the model constructs a tool input from its own knowledge rather than from previous observations. The fix is few-shot examples that explicitly show the model extracting parameters from observation text:
FEW_SHOT_EXAMPLE = """
User: What is the GDP of the country that hosted the 2022 FIFA World Cup?
Thought: I need to find which country hosted the 2022 FIFA World Cup first.
Action: search
Action Input: 2022 FIFA World Cup host country
Observation: The 2022 FIFA World Cup was hosted by Qatar.
Thought: Now I need to find Qatar's GDP.
Action: search
Action Input: Qatar GDP 2022
Observation: Qatar's GDP in 2022 was approximately $237 billion USD.
Thought: I have the answer.
Final Answer: Qatar hosted the 2022 FIFA World Cup, with a GDP of approximately $237 billion USD.
"""
Note how the second Action Input uses "Qatar" β derived from the observation, not hallucinated.
ReAct with Structured Tool Calls (Modern OpenAI API)
The original ReAct paper used text-based action parsing because function calling didn't exist yet. Modern implementations use the OpenAI tool-calling API, which is more reliable:
import json
from openai import OpenAI
client = OpenAI()
tools_spec = [
{
"type": "function",
"function": {
"name": "search_wikipedia",
"description": "Search Wikipedia for factual information",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a mathematical expression",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Arithmetic expression to evaluate"
}
},
"required": ["expression"]
}
}
}
]
def run_modern_react(query: str, max_steps: int = 10) -> str:
messages = [
{"role": "system", "content": "You are a helpful research assistant. Think step by step, use tools to gather accurate information, and provide a complete final answer."},
{"role": "user", "content": query}
]
for _ in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools_spec,
tool_choice="auto",
temperature=0.0,
)
msg = response.choices[0].message
messages.append(msg)
# No more tool calls β final answer
if not msg.tool_calls:
return msg.content
# Execute all tool calls
for tool_call in msg.tool_calls:
fn_name = tool_call.function.name
fn_args = json.loads(tool_call.function.arguments)
if fn_name == "search_wikipedia":
result = search_wikipedia(fn_args["query"])
elif fn_name == "calculate":
result = calculate(fn_args["expression"])
else:
result = f"Unknown function: {fn_name}"
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
return "Max steps reached."
This is cleaner than text-based parsing. The model's reasoning is implicit in its tool selection choices rather than explicitly formatted β which actually works fine for production but makes debugging harder. For debugging, the text-based format has an advantage: you can read the chain of thought directly.
When ReAct Breaks Down
ReAct is not a solution to every agent problem. It degrades on:
Long-horizon tasks with 15+ steps. The model accumulates observations in context, and by step 10, the context is cluttered enough that attention quality drops. The model starts making decisions based on the most recent few observations rather than the full picture. For these tasks, look at Plan-and-Execute patterns or hierarchical agent architectures.
Tasks requiring parallel tool calls. Standard ReAct is sequential β one action, one observation, repeat. If you need to fetch information from five sources simultaneously, you're either burning latency or you need to modify the loop to support batched actions.
Adversarial inputs. If any tool can return attacker-controlled content, you have a prompt injection vulnerability baked into the architecture. Every observation gets fed back into the model's context β which means malicious content in a tool response can redirect the agent's behavior. See our full guide on prompt injection attacks for defense strategies.
ReAct in the Broader Prompting Landscape
ReAct builds on the foundation of chain-of-thought prompting, adding the outer feedback loop. It's closely related to tree-of-thought prompting, though ToT branches on the reasoning side while ReAct branches through tool calls.
For a comprehensive view of agent architectures that build on ReAct, the AI Agent Dev course walks through production implementations including memory management, parallel execution, and multi-agent coordination.
If you want to test your understanding of these patterns before building, the Advanced Prompting Quiz and AI Agents Quiz cover the key concepts with scenarios drawn from real agent failures.
For reference material while you're building, the Prompt Engineering Cheatsheet has ReAct prompt templates and the LLM Concepts Notes covers the theoretical background.
Practical Recommendations
After building several ReAct agents in production, the things that actually matter:
Use structured tool calls (function calling API) rather than text parsing in production. More reliable, easier to validate, better error messages.
Log everything. The full thought/action/observation trace is invaluable for debugging. Don't just log the final output.
Set hard step limits and handle them gracefully. An agent that loops 50 times before crashing is worse than one that stops at 8 steps and says "I couldn't find enough information."
Write tool descriptions obsessively. The model's tool selection quality is almost entirely determined by how well you describe what each tool does and when to use it. This is where prompt engineering matters most in ReAct systems.
Test with adversarial tool outputs. Inject unexpected formats, empty results, error messages, and (if relevant to your threat model) attacker-crafted strings into your tool responses. See how the agent behaves.
The Prompt Engineering course has a full module on agent prompt design that goes deeper on tool description best practices.
ReAct is the right default architecture for agents that need to be factually grounded. The implementation is straightforward, the failure modes are understandable, and the performance gains over pure chain-of-thought are consistent enough to justify the latency. Start with the simple text-based loop while you're learning the pattern, then migrate to function calling once you need reliability.
The one thing to internalize: the quality of your agent is mostly determined by the quality of your tools and tool descriptions, not by prompt cleverness. Invest there first.
π¬ DiscussionPowered by GitHub Discussions
Frequently Asked Questions
AiTechWorlds Team
β Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
Automatic Prompt Optimization: Using AI to Write Better Prompts
Automatic prompt optimization uses AI to iteratively improve prompts without manual tuning. Learn DSPy, APE, and gradient-free optimization methods with real benchmarks.
Meta-Prompting: Using LLMs to Generate and Improve Their Own Prompts
Meta-prompting uses LLMs to write, critique, and refine prompts β often outperforming human-written ones. Learn the patterns, failure modes, and production use cases.
Prompt Injection Attacks: How They Work and How to Defend Against Them
Prompt injection attacks let adversaries hijack AI behavior through malicious inputs. Learn how direct and indirect injection work, and how to build real defenses.
Jailbreak or Not? Understanding the Ethics of Prompt Manipulation
AI prompt ethics explained β the real difference between jailbreaking, clever prompting, and legitimate use, plus why AI safety guardrails exist and when to respect them.