The Core Reasoning Loop: ReAct
The reasoning loop is the heartbeat of your agent. Every capability — tool use, multi-step planning, self-correction — flows through it. In this lesson you will learn how the ReAct pattern (Reason + Act) works, why it produces better results than straight action generation, and how to implement a full Thought → Action → Observation cycle using LangChain.
Why Loops Matter
A single LLM call can answer many questions, but it cannot:
- Execute code and react to the runtime output
- Search the web and then synthesise multiple results into a coherent answer
- Retry gracefully when a tool fails or returns bad data
- Decompose a complex goal into sequenced sub-steps
For all of these, you need a loop — a repeated cycle where the model is given the result of its last action and asked what to do next. The ReAct pattern is the most widely validated design for this loop.
The ReAct Pattern
ReAct was introduced in a 2022 paper from Google Research and Princeton. The core insight is deceptively simple: let the model reason out loud before it acts. By generating an explicit Thought before choosing an Action, the model:
- Commits to a plan that can be inspected and debugged
- Grounds its next action in its own reasoning trace
- Can detect and recover from errors by reasoning about the
Observation
The Three-Step Cycle
┌─────────────────────────────────────────────────────────┐
│ ITERATION N │
│ │
│ Thought: "I need the current weather in Paris. │
│ I should call the weather API tool." │
│ │
│ Action: weather_api(city="Paris") │
│ │
│ Observation: {"temp": 18, "condition": "cloudy"} │
│ │
│ → Feed Observation back into prompt for Iteration N+1 │
└─────────────────────────────────────────────────────────┘
The Observation from iteration N becomes part of the prompt for iteration N+1. The model therefore has a growing trace of its own reasoning and all tool outputs — this is its "scratchpad".
Key Insight: The model never sees a clean slate after the first iteration. Every thought and observation accumulates in the context window. This is why managing context length is a real concern in long-running agents.
Anatomy of a ReAct Prompt
You are a helpful research assistant. You have access to the following tools:
{tool_descriptions}
Use the following format EXACTLY:
Thought: <your reasoning about what to do next>
Action: <tool_name>
Action Input: <JSON arguments for the tool>
Observation: <result of the tool — this will be filled in for you>
Repeat Thought/Action/Action Input/Observation as needed.
When you have the final answer, respond with:
Final Answer: <your complete response to the user>
Begin!
User: {user_input}
{scratchpad}
The {scratchpad} is the accumulated history of previous Thought/Action/Observation triples. Your code appends to it after each iteration.
Implementing a ReAct Loop with LangChain
LangChain provides the plumbing so you can focus on the agent logic. The following example builds a minimal but complete ReAct agent using LangChain's primitives.
Step 1 — Define Your Tools
from langchain.tools import tool
@tool
def search_web(query: str) -> str:
"""
Search the web for current information.
Use this when you need facts that may have changed recently or that
are not in your training data. Returns a plain-text summary of results.
"""
# In production, connect to SerpAPI, Tavily, or Brave Search.
# For this example we return a stub response.
return f"Search results for '{query}': [stub — integrate a real search API here]"
@tool
def calculate(expression: str) -> str:
"""
Evaluate a mathematical expression safely.
Input must be a valid Python arithmetic expression (no function calls).
Supports +, -, *, /, ** operators. Returns the numeric result as a string.
"""
import ast
import operator as op
allowed_ops = {
ast.Add: op.add,
ast.Sub: op.sub,
ast.Mult: op.mul,
ast.Div: op.truediv,
ast.Pow: op.pow,
ast.USub: op.neg,
}
def _eval(node: ast.AST) -> float:
if isinstance(node, ast.Constant):
return float(node.value)
if isinstance(node, ast.BinOp):
return allowed_ops[type(node.op)](_eval(node.left), _eval(node.right))
if isinstance(node, ast.UnaryOp):
return allowed_ops[type(node.op)](_eval(node.operand))
raise ValueError(f"Unsupported node type: {type(node)}")
try:
tree = ast.parse(expression, mode="eval")
result = _eval(tree.body)
return str(result)
except Exception as exc:
return f"Error evaluating expression: {exc}"
tools = [search_web, calculate]
Step 2 — Build the ReAct Agent
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub
# Pull the standard ReAct prompt from LangChain Hub.
# This is the battle-tested prompt template — use it as your starting point
# before writing a custom one.
react_prompt = hub.pull("hwchase17/react")
llm = ChatOpenAI(
model="gpt-4o",
temperature=0, # Deterministic reasoning is preferable for agents
)
# create_react_agent wires the LLM, tools, and prompt together.
# It does NOT run the loop — that is AgentExecutor's job.
agent = create_react_agent(
llm=llm,
tools=tools,
prompt=react_prompt,
)
executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Prints each Thought/Action/Observation in real time
max_iterations=10,
handle_parsing_errors=True, # Recover from malformed LLM outputs instead of crashing
return_intermediate_steps=True,
)
Step 3 — Run the Agent and Inspect the Trace
result = executor.invoke({
"input": "What is 17% of 4500? Then search for what 'compound interest' means."
})
print("Final Answer:", result["output"])
print()
print("=== Reasoning Trace ===")
for step in result["intermediate_steps"]:
action, observation = step
print(f"\nThought: {action.log.strip()}")
print(f"Action: {action.tool}({action.tool_input})")
print(f"Obs: {observation}")
Expected console output structure:
Thought: I need to calculate 17% of 4500. I should use the calculate tool.
Action: calculate(4500 * 0.17)
Obs: 765.0
Thought: Now I need to search for 'compound interest'.
Action: search_web(compound interest definition finance)
Obs: Search results for 'compound interest definition finance': ...
Final Answer: 17% of 4500 is 765. Compound interest is interest calculated
on both the principal and previously accumulated interest...
Tip: Always set
verbose=Trueduring development. Watching the Thought/Action/Observation trace live is the fastest way to diagnose why your agent is making wrong decisions.
Implementing the Loop Manually
Understanding what AgentExecutor does under the hood will make you a better agent developer. Here is a simplified manual implementation that reveals every moving part:
from dataclasses import dataclass, field
from typing import Optional
import re
import json
import logging
logger = logging.getLogger(__name__)
@dataclass
class ReActStep:
"""One completed Thought → Action → Observation cycle."""
thought: str
action: str
action_input: dict
observation: str
@dataclass
class ReActState:
"""Mutable state passed between loop iterations."""
user_input: str
steps: list[ReActStep] = field(default_factory=list)
final_answer: Optional[str] = None
def scratchpad(self) -> str:
"""Render the accumulated trace for inclusion in the next prompt."""
parts = []
for step in self.steps:
parts.append(f"Thought: {step.thought}")
parts.append(f"Action: {step.action}")
parts.append(f"Action Input: {json.dumps(step.action_input)}")
parts.append(f"Observation: {step.observation}")
return "\n".join(parts)
def parse_llm_output(
raw: str,
) -> tuple[Optional[str], Optional[str], Optional[dict], Optional[str]]:
"""
Extract (thought, action, action_input, final_answer) from raw LLM output.
Handles common formatting variations such as extra whitespace and
different capitalisation. Returns None for fields not present in
this particular output.
"""
thought = None
action = None
action_input = None
final_answer = None
if m := re.search(r"Thought:\s*(.+?)(?=\nAction:|\nFinal Answer:|$)", raw, re.DOTALL):
thought = m.group(1).strip()
if m := re.search(r"Action:\s*(.+?)(?=\nAction Input:|$)", raw, re.DOTALL):
action = m.group(1).strip()
if m := re.search(r"Action Input:\s*(.+?)(?=\nObservation:|$)", raw, re.DOTALL):
try:
action_input = json.loads(m.group(1).strip())
except json.JSONDecodeError:
# Fall back to treating the whole string as the primary argument
action_input = {"input": m.group(1).strip()}
if m := re.search(r"Final Answer:\s*(.+)$", raw, re.DOTALL):
final_answer = m.group(1).strip()
return thought, action, action_input, final_answer
def run_react_loop(
user_input: str,
tool_registry: dict,
llm_call,
max_iterations: int = 10,
) -> str:
"""
Execute the ReAct loop manually.
Drives Thought → Action → Observation cycles until the LLM emits
a 'Final Answer:' line or max_iterations is exhausted.
Args:
user_input: The raw user question or task description.
tool_registry: Dict mapping tool name strings to callable functions.
llm_call: A callable that accepts a prompt string and returns a string.
max_iterations: Hard cap on the number of reasoning iterations.
Returns:
The final answer string, or an error message if the cap is hit.
"""
state = ReActState(user_input=user_input)
for iteration in range(max_iterations):
logger.info("[INFO][run_react_loop] Starting iteration %d", iteration)
prompt = _build_react_prompt(
user_input=state.user_input,
tool_registry=tool_registry,
scratchpad=state.scratchpad(),
)
raw_output = llm_call(prompt)
logger.debug("[DEBUG][run_react_loop] LLM output:\n%s", raw_output)
thought, action_name, action_input, final_answer = parse_llm_output(raw_output)
if final_answer:
logger.info("[INFO][run_react_loop] Final Answer received at iteration %d", iteration)
state.final_answer = final_answer
return final_answer
if action_name and action_name in tool_registry:
observation = tool_registry[action_name](**(action_input or {}))
else:
observation = (
f"Error: unknown tool '{action_name}'. "
f"Available tools: {list(tool_registry.keys())}"
)
state.steps.append(ReActStep(
thought=thought or "",
action=action_name or "",
action_input=action_input or {},
observation=str(observation),
))
logger.warning(
"[WARN][run_react_loop] Hit max_iterations=%d without Final Answer",
max_iterations,
)
return "I was unable to complete this task within the iteration limit."
def _build_react_prompt(user_input: str, tool_registry: dict, scratchpad: str) -> str:
"""Assemble the full ReAct prompt from its components."""
tool_descriptions = "\n".join(
f"- {name}: {fn.__doc__ or 'No description provided.'}"
for name, fn in tool_registry.items()
)
return (
f"You are a helpful assistant with access to tools.\n\n"
f"Tools available:\n{tool_descriptions}\n\n"
f"Format:\n"
f"Thought: <reasoning>\n"
f"Action: <tool_name>\n"
f"Action Input: <JSON dict>\n"
f"Observation: <will be filled>\n"
f"... (repeat as needed)\n"
f"Final Answer: <your answer>\n\n"
f"User: {user_input}\n\n"
f"{scratchpad}"
)
Common ReAct Failure Modes
| Failure | Symptom | Fix |
|---|---|---|
| Action parsing error | LLM emits malformed JSON for Action Input | Add handle_parsing_errors=True and retry with an error message in the prompt |
| Infinite loop | Agent searches repeatedly without concluding | Lower max_iterations; add a prompt hint like "you have N steps remaining" |
| Hallucinated tool name | LLM calls a tool that does not exist | List tool names explicitly in the system prompt |
| Observation overflow | A tool returns tens of thousands of tokens | Truncate observations to a sane limit before appending to the scratchpad |
| Stale reasoning | Agent ignores earlier observations | Ensure the full scratchpad is included in every iteration's prompt |
| Premature termination | Agent answers "I don't know" after one failed call | Prompt it: "Try at least two different approaches before giving up" |
The LLM as the Reasoning Engine
One question students often ask: "Why not just write the logic in Python instead of calling an LLM every iteration?"
The answer is that LLMs bring capabilities that are difficult or impossible to hand-code:
- Natural language understanding — interpreting ambiguous, underspecified user goals
- Common-sense reasoning — knowing that "yesterday" means you should check the current date before using it
- Adaptive planning — changing approach when the first strategy fails
- Output synthesis — turning a pile of raw tool results into a coherent, readable narrative
The ReAct pattern exploits these strengths while keeping the agent grounded — every claim the model makes in its final answer must follow from an actual tool observation, not from its parametric memory.
Important: A well-designed ReAct agent should be able to say "I don't know" when the tools don't return useful results. Prompt the model explicitly: "If your tools do not return useful information after two attempts, say so clearly rather than guessing."
Key Takeaways
- ReAct separates reasoning from action by inserting an explicit
Thoughtstep before every tool call. - The
scratchpadis the model's working memory — it accumulates all Thought/Action/Observation triples. - Use
AgentExecutorfor production agents; understand the manual loop to diagnose edge cases. - Always cap iterations with
max_iterationsand handle parsing errors gracefully. - The LLM's job is reasoning and synthesis; your Python code handles execution and state management.
Further Reading
- ReAct: Synergizing Reasoning and Acting in Language Models — the foundational paper
- LangChain ReAct Agent documentation — practical implementation guide
- Tree of Thoughts (Yao et al., 2023) — extension of ReAct to branching reasoning trees