Core Agent Logic

Implementing the Reasoning Loop

14m read

The Core Reasoning Loop: ReAct

The reasoning loop is the heartbeat of your agent. Every capability — tool use, multi-step planning, self-correction — flows through it. In this lesson you will learn how the ReAct pattern (Reason + Act) works, why it produces better results than straight action generation, and how to implement a full Thought → Action → Observation cycle using LangChain.


Why Loops Matter

A single LLM call can answer many questions, but it cannot:

  • Execute code and react to the runtime output
  • Search the web and then synthesise multiple results into a coherent answer
  • Retry gracefully when a tool fails or returns bad data
  • Decompose a complex goal into sequenced sub-steps

For all of these, you need a loop — a repeated cycle where the model is given the result of its last action and asked what to do next. The ReAct pattern is the most widely validated design for this loop.


The ReAct Pattern

ReAct was introduced in a 2022 paper from Google Research and Princeton. The core insight is deceptively simple: let the model reason out loud before it acts. By generating an explicit Thought before choosing an Action, the model:

  1. Commits to a plan that can be inspected and debugged
  2. Grounds its next action in its own reasoning trace
  3. Can detect and recover from errors by reasoning about the Observation

The Three-Step Cycle

┌─────────────────────────────────────────────────────────┐
│  ITERATION N                                            │
│                                                         │
│  Thought:  "I need the current weather in Paris.        │
│             I should call the weather API tool."        │
│                                                         │
│  Action:   weather_api(city="Paris")                    │
│                                                         │
│  Observation: {"temp": 18, "condition": "cloudy"}       │
│                                                         │
│  → Feed Observation back into prompt for Iteration N+1  │
└─────────────────────────────────────────────────────────┘

The Observation from iteration N becomes part of the prompt for iteration N+1. The model therefore has a growing trace of its own reasoning and all tool outputs — this is its "scratchpad".

Key Insight: The model never sees a clean slate after the first iteration. Every thought and observation accumulates in the context window. This is why managing context length is a real concern in long-running agents.


Anatomy of a ReAct Prompt

You are a helpful research assistant. You have access to the following tools:

{tool_descriptions}

Use the following format EXACTLY:

Thought: <your reasoning about what to do next>
Action: <tool_name>
Action Input: <JSON arguments for the tool>
Observation: <result of the tool — this will be filled in for you>

Repeat Thought/Action/Action Input/Observation as needed.
When you have the final answer, respond with:

Final Answer: <your complete response to the user>

Begin!

User: {user_input}

{scratchpad}

The {scratchpad} is the accumulated history of previous Thought/Action/Observation triples. Your code appends to it after each iteration.


Implementing a ReAct Loop with LangChain

LangChain provides the plumbing so you can focus on the agent logic. The following example builds a minimal but complete ReAct agent using LangChain's primitives.

Step 1 — Define Your Tools

from langchain.tools import tool


@tool
def search_web(query: str) -> str:
    """
    Search the web for current information.
    Use this when you need facts that may have changed recently or that
    are not in your training data. Returns a plain-text summary of results.
    """
    # In production, connect to SerpAPI, Tavily, or Brave Search.
    # For this example we return a stub response.
    return f"Search results for '{query}': [stub — integrate a real search API here]"


@tool
def calculate(expression: str) -> str:
    """
    Evaluate a mathematical expression safely.
    Input must be a valid Python arithmetic expression (no function calls).
    Supports +, -, *, /, ** operators. Returns the numeric result as a string.
    """
    import ast
    import operator as op

    allowed_ops = {
        ast.Add: op.add,
        ast.Sub: op.sub,
        ast.Mult: op.mul,
        ast.Div: op.truediv,
        ast.Pow: op.pow,
        ast.USub: op.neg,
    }

    def _eval(node: ast.AST) -> float:
        if isinstance(node, ast.Constant):
            return float(node.value)
        if isinstance(node, ast.BinOp):
            return allowed_ops[type(node.op)](_eval(node.left), _eval(node.right))
        if isinstance(node, ast.UnaryOp):
            return allowed_ops[type(node.op)](_eval(node.operand))
        raise ValueError(f"Unsupported node type: {type(node)}")

    try:
        tree = ast.parse(expression, mode="eval")
        result = _eval(tree.body)
        return str(result)
    except Exception as exc:
        return f"Error evaluating expression: {exc}"


tools = [search_web, calculate]

Step 2 — Build the ReAct Agent

from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

# Pull the standard ReAct prompt from LangChain Hub.
# This is the battle-tested prompt template — use it as your starting point
# before writing a custom one.
react_prompt = hub.pull("hwchase17/react")

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,          # Deterministic reasoning is preferable for agents
)

# create_react_agent wires the LLM, tools, and prompt together.
# It does NOT run the loop — that is AgentExecutor's job.
agent = create_react_agent(
    llm=llm,
    tools=tools,
    prompt=react_prompt,
)

executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,                # Prints each Thought/Action/Observation in real time
    max_iterations=10,
    handle_parsing_errors=True,  # Recover from malformed LLM outputs instead of crashing
    return_intermediate_steps=True,
)

Step 3 — Run the Agent and Inspect the Trace

result = executor.invoke({
    "input": "What is 17% of 4500? Then search for what 'compound interest' means."
})

print("Final Answer:", result["output"])
print()
print("=== Reasoning Trace ===")
for step in result["intermediate_steps"]:
    action, observation = step
    print(f"\nThought: {action.log.strip()}")
    print(f"Action:  {action.tool}({action.tool_input})")
    print(f"Obs:     {observation}")

Expected console output structure:

Thought: I need to calculate 17% of 4500. I should use the calculate tool.
Action:  calculate(4500 * 0.17)
Obs:     765.0

Thought: Now I need to search for 'compound interest'.
Action:  search_web(compound interest definition finance)
Obs:     Search results for 'compound interest definition finance': ...

Final Answer: 17% of 4500 is 765. Compound interest is interest calculated
on both the principal and previously accumulated interest...

Tip: Always set verbose=True during development. Watching the Thought/Action/Observation trace live is the fastest way to diagnose why your agent is making wrong decisions.


Implementing the Loop Manually

Understanding what AgentExecutor does under the hood will make you a better agent developer. Here is a simplified manual implementation that reveals every moving part:

from dataclasses import dataclass, field
from typing import Optional
import re
import json
import logging

logger = logging.getLogger(__name__)


@dataclass
class ReActStep:
    """One completed Thought → Action → Observation cycle."""
    thought: str
    action: str
    action_input: dict
    observation: str


@dataclass
class ReActState:
    """Mutable state passed between loop iterations."""
    user_input: str
    steps: list[ReActStep] = field(default_factory=list)
    final_answer: Optional[str] = None

    def scratchpad(self) -> str:
        """Render the accumulated trace for inclusion in the next prompt."""
        parts = []
        for step in self.steps:
            parts.append(f"Thought: {step.thought}")
            parts.append(f"Action: {step.action}")
            parts.append(f"Action Input: {json.dumps(step.action_input)}")
            parts.append(f"Observation: {step.observation}")
        return "\n".join(parts)


def parse_llm_output(
    raw: str,
) -> tuple[Optional[str], Optional[str], Optional[dict], Optional[str]]:
    """
    Extract (thought, action, action_input, final_answer) from raw LLM output.

    Handles common formatting variations such as extra whitespace and
    different capitalisation. Returns None for fields not present in
    this particular output.
    """
    thought = None
    action = None
    action_input = None
    final_answer = None

    if m := re.search(r"Thought:\s*(.+?)(?=\nAction:|\nFinal Answer:|$)", raw, re.DOTALL):
        thought = m.group(1).strip()

    if m := re.search(r"Action:\s*(.+?)(?=\nAction Input:|$)", raw, re.DOTALL):
        action = m.group(1).strip()

    if m := re.search(r"Action Input:\s*(.+?)(?=\nObservation:|$)", raw, re.DOTALL):
        try:
            action_input = json.loads(m.group(1).strip())
        except json.JSONDecodeError:
            # Fall back to treating the whole string as the primary argument
            action_input = {"input": m.group(1).strip()}

    if m := re.search(r"Final Answer:\s*(.+)$", raw, re.DOTALL):
        final_answer = m.group(1).strip()

    return thought, action, action_input, final_answer


def run_react_loop(
    user_input: str,
    tool_registry: dict,
    llm_call,
    max_iterations: int = 10,
) -> str:
    """
    Execute the ReAct loop manually.

    Drives Thought → Action → Observation cycles until the LLM emits
    a 'Final Answer:' line or max_iterations is exhausted.

    Args:
        user_input:     The raw user question or task description.
        tool_registry:  Dict mapping tool name strings to callable functions.
        llm_call:       A callable that accepts a prompt string and returns a string.
        max_iterations: Hard cap on the number of reasoning iterations.

    Returns:
        The final answer string, or an error message if the cap is hit.
    """
    state = ReActState(user_input=user_input)

    for iteration in range(max_iterations):
        logger.info("[INFO][run_react_loop] Starting iteration %d", iteration)

        prompt = _build_react_prompt(
            user_input=state.user_input,
            tool_registry=tool_registry,
            scratchpad=state.scratchpad(),
        )

        raw_output = llm_call(prompt)
        logger.debug("[DEBUG][run_react_loop] LLM output:\n%s", raw_output)

        thought, action_name, action_input, final_answer = parse_llm_output(raw_output)

        if final_answer:
            logger.info("[INFO][run_react_loop] Final Answer received at iteration %d", iteration)
            state.final_answer = final_answer
            return final_answer

        if action_name and action_name in tool_registry:
            observation = tool_registry[action_name](**(action_input or {}))
        else:
            observation = (
                f"Error: unknown tool '{action_name}'. "
                f"Available tools: {list(tool_registry.keys())}"
            )

        state.steps.append(ReActStep(
            thought=thought or "",
            action=action_name or "",
            action_input=action_input or {},
            observation=str(observation),
        ))

    logger.warning(
        "[WARN][run_react_loop] Hit max_iterations=%d without Final Answer",
        max_iterations,
    )
    return "I was unable to complete this task within the iteration limit."


def _build_react_prompt(user_input: str, tool_registry: dict, scratchpad: str) -> str:
    """Assemble the full ReAct prompt from its components."""
    tool_descriptions = "\n".join(
        f"- {name}: {fn.__doc__ or 'No description provided.'}"
        for name, fn in tool_registry.items()
    )
    return (
        f"You are a helpful assistant with access to tools.\n\n"
        f"Tools available:\n{tool_descriptions}\n\n"
        f"Format:\n"
        f"Thought: <reasoning>\n"
        f"Action: <tool_name>\n"
        f"Action Input: <JSON dict>\n"
        f"Observation: <will be filled>\n"
        f"... (repeat as needed)\n"
        f"Final Answer: <your answer>\n\n"
        f"User: {user_input}\n\n"
        f"{scratchpad}"
    )

Common ReAct Failure Modes

FailureSymptomFix
Action parsing errorLLM emits malformed JSON for Action InputAdd handle_parsing_errors=True and retry with an error message in the prompt
Infinite loopAgent searches repeatedly without concludingLower max_iterations; add a prompt hint like "you have N steps remaining"
Hallucinated tool nameLLM calls a tool that does not existList tool names explicitly in the system prompt
Observation overflowA tool returns tens of thousands of tokensTruncate observations to a sane limit before appending to the scratchpad
Stale reasoningAgent ignores earlier observationsEnsure the full scratchpad is included in every iteration's prompt
Premature terminationAgent answers "I don't know" after one failed callPrompt it: "Try at least two different approaches before giving up"

The LLM as the Reasoning Engine

One question students often ask: "Why not just write the logic in Python instead of calling an LLM every iteration?"

The answer is that LLMs bring capabilities that are difficult or impossible to hand-code:

  • Natural language understanding — interpreting ambiguous, underspecified user goals
  • Common-sense reasoning — knowing that "yesterday" means you should check the current date before using it
  • Adaptive planning — changing approach when the first strategy fails
  • Output synthesis — turning a pile of raw tool results into a coherent, readable narrative

The ReAct pattern exploits these strengths while keeping the agent grounded — every claim the model makes in its final answer must follow from an actual tool observation, not from its parametric memory.

Important: A well-designed ReAct agent should be able to say "I don't know" when the tools don't return useful results. Prompt the model explicitly: "If your tools do not return useful information after two attempts, say so clearly rather than guessing."


Key Takeaways

  • ReAct separates reasoning from action by inserting an explicit Thought step before every tool call.
  • The scratchpad is the model's working memory — it accumulates all Thought/Action/Observation triples.
  • Use AgentExecutor for production agents; understand the manual loop to diagnose edge cases.
  • Always cap iterations with max_iterations and handle parsing errors gracefully.
  • The LLM's job is reasoning and synthesis; your Python code handles execution and state management.

Further Reading