Designing Your Agent's Architecture

Before writing a single line of agent code, you need a clear picture of how your agent is structured. Architecture decisions made early — component boundaries, data flow, the shape of the reasoning loop — are expensive to reverse later. This lesson walks you through the core components every agent shares, the trade-offs between monolithic and modular designs, and how to record the decisions you make so your future self (and your teammates) can understand why the system looks the way it does.

The Three Pillars of Every Agent

Regardless of framework or LLM provider, every AI agent can be decomposed into three functional layers:

Layer	Responsibility	Examples
Perception	Ingests and normalises inputs from the environment	User messages, tool outputs, file contents, API responses
Reasoning	Decides what to do next given the current state	LLM calls, chain-of-thought, planning, reflection
Action	Executes decisions and returns observations	HTTP requests, code execution, database writes, UI interactions

These three layers form a closed loop — the output of the action layer becomes new input for the perception layer on the next tick. Understanding this cycle is the foundation of every architecture decision you will make.

The Agent Loop

┌──────────────────────────────────────────────────────────────┐
│                        AGENT LOOP                            │
│                                                              │
│   ┌─────────────┐      ┌──────────────┐    ┌─────────────┐  │
│   │  PERCEPTION │─────▶│  REASONING   │───▶│   ACTION    │  │
│   │             │      │              │    │             │  │
│   │ • Normalise │      │ • LLM call   │    │ • Tool exec │  │
│   │ • Validate  │      │ • Plan       │    │ • Side fx   │  │
│   │ • Embed     │      │ • Reflect    │    │ • Emit obs  │  │
│   └──────▲──────┘      └──────────────┘    └──────┬──────┘  │
│          │                                         │         │
│          └──────────── Observation ────────────────┘         │
│                                                              │
│                    ┌──────────────┐                          │
│                    │    STATE     │                          │
│                    │ • History    │                          │
│                    │ • Memory     │                          │
│                    │ • Phase      │                          │
│                    └──────────────┘                          │
└──────────────────────────────────────────────────────────────┘

Each iteration of the loop produces an observation — the result of the action — which feeds back into the next perception step. State sits beneath all three layers and accumulates across iterations. This closed feedback loop is what allows agents to course-correct mid-task.

Component Responsibilities

Input Handler: Validates, normalises, and sanitises user input before it reaches the agent core. Should validate type and length, guard against prompt injection, and extract structured metadata such as user ID and session ID.

Orchestrator (Core): The brain of the system. Manages the reasoning loop, decides when to use tools, when to return an answer, and when to escalate. Keep it thin — it coordinates, it does not implement business logic.

Memory Layer: Abstracts all memory operations behind a clean interface. The orchestrator calls memory.retrieve() and memory.store() — whether that hits Redis, Postgres, or a vector database is the memory layer's concern.

LLM Layer: Abstracts the LLM provider. The orchestrator calls llm.complete() — whether that hits GPT-4o, Claude 3.5, or a local model is the LLM layer's concern.

Tools Layer: Contains all tool implementations. Each tool is an isolated, independently testable unit.

Response Layer: Formats the agent's final answer for the consumer — API response, Slack message, structured JSON, etc.

Monolithic vs. Modular Agent Designs

Monolithic Agent

A monolithic agent lives in a single class or module. All logic — perception, reasoning, action selection, tool dispatch — is co-located.

class MonolithicAgent:
    """Single class that owns perception, reasoning, and action."""

    def __init__(self, llm, tools):
        self.llm = llm
        self.tools = tools
        self.history = []

    def run(self, task: str) -> str:
        self.history.append({"role": "user", "content": task})
        while True:
            response = self.llm.chat(self.history)
            if response.is_final:
                return response.content
            result = self._execute_tool(response.tool_call)
            self.history.append({"role": "tool", "content": result})

    def _execute_tool(self, tool_call):
        tool = self.tools[tool_call.name]
        return tool(**tool_call.args)

When to use it: Proof-of-concept or research prototype. Single-task agents with a fixed, small tool set. When you need to ship something in a day or two.

Drawbacks: Becomes a "god class" quickly. Hard to test components in isolation. Changing the LLM provider forces edits across the entire file.

Modular Agent

A modular agent distributes responsibilities across separate, independently testable components connected through well-defined interfaces.

# agent/architecture.py
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Any, Optional


@dataclass
class AgentRequest:
    """Normalised input to the agent."""
    user_id: str
    session_id: str
    message: str
    metadata: dict = field(default_factory=dict)


@dataclass
class AgentResponse:
    """Structured output from the agent."""
    content: str
    tool_calls_made: list[str]
    iterations: int
    success: bool
    error: Optional[str] = None


class MemoryInterface(ABC):
    """Abstract interface for all memory operations."""

    @abstractmethod
    def retrieve(self, query: str, user_id: str, k: int = 5) -> list[str]:
        """Retrieve relevant memories for a query."""

    @abstractmethod
    def store(self, content: str, user_id: str, metadata: dict = None) -> str:
        """Store a new memory. Returns memory ID."""

    @abstractmethod
    def get_conversation_history(self, session_id: str) -> list[dict]:
        """Get conversation history for a session."""


class LLMInterface(ABC):
    """Abstract interface for LLM providers."""

    @abstractmethod
    def complete(self, messages: list[dict], tools: list = None) -> dict:
        """Complete a chat conversation. Returns response dict."""

    @abstractmethod
    def count_tokens(self, text: str) -> int:
        """Count tokens in text for cost estimation."""


class ToolInterface(ABC):
    """Abstract interface for agent tools."""

    @property
    @abstractmethod
    def name(self) -> str:
        """Tool name as seen by the agent."""

    @property
    @abstractmethod
    def description(self) -> str:
        """Tool description for the LLM."""

    @abstractmethod
    def execute(self, **kwargs) -> str:
        """Execute the tool. Always returns a string."""

When to use it: Production systems. Agents that will use more than five tools. Multi-developer teams. Any time you expect to swap LLM providers, memory backends, or tool sets.

Tip: Start monolithic for proof-of-concept work (under two days), then refactor to modular once you understand where the real component boundaries are. Premature abstraction is as harmful as no abstraction at all.

The Concrete Orchestrator

# agent/orchestrator.py
import logging

logger = logging.getLogger(__name__)


class AgentOrchestrator:
    """
    Core agent orchestrator. Manages the reasoning loop using injected
    dependencies for memory, LLM, and tools.

    All dependencies are injected via constructor, making this class
    easily testable with mock implementations.
    """

    def __init__(
        self,
        llm: LLMInterface,
        memory: MemoryInterface,
        tools: list[ToolInterface],
        max_iterations: int = 10,
        system_prompt: str = "",
    ):
        self.llm = llm
        self.memory = memory
        self.tools = {t.name: t for t in tools}
        self.max_iterations = max_iterations
        self.system_prompt = system_prompt

    def handle(self, request: AgentRequest) -> AgentResponse:
        """Process a request through the full agent pipeline."""
        tool_calls_made = []

        # Retrieve relevant memories and conversation history
        memories = self.memory.retrieve(request.message, request.user_id)
        history = self.memory.get_conversation_history(request.session_id)
        messages = self._build_messages(request, memories, history)

        # Run the reasoning loop
        for iteration in range(self.max_iterations):
            response = self.llm.complete(messages, tools=list(self.tools.values()))

            if response.get("finish_reason") == "stop":
                self.memory.store(
                    f"User asked: {request.message}\nAgent answered: {response['content']}",
                    user_id=request.user_id,
                )
                return AgentResponse(
                    content=response["content"],
                    tool_calls_made=tool_calls_made,
                    iterations=iteration + 1,
                    success=True,
                )

            if response.get("tool_calls"):
                for tool_call in response["tool_calls"]:
                    tool_name = tool_call["name"]
                    tool_calls_made.append(tool_name)
                    if tool_name in self.tools:
                        result = self.tools[tool_name].execute(**tool_call["args"])
                    else:
                        result = f"ERROR: Tool '{tool_name}' not found"
                    messages.append({
                        "role": "tool",
                        "name": tool_name,
                        "content": result,
                    })

        return AgentResponse(
            content="I was unable to complete this task within the allowed steps.",
            tool_calls_made=tool_calls_made,
            iterations=self.max_iterations,
            success=False,
            error="Max iterations exceeded",
        )

    def _build_messages(
        self,
        request: AgentRequest,
        memories: list[str],
        history: list[dict],
    ) -> list[dict]:
        messages = [{"role": "system", "content": self.system_prompt}]
        if memories:
            memory_context = "\n".join(f"- {m}" for m in memories)
            messages.append({
                "role": "system",
                "content": f"Relevant context:\n{memory_context}",
            })
        messages.extend(history)
        messages.append({"role": "user", "content": request.message})
        return messages

Dependency Injection for Testability

One of the biggest quality-of-life improvements you can make early on is ensuring your orchestrator accepts all its dependencies from the outside. This lets you swap in mocks during tests without touching any production code.

# agent/factory.py
from agent.orchestrator import AgentOrchestrator

def create_production_agent() -> AgentOrchestrator:
    """Create agent with real production dependencies."""
    from agent.llm.openai_provider import OpenAIProvider
    from agent.memory.redis_memory import RedisMemory
    from agent.tools.web_search import WebSearchTool
    from agent.config import settings

    return AgentOrchestrator(
        llm=OpenAIProvider(model=settings.default_model),
        memory=RedisMemory(url=settings.redis_url),
        tools=[WebSearchTool(api_key=settings.search_api_key)],
        max_iterations=settings.max_iterations,
    )


def create_test_agent(mock_llm=None, mock_memory=None) -> AgentOrchestrator:
    """Create agent with mock dependencies for testing."""
    from unittest.mock import MagicMock
    return AgentOrchestrator(
        llm=mock_llm or MagicMock(),
        memory=mock_memory or MagicMock(),
        tools=[],
        max_iterations=5,
    )

Architecture Decision Records for Agents

An Architecture Decision Record (ADR) is a short document that captures why you made a key design choice. For AI agents, these are especially valuable because "it seemed like a good idea at the time" is not a sufficient audit trail when a teammate needs to debug a production incident six months later.

# ADR-001: Use Modular Architecture Over Monolithic

## Status
Accepted

## Context
Our agent needs to support at least 3 different LLM backends and 12 tools.
We anticipate swapping the planner module as new models are released.
Multiple engineers will work on different tool groups simultaneously.

## Decision
We will use a modular architecture with separate Perception, Reasoning,
and Action layers connected via well-defined dataclass interfaces.
The Orchestrator imports from each layer but layers do not import from
each other.

## Consequences
+ Each layer can be unit-tested independently with mocked dependencies.
+ New LLM backends require only a new LLMInterface implementation.
+ Tool groups can be developed in parallel without merge conflicts.
- Initial setup requires defining interface contracts upfront.
- New contributors must understand the data-flow contracts between layers.

## Alternatives Considered
- Monolithic: Rejected — too brittle for the multi-backend requirement.
- LangGraph: Deferred to v2 — overhead not justified for MVP scope.

Note: Store ADRs in a docs/decisions/ directory alongside your code. When a future teammate asks "why is it structured this way?", the answer is one file-read away.

Architecture Selection Guide

Does your agent need multiple LLM backends?
    YES → Modular (abstracted LLMInterface)
    NO  ↓

Will you have more than 10 tools?
    YES → Modular (tool registry with dynamic dispatch)
    NO  ↓

Is this a 1-day prototype or throwaway experiment?
    YES → Monolithic (move fast, refactor later)
    NO  ↓

Do multiple engineers contribute to this agent?
    YES → Modular
    NO  → Either works; choose based on team familiarity

Key Takeaways

Every agent has three core layers: Perception → Reasoning → Action, connected by an observation feedback loop.
Monolithic designs are faster to prototype; modular designs are easier to scale, test, and maintain.
Define your request/response types as dataclasses from day one — it forces clarity about what information flows through your system.
Write an ADR for every significant architecture decision. Future maintainers will thank you.
Dependency injection is the single best thing you can do for testability — never instantiate LLM clients or database connections inside your core orchestrator.

Next Steps

In the next lesson, we implement the core reasoning loop — specifically the ReAct (Reason + Act) pattern — and show how an LLM drives the decision-making cycle within the architecture we have just designed.