Designing Your Agent's Architecture
Before writing a single line of agent code, you need a clear picture of how your agent is structured. Architecture decisions made early — component boundaries, data flow, the shape of the reasoning loop — are expensive to reverse later. This lesson walks you through the core components every agent shares, the trade-offs between monolithic and modular designs, and how to record the decisions you make so your future self (and your teammates) can understand why the system looks the way it does.
The Three Pillars of Every Agent
Regardless of framework or LLM provider, every AI agent can be decomposed into three functional layers:
| Layer | Responsibility | Examples |
|---|---|---|
| Perception | Ingests and normalises inputs from the environment | User messages, tool outputs, file contents, API responses |
| Reasoning | Decides what to do next given the current state | LLM calls, chain-of-thought, planning, reflection |
| Action | Executes decisions and returns observations | HTTP requests, code execution, database writes, UI interactions |
These three layers form a closed loop — the output of the action layer becomes new input for the perception layer on the next tick. Understanding this cycle is the foundation of every architecture decision you will make.
The Agent Loop
┌──────────────────────────────────────────────────────────────┐
│ AGENT LOOP │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ PERCEPTION │─────▶│ REASONING │───▶│ ACTION │ │
│ │ │ │ │ │ │ │
│ │ • Normalise │ │ • LLM call │ │ • Tool exec │ │
│ │ • Validate │ │ • Plan │ │ • Side fx │ │
│ │ • Embed │ │ • Reflect │ │ • Emit obs │ │
│ └──────▲──────┘ └──────────────┘ └──────┬──────┘ │
│ │ │ │
│ └──────────── Observation ────────────────┘ │
│ │
│ ┌──────────────┐ │
│ │ STATE │ │
│ │ • History │ │
│ │ • Memory │ │
│ │ • Phase │ │
│ └──────────────┘ │
└──────────────────────────────────────────────────────────────┘
Each iteration of the loop produces an observation — the result of the action — which feeds back into the next perception step. State sits beneath all three layers and accumulates across iterations. This closed feedback loop is what allows agents to course-correct mid-task.
Component Responsibilities
Input Handler: Validates, normalises, and sanitises user input before it reaches the agent core. Should validate type and length, guard against prompt injection, and extract structured metadata such as user ID and session ID.
Orchestrator (Core): The brain of the system. Manages the reasoning loop, decides when to use tools, when to return an answer, and when to escalate. Keep it thin — it coordinates, it does not implement business logic.
Memory Layer: Abstracts all memory operations behind a clean interface. The orchestrator calls memory.retrieve() and memory.store() — whether that hits Redis, Postgres, or a vector database is the memory layer's concern.
LLM Layer: Abstracts the LLM provider. The orchestrator calls llm.complete() — whether that hits GPT-4o, Claude 3.5, or a local model is the LLM layer's concern.
Tools Layer: Contains all tool implementations. Each tool is an isolated, independently testable unit.
Response Layer: Formats the agent's final answer for the consumer — API response, Slack message, structured JSON, etc.
Monolithic vs. Modular Agent Designs
Monolithic Agent
A monolithic agent lives in a single class or module. All logic — perception, reasoning, action selection, tool dispatch — is co-located.
class MonolithicAgent:
"""Single class that owns perception, reasoning, and action."""
def __init__(self, llm, tools):
self.llm = llm
self.tools = tools
self.history = []
def run(self, task: str) -> str:
self.history.append({"role": "user", "content": task})
while True:
response = self.llm.chat(self.history)
if response.is_final:
return response.content
result = self._execute_tool(response.tool_call)
self.history.append({"role": "tool", "content": result})
def _execute_tool(self, tool_call):
tool = self.tools[tool_call.name]
return tool(**tool_call.args)
When to use it: Proof-of-concept or research prototype. Single-task agents with a fixed, small tool set. When you need to ship something in a day or two.
Drawbacks: Becomes a "god class" quickly. Hard to test components in isolation. Changing the LLM provider forces edits across the entire file.
Modular Agent
A modular agent distributes responsibilities across separate, independently testable components connected through well-defined interfaces.
# agent/architecture.py
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Any, Optional
@dataclass
class AgentRequest:
"""Normalised input to the agent."""
user_id: str
session_id: str
message: str
metadata: dict = field(default_factory=dict)
@dataclass
class AgentResponse:
"""Structured output from the agent."""
content: str
tool_calls_made: list[str]
iterations: int
success: bool
error: Optional[str] = None
class MemoryInterface(ABC):
"""Abstract interface for all memory operations."""
@abstractmethod
def retrieve(self, query: str, user_id: str, k: int = 5) -> list[str]:
"""Retrieve relevant memories for a query."""
@abstractmethod
def store(self, content: str, user_id: str, metadata: dict = None) -> str:
"""Store a new memory. Returns memory ID."""
@abstractmethod
def get_conversation_history(self, session_id: str) -> list[dict]:
"""Get conversation history for a session."""
class LLMInterface(ABC):
"""Abstract interface for LLM providers."""
@abstractmethod
def complete(self, messages: list[dict], tools: list = None) -> dict:
"""Complete a chat conversation. Returns response dict."""
@abstractmethod
def count_tokens(self, text: str) -> int:
"""Count tokens in text for cost estimation."""
class ToolInterface(ABC):
"""Abstract interface for agent tools."""
@property
@abstractmethod
def name(self) -> str:
"""Tool name as seen by the agent."""
@property
@abstractmethod
def description(self) -> str:
"""Tool description for the LLM."""
@abstractmethod
def execute(self, **kwargs) -> str:
"""Execute the tool. Always returns a string."""
When to use it: Production systems. Agents that will use more than five tools. Multi-developer teams. Any time you expect to swap LLM providers, memory backends, or tool sets.
Tip: Start monolithic for proof-of-concept work (under two days), then refactor to modular once you understand where the real component boundaries are. Premature abstraction is as harmful as no abstraction at all.
The Concrete Orchestrator
# agent/orchestrator.py
import logging
logger = logging.getLogger(__name__)
class AgentOrchestrator:
"""
Core agent orchestrator. Manages the reasoning loop using injected
dependencies for memory, LLM, and tools.
All dependencies are injected via constructor, making this class
easily testable with mock implementations.
"""
def __init__(
self,
llm: LLMInterface,
memory: MemoryInterface,
tools: list[ToolInterface],
max_iterations: int = 10,
system_prompt: str = "",
):
self.llm = llm
self.memory = memory
self.tools = {t.name: t for t in tools}
self.max_iterations = max_iterations
self.system_prompt = system_prompt
def handle(self, request: AgentRequest) -> AgentResponse:
"""Process a request through the full agent pipeline."""
tool_calls_made = []
# Retrieve relevant memories and conversation history
memories = self.memory.retrieve(request.message, request.user_id)
history = self.memory.get_conversation_history(request.session_id)
messages = self._build_messages(request, memories, history)
# Run the reasoning loop
for iteration in range(self.max_iterations):
response = self.llm.complete(messages, tools=list(self.tools.values()))
if response.get("finish_reason") == "stop":
self.memory.store(
f"User asked: {request.message}\nAgent answered: {response['content']}",
user_id=request.user_id,
)
return AgentResponse(
content=response["content"],
tool_calls_made=tool_calls_made,
iterations=iteration + 1,
success=True,
)
if response.get("tool_calls"):
for tool_call in response["tool_calls"]:
tool_name = tool_call["name"]
tool_calls_made.append(tool_name)
if tool_name in self.tools:
result = self.tools[tool_name].execute(**tool_call["args"])
else:
result = f"ERROR: Tool '{tool_name}' not found"
messages.append({
"role": "tool",
"name": tool_name,
"content": result,
})
return AgentResponse(
content="I was unable to complete this task within the allowed steps.",
tool_calls_made=tool_calls_made,
iterations=self.max_iterations,
success=False,
error="Max iterations exceeded",
)
def _build_messages(
self,
request: AgentRequest,
memories: list[str],
history: list[dict],
) -> list[dict]:
messages = [{"role": "system", "content": self.system_prompt}]
if memories:
memory_context = "\n".join(f"- {m}" for m in memories)
messages.append({
"role": "system",
"content": f"Relevant context:\n{memory_context}",
})
messages.extend(history)
messages.append({"role": "user", "content": request.message})
return messages
Dependency Injection for Testability
One of the biggest quality-of-life improvements you can make early on is ensuring your orchestrator accepts all its dependencies from the outside. This lets you swap in mocks during tests without touching any production code.
# agent/factory.py
from agent.orchestrator import AgentOrchestrator
def create_production_agent() -> AgentOrchestrator:
"""Create agent with real production dependencies."""
from agent.llm.openai_provider import OpenAIProvider
from agent.memory.redis_memory import RedisMemory
from agent.tools.web_search import WebSearchTool
from agent.config import settings
return AgentOrchestrator(
llm=OpenAIProvider(model=settings.default_model),
memory=RedisMemory(url=settings.redis_url),
tools=[WebSearchTool(api_key=settings.search_api_key)],
max_iterations=settings.max_iterations,
)
def create_test_agent(mock_llm=None, mock_memory=None) -> AgentOrchestrator:
"""Create agent with mock dependencies for testing."""
from unittest.mock import MagicMock
return AgentOrchestrator(
llm=mock_llm or MagicMock(),
memory=mock_memory or MagicMock(),
tools=[],
max_iterations=5,
)
Architecture Decision Records for Agents
An Architecture Decision Record (ADR) is a short document that captures why you made a key design choice. For AI agents, these are especially valuable because "it seemed like a good idea at the time" is not a sufficient audit trail when a teammate needs to debug a production incident six months later.
# ADR-001: Use Modular Architecture Over Monolithic
## Status
Accepted
## Context
Our agent needs to support at least 3 different LLM backends and 12 tools.
We anticipate swapping the planner module as new models are released.
Multiple engineers will work on different tool groups simultaneously.
## Decision
We will use a modular architecture with separate Perception, Reasoning,
and Action layers connected via well-defined dataclass interfaces.
The Orchestrator imports from each layer but layers do not import from
each other.
## Consequences
+ Each layer can be unit-tested independently with mocked dependencies.
+ New LLM backends require only a new LLMInterface implementation.
+ Tool groups can be developed in parallel without merge conflicts.
- Initial setup requires defining interface contracts upfront.
- New contributors must understand the data-flow contracts between layers.
## Alternatives Considered
- Monolithic: Rejected — too brittle for the multi-backend requirement.
- LangGraph: Deferred to v2 — overhead not justified for MVP scope.
Note: Store ADRs in a
docs/decisions/directory alongside your code. When a future teammate asks "why is it structured this way?", the answer is one file-read away.
Architecture Selection Guide
Does your agent need multiple LLM backends?
YES → Modular (abstracted LLMInterface)
NO ↓
Will you have more than 10 tools?
YES → Modular (tool registry with dynamic dispatch)
NO ↓
Is this a 1-day prototype or throwaway experiment?
YES → Monolithic (move fast, refactor later)
NO ↓
Do multiple engineers contribute to this agent?
YES → Modular
NO → Either works; choose based on team familiarity
Key Takeaways
- Every agent has three core layers: Perception → Reasoning → Action, connected by an observation feedback loop.
- Monolithic designs are faster to prototype; modular designs are easier to scale, test, and maintain.
- Define your request/response types as dataclasses from day one — it forces clarity about what information flows through your system.
- Write an ADR for every significant architecture decision. Future maintainers will thank you.
- Dependency injection is the single best thing you can do for testability — never instantiate LLM clients or database connections inside your core orchestrator.
Next Steps
In the next lesson, we implement the core reasoning loop — specifically the ReAct (Reason + Act) pattern — and show how an LLM drives the decision-making cycle within the architecture we have just designed.