Hierarchical Multi-Agent Architectures
Overview
Hierarchical multi-agent systems organize agents into layers of authority and responsibility — much like a corporate org chart. A supervisor agent sits at the top, breaking down complex goals into sub-tasks and delegating them to worker agents that specialize in execution. This pattern is one of the most robust and widely-used architectures in production AI systems.
Why Hierarchy?
Flat agent systems struggle with complex tasks because no single agent can hold enough context, tools, or expertise to handle everything well. Hierarchy solves this through separation of concerns:
- The supervisor reasons about what needs to be done and in what order
- Workers focus exclusively on how to execute a specific sub-task
- The supervisor aggregates results and decides on next steps
This mirrors how human organizations operate: managers delegate, specialists execute, and the chain of command keeps work coherent.
Core Concepts
Supervisor Agent
The supervisor is the orchestrator — it receives the high-level goal, decomposes it into actionable tasks, and routes each task to the appropriate worker. The supervisor:
- Maintains the overall goal in context
- Tracks task completion and intermediate results
- Handles failures by re-routing or retrying with different workers
- Produces the final consolidated response
Worker Agents
Workers are focused executors. Each worker is configured with:
- A specific system prompt defining its role and constraints
- A curated set of tools relevant to its domain
- No awareness of the broader goal — only its assigned sub-task
Chain of Command
The chain of command defines the delegation path from goal to action:
User Goal
└── Supervisor Agent
├── Research Worker → web search, document retrieval
├── Analysis Worker → data processing, reasoning
├── Code Worker → code generation, execution
└── Writer Worker → final synthesis and formatting
Task Decomposition
Effective task decomposition is the most critical skill of a supervisor agent. Poor decomposition leads to:
- Over-delegation: splitting trivial tasks unnecessarily, adding latency
- Under-delegation: assigning too much to one worker, losing specialization benefits
- Ambiguous handoffs: workers receiving unclear instructions and producing irrelevant output
Decomposition Strategies
| Strategy | When to Use | Example |
|---|---|---|
| Sequential | Tasks have strict dependencies | Research → Analyze → Write |
| Parallel | Tasks are independent | Fetch data sources simultaneously |
| Conditional | Next task depends on current result | If code fails, route to debugger |
| Recursive | Sub-tasks can themselves be complex | Sub-supervisor for large code modules |
Implementation with LangGraph
LangGraph models multi-agent systems as directed graphs where nodes are agents and edges are conditional routing logic.
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from typing import TypedDict, Literal, Annotated
import operator
# --- Shared state schema ---
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
task: str
next_worker: str
results: dict
final_output: str
# --- Worker agent factory ---
def make_worker(role: str, tools: list, model: str = "gpt-4o-mini"):
llm = ChatOpenAI(model=model).bind_tools(tools)
system_prompt = SystemMessage(content=f"You are a specialized {role}. "
f"Execute the assigned task precisely and return structured results.")
def worker_node(state: AgentState) -> AgentState:
response = llm.invoke([system_prompt, HumanMessage(content=state["task"])])
return {
"messages": [response],
"results": {**state.get("results", {}), role: response.content}
}
return worker_node
# --- Supervisor agent ---
supervisor_llm = ChatOpenAI(model="gpt-4o")
WORKERS = ["researcher", "analyst", "writer"]
def supervisor_node(state: AgentState) -> AgentState:
system = SystemMessage(content=f"""You are a supervisor orchestrating a team of agents.
Available workers: {WORKERS}.
Based on the current state and results, decide which worker to invoke next,
or respond with 'FINISH' if the task is complete.
Respond with ONLY the worker name or FINISH.""")
history = state.get("messages", [])
response = supervisor_llm.invoke([system, *history, HumanMessage(content=state["task"])])
next_step = response.content.strip().lower()
return {"next_worker": next_step if next_step in WORKERS else "FINISH"}
def route_from_supervisor(state: AgentState) -> Literal["researcher", "analyst", "writer", "__end__"]:
nxt = state.get("next_worker", "FINISH")
if nxt == "finish":
return END
return nxt
# --- Build the graph ---
workflow = StateGraph(AgentState)
workflow.add_node("supervisor", supervisor_node)
workflow.add_node("researcher", make_worker("researcher", tools=[]))
workflow.add_node("analyst", make_worker("analyst", tools=[]))
workflow.add_node("writer", make_worker("writer", tools=[]))
workflow.set_entry_point("supervisor")
# Supervisor routes to workers or ends
workflow.add_conditional_edges("supervisor", route_from_supervisor)
# All workers report back to supervisor
for worker in WORKERS:
workflow.add_edge(worker, "supervisor")
app = workflow.compile()
# --- Run the system ---
result = app.invoke({
"task": "Research the impact of LLMs on software development, analyze key trends, and write a 200-word summary.",
"messages": [],
"results": {},
"next_worker": "",
"final_output": ""
})
print(result["results"])
Tip: Always define a maximum iteration limit in your supervisor to prevent infinite delegation loops. LangGraph supports
recursion_limitininvoke()config.
Implementation with CrewAI
CrewAI uses a more declarative approach — you define agents, tasks, and let the framework handle orchestration.
from crewai import Agent, Task, Crew, Process
# Define specialist agents
researcher = Agent(
role="Senior Research Analyst",
goal="Find accurate, up-to-date information on assigned topics",
backstory="You are an expert researcher with 10 years of experience in AI and technology domains.",
verbose=True,
allow_delegation=False,
)
analyst = Agent(
role="Data Analyst",
goal="Synthesize research findings into actionable insights",
backstory="You excel at identifying patterns and drawing conclusions from complex information.",
verbose=True,
allow_delegation=False,
)
writer = Agent(
role="Technical Writer",
goal="Produce clear, concise, and accurate written content",
backstory="You transform complex technical content into accessible, well-structured documents.",
verbose=True,
allow_delegation=False,
)
# Define tasks with explicit delegation
research_task = Task(
description="Research the current state of multi-agent AI systems in 2025. Focus on adoption rates and key use cases.",
expected_output="A structured research brief with 5 key findings and supporting evidence.",
agent=researcher,
)
analysis_task = Task(
description="Analyze the research brief and identify the top 3 trends with business implications.",
expected_output="A trend analysis document with ranked findings and business impact assessment.",
agent=analyst,
context=[research_task], # depends on research
)
writing_task = Task(
description="Write a 300-word executive summary based on the analysis.",
expected_output="A polished executive summary suitable for C-suite presentation.",
agent=writer,
context=[research_task, analysis_task],
)
# Assemble the crew with hierarchical process
crew = Crew(
agents=[researcher, analyst, writer],
tasks=[research_task, analysis_task, writing_task],
process=Process.hierarchical, # enables supervisor orchestration
manager_llm="gpt-4o", # the supervisor model
verbose=True,
)
result = crew.kickoff()
print(result.raw)
Supervisor Design Patterns
Pattern 1: Static Routing
The supervisor has fixed routing rules. Simple, predictable, low overhead.
ROUTING_TABLE = {
"research": "researcher",
"code": "coder",
"review": "reviewer",
"default": "generalist"
}
Pattern 2: Dynamic LLM Routing
The supervisor uses an LLM call to decide routing. More flexible, higher latency and cost.
Pattern 3: Hybrid
Static rules for known task types, LLM fallback for ambiguous cases. Recommended for production.
Note: Dynamic routing adds one LLM call per delegation step. For latency-sensitive applications, pre-classify task types at the entry point and use static routing tables.
Common Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Supervisor bottleneck | High latency on all requests | Enable parallel worker invocation |
| Context loss between steps | Workers repeat work or miss context | Pass structured state, not raw messages |
| Overly large supervisor prompt | Supervisor makes poor routing decisions | Simplify to routing-only, no execution |
| No failure handling | Single worker failure breaks entire pipeline | Add retry logic and fallback workers |
Key Takeaways
- Hierarchical architectures scale well because responsibility is cleanly separated between orchestration (supervisor) and execution (workers)
- The supervisor should focus only on decomposition and routing — never on executing domain tasks itself
- Use sequential delegation for dependent tasks, parallel for independent ones
- Both LangGraph and CrewAI support hierarchical patterns but with different trade-offs in flexibility vs. simplicity
- Always implement iteration limits, timeouts, and fallback paths in production supervisors