Frameworks for Multi-Agent Systems: LangGraph vs CrewAI vs AutoGen
Overview
Choosing the right framework for a multi-agent system is an architectural decision with long-term consequences. LangGraph, CrewAI, and AutoGen each take a fundamentally different approach to the same problem — and each has scenarios where it genuinely excels. This lesson gives you a deep comparative understanding so you can make the right choice for your use case, and shows you the same workflow implemented in both LangGraph and CrewAI.
The Three Paradigms
| Framework | Core Paradigm | Mental Model | Best Analogy |
|---|---|---|---|
| LangGraph | Graph-based orchestration | State machine | Flowchart with conditional routing |
| CrewAI | Role-based crews | Team management | Department with job descriptions |
| AutoGen | Conversation-based | Chat room | Group chat where agents message each other |
LangGraph: Graph-Based Orchestration
LangGraph models your multi-agent system as a directed graph where:
- Nodes are processing units (agents, tools, human-in-the-loop steps)
- Edges are transitions, which can be conditional (routing logic)
- State flows through the graph, being modified at each node
Architecture Philosophy
LangGraph gives you maximum control and observability. You explicitly define every possible state transition. Nothing happens "magically" — every routing decision is a function you wrote. This makes LangGraph the choice for:
- Complex conditional workflows with many branching paths
- Systems where state management is critical and must be auditable
- Production systems requiring fine-grained control over retries, timeouts, and error handling
- Workflows that need human-in-the-loop checkpoints
LangGraph Trade-offs
| Pro | Con |
|---|---|
| Explicit, fully auditable state transitions | More boilerplate than higher-level frameworks |
| Built-in state persistence and checkpointing | Graph definition can get complex at scale |
| First-class support for human-in-the-loop | Steeper learning curve |
| Streaming support for long-running workflows | Requires thinking in graph/state-machine terms |
| Fine-grained control over retry and error handling | Overkill for simple sequential tasks |
CrewAI: Role-Based Crews
CrewAI abstracts orchestration behind a team metaphor. You define:
- Agents with roles, goals, and backstories
- Tasks with descriptions and expected outputs
- Crews that combine agents and tasks under a process (sequential, hierarchical, parallel)
Architecture Philosophy
CrewAI handles orchestration for you. The framework decides how tasks flow between agents based on the process setting and task context dependencies. This makes CrewAI ideal for:
- Rapid prototyping and proof-of-concept
- Standard research → analyze → write pipelines
- Teams who think in terms of job functions, not state machines
- Workflows without complex conditional branching
CrewAI Trade-offs
| Pro | Con |
|---|---|
| Intuitive role-based abstractions | Less control over exact orchestration flow |
| Very low boilerplate for standard workflows | Black-box routing in hierarchical mode |
| Rich built-in tooling ecosystem | Harder to implement complex conditional logic |
| Good defaults (retry, memory, caching) | State inspection requires custom callbacks |
Process.hierarchical gives free supervisor | Manager LLM adds cost on every step |
AutoGen: Conversation-Based Multi-Agent
AutoGen models multi-agent interaction as a group chat between ConversableAgent instances. Agents take turns responding to each other via a GroupChat with a GroupChatManager that routes turns.
Architecture Philosophy
AutoGen is designed for open-ended, conversational collaboration — tasks where the solution path isn't known in advance and emerges through agent dialogue. Best for:
- Research assistants and brainstorming tools
- Code generation and debugging through dialogue
- Scenarios with unpredictable task structure
- When you want agents to naturally self-organize
AutoGen Trade-offs
| Pro | Con |
|---|---|
| Very natural for code generation/debugging | Less predictable execution paths |
| Minimal setup — agents just talk | Harder to enforce strict workflow structure |
| Great for open-ended problem solving | Token consumption grows O(n²) with conversation |
| Built-in code execution sandboxing | Less control over agent routing |
| Easy human proxy integration | Termination conditions can be tricky |
Side-by-Side: The Same Workflow in LangGraph and CrewAI
Task: Research a company, analyze it financially, then write an investment memo.
Implementation in LangGraph
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from typing import TypedDict, Annotated, Literal
import operator
# --- State schema ---
class InvestmentState(TypedDict):
company: str
messages: Annotated[list, operator.add]
research_summary: str
financial_analysis: str
investment_memo: str
quality_check_passed: bool
current_step: str
# --- Agent nodes ---
research_llm = ChatOpenAI(model="gpt-4o-mini")
analysis_llm = ChatOpenAI(model="gpt-4o")
writing_llm = ChatOpenAI(model="gpt-4o")
qa_llm = ChatOpenAI(model="gpt-4o-mini")
def research_node(state: InvestmentState) -> InvestmentState:
system = SystemMessage(content=(
"You are a senior research analyst. Gather key facts about the company: "
"business model, market position, recent news, leadership, and competitive landscape."
))
response = research_llm.invoke([
system,
HumanMessage(content=f"Research company: {state['company']}")
])
return {
"research_summary": response.content,
"messages": [response],
"current_step": "analysis",
}
def analysis_node(state: InvestmentState) -> InvestmentState:
system = SystemMessage(content=(
"You are a financial analyst. Based on the research provided, analyze: "
"revenue trends, growth potential, risk factors, competitive moat, and valuation signals."
))
response = analysis_llm.invoke([
system,
HumanMessage(content=f"Research to analyze:\n{state['research_summary']}")
])
return {
"financial_analysis": response.content,
"messages": [response],
"current_step": "writing",
}
def writing_node(state: InvestmentState) -> InvestmentState:
system = SystemMessage(content=(
"You are a senior investment analyst. Write a concise, professional investment memo "
"(400-600 words) with sections: Executive Summary, Business Overview, "
"Investment Thesis, Key Risks, and Recommendation."
))
response = writing_llm.invoke([
system,
HumanMessage(content=(
f"Research: {state['research_summary']}\n\n"
f"Analysis: {state['financial_analysis']}\n\n"
f"Write the investment memo for: {state['company']}"
))
])
return {
"investment_memo": response.content,
"messages": [response],
"current_step": "qa",
}
def qa_node(state: InvestmentState) -> InvestmentState:
system = SystemMessage(content=(
"You are a quality reviewer. Check this investment memo for: "
"logical consistency, unsupported claims, missing sections, and factual alignment with research. "
"Reply with 'APPROVED' or 'REVISION_NEEDED: <specific issues>'"
))
response = qa_llm.invoke([
system,
HumanMessage(content=(
f"Research: {state['research_summary']}\n\n"
f"Memo to review:\n{state['investment_memo']}"
))
])
passed = response.content.strip().startswith("APPROVED")
return {
"quality_check_passed": passed,
"messages": [response],
}
# --- Routing ---
def route_after_qa(state: InvestmentState) -> Literal["writing", "__end__"]:
if state["quality_check_passed"]:
return END
return "writing" # loop back to rewrite
# --- Build graph ---
workflow = StateGraph(InvestmentState)
workflow.add_node("research", research_node)
workflow.add_node("analysis", analysis_node)
workflow.add_node("writing", writing_node)
workflow.add_node("qa", qa_node)
workflow.set_entry_point("research")
workflow.add_edge("research", "analysis")
workflow.add_edge("analysis", "writing")
workflow.add_edge("writing", "qa")
workflow.add_conditional_edges("qa", route_after_qa)
app = workflow.compile()
# Run
result = app.invoke({
"company": "Anthropic",
"messages": [],
"research_summary": "",
"financial_analysis": "",
"investment_memo": "",
"quality_check_passed": False,
"current_step": "research",
})
print(result["investment_memo"])
Implementation in CrewAI
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
# --- Agents ---
researcher = Agent(
role="Senior Research Analyst",
goal="Gather comprehensive information about companies for investment analysis",
backstory=(
"15 years in equity research at Goldman Sachs. You have an encyclopedic knowledge "
"of how to quickly assess company fundamentals through public information."
),
llm=ChatOpenAI(model="gpt-4o-mini"),
verbose=True,
allow_delegation=False,
)
financial_analyst = Agent(
role="Financial Analyst",
goal="Analyze research to derive investment insights and identify risks",
backstory=(
"CFA charterholder with a background in hedge funds. You see financial signals others miss "
"and are known for your rigorous, quantitative approach to qualitative data."
),
llm=ChatOpenAI(model="gpt-4o"),
verbose=True,
allow_delegation=False,
)
memo_writer = Agent(
role="Investment Memo Writer",
goal="Produce clear, compelling, and professionally structured investment memos",
backstory=(
"Former VP of Investor Relations who now specializes in translating complex analysis "
"into executive-level documents. Your memos are known for clarity and precision."
),
llm=ChatOpenAI(model="gpt-4o"),
verbose=True,
allow_delegation=False,
)
qa_reviewer = Agent(
role="Senior Quality Reviewer",
goal="Ensure all outputs meet professional investment standards",
backstory=(
"You spent years as a managing director signing off on research publications. "
"Nothing passes your desk that isn't factually sound and logically consistent."
),
llm=ChatOpenAI(model="gpt-4o-mini"),
verbose=True,
allow_delegation=False,
)
# --- Tasks ---
research_task = Task(
description=(
"Research {company}: business model, market position, recent developments (last 12 months), "
"key competitors, and leadership team."
),
expected_output=(
"A structured research brief covering: Company Overview, Market Position, "
"Recent Developments, Competitive Landscape, and Leadership Summary. "
"Include a confidence rating (1-10) for each section."
),
agent=researcher,
)
analysis_task = Task(
description=(
"Analyze the research brief for investment merit. Assess: growth potential, "
"competitive moat, financial health signals, and key risk factors."
),
expected_output=(
"Financial analysis with: Investment Thesis (bull/bear case), Growth Potential score (1-10), "
"Moat Strength score (1-10), Risk Matrix (likelihood × impact), "
"and preliminary Recommendation (Buy/Hold/Avoid)."
),
agent=financial_analyst,
context=[research_task],
)
writing_task = Task(
description=(
"Write a professional investment memo for {company} based on the research and analysis. "
"Structure: Executive Summary (100w), Business Overview (150w), Investment Thesis (200w), "
"Key Risks (100w), Recommendation with conviction level."
),
expected_output="A 500-600 word investment memo formatted in professional markdown.",
agent=memo_writer,
context=[research_task, analysis_task],
)
qa_task = Task(
description=(
"Review the investment memo against the research brief and analysis. "
"Check: logical consistency, all claims are supported, all required sections present, "
"recommendation aligns with analysis. Approve or provide specific revision notes."
),
expected_output=(
"Quality review report: APPROVED or REVISION_NEEDED with specific issues listed, "
"each referencing the section and claim that needs correction."
),
agent=qa_reviewer,
context=[research_task, analysis_task, writing_task],
)
# --- Assemble crew ---
investment_crew = Crew(
agents=[researcher, financial_analyst, memo_writer, qa_reviewer],
tasks=[research_task, analysis_task, writing_task, qa_task],
process=Process.sequential,
verbose=True,
)
result = investment_crew.kickoff(inputs={"company": "Anthropic"})
print(result.raw)
Decision Framework: Which Framework to Choose?
Does your workflow have complex conditional branching?
├── YES → Does it also need explicit state persistence/checkpointing?
│ ├── YES → LangGraph (with checkpointer)
│ └── NO → LangGraph (simpler setup)
└── NO → Is the workflow primarily conversational / open-ended?
├── YES → AutoGen
└── NO → Is the workflow role-based with clear job functions?
├── YES → CrewAI
└── NO → LangGraph (simple sequential graph)
Note: Framework choice is not permanent. Many teams start with CrewAI for rapid prototyping, identify the bottlenecks and custom requirements, then migrate the production version to LangGraph for fine-grained control. The concepts transfer cleanly.
Key Takeaways
- LangGraph: Choose for production systems requiring explicit control, complex branching, and full observability. Higher boilerplate, but highest control.
- CrewAI: Choose for rapid development of role-based workflows. Excellent defaults, minimal boilerplate, but less control over exact orchestration.
- AutoGen: Choose for conversational, open-ended problem-solving tasks. Best for code-gen workflows and exploratory research.
- All three frameworks support the same fundamental patterns — they differ in how much they abstract vs. expose
- The same pipeline logic takes roughly 3× as many lines in LangGraph as in CrewAI — but those extra lines give you explicit control over every transition