Peer-to-Peer Agent Architectures
Overview
Not every multi-agent problem needs a boss. Peer-to-peer (P2P) architectures place agents on equal footing — no single agent has authority over others. Instead, agents collaborate through direct communication, negotiation, and consensus mechanisms. This approach excels when no single agent should have privileged knowledge of the full solution, or when the problem benefits from diverse, independent perspectives challenging each other.
When to Choose P2P Over Hierarchy
| Scenario | Best Architecture |
|---|---|
| Task has clear sequential steps with dependencies | Hierarchical |
| Multiple valid answers exist and quality improves through debate | Peer-to-peer |
| A central coordinator would become a bottleneck | Peer-to-peer |
| Strong auditability of reasoning required | Peer-to-peer (debate logs) |
| Simple delegation of well-defined sub-tasks | Hierarchical |
| Fact-checking, adversarial evaluation, bias detection | Peer-to-peer |
Core Collaboration Patterns
1. Round-Robin Discussion
Agents take turns contributing to a shared conversation. Each agent reads the full conversation history before responding, building on or challenging prior contributions.
Round 1: Agent A proposes a solution
Round 2: Agent B critiques and refines
Round 3: Agent C synthesizes a consensus
Round 4: Agent A validates the synthesis
→ Terminate when consensus is reached or max rounds exceeded
2. Agent Debate
Two or more agents take opposing positions and argue toward a conclusion. This pattern is particularly effective for:
- Fact verification: One agent asserts a claim, another actively looks for counterexamples
- Decision analysis: One agent argues for a decision, another argues against
- Red team/blue team: One agent designs a system, another attacks it
3. Voting / Jury
Multiple agents independently produce answers, then vote. The majority answer wins. Useful when individual agents may be unreliable but the ensemble is robust.
4. Blackboard Collaboration
Agents asynchronously read from and write to a shared workspace. No direct agent-to-agent communication — all coordination happens through the shared state. (Covered in depth in the Shared State module.)
Consensus Mechanisms
Consensus is the mechanism by which a group of agents converges on a single agreed-upon output.
Explicit Consensus
One agent explicitly proposes a conclusion and others vote to accept or reject:
def vote(agents: list, proposal: str) -> bool:
votes = [agent.evaluate(proposal) for agent in agents]
approval_rate = sum(v == "approve" for v in votes) / len(votes)
return approval_rate >= 0.6 # 60% threshold
Emergent Consensus
Agents iterate through discussion until their outputs converge — measured by semantic similarity of their responses.
Adversarial Consensus
One agent is designated the "devil's advocate" and must argue against any proposal, even if it agrees. Consensus is only reached when the devil's advocate can no longer find valid objections.
Tip: Adversarial consensus dramatically reduces the rate of groupthink in LLM-based agent systems. When all agents use the same base model, they tend to agree too quickly without challenge.
Implementing Agent Debate
The following example implements a two-agent debate where agents argue toward a fact-checked conclusion. A neutral judge evaluates when consensus is reached.
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from dataclasses import dataclass, field
from typing import Optional
import json
@dataclass
class DebateMessage:
agent_id: str
round: int
stance: str # "affirm" | "challenge" | "synthesize"
content: str
confidence: float # 0.0 - 1.0
@dataclass
class DebateSession:
topic: str
messages: list[DebateMessage] = field(default_factory=list)
consensus: Optional[str] = None
rounds_completed: int = 0
class DebateAgent:
def __init__(self, agent_id: str, role: str, stance_prompt: str, model: str = "gpt-4o-mini"):
self.agent_id = agent_id
self.role = role
self.stance_prompt = stance_prompt
self.llm = ChatOpenAI(model=model, temperature=0.3)
def respond(self, session: DebateSession) -> DebateMessage:
# Build conversation history for context
history = []
for msg in session.messages:
role_label = "assistant" if msg.agent_id == self.agent_id else "user"
history.append({
"role": role_label,
"content": f"[{msg.agent_id} - Round {msg.round}]: {msg.content}"
})
system = SystemMessage(content=f"""You are {self.role} in a structured debate.
Topic: {session.topic}
Your stance: {self.stance_prompt}
Respond with a JSON object:
{{
"stance": "affirm" | "challenge" | "synthesize",
"content": "your argument (2-3 sentences)",
"confidence": 0.0-1.0,
"key_claim": "single most important point you're making"
}}
Be direct, specific, and evidence-based. Do not repeat previous arguments.""")
messages = [system] + [
HumanMessage(content=m["content"]) if m["role"] == "user"
else AIMessage(content=m["content"])
for m in history
]
response = self.llm.invoke(messages)
try:
data = json.loads(response.content)
except json.JSONDecodeError:
data = {"stance": "synthesize", "content": response.content, "confidence": 0.5}
return DebateMessage(
agent_id=self.agent_id,
round=session.rounds_completed,
stance=data.get("stance", "synthesize"),
content=data.get("content", ""),
confidence=float(data.get("confidence", 0.5)),
)
class DebateJudge:
"""Neutral agent that determines when consensus has been reached."""
def __init__(self, model: str = "gpt-4o"):
self.llm = ChatOpenAI(model=model, temperature=0.0)
def evaluate(self, session: DebateSession) -> tuple[bool, Optional[str]]:
if len(session.messages) < 4:
return False, None # not enough debate to judge
recent = session.messages[-4:]
history_text = "\n".join(
f"[{m.agent_id}]: {m.content}" for m in recent
)
system = SystemMessage(content="""You are an impartial debate judge.
Review the recent exchanges and determine:
1. Has consensus been reached or are positions still diverging?
2. If consensus: what is the agreed-upon conclusion?
Respond as JSON:
{"consensus_reached": true/false, "conclusion": "..." or null, "reason": "..."}""")
response = self.llm.invoke([system, HumanMessage(content=history_text)])
try:
result = json.loads(response.content)
return result.get("consensus_reached", False), result.get("conclusion")
except json.JSONDecodeError:
return False, None
def run_debate(topic: str, max_rounds: int = 6) -> DebateSession:
# Create peer agents with opposing stances
affirmer = DebateAgent(
agent_id="Agent-Affirm",
role="Affirmative Debater",
stance_prompt="Argue in FAVOR of the topic with evidence. Be open to updating your position.",
)
challenger = DebateAgent(
agent_id="Agent-Challenge",
role="Critical Challenger",
stance_prompt="Critically examine claims. Point out weaknesses, edge cases, and counterexamples.",
)
judge = DebateJudge()
session = DebateSession(topic=topic)
print(f"\n{'='*60}")
print(f"DEBATE: {topic}")
print(f"{'='*60}\n")
for round_num in range(max_rounds):
session.rounds_completed = round_num
# Agents take turns — true peer-to-peer, no moderator between rounds
for agent in [affirmer, challenger]:
msg = agent.respond(session)
session.messages.append(msg)
print(f"[Round {round_num}] {msg.agent_id} ({msg.stance}, confidence={msg.confidence:.1f}):")
print(f" {msg.content}\n")
# Judge checks for consensus after each full round
reached, conclusion = judge.evaluate(session)
if reached:
session.consensus = conclusion
print(f"\n✓ CONSENSUS REACHED after {round_num + 1} rounds:")
print(f" {conclusion}")
break
else:
print(f"\n⚠ No consensus after {max_rounds} rounds. Review debate log.")
return session
# Example usage
if __name__ == "__main__":
session = run_debate(
topic="Chain-of-thought prompting always improves LLM reasoning accuracy",
max_rounds=6,
)
Round-Robin Implementation
For collaborative (non-adversarial) tasks, round-robin is simpler and avoids the overhead of a judge:
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
def round_robin_brainstorm(topic: str, agents_config: list[dict], rounds: int = 3) -> list[str]:
"""
Multiple agents collaboratively brainstorm, each building on prior contributions.
No supervisor — pure peer collaboration.
"""
agents = [
(cfg["name"], cfg["persona"], ChatOpenAI(model="gpt-4o-mini", temperature=0.7))
for cfg in agents_config
]
conversation: list = []
outputs: list[str] = []
for round_idx in range(rounds):
for agent_name, persona, llm in agents:
system = SystemMessage(content=f"""You are {agent_name}: {persona}
You are collaborating with peers on: {topic}
Read all prior contributions carefully. Add a distinct, non-redundant perspective.
Be concise (2-3 sentences). Explicitly build on or respectfully challenge prior ideas.""")
messages = [system] + conversation + [
HumanMessage(content=f"[Round {round_idx + 1}] Add your contribution to the discussion.")
]
response = llm.invoke(messages)
contribution = f"[{agent_name}, Round {round_idx + 1}]: {response.content}"
conversation.append(AIMessage(content=contribution))
outputs.append(contribution)
print(contribution + "\n")
return outputs
# Usage
agents_config = [
{"name": "Optimist", "persona": "You focus on opportunities and positive outcomes."},
{"name": "Realist", "persona": "You focus on practical constraints and implementation challenges."},
{"name": "Innovator", "persona": "You propose novel, unconventional approaches."},
]
round_robin_brainstorm(
topic="How should AI systems handle ethical dilemmas in autonomous decision-making?",
agents_config=agents_config,
rounds=2,
)
Avoiding Common P2P Pitfalls
Groupthink
When agents share the same base model, they may converge on superficially similar answers too quickly. Mitigation:
- Use different temperatures per agent
- Explicitly assign opposing roles
- Use the adversarial consensus pattern
Infinite Debate Loops
Agents may keep challenging each other indefinitely. Always implement:
- Maximum round limits
- A judge/terminator component
- Confidence threshold convergence check
Communication Overhead
Every agent reads the full conversation history. At scale, this creates O(n²) token consumption per round.
Note: For debates exceeding 10 rounds, consider summarizing older rounds into a "debate summary" that replaces raw history. This keeps context windows manageable without losing key arguments.
Key Takeaways
- P2P architectures shine when problems benefit from diverse, independent reasoning — particularly for verification, quality assurance, and adversarial robustness
- The agent debate pattern is the most effective way to reduce LLM hallucination through peer challenge
- Always include a termination condition — consensus threshold, judge evaluation, or round limit
- Round-robin collaboration works well for creative or brainstorming tasks where each agent contributes a distinct angle
- P2P systems consume more tokens per output than hierarchical systems — budget accordingly