When One Agent Isn't Enough
The previous eight articles built single-agent systems: one LLM, one set of tools, one conversation history. That architecture handles most problems well.
But some tasks are inherently multi-expert:
- Writing a technical article needs a researcher to gather facts, a writer to draft, and an editor to polish — three roles, three ways of thinking
- Handling a support ticket needs intent classification, knowledge base lookup, and reply generation — three stages, independently testable
- Code review needs static analysis, security scanning, and readability review — three dimensions, none interfering with the others
A single agent can handle these, but you'll find the System Prompt growing uncontrollably and output quality becoming erratic — because you're forcing one role to play everyone.
The core value of multi-agent: separation of concerns. Each agent does one thing well.
Two Common Architecture Patterns
Multi-agent systems have several topologies. Two dominate in practice:
Supervisor Pattern (dynamic routing):
classify → supervisor → researcher
↘ writer
↘ reviewer
↘ FINISH
Key: one "control center" decides which agent to call next
Pipeline Pattern (fixed sequence):
outline_agent → draft_agent → polish_agent → END
Key: execution path is hardwired; each agent sees only its own context
These patterns aren't competitors — they fit different scenarios.
Demo 1: Supervisor Pattern
Design Approach
The core challenge of the Supervisor pattern is routing reliability. If the LLM decides "who to call next" at every step, it tends to:
- Call the same worker multiple times
- Forget which workers have already been called
- Fail to recognize when to terminate
A better design is a two-phase hybrid:
Phase 1: LLM classifies the task once (simple_fact vs full_article)
Phase 2: Python routes deterministically based on classification + called list
LLM handles "understanding what kind of task this is." Python handles "executing the right sequence." Each does what it's good at.
LangGraph Implementation
class SupervisorState(TypedDict):
messages: Annotated[list, add_messages]
task: str
task_type: str # "simple_fact" or "full_article"
called: list[str]
next: str
def classify_node(state: SupervisorState) -> SupervisorState:
"""LLM classifies once; result persists in state for all subsequent routing."""
decision = _ask(
"Classify this task:\n"
" simple_fact — a factual question with a direct short answer\n"
" full_article — needs research, writing, and editorial review\n"
"Output one word only: simple_fact / full_article",
f"Task: {state['task']}",
).strip().lower()
task_type = "full_article" if "full_article" in decision else "simple_fact"
return {**state, "task_type": task_type}
def supervisor_node(state: SupervisorState) -> SupervisorState:
"""Pure Python routing — no LLM call, no risk of infinite loops."""
called = state["called"]
task_type = state["task_type"]
if "researcher" not in called:
next_worker = "researcher"
elif task_type == "simple_fact":
next_worker = "FINISH" # simple questions stop after research
elif "writer" not in called:
next_worker = "writer"
elif "reviewer" not in called:
next_worker = "reviewer"
else:
next_worker = "FINISH"
return {**state, "next": next_worker}
Graph Topology
g = StateGraph(SupervisorState)
g.set_entry_point("classify")
g.add_edge("classify", "supervisor")
g.add_conditional_edges(
"supervisor",
route_supervisor,
{"researcher": "researcher", "writer": "writer",
"reviewer": "reviewer", "FINISH": END},
)
g.add_edge("researcher", "supervisor")
g.add_edge("writer", "supervisor")
g.add_edge("reviewer", "supervisor")
classify → supervisor → [workers] → supervisor → ... → FINISH forms a controlled loop. The classify node runs exactly once; the supervisor node acts as a lightweight state machine.
Measured Results
Task: "Write a short article about Python list comprehensions"
[classify] task_type = full_article
[supervisor] → researcher
[researcher] working...
[supervisor] → writer
[writer] working...
[supervisor] → reviewer
[reviewer] working...
[supervisor] → FINISH
Workers called: ['researcher', 'writer', 'reviewer']
Task type : full_article
One LLM call for classify, then pure Python routing. The execution chain is clean and predictable: researcher → writer → reviewer → FINISH.
Demo 2: Pipeline Pattern
Pipeline requires significantly less code than Supervisor — there's no routing logic to write:
class PipelineState(TypedDict):
topic: str
outline: str
draft: str
polished: str
stage_log: list[str]
def outline_agent(state: PipelineState) -> PipelineState:
outline = _ask("Create a 5-point outline...", state["topic"])
return {**state, "outline": outline, "stage_log": [...]}
def draft_agent(state: PipelineState) -> PipelineState:
draft = _ask("Expand the outline into a 200-word draft...", state["outline"])
return {**state, "draft": draft, "stage_log": [...]}
def polish_agent(state: PipelineState) -> PipelineState:
polished = _ask("Polish the draft...", state["draft"])
return {**state, "polished": polished, "stage_log": [...]}
# Topology: outline → draft → polish → END
g.add_edge("outline_agent", "draft_agent")
g.add_edge("draft_agent", "polish_agent")
g.add_edge("polish_agent", END)
Measured Results
[outline_agent] 957 chars
[draft_agent] 1846 chars
[polish_agent] 2168 chars
Final output (first 300 chars):
"### Unveiling the Power of List Comprehensions in Python
Python's lists are dynamic and powerful data containers..."
Each stage's output becomes the next stage's input. Content grows progressively: outline 957 chars → draft 1846 → polished 2168.
No LLM routing decisions at all — the path was determined when the graph was written.
Demo 3: Same Graph, Different Paths
The most compelling advantage of the Supervisor pattern: different task types take different execution paths through the same graph — no code changes needed.
Running the same Supervisor graph on a simple factual question:
Task: "What year was Python created?"
[classify] task_type = simple_fact
[supervisor] → researcher
[researcher] working...
[supervisor] → FINISH
Workers called : ['researcher']
Task type : simple_fact
Result : researcher → FINISH (writer + reviewer skipped)
Side-by-side comparison:
Task Workers Called Steps
──────────────────────────────────────────────────────────────────────────
Write article (full_article) researcher→writer→reviewer 3
Factual question (simple_fact) researcher 1
Same Supervisor graph. Different paths based on LLM classification.
Pipeline can't do this — its path is hardcoded.
Pattern Selection Matrix
Dimension Pipeline Supervisor
─────────────────────────────────────────────────────────────────────
Execution path Fixed, hardwired Dynamic, classification-driven
Best for ETL, doc processing Research, open Q&A, mixed tasks
Debuggability High (linear trace) Medium (path varies per task)
LLM calls/turn N (one per stage) N + 1 (one classify call extra)
Flexibility Low High
Predictability High Lower
Implementation Trivial Medium
Rule of thumb:
- Know exactly what steps you need → Pipeline
- Need to adapt the steps per task → Supervisor
Multi-Agent Design Checklist
Pattern Selection
- [ ] Fixed, known steps → Pipeline; dynamic decision needed → Supervisor
- [ ] With ≤ 3 workers and clear responsibilities, either pattern works
- [ ] With > 5 workers, consider hierarchical Supervisor (Supervisor of Supervisors)
State Design
- [ ] Each worker reads only the fields it needs, writes only the fields it produces
- [ ] Supervisor state must include
called: list[str]to prevent duplicate invocations - [ ] Pipeline state uses stage-named fields (
outline,draft,polished) for easy debugging
Routing Reliability
- [ ] Avoid pure LLM routing (LLMs cannot reliably track call history)
- [ ] Recommended: LLM for one-time classification + Python for deterministic routing
- [ ] Set
recursion_limit(20–30 is a good range) to guard against accidental loops
Worker Design
- [ ] Each worker does exactly one thing, with clear input/output contract
- [ ] Workers communicate via State, never by calling each other directly
- [ ] Write clear Worker System Prompts — don't make workers guess the context
Observability
- [ ] Log each node execution (worker name, input/output summary)
- [ ] Record the
calledlist to trace routing decisions post-hoc - [ ] Add warning logs on unusual branches (e.g., premature FINISH)
Summary
Five core takeaways:
- Multi-agent is about separation of concerns: not for complexity's sake, but because a single Agent's System Prompt starts breaking down when it has to play too many roles
- Pipeline wins on simplicity and predictability: the execution path lives in code, traces are linear, testing and debugging cost is minimized
- Supervisor wins on adaptability: the same graph handles a one-step factual question and a three-step full article — no code change required
- LLM classification + Python execution is the best pairing: LLM does what it's good at (understanding task type), Python does what it's good at (reliable sequencing)
-
calledlist is the critical field in Supervisor State: the foundation of routing determinism — without it, Supervisor is prone to duplicate calls and infinite loops
References
- LangGraph Multi-Agent Concepts
- LangGraph Supervisor Tutorial
- Full demo code for this series: agent-08-multi-agent
Find more useful knowledge and interesting products on my Homepage
Top comments (3)
Good comparison of the patterns. One thing both architectures hit eventually: how do agents share state without passing it through every message?
Supervisor pattern especially — if the supervisor crashes or restarts, agents lose shared context.
The pattern that works for us:
Each agent has its own namespace. Shared namespace for cross-agent coordination.
Open source:
pip install becomer-agentsGreat breakdown of supervisor vs pipeline patterns for multi-agent systems. One gap I've seen even with clear role separation: agents still blur the line between thinking and acting. I put together Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) — it uses PreToolUse hooks to block tool calls during brainstorming phases. Three modes (divergent, actionable, academic) keep each agent in the right headspace within your architecture.
Great breakdown of supervisor vs pipeline patterns. One thing I've noticed in multi-agent setups: agents don't have a 'thinking mode' vs 'action mode' toggle. That's why I built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) — it adds three modes (divergent, actionable, academic) via PreToolUse hooks so the agent stays in brainstorming instead of jumping to code. Fits nicely as a supplementary pattern alongside the supervisor/pipeline approaches.