Beginner

Multi-Agent Architecture Patterns

Multi-agent systems are how production AI workflows handle tasks too complex for a single prompt. This lesson covers the core orchestration patterns, when to use them, and real-world examples from coding agents, research workflows, and enterprise automation.

Single Agent vs Multi-Agent: When Do You Need More Than One?

A single agent is an LLM with a system prompt, a set of tools, and a loop that reasons about which tool to call next. This works well for straightforward tasks: answer a question, write a function, summarize a document.

Multi-agent systems become necessary when:

The task requires different expertise: A coding task needs one agent to write code, another to review it, and a third to write tests. Each agent has different system prompts, tools, and evaluation criteria.
The task has natural parallelism: Research across 10 sources can be done by 10 agents simultaneously, with a final agent synthesizing results.
Quality requires adversarial checking: A "writer" agent drafts content while a "critic" agent evaluates it. This debate pattern catches errors a single agent would miss.
The workflow has distinct phases: Planning, execution, and verification are fundamentally different tasks that benefit from specialized agents.

💡

Apply at work: Before building a multi-agent system, try a single agent first. If it fails because the task requires multiple distinct capabilities (e.g., code generation + code review + testing), that is your signal to split into multiple agents. Multi-agent adds complexity — only use it when single-agent clearly falls short.

The Four Core Orchestration Patterns

Every multi-agent system in production uses one of these four patterns (or a combination):

1. Sequential (Pipeline)

Agents execute one after another. The output of Agent A becomes the input of Agent B. This is the simplest multi-agent pattern.

# Sequential pipeline: Planner -> Coder -> Reviewer -> Tester
class SequentialPipeline:
    def __init__(self):
        self.agents = [
            PlannerAgent(),    # Breaks task into implementation steps
            CoderAgent(),      # Writes code based on the plan
            ReviewerAgent(),   # Reviews code for bugs, style, security
            TesterAgent(),     # Writes and runs tests
        ]

    async def run(self, task: str) -> dict:
        context = {"original_task": task}
        for agent in self.agents:
            result = await agent.execute(context)
            context[agent.name] = result  # Each agent sees all prior results
            if result.get("status") == "blocked":
                return {"status": "blocked", "blocked_at": agent.name, "reason": result["reason"]}
        return context

When to use: Tasks with clear phases (plan → implement → review → test). Code generation workflows, content pipelines, data processing chains.

2. Parallel (Fan-Out / Fan-In)

Multiple agents work on independent sub-tasks simultaneously. A coordinator agent splits the work and a synthesizer agent combines the results.

# Parallel fan-out: Research multiple sources simultaneously
import asyncio

class ParallelResearch:
    def __init__(self, num_researchers: int = 5):
        self.splitter = TaskSplitterAgent()
        self.researchers = [ResearchAgent(id=i) for i in range(num_researchers)]
        self.synthesizer = SynthesizerAgent()

    async def run(self, question: str) -> str:
        # Fan-out: split into sub-questions
        sub_questions = await self.splitter.split(question)

        # Execute in parallel
        tasks = [
            self.researchers[i % len(self.researchers)].research(sq)
            for i, sq in enumerate(sub_questions)
        ]
        results = await asyncio.gather(*tasks, return_exceptions=True)

        # Filter out failures
        valid_results = [r for r in results if not isinstance(r, Exception)]

        # Fan-in: synthesize
        return await self.synthesizer.combine(question, valid_results)

When to use: Research tasks, data gathering from multiple sources, batch processing where sub-tasks are independent.

3. Hierarchical (Supervisor)

A supervisor agent delegates tasks to specialist agents, monitors progress, and decides what to do next based on results. This is the most common pattern in production systems like coding agents.

# Hierarchical: Supervisor delegates to specialists
class SupervisorSystem:
    def __init__(self):
        self.supervisor = SupervisorAgent()
        self.specialists = {
            "coder": CoderAgent(),
            "researcher": ResearchAgent(),
            "debugger": DebuggerAgent(),
            "reviewer": ReviewerAgent(),
        }

    async def run(self, task: str) -> dict:
        plan = await self.supervisor.create_plan(task)

        for step in plan.steps:
            agent = self.specialists[step.agent_type]
            result = await agent.execute(step.instruction, step.context)

            # Supervisor decides next action based on result
            decision = await self.supervisor.evaluate(step, result)
            if decision.action == "retry":
                result = await agent.execute(decision.revised_instruction, step.context)
            elif decision.action == "escalate":
                result = await self.specialists[decision.escalate_to].execute(
                    decision.escalation_context, step.context
                )
            elif decision.action == "complete":
                continue

        return await self.supervisor.compile_final_result()

When to use: Complex tasks where the workflow cannot be predetermined. The supervisor adapts the plan based on intermediate results. Used by most coding agents (Devin, Claude Code, Cursor Agent).

4. Debate (Adversarial)

Two or more agents argue for different solutions. A judge agent evaluates the arguments and picks the best answer or synthesizes a consensus.

# Debate pattern: multiple agents argue, judge decides
class DebateSystem:
    def __init__(self, num_debaters: int = 3, max_rounds: int = 3):
        self.debaters = [DebaterAgent(id=i) for i in range(num_debaters)]
        self.judge = JudgeAgent()
        self.max_rounds = max_rounds

    async def run(self, question: str) -> dict:
        # Round 1: independent answers
        positions = await asyncio.gather(*[
            d.initial_position(question) for d in self.debaters
        ])

        for round_num in range(self.max_rounds):
            # Each debater sees others' positions and can revise
            revised = await asyncio.gather(*[
                self.debaters[i].revise(question, positions, i)
                for i in range(len(self.debaters))
            ])

            # Check for convergence
            if await self.judge.has_consensus(revised):
                break
            positions = revised

        # Judge renders final verdict
        return await self.judge.verdict(question, positions)

When to use: High-stakes decisions where accuracy matters more than speed. Code review, legal document analysis, fact-checking, architectural decisions.

Choosing the Right Pattern

Pattern	Latency	Accuracy	Cost	Best For
Sequential	High (serial)	Good	Moderate	Clear phase workflows (plan → build → test)
Parallel	Low (concurrent)	Good	High (many calls)	Research, data gathering, independent sub-tasks
Hierarchical	Variable	High	Variable	Complex tasks needing adaptive planning
Debate	Very high	Very high	Very high	High-stakes decisions, critical code review

📝

Production reality: Most production systems combine patterns. A coding agent might use hierarchical orchestration overall (supervisor delegates tasks) with a sequential sub-pipeline for each coding task (plan → code → test) and a debate pattern for critical decisions (architecture choices, security reviews).

Real-World Multi-Agent Systems

Coding Agents (Claude Code, Devin)

Hierarchical pattern with a supervisor that manages planning, file editing, terminal commands, and code review agents. The supervisor maintains a task list and delegates based on the current state of the codebase.

Research Workflows (Deep Research)

Parallel fan-out pattern. A planner breaks a research question into sub-queries, parallel agents search different sources, and a synthesizer combines findings with citations and confidence scores.

Customer Support Triage

Sequential pipeline: classifier agent routes the ticket, retrieval agent fetches relevant docs, response agent drafts the reply, quality agent checks for accuracy and tone before sending.

Data Pipeline Automation

Hierarchical with parallel sub-tasks. A coordinator agent manages ETL jobs, delegates data validation to specialist agents, runs quality checks in parallel, and escalates anomalies to human operators.

Key Takeaways

Multi-agent systems are for tasks that require multiple distinct capabilities, natural parallelism, or adversarial quality checking.
Four core patterns: sequential (pipeline), parallel (fan-out/fan-in), hierarchical (supervisor), and debate (adversarial).
Start with a single agent. Only split into multi-agent when you hit clear limitations in capability or quality.
Most production systems combine patterns — hierarchical overall with sequential or parallel sub-workflows.
The pattern choice depends on your latency, accuracy, and cost requirements.

What Is Next

In the next lesson, we will design individual agents — the building blocks of any multi-agent system. You will learn the ReAct pattern, how to structure agent memory and tools, and how to build a reusable agent framework in Python.

Next → Individual Agent Design