Lesson 12 of 19 in Phase 4 · Agents & Orchestration

Supervisor Multi-Agent Lab: Routing Work Across Specialist Workers

🤖 Phase 4 · Agents & OrchestrationIntermediate~7 min read

Recommended prerequisite:#103 LATS: Tree Search Agents with Monte-Carlo Tree Search

← PreviousLATS: Tree Search Agents with Monte-Carlo Tree Search Next →Tree of Thoughts: Deliberate Search Over Reasoning Steps

A supervisor agent is the smallest interesting multi-agent system: one LLM that reads the running state, picks which specialist should act next, and decides when the job is done. This lab builds that loop from scratch in ~60 lines of Python so you can see exactly what the LangGraph supervisor pattern compiles down to. It is the runnable companion to multi-agent systems and LangGraph multi-agent — read those for the theory, run this to feel the routing loop and the turn-budget guardrail in your hands. The structure is grounded in the LangGraph hierarchical-supervisor architecture and the profiling/specialization framing from Wang et al., "A Survey on LLM based Autonomous Agents" (arXiv:2308.11432).

Mental Model

A supervisor is a router over specialist workers: a single LLM decides who acts next and when to stop, and each worker owns one narrow capability. Routing is not control flow buried in if statements — it is a value the supervisor emits each turn: the name of a worker, or FINISH.

The survey calls the per-agent identity a profile: a coder, a writer, a domain expert, each a role written into a short, sharp prompt. The supervisor holds none of those tools. It holds only the routing decision. That separation is the whole point — it buys you modularity (each worker is a function you can test in isolation) and specialization (each worker is an expert with a tiny prompt instead of one generalist drowning in twenty tools).

Compare this to a single ReAct agent with many tools. The ReAct agent interleaves reasoning and tool calls in one context window; as the tool count grows the model confuses similar tools, the prompt bloats, and you cannot evaluate "is the retrieval step good?" separately from "is the writing good?" The supervisor splits that one context into N small ones plus a router. You pay a round-trip per step; you gain isolation and per-role evaluation. See agent orchestration for when each is the right call.

Topology

The supervisor sits at the center. Every worker reports back to it — workers never talk to each other. Each "supervisor → worker → supervisor" cycle is two logical steps; a task touching three specialists realistically runs 6–10 steps. The single edge that makes the system terminate is FINISH: the supervisor decides the task is fully addressed and emits the final answer instead of another worker name.

What each piece owns

Supervisor. Sees the task and the transcript of work so far. Owns exactly one decision per turn: next worker, or FINISH. Holds no domain tools.
Worker. Owns one narrow capability and its own scratchpad — whatever reasoning it does internally stays internal. It receives a task string and returns a result string. Because the contract is that small, a worker can be another agent: a ReAct agent as the mathematician, a RAG pipeline as the researcher.
Transcript. The shared channel: an ordered log of (worker, result) pairs the supervisor reads to route the next turn.

The guardrail: a turn budget

The classic multi-agent failure is unbounded ping-pong: a vague router prompt bounces the supervisor between two workers forever — mathematician → writer → mathematician → writer — never converging on FINISH. LangGraph guards this with a recursion_limit (default 25) that hard-stops the graph. Our lab uses the same idea under the name max_steps.

The budget is not optional polish — it is the difference between a demo and a runaway bill. Two design rules:

Bias the prompt toward stopping. Tell the supervisor explicitly: "When the request is fully addressed, respond FINISH." Make FINISH a first-class option, not an afterthought.
Hard-cap regardless of the prompt. Even a well-prompted supervisor can loop on an ambiguous task. When max_steps is hit, stop and return the best answer so far rather than spinning. Degraded-but-bounded beats correct-but-infinite.

How the loop works

The core is a while loop, not a graph engine. Each iteration: ask the supervisor to route, run the chosen worker, append the result to the transcript, repeat — until FINISH or the budget runs out.

python

def run_supervisor(task, workers, max_steps=8):
    transcript = []
    for _ in range(max_steps):
        decision = route(task, workers, transcript)   # supervisor LLM call
        if decision.next == "FINISH":
            return {"answer": decision.answer, "transcript": transcript}
        worker = workers[decision.next]               # Callable[[str], str]
        result = worker(decision.subtask or task)     # worker owns its scratchpad
        transcript.append((decision.next, result))
    # budget exhausted: synthesize from what we have, never loop forever
    return {"answer": finalize(task, transcript), "transcript": transcript}

route is the supervisor LLM call: it sees the task, the available worker names, and the transcript, then returns a structured decision — either a worker name plus a subtask, or FINISH plus the final answer. That structured return mirrors LangGraph's Command(goto=...): routing and payload travel together, atomically.

Note what the worker signature is: Callable[[str], str]. A string in, a string out. That is the entire contract, which is why workers compose — any callable matching it slots in.

Run it

The runnable module lives at agents-lab/agents_lab/supervisor.py. DeepSeek is the only paid API in this lab (it powers the supervisor's routing calls); the workers you pass can be plain Python functions with no API cost at all.

python

from agents_lab.supervisor import run_supervisor

# Workers are just Callable[[str], str]. Stub them to test routing in isolation.
workers = {
    "mathematician": lambda t: "42",
    "writer": lambda t: "The answer to the computation is 42.",
}

final = run_supervisor(
    "Compute 6*7 then state it",
    workers,
    max_steps=6,
)

print(final["answer"])      # final synthesized answer
print(final["transcript"])  # [("mathematician", "42"), ("writer", "...")]

The return is a dict with answer (the supervisor's final synthesis) and transcript (the ordered list of (worker, result) tuples — your audit trail for who did what, in what order).

Because a worker is any Callable[[str], str], you can drop a real agent in where a stub was. Swap the lambda for a ReAct agent from this same lab and the mathematician becomes a genuine tool-using reasoner — the supervisor neither knows nor cares:

python

from agents_lab.react import run_react

def mathematician(subtask: str) -> str:
    # a full ReAct agent with a calculator tool, used as one worker
    return run_react(subtask, tools=["calculator"])["answer"]

workers = {
    "mathematician": mathematician,
    "writer": lambda t: f"Result: {t}",
}
final = run_supervisor("What is 6*7? Then phrase it nicely.", workers, max_steps=6)

This is the modularity payoff in code: you tested mathematician on its own, you tested the supervisor with stubs, and now you compose them with zero changes to either side.

From the CLI

bash

uv run python -m agents_lab.cli supervisor "Compute 6*7 then state it"

The CLI wires up a default worker set, runs the loop, and prints the answer plus the transcript. Try a task that needs no second worker and watch the supervisor FINISH on step one; try a deliberately ambiguous task and watch max_steps save you from the ping-pong.

What to take away

The supervisor pattern is a router LLM over specialist workers — routing is a value (worker name or FINISH), emitted each turn.
Modularity and specialization are the wins: small worker prompts you test in isolation, versus one generalist ReAct agent holding every tool.
Workers are Callable[[str], str], so they compose — a ReAct agent or RAG pipeline can be a worker.
The turn budget (max_steps here, recursion_limit in LangGraph) is the non-negotiable guardrail against unbounded supervisor↔worker loops.

For the full topology zoo — swarm, hierarchical teams, handoff mechanics, shared-vs-private state — continue to LangGraph multi-agent and the theory in multi-agent systems.

Continue Learning

← PreviousLATS: Tree Search Agents with Monte-Carlo Tree Search Next →Tree of Thoughts: Deliberate Search Over Reasoning Steps

On this page