A supervisor agent is the smallest interesting multi-agent system: one LLM that reads the running state, picks which specialist should act next, and decides when the job is done. This lab builds that loop from scratch in ~60 lines of Python so you can see exactly what the LangGraph supervisor pattern compiles down to. It is the runnable companion to multi-agent systems and LangGraph multi-agent β read those for the theory, run this to feel the routing loop and the turn-budget guardrail in your hands. The structure is grounded in the LangGraph hierarchical-supervisor architecture and the profiling/specialization framing from Wang et al., "A Survey on LLM based Autonomous Agents" (arXiv:2308.11432).
Mental Model
A supervisor is a router over specialist workers: a single LLM decides who acts next and when to stop, and each worker owns one narrow capability. Routing is not control flow buried in if statements β it is a value the supervisor emits each turn: the name of a worker, or FINISH.
The survey calls the per-agent identity a profile: a coder, a writer, a domain expert, each a role written into a short, sharp prompt. The supervisor holds none of those tools. It holds only the routing decision. That separation is the whole point β it buys you modularity (each worker is a function you can test in isolation) and specialization (each worker is an expert with a tiny prompt instead of one generalist drowning in twenty tools).
Compare this to a single ReAct agent with many tools. The ReAct agent interleaves reasoning and tool calls in one context window; as the tool count grows the model confuses similar tools, the prompt bloats, and you cannot evaluate "is the retrieval step good?" separately from "is the writing good?" The supervisor splits that one context into N small ones plus a router. You pay a round-trip per step; you gain isolation and per-role evaluation. See agent orchestration for when each is the right call.
Topology
The supervisor sits at the center. Every worker reports back to it β workers never talk to each other. Each "supervisor β worker β supervisor" cycle is two logical steps; a task touching three specialists realistically runs 6β10 steps. The single edge that makes the system terminate is FINISH: the supervisor decides the task is fully addressed and emits the final answer instead of another worker name.
What each piece owns
- Supervisor. Sees the task and the transcript of work so far. Owns exactly one decision per turn: next worker, or
FINISH. Holds no domain tools. - Worker. Owns one narrow capability and its own scratchpad β whatever reasoning it does internally stays internal. It receives a task string and returns a result string. Because the contract is that small, a worker can be another agent: a ReAct agent as the mathematician, a RAG pipeline as the researcher.
- Transcript. The shared channel: an ordered log of
(worker, result)pairs the supervisor reads to route the next turn.
The guardrail: a turn budget
The classic multi-agent failure is unbounded ping-pong: a vague router prompt bounces the supervisor between two workers forever β mathematician β writer β mathematician β writer β never converging on FINISH. LangGraph guards this with a recursion_limit (default 25) that hard-stops the graph. Our lab uses the same idea under the name max_steps.
The budget is not optional polish β it is the difference between a demo and a runaway bill. Two design rules:
- Bias the prompt toward stopping. Tell the supervisor explicitly: "When the request is fully addressed, respond
FINISH." MakeFINISHa first-class option, not an afterthought. - Hard-cap regardless of the prompt. Even a well-prompted supervisor can loop on an ambiguous task. When
max_stepsis hit, stop and return the best answer so far rather than spinning. Degraded-but-bounded beats correct-but-infinite.
How the loop works
The core is a while loop, not a graph engine. Each iteration: ask the supervisor to route, run the chosen worker, append the result to the transcript, repeat β until FINISH or the budget runs out.
def run_supervisor(task, workers, max_steps=8):
transcript = []
for _ in range(max_steps):
decision = route(task, workers, transcript) # supervisor LLM call
if decision.next == "FINISH":
return {"answer": decision.answer, "transcript": transcript}
worker = workers[decision.next] # Callable[[str], str]
result = worker(decision.subtask or task) # worker owns its scratchpad
transcript.append((decision.next, result))
# budget exhausted: synthesize from what we have, never loop forever
return {"answer": finalize(task, transcript), "transcript": transcript}
route is the supervisor LLM call: it sees the task, the available worker names, and the transcript, then returns a structured decision β either a worker name plus a subtask, or FINISH plus the final answer. That structured return mirrors LangGraph's Command(goto=...): routing and payload travel together, atomically.
Note what the worker signature is: Callable[[str], str]. A string in, a string out. That is the entire contract, which is why workers compose β any callable matching it slots in.
Run it
The runnable module lives at agents-lab/agents_lab/supervisor.py. DeepSeek is the only paid API in this lab (it powers the supervisor's routing calls); the workers you pass can be plain Python functions with no API cost at all.
from agents_lab.supervisor import run_supervisor
# Workers are just Callable[[str], str]. Stub them to test routing in isolation.
workers = {
"mathematician": lambda t: "42",
"writer": lambda t: "The answer to the computation is 42.",
}
final = run_supervisor(
"Compute 6*7 then state it",
workers,
max_steps=6,
)
print(final["answer"]) # final synthesized answer
print(final["transcript"]) # [("mathematician", "42"), ("writer", "...")]
The return is a dict with answer (the supervisor's final synthesis) and transcript (the ordered list of (worker, result) tuples β your audit trail for who did what, in what order).
Because a worker is any Callable[[str], str], you can drop a real agent in where a stub was. Swap the lambda for a ReAct agent from this same lab and the mathematician becomes a genuine tool-using reasoner β the supervisor neither knows nor cares:
from agents_lab.react import run_react
def mathematician(subtask: str) -> str:
# a full ReAct agent with a calculator tool, used as one worker
return run_react(subtask, tools=["calculator"])["answer"]
workers = {
"mathematician": mathematician,
"writer": lambda t: f"Result: {t}",
}
final = run_supervisor("What is 6*7? Then phrase it nicely.", workers, max_steps=6)
This is the modularity payoff in code: you tested mathematician on its own, you tested the supervisor with stubs, and now you compose them with zero changes to either side.
From the CLI
uv run python -m agents_lab.cli supervisor "Compute 6*7 then state it"
The CLI wires up a default worker set, runs the loop, and prints the answer plus the transcript. Try a task that needs no second worker and watch the supervisor FINISH on step one; try a deliberately ambiguous task and watch max_steps save you from the ping-pong.
What to take away
- The supervisor pattern is a router LLM over specialist workers β routing is a value (worker name or
FINISH), emitted each turn. - Modularity and specialization are the wins: small worker prompts you test in isolation, versus one generalist ReAct agent holding every tool.
- Workers are
Callable[[str], str], so they compose β a ReAct agent or RAG pipeline can be a worker. - The turn budget (
max_stepshere,recursion_limitin LangGraph) is the non-negotiable guardrail against unbounded supervisorβworker loops.
For the full topology zoo β swarm, hierarchical teams, handoff mechanics, shared-vs-private state β continue to LangGraph multi-agent and the theory in multi-agent systems.