Back to Agentic Sales

Agents & Workflows

🧩 The control plane and data plane — how a new sales workflow ships as a graph plus a prompt, not a service. Built bottom-up: ELI5 first, then the system-design view, then each piece in depth with the real code.

The agentic-sales platform supports N independent sales workflows — discover, enrich, score, outreach, learn — on shared infrastructure, without standing up a service per flow. The design follows from one rule: a new workflow ships as a graph plus a prompt, not as a service.

This page covers how those workflows are defined and reached — the control plane (graph identity) and the data plane (routing and worker pools). The third plane, observability, has its own deep-dive reference.

It is built bottom-up: a plain-language ELI5 first, then the system-design view, then each piece in depth with the real code. Every part below is generated by LlamaIndex, grounded in the real source — the graph registry, the dispatcher's routing table, the typed graph client, and the architecture spec.

Explain it like I'm 5

Imagine a workshop with several specialized stations—one for cleaning parts, one for assembling, one for painting. Instead of building a separate workshop for each task, you have one workshop with shared power, supplies, and a single quality inspector. Each station is a distinct workflow: discovering companies, scoring contacts, composing emails. They all use the same infrastructure but stay isolated so a jam at the paint station doesn't halt the cleaning station. Without sharing, you'd waste resources building duplicate workshops for every task, and you'd lose the ability to see the whole production line at once. This is exactly the idea: run many independent sales workflows on one shared system.

The system-design view

The system is organized as three planes that together satisfy a non-negotiable constitutional constraint (from specs/agentic-sales/mission.md:17-27). The control plane owns graph identity via a registry and the routing contract expressed as WORKER_ROUTES. It is deliberately cheap to load — no LLM or database dependencies — because the registry is built at build time, not by compiling fifty graph modules at runtime. The data plane implements per-capability worker pools: EMAIL, CLASSIFY, DISCOVERY (extendable). A noisy or expensive graph stays inside its own pool, so the blast radius is at the pool level, not the platform. The observability plane glues the two together with a distributed run tree that crosses the TypeScript ↔ Python hop. Every cross-process trace hop carries up to four header families — traceparent and tracestate (W3C trace context injected via propagation.inject in langgraph-client.ts:217), baggage and langsmith-trace (injected via injectLangSmithHeaders() at lines 23–33) — so that one user action shows up as one debuggable thing across all three planes. On the worker side, _tracing.py:parse_inbound(headers) reads the same headers to either continue the caller’s run tree or self-root if absent.

The design explicitly rejects two alternatives. A monolith with branching prompts would mean every workflow executes in the same process; a runaway agent or a memory leak in one sales flow could take down every other flow. A microservice per flow would isolate failures perfectly, but it multiplies operational costs per flow (deploy, monitor, scale) and violates the mission’s goal: “support many independent sales workflows on shared infrastructure without standing up a service per flow.” The chosen architecture shares a single LangGraph runtime on the backend (the Render FastAPI app), but isolates failure-domain cost by routing different workflows to different worker pools. The routing contract (WORKER_ROUTES) is the only place that decides which pool handles a given request — no other path into the runtime exists. The cost axis that settles the trade-off is blast radius: a DISCOVERY graph that consumes 180 seconds of CPU does not starve an EMAIL graph, and a crash in the CLASSIFY worker does not orphan a running CAMPAIGN. The observability plane then ensures that even when a request fans out across pools, the trace is unified.

A concrete failure mode from the observability deep-dive illustrates why the three-plane boundary matters. Symptom: “Worker run shows as standalone root (not nested under TS span).” Cause: getCurrentRunTree(true) returned undefined — no LangSmith run tree was open in the Next.js scope. In this codebase the Next.js app is OTel-first, not LangSmith-first; a traceable() context is only opened when a route handler explicitly creates one (e.g., inside an ainvoke chain). When absent, the worker becomes the root of its own LangSmith run. This is not a bug — it is the intended best-effort design — but it means an engineer debugging via LangSmith alone sees an orphan worker tree and must switch to the OTel collector (which holds the traceparent-continued span chain) to reconstruct the full request. Another common breakage is “Two separate trace IDs for one user action” caused by a load balancer or proxy that strips the traceparent header before it reaches the worker; verifying that parse_inbound parses non-null headers is the first debugging step.

In depth, piece by piece

Each piece below: the plain-language take, the system-design detail, then the real code it comes from.

What a workflow is here

In plain terms. Think of each workflow as a recipe in a shared kitchen. The recipe has a stable name (the assistant ID) that never changes, like a dish name on a menu—you always order "lasagna," not "the thing in the third drawer." A GraphSpec ties that name to a recipe card (the module) and tells the kitchen whether to use a pre-made dish (compiled attribute) or a build-your-own instruction set (builder attribute). The resumable flag marks whether the dish can be half-eaten and finished later (like a paused movie) or is eaten in one go. Without this stable identity, moving a recipe to a new notebook would break every order placed by that name.

System design. Every LangGraph workflow in this system is defined by a single GraphSpec dataclass (defined in registry.py). The dataclass is frozen and carries four fields:

  • assistant_id: a public, human-readable string like "extract_stack" or "contact_vertical_fit" that serves as the stable identifier used by the TypeScript client, the /runs/wait endpoint, langgraph.json, and worker routing tables.
  • module: a dotted import path, e.g. "graphs.contact_vertical_fit_graph", pointing to the Python module that contains the graph logic.
  • compiled_attr (default "graph"): the module-level attribute name holding the precompiled CompiledGraph instance.
  • builder_attr (default "build_graph", can be None): the module-level callable that takes a checkpointer and returns a CompiledGraph. When builder_attr is None, the module is expected to expose a pre-built instance under compiled_attr built at import time without a checkpointer — used for stateless, idempotent graphs like "deep_scrape".
  • resumable (default False): controls whether a checkpointer is attached to the graph at runtime. As of the context, every registered GraphSpec sets this to False, meaning the FastAPI runtime never wires a checkpointer and every invocation gets a random UUID thread_id. The only graph that uses persistence is vertical_activation_graph, which lives outside the registry and is built via build_resumable_graph (only exercised in dry-run tests).

The concrete mechanism for turning an assistant_id into a running graph is implemented in core/app.py:_compile_one (described in the resumable docstring). It reads the GraphSpec from the in-memory GRAPHS list, imports the module, and either uses the precompiled instance at compiled_attr (if builder_attr is None) or calls module.builder_attr(checkpointer) to construct a graph. The checkpointer argument is None when resumable is False, so checkpoint_blobs and checkpoint_writes tables are never written. The runtime then invokes the graph via graph.ainvoke(graph_input) inside a trace_run context manager (as shown in observability.md), and the result is returned as JSON. The assistant_id is the only thing that identifies which graph to run — it is looked up in the registry by its string, not by import path or file path.

The trade‑off is that every workflow is addressed by a stable string rather than by its module import or file path. This decouples contract from implementation: a graph can be renamed, consolidated, or its module moved without breaking any caller. A concrete example is "extract_stack" — its GraphSpec points to graphs.employer_intel_graph because that functionality was merged into the jobs pillar, but the assistant_id remains "extract_stack" so that existing TypeScript callers never need to change. The same string is used in per‑worker allowlists (DEFAULT_CLASSIFY_WORKER_ASSISTANTS and DEFAULT_DISCOVERY_WORKER_ASSISTANTS in route_for.py) to route the graph to a specific worker pool, giving each assistant its own blast radius. The cost of this indirection is that the registry becomes a single source of truth that must be kept consistent — hence the assert at the end of registry.py that loudly fails at import time if any assistant_id is duplicated, preventing silent misrouting.

The rejected alternative would be to address graphs by their module import path or file path directly, which is exactly what the assistant_id abstraction avoids. If callers used "graphs.sales_tech_feature_graph" instead of "sales_tech_feature_graph", renaming the module would break every integration. Moreover, the builder_attr/compiled_attr pattern rejects the idea that every graph must be built the same way: some graphs are precompiled (no builder function, no checkpointer) while others are constructed on demand. The resumable flag explicitly rejects the assumption that all graphs need durability — the D1 checkpointer and compaction code are described as “effectively dormant” because no registered graph uses it, keeping storage costs predictable.

A concrete failure mode is the duplicate‑assistant_id assertion: without it, a typo that introduces two GraphSpec entries with the same id would cause the runtime to silently pick the last one registered, leading to unexpected behavior. Another edge case involves the builder_attr being None but the module not exposing a precompiled instance under compiled_attr — the import of the module itself would likely throw an AttributeError at startup, which is caught immediately because the registry is built at import time (as noted in the mission spec: “the JSON generator builds the registry without compiling fifty graph modules,” but the modules themselves are imported later). The resumable=False default also hides a storage‑bloat risk: if a graph were accidentally set to resumable=True but invoked with a random thread_id (as every non‑resumable graph is), the checkpointer would write checkpoint rows that are never read, quickly exceeding the Neon storage cap.

The GraphSpec dataclass defines a workflow's stable identity by its assistant_id, not by import path; the runtime resolves workflows solely by that ID.

python
@dataclass(frozen=True)
class GraphSpec:
    assistant_id: str  # public id used in /runs/wait, langgraph.json, TS client
    module: str  # dotted import path, e.g. "graphs.email_compose_graph"
    compiled_attr: str = "graph"  # module-level symbol referenced in langgraph.json
    builder_attr: str | None = "build_graph"
    resumable: bool = False



assert len({g.assistant_id for g in GRAPHS}) == len(GRAPHS), (
    "duplicate assistant_id in GRAPHS"
)

The graph registry — the control plane

In plain terms. Think of GRAPHS as a master key ring that holds the only official list of every graph in the codebase. When you need to add a new graph, you just add one entry to that list – that’s the only step. A safety check at the bottom silently runs as soon as the code loads, and if anyone accidentally duplicates a graph’s stable nickname, it crashes immediately so the mistake can’t hide. The list itself is kept incredibly lightweight – it never pulls in the heavy AI models or database libraries – so that a separate tool can scan every entry quickly without needing to load all the expensive stuff. Without this single registry, different parts of the system could disagree on which graphs exist or use conflicting names.

System design. The GRAPHS tuple in registry.py is the single, explicit register of every LangGraph assistant identity in the system. Each element is a GraphSpec dataclass carrying the public assistant_id, the dotted module import path, the compiled_attr symbol name (default "graph"), an optional builder_attr (default "build_graph"), and a resumable boolean. To add a graph, one "drop[s] a row in GRAPHS and run[s] make gen-langgraph-json". That Makefile target invokes backend/scripts/gen_langgraph_json.py, which reads the raw GRAPHS structure and writes core/langgraph.json — the config that the langgraph CLI and the Cloudflare Containers runtime read. Separately, the FastAPI runtime in core/app.py imports GRAPHS directly at lifespan startup and compiles each spec (calling the module's builder_attr function with a checkpointer when needed, or using the pre‑compiled compiled_attr object). The tail of registry.py enforces uniqueness with assert len({g.assistant_id for g in GRAPHS}) == len(GRAPHS) — any duplicate assistant_id raises an AssertionError at import time, preventing silent misrouting.

The design is a deliberate trade‑off between centralized governance and build‑time hygiene. The registry module is kept "dependency‑free — it must import nothing from agentic_sales.*_graph at module top level". This ensures the JSON generator can resolve every row without compiling the 50+ graphs or pulling in the optional LLM/DB imports that many graph modules carry at import time. The cost is a two‑step flow: adding a graph to GRAPHS is one line, but a separate make gen-langgraph-json step must run afterward to regenerate the JSON config. The GraphSpec fields (module, compiled_attr, builder_attr) serve as deferred import instructions: the runtime can lazily import the module only when it needs to compile the graph, keeping the registry lightweight.

A natural alternative would be to have each graph self‑register by importing its module at module top level — the pattern many monoliths use. The source explicitly rejects this: "it must import nothing from agentic_sales.*_graph at module top level so the JSON generator can build the registry without compiling 50+ graphs". That alternative would force every developer to have all LLM/DB dependencies installed just to run the JSON generator, and it would make the generator a full‑blown graph compiler. By keeping the registry pure data, the JSON generator remains a cheap, dependency‑free script.

A concrete failure mode is a typo that creates duplicate assistant_id values. Without the assertion, both runtimes would silently route to the last‑registered builder, producing unpredictable behavior for callers using the duplicate id. The assertion traps this at import time, forcing the developer to fix the row before the runtime starts. Another edge case: if a graph module changes its compiled_attr or builder_attr symbol name but the GraphSpec row is not updated, the runtime will fail to import or call the wrong function. The registry provides no schema validation beyond the duplicate check, so such mismatches surface as runtime ImportError or AttributeError when the graph is first compiled — a failure that is localised to the graph in question, but still production‑visible. The resumable flag, currently False for every graph in the shipped GRAPHS, exists to gate checkpointer attachment; flipping it to True would reactivate the dormant D1 checkpointer (infra/checkpointer.py) and compaction logic, a reminder that the registry is also the single place where such durability decisions are made.

The graph registry is a single source of truth defined as a tuple of GraphSpecs, added by inserting one row, and protected by an import-time duplicate assistant_id assertion, kept dependency-free for cheap JSON generation.

python
"""Single source of truth for the agentic-sales LangGraph registry.

... To add a graph: drop a row in ``GRAPHS`` and run ``make gen-langgraph-json``.
... Keep this module dependency-free — it must import nothing from
``agentic_sales.*_graph`` at module top level so the JSON generator can build
the registry without compiling 50+ graphs (and without dragging in optional
LLM/DB deps that some graph modules import at import time).
"""

from __future__ import annotations
from dataclasses import dataclass

@dataclass(frozen=True)
class GraphSpec:
    assistant_id: str
    module: str
    compiled_attr: str = "graph"
    builder_attr: str | None = "build_graph"
    resumable: bool = False

# GRAPHS is the tuple of all registered graphs.

# Order is presentation only; runtime resolution is by ``assistant_id``.
GRAPHS = (
    GraphSpec("sales_tech_feature_graph", "graphs.sales_tech_feature_graph"),
    # … more entries …
)

assert len({g.assistant_id for g in GRAPHS}) == len(GRAPHS), (
    "duplicate assistant_id in GRAPHS"
)

Invoking a workflow

In plain terms. Think of it like mailing a package: you write the recipient's address (the assistant's stable name) and what's inside (the input), seal it with a security sticker (the bearer secret so only authorized mail gets through), and attach a return label that includes your order tracking number (the W3C trace-context and LangSmith headers). Without that label, the warehouse wouldn't know this package belongs to your existing order—it would start a new, disconnected shipment. The client ensures every request carries both the secret and the trace headers so the backend's work nests neatly under your original job, keeping the whole pipeline linked in one view.

System design. The mechanism begins with a singleton langgraphClient constructed from @langchain/langgraph-sdk's Client. It points at LANGGRAPH_DISPATCHER_URL (Render or http://127.0.0.1:8787) and sends a static Authorization: Bearer <LANGGRAPH_DISPATCHER_SECRET> via defaultHeaders. Every invocation goes through client.runs.wait(null, assistant_id, { input }), which performs a POST /runs/wait against that backend. The critical trace‑context injection lives in the onRequest hook, called by the SDK at fetch time. Inside that hook, buildTraceCarrier() runs: it calls propagation.inject(context.active(), carrier) to write the W3C traceparent/tracestate headers from the active OpenTelemetry context, then calls injectLangSmithHeaders(carrier) which attempts getCurrentRunTree(true). If a LangSmith RunTree is active (e.g. inside a traceable() wrapper), it calls rt.toHeaders() and places the langsmith-trace and baggage headers; otherwise it silently returns. The resulting carrier is merged back into the request’s Headers. On the Python side, parse_inbound(headers) in _tracing.py reads those same two LangSmith-family headers to either continue the caller’s run tree or self‑root. The response carries x-langsmith-run-id, x-langsmith-run-url, and x-trace-id, which the onRequest‑return code writes into the active OTel span as attributes like langsmith.run_id.

The design deliberately uses the SDK’s runs.wait instead of RemoteGraph.invoke() because the backend (the Render FastAPI app at backend/app.py) exposes only the synchronous endpoint POST /runs/wait and a sibling /runs/stream, not the full Agent Protocol that RemoteGraph expects (SSE‑based threads, streaming, HITL). Using runs.wait gives a flat final state — exactly the contract the backend implements. The onRequest hook is the idiomatic way to attach per‑request context to a reused client; the alternative of building a fresh client per call would be wasteful, but simply setting headers once at construction would freeze the trace context at client creation, not at fetch time, losing the active OTel/LangSmith scope. The maxRetries: 0 and maxConcurrency: MAX_CONCURRENCY are explicit trade‑offs: retry would re‑fire expensive graph calls that have no idempotency key, and the SDK’s default concurrency of 4 would throttle the whole process. The LangSmith injection is explicitly “best‑effort” because the Next.js app is OTel‑first and only opens a LangSmith run tree when a route handler calls wrapOpenAI or traceable. The common case — no active LangSmith run — means the worker self‑roots, which the team considers acceptable.

A concrete failure mode arises when getCurrentRunTree(true) returns undefined because no traceable() context is open, which is the default for most request handlers. In that scenario, injectLangSmithHeaders exits early, so no langsmith-trace or baggage headers are set. The worker receives only W3C headers and must self‑root its LangSmith run tree, meaning the LangSmith UI shows the worker’s tree as a standalone run rather than nested under a Next.js parent span. Another edge case: if a load balancer or proxy strips the traceparent header en route, the OTel span tree collapses into flat orphans because the worker can no longer connect its span to the caller. This is documented in the failure‑mode catalog as “Two separate trace IDs for one user action.” The code handles this gracefully — the worker’s parse_inbound simply returns None when no LangSmith headers exist — but the operator must check the inbound headers on the worker to diagnose breakage. The Authorization header, if misconfigured or leaked, is also a risk: the backend compares it directly against its own LANGGRAPH_AUTH_TOKEN var, so a mismatch yields a 401 that the caller’s LangGraphError catch block must handle.

The langgraph-client initialization in index.ts configures each outbound call with the dispatcher URL, a Bearer secret, and per-request injection of W3C trace-context and LangSmith headers via buildTraceCarrier.

typescript
const langgraphClient = new Client({
  apiUrl: LANGGRAPH_DISPATCHER_URL,
  apiKey: null,
  defaultHeaders: LANGGRAPH_DISPATCHER_SECRET
    ? { Authorization: `Bearer ${LANGGRAPH_DISPATCHER_SECRET}` }
    : {},
  onRequest: (_url, init) => {
    const carrier = buildTraceCarrier();
    if (Object.keys(carrier).length === 0) return init;
    const headers = new Headers(init.headers);
    for (const [k, v] of Object.entries(carrier)) headers.set(k, v);
    return { ...init, headers };
  },
});

function buildTraceCarrier(): Record<string, string> {
  const carrier: Record<string, string> = {};
  propagation.inject(context.active(), carrier);
  injectLangSmithHeaders(carrier);
  return carrier;
}

The routing contract

In plain terms. Imagine the system as a mailroom. When a package arrives, the dispatcher checks each department’s list of accepted items and a valid room number. The first department that both accepts the item and has a working address gets the package, along with its security credentials. If no department matches, the package automatically goes to the default central office. Without this logic, packages would end up in the wrong department or get lost—the dispatcher ensures every request reaches exactly the right team, every time.

System design. The dispatcher's route_for function in route_for.py implements a simple ordered-allowlist match against a list of WorkerRoute entries. Each WorkerRoute carries a prefix string (e.g. "CLASSIFY" or "DISCOVERY"), a nullable url, a nullable secret (the bearer token for that sub-worker), and a frozenset of assistants identifiers. The function iterates through the list in order; if the route's url is non‑None and the incoming assistant_id is present in that route's assistants, it returns a Decision with that route's url and secret, plus the route's prefix as a label. A miss on every route (either because url is None or the assistant_id is not in the allowlist) falls through to a default Decision whose prefix is "CORE" and whose url and token come from the caller-supplied default_url and default_token. The routes are built by build_routes, which reads per‑worker CSV strings from environment variables (with hardcoded defaults like DEFAULT_CLASSIFY_WORKER_ASSISTANTS and DEFAULT_DISCOVERY_WORKER_ASSISTANTS) and parses them into frozenset via _parse_assistants. This is a pure‑Python, side‑effect‑free module deliberately designed to be tested with pytest without any infrastructure.

The design trades off generality for operational simplicity and testability. Because each assistant_id lives in exactly one allowlist (enforced by construction and documented in the source: “no overlap by construction”), the ordering of WorkerRoute entries matters only for explainability — the first match is always the only possible match. This eliminates any risk of ambiguous routing. The allowlists are plain comma‑separated strings read from environment variables, meaning a deploy‑time override (e.g. Wrangler vars) can ramp or rollback a single sub‑worker without touching the code module — a lightweight canary mechanism (“manual per‑assistant ramp by swapping *_WORKER_URL env vars”) as noted in observability.md. The code is deliberately “pure: no httpx, no env, no Workers globals” (from the module docstring), ensuring the routing logic is fully deterministic and unit‑testable in a standard Python environment, identical to the test harness used by the predecessor TS implementation.

The explicit alternative that this design replaced was the TS‑side routeFor in apps/agentic-sales/src/lib/langgraph-client.ts, which lived inside the Vercel–deployed frontend and was unreachable by non‑JS callers (e.g. the ai-engineer-roadmap CF Worker, bricks, or future Rust binaries). By lifting the routing matrix into a pure‑Python module in the dispatcher worker, those non‑TS callers can now forward /runs/wait requests to the correct sub‑worker pool using the same route_for contract, without importing TypeScript. Another implicit alternative — a single monolithic worker that runs all graphs — is rejected because the blast radius of a noisy graph is contained to its own pool (the data‑plane design from mission.md). The allowlist approach enforces that boundary per assistant_id.

A concrete failure mode arises when a misconfigured environment sets a sub‑worker’s url to an empty string or None while its allowlist still contains active assistants. In that case the route is effectively dead — its url is falsy, so route_for skips it entirely, and those assistants fall through to the default "CORE" route. This could silently shift traffic to the wrong worker, potentially overloading the default container and breaking performance SLAs. Another edge case: if an assistant_id were accidentally added to two allowlists (violating the “no overlap” rule), the first route in the list would win and the second would be shadowed — there is no warning or validation at runtime. The source explicitly notes that the ordering matters only for explainability because of the no‑overlap invariant, so such a mistake would be invisible until a deploy‑time audit.

The route_for function implements the routing contract: it walks ordered sub-worker routes and returns the first matching Decision, or falls back to a CORE decision.

python
def route_for(
    assistant_id: str,
    *,
    default_url: str,
    default_token: str | None,
    routes: list[WorkerRoute],
) -> Decision:
    """Pick the downstream for a /runs/wait dispatch."""
    for r in routes:
        if r.url and assistant_id in r.assistants:
            return Decision(url=r.url, token=r.secret, prefix=r.prefix)
    return Decision(url=default_url, token=default_token, prefix="CORE")

Worker pools and blast radius

In plain terms. Think of it like separate checkout lanes in a supermarket: the CLASSIFY and DISCOVERY lanes are dedicated for specific types of orders—quick classifications or complex research tasks. Every graph has exactly one lane it belongs to. If a discovery graph gets stuck or noisy (say, a customer with a price check), it only backs up its own lane; the main CORE lane keeps moving for everything else, like groceries for routine shoppers. Without this separation, one jammed graph would block the entire store—every graph, every request. The pools contain the blast radius to just that lane, not the whole platform.

System design. The per-capability sub-worker pools—CLASSIFY and DISCOVERY alongside the CORE default—are implemented through a pure routing matrix in route_for.py. The function build_routes constructs a list of WorkerRoute objects, each holding a prefix label, an optional url and secret, and a frozenset[str] of allowed assistant_id values. The core dispatch function route_for(assistant_id, *, default_url, default_token, routes) iterates over these routes: if a route has a non-None url and the assistant_id is present in that route’s assistants frozenset, it returns a Decision with that route’s URL, token, and prefix. If no route matches, the default container (CORE) is used. The allowlists are defined as module-level constants DEFAULT_CLASSIFY_WORKER_ASSISTANTS and DEFAULT_DISCOVERY_WORKER_ASSISTANTS—strings that mirror the TS WORKER_ROUTES array and are parsed by _parse_assistants into frozensets. By construction, each assistant_id lives in exactly one allowlist, so routing is deterministic and no assistant can match two sub-workers.

  • Concrete parts: WorkerRoute, Decision, build_routes, route_for, DEFAULT_CLASSIFY_WORKER_ASSISTANTS, DEFAULT_DISCOVERY_WORKER_ASSISTANTS.
  • Control flow: dispatcher calls route_for → iterates routes → first url + assistant_id in r.assistants wins → else CORE.

The trade-off is blast radius containment versus operational complexity. By isolating heavy or noisy graphs (e.g., contact_discovery under DISCOVERY, classify_paper under CLASSIFY) into dedicated worker pools, a resource spike or crash inside one pool cannot jam the entire platform—the outage stays at the pool level. This is explicitly called out in mission.md: "A noisy graph stays inside its own pool. The blast radius is at the pool level, not the platform." The downside is that operators must manage separate services, deployment pipelines, and scaling policies per pool. The system mitigates this by making the routing table entirely env-var driven (*_WORKER_URL, *_SECRET, *_assistants), so a sub-worker can be ramped, rolled back, or disabled without touching code—a manual form of the canary-and-ramp pattern noted as not yet automated in observability.md.

The rejected alternative is the monolithic default: all assistants handled by one CORE container with no sub-worker routing. The source makes this explicit because route_for has a fallback to CORE if no sub-worker’s URL is set or if the assistant_id is not in any allowlist—the default path. The three-plane constitution (control, data, observability) deliberately chose to decompose the data plane into separate pools rather than letting every graph share one process. The alternative would simplify deployment but admit a single noisy graph (e.g., a wide-network discovery) to degrade every request, including lightweight classifiers.

A concrete failure mode is silent routing to a dead sub-worker. The route_for logic inspects only the url field (presence check via if r.url and ...)—it does not probe liveness or health. If a sub-worker’s URL is configured but the service is down, the dispatcher still sends runs.wait requests there, and they fail. The CORE default would have handled those graphs had the sub-worker been unset. Another edge case: the _parse_assistants function returns an empty frozenset if the comma-separated string is empty; in that scenario no assistant_id can match that route, so all such assistants silently fall through to CORE—a safe degradation but potentially surprising to an operator expecting isolation. The system has no built-in circuit breaker or automatic re-route; the observability plane (run trees, LangSmith traces) is the only way to detect the misrouting post hoc.

Per-capability sub-worker pools defined as allowlisted assistant IDs, with route_for ensuring each assistant routes to exactly one pool, isolating blast radius.

python
DEFAULT_CLASSIFY_WORKER_ASSISTANTS = (
    "classify_paper,classify_recruitment,…,agentic_rag"
)
DEFAULT_DISCOVERY_WORKER_ASSISTANTS = (
    "consultancies_discovery,…,sales_tech_outreach"
)

@dataclass(frozen=True)
class WorkerRoute:
    prefix: str
    url: str | None
    secret: str | None
    assistants: frozenset[str]

def route_for(assistant_id: str, *, default_url: str, default_token: str | None, routes: list[WorkerRoute]) -> Decision:
    """Pick the downstream for a /runs/wait dispatch."""
    for r in routes:
        if r.url and assistant_id in r.assistants:
            return Decision(url=r.url, token=r.secret, prefix=r.prefix)
    return Decision(url=default_url, token=default_token, prefix="CORE")

See also