01. What This Is
This platform lets one shared set of infrastructure run many independent sales workflows at once. Instead of standing up a fresh service for every new behavior, you ship two small things: a graph and a prompt. So why build it this way? The answer is cost and safety. A control plane owns every workflow identity and the single routing contract that all requests must pass through. Because that plane carries no large language model and no database, it loads in milliseconds and builds its registry without compiling dozens of heavy graph modules. The real work lives in a data plane that sorts jobs into separate worker pools. A noisy or broken graph stays trapped inside its own pool, so the blast radius of a failure ends there rather than spreading across the platform. The trade-off is honest. You give up the simplicity of one giant program, and you accept a strict rule that nothing may bypass the contract. In return, adding a new sales behavior becomes a tiny change instead of a whole new deployment, and a single misbehaving graph can never take everything down.
02. The System Design View
The system rests on three planes that work together, and each one answers a different question. The control plane answers a question of identity. It owns the workflow name and the routing contract. The data plane answers a question of placement. It sorts work into pools such as email, classify, and discovery. The observability plane answers what happened. It links every step into one trace, even as a request hops from the TypeScript client to the Python worker. Why three planes and not something simpler? Two easy choices both fail. One big program would let a single bad graph drag down every workflow. One service per workflow would pile on cost and overhead until it broke the budget. This design sits in the middle. What decides the shape is the cost of a failure, not the lines of code you save. The control plane carries no model and no database, so it loads fast and bans heavy imports inside a graph module. The data plane keeps each pool walled off, so harm stays local. The observability plane builds one run tree per request, so one click becomes one thing you can debug.
03. What A Workflow Is
A workflow is, at heart, just a single stable name. That name maps to exactly one graph definition and one expected shape of input. Nothing else in the system may invent its own entry point, and every other component refers back to this identity. The control plane owns the naming contract, and it stays deliberately lightweight. With no large language model and no database behind it, it can build its registry without compiling dozens of graph modules. Why does so much weight rest on something so small? Because a fixed identity is what makes the whole platform inspectable. Open any trace and you can see precisely which graph produced it and with which inputs. That turns debugging and cost tracking from guesswork into a simple lookup. The trade-off is discipline. You must enforce the contract strictly and forbid every side door, because one ad-hoc entry point would quietly break the guarantee. The payoff is clarity that compounds. A tiny, unchanging name ends up carrying the integrity of the entire architecture on its back.
04. The Graph Registry
The control plane keeps a registry that lists every workflow in one place, where each entry is a plain record rather than a live object. This list is built to be cheap. It pulls in no model, no database, and no heavy graph code at import time. Reading it never forces the system to compile dozens of modules. The registry is generated from a simple data file, which keeps startup fast and the source of truth flat and reviewable. But cheapness has a price, and the price is strictness. When you strip away the machinery, correctness has to come from rules instead. The sharpest rule guards names. If two workflows ever claim the same name, the registry does not quietly let one shadow the other. It fails loudly the moment the list is read, surfacing the clash before any request can be misrouted. That early, noisy failure is the feature, not a bug. It forces graph identity to stay unique and stable, and it protects the routing contract that every call depends on. It also bans module-level imports inside graph submodules, so the cheap registry never accidentally drags in the expensive world.
05. Invoking A Workflow
A caller never talks to a graph directly. It goes through a typed client, names the specific workflow it wants, hands over an input, and waits for a single flat result. Every workflow is reached this same uniform way. Adding a new one needs no fresh wiring; you simply mirror the pattern that the email migration already established. The client does quiet but crucial extra work along the way. It attaches tracing spans and propagation headers and carries them across the wire to the Python worker. That is what lets one user action surface as a single distributed trace spanning both TypeScript and Python. Here is the catch, and it is the whole point of the rule. If a teammate bypasses this canonical function and fires a raw request instead, the headers vanish and the trace silently breaks in half. So the trade-off is a small loss of freedom for a large gain in visibility. Always route through the one function, accept that the pattern is shaped by convention rather than enforced by a compiler, and every invocation stays observable from request to final answer.
06. The Routing Contract
Exactly one rule decides how an incoming request turns into a running workflow. That rule lives in the cheap control plane, the layer that carries no model and no database. The rule resolves three things at once: which graph to run, which worker pool should run it, and what inputs to pass along. Every request flows through this single contract. None are exempt. It can look like a bottleneck, and in a sense it is one on purpose. By funneling everything through one place, the system can promise that a failure stays boxed inside its pool. A runaway discovery workflow cannot bleed into the email workflows, because routing already separated them. The trade-off is real and worth naming. A single routing point adds a little coordination overhead. It is also one more thing that must stay correct. In exchange, it stops a single misbehaving graph from taking down the entire platform, and it keeps each pool independent and decoupled. That one contract is the spine of the design. It is what makes the architecture inspectable and its failure modes nameable, because no request is ever allowed to sneak around it.
07. Worker Pools And Blast Radius
The data plane splits its work into separate pools, with one pool for classification, one for email composition, and one for discovery. Each pool runs its own worker, with its own environment and its own project inside the tracing system. A noisy or broken workflow stays trapped where it started and cannot spill sideways into its neighbors. Crucially, the pools are also tuned independently. A high-priority assistant such as email composition can keep full tracing turned on. A bulk assistant such as country classification samples only a fraction of its runs, holding down cost and noise. That independence is exactly what bounds the damage from a mistake. If someone points a pool at the wrong trace project, only that one pool goes dark, and every other pool keeps serving traffic. So the trade-off lands in your favor. You maintain several sets of worker files and environment variables instead of one, and in return no single failure can reach across the platform. The blast radius of any problem ends at the pool that caused it, which is the whole reason the work was divided this way.