LangGraph Primer
🕸️ The runtime under every workflow — graphs, nodes, edges, and state, in plain language first. Every section leads with the simple take and the real code; the dense system-design detail is tucked behind a “deep dive” toggle. Select any passage to get it explained.
Key terms, in plain words
Graph — the map of one whole job — every step and the arrows between them.
Node — one step on that map; it does a single thing (load data, check safety, write a draft).
Edge — an arrow saying which step runs next. A conditional edge looks at the notepad first and picks a path.
State — the shared notepad every step reads and writes; it carries everything learned so far.
Checkpointer — saves the notepad after every step, so a crashed run resumes where it stopped instead of starting over.
Registry — the menu of all graphs — each has one stable name (its assistant_id) that callers order by.
Dispatcher & worker pools — the mailroom: it reads the graph's name and sends the run to the right team of machines, so one noisy team can't take down the others.
The three planes — control = naming graphs (the registry), data = routing runs to pools (the dispatcher), observability = one trace that follows a run across languages.
LangGraph is the runtime under every workflow on the agentic-sales platform. A workflow is a graph: small steps (nodes) wired together by edges, all passing a single shared state object as they run. Compile that graph with a checkpointer and it becomes durable — it can pause, survive a crash, and resume exactly where it stopped.
This page is a primer, built bottom-up: first the LangGraph primitives themselves — graphs, nodes, edges, state, conditional routing, durable execution — then how the platform wraps them into the three planes: the control plane (graph identity in the registry), the data plane (routing onto per-capability worker pools), and a quick bridge to the observability plane. For the registry and routing in their own right, see the Agents & Workflows reference; for the run tree, the observability deep-dive.
Each piece below leads with a plain-language ELI5, then the system-design detail, then the real code it comes from. Every part is generated by LlamaIndex, grounded in the actual source — a concrete graph, the checkpointer, the registry, the dispatcher, and the typed client — not paraphrased from memory.
Explain it like I'm 5
Think of it as a recipe written as a small map instead of a long list. Each major step is its own station—a node—where a specific task happens. The arrows between them are edges that control the order, so the work can branch, pause, or skip depending on what’s needed. Everyone shares a single notepad—the state—that gets handed along, so no step loses track of what’s already done. This lets sales workflows handle interruptions, retry a failed step, or add new capabilities without rewriting everything. A long script would be brittle: pause it and you lose your place. A graph keeps the process flexible and recoverable.
The system-design view
In depth, piece by piece
Each piece below: the plain-language take, the system-design detail, then the real code it comes from.
What a graph is
In plain terms. Imagine you're building a flowchart for an automated email follow-up. You start with a blank canvas, then add each step as a box: first gather contact info, then load the full email history, then run a safety check, then decide on a follow-up point, then compose the email. You connect these boxes in order—start at the first box, then go to the next, and so on until the end. Finally, you "compile" that flowchart into an executable program that can actually run the process step by step. Without this structured assembly, the steps would be scattered and you couldn't guarantee they run in the right sequence.
The build_graph function defines the full graph structure: creates a StateGraph, adds five nodes via add_node, wires sequential edges from START through hydrate → load_full_history → safety_gate, adds a conditional edge out of safety_gate to either skip to END or continue to derive_followup_point, another conditional edge from derive_followup_point to either skip or go to compose, and finally compiles the builder into a runnable graph.
def build_graph(checkpointer: Any = None) -> Any:
builder = StateGraph(EmailFollowupState)
builder.add_node("hydrate", hydrate)
builder.add_node("load_full_history", load_full_history)
builder.add_node("safety_gate", safety_gate)
builder.add_node("derive_followup_point", derive_followup_point, retry_policy=_LLM_RETRY)
builder.add_node("compose", compose, retry_policy=_LLM_RETRY)
builder.add_edge(START, "hydrate")
builder.add_edge("hydrate", "load_full_history")
builder.add_edge("load_full_history", "safety_gate")
builder.add_conditional_edges(
"safety_gate", _route_after_gate, {"skip": END, "continue": "derive_followup_point"}
)
builder.add_conditional_edges(
"derive_followup_point", _route_after_anchor, {"skip": END, "compose": "compose"}
)
builder.add_edge("compose", END)
return builder.compile(checkpointer=checkpointer)
State, the graph's shared notepad
In plain terms. Imagine a shared whiteboard that holds everything the workflow knows—contact info, email history, and a skip reason. Each step in the process takes the current whiteboard, reads what it needs, and writes its own piece (like "here’s the follow-up anchor") back to the same board. The board automatically merges all updates. This single board (rather than passing many separate notes) lets each step work without knowing what the other steps expect—they just agree on the board's layout. Without it, you’d need to hand off dozens of loose papers, and one missing note would break the whole handover.
Each node receives the typed EmailFollowupState and returns a dict of partial updates that are merged back — this is how the graph's shared notepad flows between nodes.
async def derive_followup_point(state: EmailFollowupState) -> dict:
threads = state.get("prior_threads") or []
point, tel = await _derive_anchor(
threads=threads,
history_total=int(state.get("history_total") or len(threads)),
sent_count=int(state.get("sent_count") or 0),
instructions=state.get("instructions") or "",
)
out: dict[str, Any] = {
"followup_point": point,
"prompt_version": PROMPT_VERSION,
"model": deepseek_model_name("standard"),
"graph_meta": {...},
}
if not point["should_follow_up"]:
out["skip_reason"] = "not_worth_following_up"
return out
Branching with conditional edges
In plain terms. Think of the graph like an assembly line with decision gates. A plain edge is a fixed conveyor belt: after one step, the next is always the same. A conditional edge is a quality-check station: it inspects the current state—say, a safety check that flags a problem—and then either sends the work forward to continue or diverts it to a skip bin to stop early. Without these explicit decision points, you'd hide those choices inside the steps themselves as if‑statements, making the flow harder to inspect and debug. The graph forces every fork to be visible, so you can see exactly where and why a run stopped short.
Conditional edges route to different nodes based on state function return value, making control flow explicit.
def build_graph(checkpointer: Any = None) -> Any:
builder = StateGraph(EmailFollowupState)
builder.add_node("hydrate", hydrate)
builder.add_node("load_full_history", load_full_history)
builder.add_node("safety_gate", safety_gate)
builder.add_node("derive_followup_point", derive_followup_point)
builder.add_node("compose", compose)
builder.add_edge(START, "hydrate")
builder.add_edge("hydrate", "load_full_history")
builder.add_edge("load_full_history", "safety_gate")
builder.add_conditional_edges(
"safety_gate", _route_after_gate, {"skip": END, "continue": "derive_followup_point"}
)
builder.add_conditional_edges(
"derive_followup_point", _route_after_anchor, {"skip": END, "compose": "compose"}
)
builder.add_edge("compose", END)
return builder.compile(checkpointer=checkpointer)
Durable execution and resume
In plain terms. Think of a graph like a multi-step recipe where each step saves a photo of the finished dish. If the power goes out mid-recipe, you can pick up from the last saved photo instead of starting over. That’s what the checkpointer does: after each step completes, it stores that step’s result in a durable database under a unique thread label. When the same thread resumes, it loads the last saved step and continues from there. Without this, any crash or long pause would force the entire workflow to restart from scratch, wasting time and resources.
Durable execution is enabled by the D1 checkpointer, which writes checkpoint state after each node—only for graphs marked resumable=True—so a re-invocation with the same thread resumes from the last completed node.
# ----------------------
# core, research, ml, scrape and outreach each opened their own AsyncConnectionPool
# + AsyncPostgresSaver against Neon — ~35 lines of near-identical pool plumbing.
# AsyncCloudflareD1Saver talks to D1 over the REST API: stateless, no pool, no
# idle-connection failure mode.
# The cost is per-write HTTP latency, acceptable because only GraphSpec.resumable
# graphs are wired with a checkpointer (the non-resumable ones still get None —
# that gating, enforced in each app's _compile_one, is the defense against the
# checkpoint-table storage blowups that previously hit the Neon cap).
Graph identity — the registry (control plane)
In plain terms. Think of the GRAPHS registry like a restaurant’s menu board that lists every dish by its public name (the assistant_id) and where in the kitchen it’s made. Adding a new dish just means writing one row on that board. The board itself is kept lightweight—no heavy cooking gear or database connections on it—so it loads instantly. A built-in check at the start rejects any duplicate dish names, preventing chaos like two different meals having the same name. Without this single source of truth, cooks would have to search the kitchen by ingredient instead of ordering by name, and duplicate names would send customers the wrong meal.
The registry’s one‑row–per‑graph GRAPHS list, the GraphSpec frozen dataclass that maps assistant_id to a module path, the import‑time guard that rejects duplicate ids, and the design rule that keeps the module cheap to load by importing no heavy dependencies.
"""Single source of truth for the agentic-sales LangGraph registry.
Both runtimes (the local ``langgraph dev`` server on :8002 and the FastAPI/
Cloudflare Containers app at ``core/app.py``) read graph identity from this
file. … Keep this module dependency-free — it must import nothing from
``agentic_sales.*_graph`` at module top level …"""
from __future__ import annotations
from dataclasses import dataclass
@dataclass(frozen=True)
class GraphSpec:
assistant_id: str # public id used in /runs/wait, langgraph.json, TS client
module: str # dotted import path, e.g. "graphs.email_compose_graph"
compiled_attr: str = "graph"
builder_attr: str | None = "build_graph"
resumable: bool = False
GRAPHS = [
GraphSpec("sales_tech_feature_graph", "graphs.sales_tech_feature_graph"),
GraphSpec("extract_stack", "graphs.employer_intel_graph",
compiled_attr="extract_stack_graph",
builder_attr="build_extract_stack_graph"),
# … many more rows, one per graph …
]
assert len({g.assistant_id for g in GRAPHS}) == len(GRAPHS), (
"duplicate assistant_id in GRAPHS"
)
Routing across worker pools (data plane)
In plain terms. Imagine the dispatcher as a mailroom that sorts every incoming package (a workflow request) by its label—the assistant id. Each label is checked against a small directory: if it matches a specialized department like "CLASSIFY" or "DISCOVERY," the mailroom sends the package to that department’s own building (its sub-worker URL) with the correct security badge (bearer token). If the label isn’t in any department’s list, the package goes to the main office (the default container). This separation means if one department makes a mess—say a noisy, slow graph—the spill stays inside that department’s walls, not the whole office.
The route_for function selects a sub-worker from an ordered allowlist for a given assistant_id, falling back to the default container when no match exists — containing a noisy graph's blast radius to its own pool.
def route_for(
assistant_id: str,
*,
default_url: str,
default_token: str | None,
routes: list[WorkerRoute],
) -> Decision:
"""Pick the downstream for a /runs/wait dispatch.
Mirrors langgraph-client.ts:104–114 — first sub-worker whose URL is set AND
whose allowlist contains ``assistant_id`` wins; otherwise the default
(container) route applies.
"""
for r in routes:
if r.url and assistant_id in r.assistants:
return Decision(url=r.url, token=r.secret, prefix=r.prefix)
return Decision(url=default_url, token=default_token, prefix="CORE")
Invoking a graph across the hop
In plain terms. The typed TypeScript client works like mailing a package with a custom form: it takes the assistant’s ID (a label for which graph to run) and the input data, then sends both to a dispatcher using a shared secret key (like a doorman’s badge) to prove it’s allowed. Before sending, it also stamps the package with two trace IDs: a W3C trace-context (like a global tracking number) and LangSmith headers (like a note saying “this package belongs to order #123”). This nesting lets the Python worker’s run become a child step inside the caller’s larger trace. Without those headers, the worker’s run would appear as an orphan—separate and unconnected—making debugging impossible.
The LangGraph Client is created with a bearer secret and an onRequest hook that injects W3C trace-context and LangSmith headers into every outbound request, including the runs.wait call.
const langgraphClient = new Client({
apiUrl: LANGGRAPH_DISPATCHER_URL,
apiKey: null,
defaultHeaders: LANGGRAPH_DISPATCHER_SECRET
? { Authorization: `Bearer ${LANGGRAPH_DISPATCHER_SECRET}` }
: {},
callerOptions: { maxRetries: 0, maxConcurrency: MAX_CONCURRENCY },
onRequest: (_url, init) => {
const carrier = buildTraceCarrier();
if (Object.keys(carrier).length === 0) return init;
const headers = new Headers(init.headers);
for (const [k, v] of Object.entries(carrier)) headers.set(k, v);
return { ...init, headers };
},
});
See also
- Agents & Workflows — graph identity and the routing contract (control + data planes) in depth.
- Observability deep-dive — the distributed run tree that ties a single user action together across the hop.
- Read the full transcript · listen to the audio guide.