LangGraph deep dive

Agents with LangGraph

🎧 Listen: complete audio guide — all three LangGraph pages in one narration

An agent is a language model with tools in a loop: the model looks at the goal, picks a tool, reads the result, and decides the next step itself — the model owns the control flow, not your code. LangGraph is the runtime that makes that loop production-grade: an explicit graph you can checkpoint, stream, and inspect.

This page covers the fundamentals — what an agent is, the agent loop, tools, state and short-term memory, and the graph underneath. When you're ready for long-running autonomy — durable execution, human-in-the-loop guardrails, long-term memory, multi-agent systems — continue to Autonomous agents. For the full LangGraph deep-dive and audio guide, see LangGraph.

Each piece below leads with a plain-language ELI5, then the system-design detail, then real example code. Every part is generated by LlamaIndex, grounded in the official LangGraph and LangChain agents documentation — not paraphrased from memory.

Explain it like I'm 5

Think of a regular program like a fixed recipe: you follow each step exactly once, from start to finish. An AI agent is more like a chef who cooks by tasting as they go. The chef looks at the goal—say, a perfectly seasoned soup—then picks a tool (salt shaker), adds a pinch, tastes the result, and decides: “more salt” or “that’s good.” This loop repeats until the soup is right. The agent works the same way: the language model picks a tool, sees what came back, and decides what to do next on its own. Without this loop, the model would follow a fixed script and never adjust. Giving it tools lets it actually change the world—like adding salt—instead of just talking about it.

The system-design view

The agent architecture described in langgraph-workflows-agents.md is built manually on top of a StateGraph with MessagesState. The concrete mechanism is a three‑element loop:

  • llm_call node – a function that invokes a model with tools bound (llm_with_tools). It receives the current MessagesState (a list of messages) and returns an LLM response that may contain tool_calls.
  • ToolNode node – a prebuilt node that receives tool_calls from the last message and executes the corresponding tools in parallel, injecting their results back into the state.
  • Conditional edge should_continue – a function that inspects state["messages"][-1].tool_calls. If the list is non‑empty it routes to "tool_node"; otherwise it routes to END.

The fixed edge "tool_node" → "llm_call" closes the loop. Execution follows LangGraph’s message‑passing super‑step model: each node activation corresponds to a super‑step, and nodes that run in parallel (e.g., tool executions) belong to the same super‑step. The state accumulates monotonically via add_messages. Tools are bound to the model before the graph runs; the model decides which tools to call based on its prompt and the conversation history.

The trade‑off is between a fixed workflow (developer‑defined control flow) and an agent (model‑defined control flow). The source explicitly contrasts: “Workflows have predetermined code paths … Agents are dynamic and define their own processes and tool usage.” The agent loop is worth its unpredictability because it lets a model handle arbitrary sequences of tool calls without the developer having to enumerate every branch. However, that unpredictability introduces non‑determinism, higher latency (sequential LLM calls), and potential for runaway loops. The same source notes that create_agent from LangChain provides a higher‑level harness for exactly this pattern, but the low‑level StateGraph version gives full control over the loop structure.

A rejected alternative implied by the source is a fixed‑workflow graph where the developer hard‑codes the order of tool calls and does not give the model a choice to stop or loop. That approach is simpler to debug and cheaper (fewer LLM calls), but it cannot adapt to unexpected user inputs or tool results. The agent pattern sacrifices deterministic execution for flexibility. The source also hints that a single agent with too many tools degrades decision quality, which is why multi‑agent patterns (routers, subagents) exist as a fallback.

Concrete failure modes and edge cases:

  • Infinite loop: If the model keeps emitting tool_calls (e.g., due to a prompt that does not terminate, or a tool that always returns a value that triggers further calls), the graph will loop forever. The should_continue edge only checks for tool_calls; there is no built‑in recursion limit in the shown code. Real deployments would need a max_iterations guard.
  • Unbounded state growth: The MessagesState accumulates every message (LLM response and tool result). Over many iterations this can exceed context windows or memory budgets. The pattern lacks a compaction or windowing strategy.
  • Tool execution failures: The ToolNode handles parallel execution and error injection, but if a tool raises an unhandled exception, the node may fail and the graph must rely on LangGraph’s persistence to retry or recover.
  • Model‑defined control flow unpredictability: Because the model decides when to stop, two runs with identical inputs can produce different numbers of loop iterations, making latency and cost hard to bound.

In depth, piece by piece

Each piece below: the plain-language take, the system-design detail, then real example code from the official docs.

What an agent is

In plain terms. Imagine a chef versus a conveyor belt. A workflow is like a conveyor belt in a factory: a fixed sequence of stations, each doing one job, always in the same order. An agent is like a chef at a kitchen station who has a set of tools—knives, pans, a stove—and decides for herself which tool to use next, whether to keep cooking or stop when the dish is done. Using create_agent is like hiring that chef: you give her the tools and a starting instruction, then she chooses her own path. Without this independence, you’d have to predict every step upfront, making it impossible to handle surprises or complex tasks.

System design. In LangChain and LangGraph, an agent is defined as “a model calling tools in a loop until a given task is complete,” where the model—not the developer—chooses which tool to invoke next and when to stop. The concrete entry point is create_agent, a highly configurable harness that wraps the model with its prompt and tool set. Under the hood, create_agent instantiates a loop: the model receives the current conversation state (messages, tool results, intermediate steps), decides whether to emit a final answer or invoke a tool, and if a tool is called, the harness executes it and feeds the result back into the conversation as a new message. This cycle continues until the model produces a natural‑language response. The harness’s job is to get the model “the right context at the right time” by composing middleware—such as SummarizationMiddleware, MemoryMiddleware, and SkillsMiddleware—that manage context‑window limits, load persistent knowledge across sessions, and surface domain‑specific knowledge on demand. In contrast, a workflow (as used in LangGraph) has “predetermined code paths … designed to operate in a certain order”; the developer fixes the graph’s Nodes and Edges at build time, so control flow is deterministic rather than model‑driven.

The trade‑off is between determinism and flexibility. A workflow is predictable, easier to debug, and guarantees a fixed sequence of steps—ideal when the task structure is known at design time. An agent, however, can adapt to open‑ended tasks, deciding its own plan as it observes intermediate results. The reason create_agent and its harness exist is precisely to manage the instability that comes with that freedom: as the agent runs, “accumulating history, tool results, and intermediate steps—that window fills,” requiring summarization and memory to prevent overflow. The harness centralizes these concerns so the model can focus on reasoning rather than housekeeping. The rejected alternative is to force the model into a fixed workflow even when the problem space is unknown; the documentation explicitly notes that building a custom LangGraph graph with fixed edges is a different pattern (e.g., for “custom workflow” in multi‑agent systems), and that not every complex task needs an agent loop—sometimes a single agent with the right tools is sufficient.

A concrete failure mode is context‑window overflow, where the accumulating history exceeds the model’s token limit. Without intervention, earlier tool calls or user instructions are silently truncated, degrading decision quality. The harness addresses this with SummarizationMiddleware, which “compresses history before overflow hits,” and MemoryMiddleware, which loads persistent instructions at startup “so knowledge carries across sessions.” Another edge case is poor tool selection: the documentation warns that a single agent with “too many tools … makes poor decisions about which to use.” This can lead to excessive or irrelevant calls, wasting tokens and latency. The harness does not solve this directly—it is a design constraint—but the multi‑agent section suggests splitting tools across subagents or using a router pattern as an alternative when the tool set is large.

Creating a simple agent with a model and tools.

python
from langchain.agents import create_agent

agent = create_agent("openai:gpt-5.4", tools=tools)

Workflows versus agents

In plain terms. Think of workflows like a cooking recipe: you follow fixed steps—mixing, baking, decorating—in a set order to get a consistent cake. The developer decides every step, and the LLM is just one ingredient in that process. Agents are like a chef who sees the kitchen, picks tools, and decides the order as they go—perfect for when you don’t know the exact dish. Use workflows when you need predictable, repeatable results (like a cake recipe). Use agents when problems are open-ended and require flexible, improvised problem-solving. Without the right choice, you’d either get a chaotic kitchen or a rigid recipe that can’t handle surprises.

System design. The spectrum between workflows and agents in LangGraph is a conscious design axis: workflows enforce developer-owned control flow where LLM calls are embedded as nodes inside a StateGraph, while agents grant the LLM dynamic control over tool usage and process sequencing. In a workflow, every step—llm_call_1, llm_call_2, llm_call_3—is a fixed node; routing between them is governed by a conditional edge function like route_decision that inspects a structured decision field produced by an augmented LLM (llm.with_structured_output(Route)). For parallelization, the orchestrator node generates a list of Section objects, and assign_workers uses the Send API to dynamically spawn llm_call workers, each receiving its own WorkerState. The synthesizer node then aggregates completed_sections via operator.add into final_report. In the agent pattern, the LLM is given tools (e.g., multiply), and the should_continue edge checks last_message.tool_calls to decide whether to route to tool_node or END. The agent loops until the LLM chooses to reply—no developer‑written route_decision function prescribes the next step.

The trade‑off is predictability vs. flexibility. Workflows excel when subtasks are known and repeatable—prompt chaining, parallelization, routing, and orchestrator‑worker all produce deterministic, testable execution paths. The orchestrator pattern, for example, is favored when subtasks “cannot be predefined the way they can with parallelization,” yet still supports bounded parallelism via Send. Agents, by contrast, are designed for “open‑ended problems where solutions are unpredictable.” They trade deterministic control for autonomous decision‑making: the LLM can decide to call zero, one, or many tools, and the should_continue edge only halts when the model emits a non‑tool response. This autonomy introduces nondeterminism and makes reproducibility harder, but is necessary for tasks where the path cannot be enumerated in advance.

The context explicitly implies a rejected alternative to multi‑agent orchestration: “a single agent with the right (sometimes dynamic) tools and prompt can often achieve similar results.” Rather than decomposing work into separate agents with their own state and context, developers are encouraged to first try a single agent with a well‑curated tool set. Similarly, the orchestrator‑worker pattern is contrasted with plain parallelization: when subtasks are predefined, the simpler parallelization pattern suffices; when they are dynamic, the orchestrator‑worker (with Send and per‑worker state) is the appropriate choice. The router pattern (Route.step: Literal["poem", "story", "joke"]) is a hard‑coded alternative to letting the agent freely decide—it forces the LLM’s output into a closed set of next steps, trading flexibility for guaranteed structure.

Concrete failure modes emerge from the design choices. In a router workflow, if the LLM fails to produce a valid Route.step (e.g., due to schema drift or malformed output), the conditional edge function would have no matching branch, causing the graph to stall or raise an error. In an agent loop, the should_continue edge relies entirely on the LLM’s tool_calls; an LLM that incorrectly calls tools in an infinite loop (e.g., always calling multiply without ever replying) would never reach END, consuming unbounded token budgets. The evaluator‑optimizer workflow mitigates this by introducing a grader (evaluator) with a grade field, but without a human‑in‑the‑loop (interrupts) or a maximum iteration limit, the loop can also run forever if the evaluator never judges the output “funny”—an edge case explicitly shown in the source. Both patterns demand careful edge‑case handling: workflows need exhaustive route branches, and agents need budgeted tool‑call limits or human oversight.

Agent with tool-calling loop and conditional routing

python
def should_continue(state: MessagesState) -> Literal["tool_node", END]:
    messages = state["messages"]
    last_message = messages[-1]
    if last_message.tool_calls:
        return "tool_node"
    return END

agent_builder = StateGraph(MessagesState)
agent_builder.add_node("llm_call", llm_call)
agent_builder.add_node("tool_node", tool_node)
agent_builder.add_edge(START, "llm_call")
agent_builder.add_conditional_edges("llm_call", should_continue, ["tool_node", END])
agent_builder.add_edge("tool_node", "llm_call")
agent = agent_builder.compile()

The agent loop

In plain terms. Think of the agent like a helper who can either answer your question directly or use a calculator. The loop works like this: the helper reads the whole conversation history, then decides. If they say "I need to calculate something," they use the calculator tool, get the result, and then re‑read the conversation with that new information. They keep doing this until they can give you a final answer without needing any more tools. Without this loop, the helper would either never use the calculator or would use it just once and get stuck. The loop ends naturally when the helper responds directly.

System design. The core ReAct-style agent loop is implemented as a stateful graph whose control flow is driven entirely by the model’s output. The entry point is a call to create_agent, which assembles a harness comprising a model, a set of tools, and optional middleware. At each iteration the model receives the full message history (system prompt, user messages, previous tool results) and decides one of two actions:

  • If the model emits tool_calls (structured requests with a tool name and arguments), the harness routes execution to the tool engine. In LangGraph-based agents this is the ToolNode, which invokes each tool and returns the results as tool messages appended to the state.
  • If the model responds without tool_calls, the loop terminates and that response becomes the agent’s final answer.

The termination condition is explicit: a conditional edge function (in the Graph API example it is should_continue) inspects last_message.tool_calls. If tool calls exist, the edge directs to tool_node; otherwise it returns END. After tool_node executes, a fixed edge returns to llm_call, feeding the updated state (history plus tool results) back into the model. The graph compiles into a compiled agent that runs as a super‑step sequence: each super‑step sees the model process the state, potentially trigger tool execution in parallel, and then advance to the next super‑step.

Why this loop? The design trades deterministic control for flexibility. By letting the model autonomously decide when to call tools and which tools to call, the agent can handle open‑ended tasks without a hard‑coded workflow. The harness (model + prompt + tools + middleware) ensures the right context is always presented. Summarization middleware (SummarizationMiddleware) compresses history before the context window overflows, and skills middleware (SkillsMiddleware) injects domain knowledge on demand rather than loading everything upfront. This keeps the loop stable even as the state grows.

The rejected alternative is a workflow – a graph with predetermined code paths and a fixed execution order. As noted in langgraph-workflows-agents.md, workflows are designed to “operate in a certain order” whereas agents “define their own processes and tool usage.” Workflows are more predictable and easier to debug but cannot adapt to novel tool‑use patterns. The agent loop sacrifices that predictability for generality: the same harness can answer a simple query or orchestrate a multi‑step data‑gathering plan, all within the same loop.

Failure modes and bounds.

  • Context overflow: Each iteration appends the model’s response and any tool results to the message history. Without compression the context window fills, causing either truncation or errors. The harness addresses this via summarization middleware, but if that middleware is missing or misconfigured, the agent will fail after enough turns.
  • Indefinite looping: The only natural termination signal is the model deciding to stop calling tools. If the model gets stuck in a cycle (e.g., calling a tool that returns an error, then retrying indefinitely) the loop never reaches END. The harness should impose a hard iteration cap (e.g., via middleware or graph configuration), but the provided source does not document such a cap; in its absence the agent may run until the context window fills or an external timeout kills the process.
  • Tool errors: Errors during tool execution are caught by middleware (tool error handling is “configured through middleware”) and must be surfaced as tool messages so the model can decide how to proceed. Without proper error middleware, a failing tool could produce malformed messages that break the loop.

Streaming agent loop showing model responses and tool calls

python
for chunk in agent.stream({
    "messages": [{"role": "user", "content": "Search for AI news and summarize the findings"}]
}, stream_mode="values"):
    latest_message = chunk["messages"][-1]
    if latest_message.content:
        if isinstance(latest_message, AIMessage):
            print(f"Agent: {latest_message.content}")
    elif latest_message.tool_calls:
        print(f"Calling tools: {[tc['name'] for tc in latest_message.tool_calls]}")

Tools, the agent's hands

In plain terms. Think of the model as a helpful assistant and each tool as a specialist it can call. You define a specialist by giving it a clear name, a short description of what it does, and the exact kind of information it needs (like a city name or numbers). The assistant learns about all available specialists and, based on your request, decides which one to call and what details to provide. When the specialist returns an answer—or an error if something goes wrong—that result is sent back as a message the assistant can read. This way, if the specialist fails, the assistant can try again or let you know what went wrong.

System design. The core mechanism begins with the @tool decorator. When you decorate a function like search_database(query: str, limit: int = 10) -> str, LangChain extracts the docstring as the description the model sees and the type hints as the argument schema. The model receives the tool’s name, description, and argument schema as part of its system prompt or tool definitions. At inference time, the model decides to invoke the tool by emitting a structured tool call containing an id, name, and args. This call is routed to a tool executor – in LangGraph workflows, this is typically the prebuilt ToolNode, which handles parallel execution and state injection automatically. The tool’s return value becomes a ToolMessage that is fed back into the conversation so the model can reason about the result. Error handling is configured through middleware; for example, wrapping a tool call with wrap_tool_call allows you to catch exceptions and return a ToolMessage with an error string, which the model can then use to correct its next action. A concrete middleware example in the source is handle_tool_errors, which converts any Exception into a ToolMessage whose content reads "Tool error: Please check your input and try again. ({e})".

The design trades off simplicity for flexibility and robustness. By keeping tool definitions schema-only on the server and executing the actual logic either in‑process or remotely (as with headless tools), the system separates what the model reasons about (the schema) from where the work happens. This allows the same agent definition to work with server-side code, client-side browser APIs, or even human-in-the-loop steps. The middleware layer for error recovery is a direct consequence: instead of crashing the agent graph on a malformed input or transient failure, the model receives a structured error message and can retry with corrected arguments, change strategy, or explain the failure to the user. The trade-off is additional complexity in defining middleware and handling the interrupt/resume pattern for headless tools, but it avoids the brittleness of a single‑process execution model.

The primary alternative described in the source is server‑side tool use: some chat models (e.g., those with built‑in web search or code interpreters) execute tools entirely on the model provider’s side. In that pattern, you do not define or host the tool logic; the model provider runs it and returns the result. This is contrasted with the explicit, user‑defined @tool approach, which gives you full control over the implementation but requires you to manage execution and error handling. Another variant is the headless tool pattern, where only the schema is registered on the server; the implementation lives on the client (e.g., browser) and runs after an interrupt/resume handshake. Ordinary tools (the standard @tool decorator) run inside the server process, while headless tools use a payload shaped like {'type': 'tool', 'tool_call': {'id', 'name', 'args'}} to pause the graph, execute elsewhere, and resume.

A concrete failure mode occurs when the model calls a tool with invalid arguments – e.g., passing a non‑numeric string to a calculator tool. Without middleware, this would raise an unhandled exception and break the agent loop. With the handle_tool_errors middleware in place, the exception is caught and converted into a ToolMessage containing the error description. The model then sees that message and can choose to re‑call the tool with properly formatted arguments, ask the user for clarification, or fall back to a different tool. An edge case arises with headless tools: if the client never implements the tool or the interrupt/resume handshake fails, the graph remains paused indefinitely. The source notes that for browser‑based flows you can “mirror the schema in the frontend and attach .implement(...) there” to avoid this, but the Python side has no .implement() API – the tool is truly schema‑only (HeadlessTool), and the server must rely on the client to resume the graph.

Tool error handling with middleware returns a ToolMessage so the model can recover.

python
@wrap_tool_call
def handle_tool_errors(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], ToolMessage],
) -> ToolMessage:
    """Convert tool exceptions into ToolMessages the model can handle."""
    try:
        return handler(request)
    except Exception as e:
        return ToolMessage(
            content=f"Tool error: Please check your input and try again. ({e})",
            tool_call_id=request.tool_call["id"],
        )

agent = create_agent(
    model="ollama:devstral-2",
    tools=[],
    middleware=[handle_tool_errors],
)

State and short-term memory

In plain terms. Think of each conversation as a shared notebook where every message you send and every response the agent gives gets written down. The agent uses this list of past messages to stay on topic and remember what you said earlier. To make sure that notebook doesn’t disappear when you pause the chat, you attach a checkpointer that saves it under a unique label (like a thread ID). Without that, starting a new turn would be like opening a blank notebook every time—you’d lose all context. But if the notebook grows too long, it won’t fit in the agent’s short‑term memory, so you must trim or summarize old pages to keep things working.

System design. The agent's short-term memory is implemented as a thread-scoped message-list state that accumulates every HumanMessage, AIMessage, and ToolMessage across a single conversation. The state is defined by LangGraph’s MessagesState schema, and the agent’s execution graph reads this state at the start of each step and writes to it after each invocation or tool call. Persistence is achieved by attaching a checkpointer — a BaseCheckpointSaver implementation such as InMemorySaver — to the agent at construction via the checkpointer parameter of create_agent(...). The caller then addresses a specific conversation by passing a thread_id inside the configurable dict of the invocation config, e.g., {"configurable": {"thread_id": "1"}}. The checkpointer serializes the entire state (including the message list) to a database (or in-memory store) keyed by thread ID. On every subsequent agent.invoke(...) with the same thread_id, the checkpointer reads the prior state back, so the agent “remembers” the full history without any explicit memory logic in the system prompt. This mechanism is the concrete control flow: state → checkpointer → resume.

The trade-off is between faithful history preservation and context-window pressure. By storing the raw message list as state, the agent never discards information automatically, which is ideal for correctness in multi-turn dialogues. However, long conversations produce a message list that can easily exceed an LLM’s context window, causing truncation, costly token usage, and degraded performance — the source explicitly warns that LLMs “get ‘distracted’ by stale or off-topic content” and suffer “slower response times and higher costs.” The design chooses to let developers explicitly manage this tension by adding middleware (e.g., trim_messages) or summarization techniques (e.g., SummarizationMiddleware) to compress the history before overflow. This keeps the core mechanism simple and general, while pushing complexity to optional, composable components.

The key rejected alternative is long-term memory (as implemented via LangGraph store). Long-term memory stores structured JSON documents in custom namespaces (e.g., (user_id, "memories")) and is explicitly scoped across threads, not within a single conversation. Using long-term memory for conversation history would require the agent to explicitly decide what to save, lose the sequential ordering inherent in a message list, and break the automatic resumption of a thread — the agent would have to look up past messages on every turn. The source draws a clear distinction: short-term memory is for “remembering previous interactions within a single thread,” while long-term memory is for “information across different conversations and sessions.” This separation of concerns keeps the chat loop simple and the memory system predictable.

Concrete failure modes arise from unbounded growth of the message list. If the total token count exceeds the model’s context window, the API may silently truncate the prompt (dropping the oldest messages) or raise an error, causing either context loss or outright failure. Even without hard truncation, as the message list grows, the LLM’s attention becomes diluted: the source states that “most LLMs still perform poorly over long contexts … they get ‘distracted’ by stale or off-topic content.” Another edge case occurs when the checkpointer is not configured at all — for example, if checkpointer is omitted from create_agent(...), then every invocation starts with an empty state, and the agent has no memory between turns even if the same thread_id is reused. This is a silent misconfiguration that can be hard to debug because the agent will respond as if the conversation is brand new.

Agent remembers user's name across conversation turns using a checkpointer and thread_id, while trimming messages to stay within the context window.

python
from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver

agent = create_agent(
    "gpt-5-nano",
    tools=[],
    middleware=[trim_messages],
    checkpointer=InMemorySaver()
)

config = {"configurable": {"thread_id": "1"}}

agent.invoke({"messages": "hi, my name is bob"}, config)
agent.invoke({"messages": "write a short poem about cats"}, config)
agent.invoke({"messages": "what's my name?"}, config)

Under the hood, a graph

In plain terms. Think of your agent as a factory assembly line. Each workstation (a step like calling a model or using a tool) is a node, and the product moving between them is a shared state—like a clipboard that workers update. Conveyor belts (edges) move the product from one station to the next. Some belts have sensors (conditional edges) that decide which station to go to based on the clipboard’s current info. Compiling the graph is like turning on the whole line and checking every connection is correct. If you instead used a hidden loop—like a single worker blindly repeating the same tasks—you couldn’t inspect each station, pause for human review, or save progress. The explicit graph makes everything visible, debuggable, and controllable.

System design. When you build an agent with LangGraph, what actually runs is a compiled StateGraph — an explicit directed graph where every decision and action is a named node connected by typed edges. The graph is constructed by adding nodes (functions that receive the shared state and return updates) and edges (normal or conditional). For a typical tool‑calling agent, you would do:

  • agent_builder = StateGraph(MessagesState) — the state schema is a TypedDict (e.g., MessagesState) that all nodes read and write.
  • agent_builder.add_node("llm_call", llm_call) — the LLM invocation node.
  • agent_builder.add_node("tool_node", tool_node) — the tool execution node.
  • agent_builder.add_edge(START, "llm_call") — entry point.
  • agent_builder.add_conditional_edges("llm_call", should_continue, ["tool_node", END]) — a conditional edge function (should_continue) inspects the last message’s .tool_calls and returns either "tool_node" or END.
  • agent_builder.add_edge("tool_node", "llm_call") — loop back to the LLM.
  • agent = agent_builder.compile() — produces a runnable object that executes the graph.

Under the hood, LangGraph’s execution engine uses message‑passing and discrete super‑steps, inspired by Google’s Pregel. A node becomes active upon receiving a message (state) on an incoming edge; it runs its function and sends updated state along outgoing edges. All nodes that can run in parallel execute in the same super‑step, while sequential nodes belong to successive super‑steps. The graph is a first‑class object: you can inspect it with agent.get_graph(xray=True).draw_mermaid_png(), persist its state through failures, stream intermediate updates (e.g., stream_mode='updates' to see per‑node outputs), and even inject caching (cache_policy=CachePolicy(ttl=3)) or interrupts via Command(resume=...). This explicit structure is the mechanism that replaces a hidden while‑loop.

Building the agent as an explicit graph rather than an opaque loop is a deliberate trade‑off in favor of inspectability, checkpointability, and streaming. An implicit while‑loop — even if well‑structured — cannot be serialized mid‑execution, cannot be visualized, cannot be paused for human review, and cannot stream partial results from individual nodes. The graph abstraction makes each of those properties straightforward: the framework knows exactly which node is running, what state it consumed, and what edges are possible next. Because nodes and edges are just functions, you can add logging, caching, or conditional branching without modifying the core loop. The cost is that you must explicitly declare every node and edge, which is more verbose than a simple loop, but the benefit is that the runtime can offer durable execution (resume from a checkpoint), human‑in‑the‑loop (inspect and modify state via Command), and full observability with LangSmith.

The rejected alternative is a conventional while loop that calls the LLM, checks for tool calls, and iterates. That pattern is simple to write but impossible to interrupt and resume or stream without re‑architecting the entire loop. LangGraph’s documentation explicitly positions the graph as a “general program” that can express arbitrary control flow, including patterns like map‑reduce (via the Send API) or parallel fan‑out (multiple outgoing edges). A while‑loop cannot naturally represent concurrent paths or dynamic branching determined at runtime. Moreover, a while‑loop hides the state transitions; with the graph, every transition is an edge that can be logged, cached, or subject to conditional logic. The graph is also the foundation for the higher‑level “Deep Agents” harness, which adds middleware like SummarizationMiddleware and SkillsMiddleware — features that would be extremely hard to bolt onto a hidden loop.

A concrete failure mode arises when the graph state schema is inconsistent across nodes. Because each node returns a dictionary of updates that are merged into the shared state, a node returning a key that doesn’t exist in the TypedDict (or returning the wrong type) can silently fail or corrupt state. For example, in the caching example, the second invoke returns __metadata__: {'cached': True} — if a node accidentally overwrites that metadata key, caching behavior may break. Another edge case surfaces with Send: if the conditional edge function continue_to_jokes returns Send objects that produce an empty list, no downstream node is scheduled, possibly leaving the graph in an incomplete state. Finally, when using CachePolicy(ttl=3), the cached result is returned without re‑running the node — if the node’s computation depends on side‑effects (e.g., an external API call), the stale cached result can lead to incorrect application state. These failure modes are manageable because the graph’s explicit structure makes them inspectable and testable, unlike a hidden while‑loop where such bugs hide in plain sight.

Building an agent as a StateGraph with a node, conditional edges, and compile.

python
from typing import Annotated, Literal, TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.managed import RemainingSteps

class State(TypedDict):
    messages: Annotated[list, lambda x, y: x + y]
    remaining_steps: RemainingSteps

def agent_with_monitoring(state: State) -> dict:
    remaining = state["remaining_steps"]
    if remaining <= 2:
        return {"messages": ["Approaching limit, returning partial result"]}
    return {"messages": [f"Processing... ({remaining} steps remaining)"]}

def route_decision(state: State) -> Literal["agent", END]:
    if state["remaining_steps"] <= 2:
        return END
    return "agent"

builder = StateGraph(State)
builder.add_node("agent", agent_with_monitoring)
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", route_decision)
graph = builder.compile()

Watching it run, streaming

In plain terms. Think of an agent like a chef preparing a multi-course meal. Without streaming, you'd wait silently until everything is done, not knowing if they're chopping, cooking, or stuck. Streaming modes let you watch in real time: the "updates" mode shows you each cooking step as it finishes, while the "messages" mode spills out the chef's thoughts word by word. Custom progress events let the chef announce special milestones. This matters because an agent can make many model calls and tool uses—streaming keeps you in the loop, builds trust, and lets you spot problems early instead of staring at a blank screen until the final dish arrives.

System design. LangGraph surfaces an agent’s progress through the stream_mode parameter on agent.stream(). The most granular option uses stream_mode=["updates", "messages"] with version="v2", which emits two typed chunks per super-step. A chunk with type == "updates" contains the full node output (e.g., chunk["data"] includes "__interrupt__" keys when a human-in-the-loop interrupt fires). A chunk with type == "messages" delivers a tuple (token, metadata) representing a single LLM token; the code prints only when token.content is non‑empty. This is the mechanism that exposes both node‑level state transitions and token‑by‑token LLM output in a single stream, avoiding the need to poll or wait for a complete super‑step.

The stream_mode="values" alternative emits the entire state after each node completes, which is simpler but less granular. With "values", a consumer must inspect chunk["messages"][-1] to distinguish HumanMessage from AIMessage and check tool_calls to see tool invocations. Neither mode natively includes custom progress events beyond what the agent’s middleware or nodes choose to write into state; the "updates" mode mirrors node returns directly. The reason LangGraph provides both is that an agent run can span many model calls and tool executions—invoke would block until the final output. Streaming lets a UI show intermediate state (e.g., “Agent is calling tools: [search]”) and display tokens as they arrive, which is critical for user trust and for long‑running tasks.

The key trade‑off is between fidelity and parse complexity. stream_mode="values" gives a simple, complete state snapshot each step but forces the consumer to re‑inspect the entire state to detect changes. The ["updates", "messages"] v2 format separates concerns cleanly but requires version‑aware consumers and deeper branching logic. The rejected alternative is to use only stream_mode="values" for everything—it would lose token‑level streaming and force clients to reconstruct progress from state diffs. The source explicitly shows that version="v2" is required for the unified typed‑chunk format; omitting it would fall back to an older format that does not separate "messages" and "updates" cleanly.

A concrete failure mode is the human‑in‑the‑loop interrupt: after an interrupt, "updates" chunks contain "__interrupt__" but no further LLM tokens until the caller resumes with Command(resume=...). If a consumer only listens to "messages" chunks, it will hang silently waiting for tokens that will never come. Another edge case occurs when a node returns an empty update (e.g., a conditional edge that does not write state)—"updates" chunks may appear with an empty data dict, causing key‑lookup errors if not guarded. The code in the human‑in‑the‑loop example explicitly checks if "__interrupt__" in chunk["data"] rather than assuming it exists, and checks if token.content before printing to skip empty tokens. These patterns are essential building blocks for robust streaming logic in production agent systems.

Stream state updates after each node to watch agent progress.

python
for chunk in graph.stream(
    {"topic": "ice cream"},
    stream_mode="updates",
    version="v2",
):
    if chunk["type"] == "updates":
        for node_name, state in chunk["data"].items():
            print(f"Node `{node_name}` updated: {state}")

See also

  • Autonomous agents — the autonomy spectrum, durable execution, guardrails, memory, and multi-agent systems.
  • LangGraph deep dive — the full narrative guide with the ~34 min audio companion.
  • LangSmith — observability for every agent run.
Continue: Autonomous agents