Structured Outputs

8 chapters · read at your own pace

01. What Structured Outputs Are

Structured outputs constrain a language model to follow a provided schema. In LangChain, tools achieve this. Tools are callable functions with well-defined inputs and outputs. They get passed to a chat model. The model decides when to invoke a tool based on the conversation. It also decides what argument values to provide. But those arguments must match a strict schema. Type hints are required because they define the input schema. For more complex inputs, you can use Pydantic models. This solves a core problem. Downstream systems need type-safe data, not free-form text. Without structured outputs, the model might produce invalid or inconsistent arguments. With a schema, the output is reliable and predictable. The trade-off is that you must define the schema in advance. But this upfront work prevents errors later. The model understands exactly what fields are needed. So your application gets validated data every time. Tools also extend what agents can do. They fetch real-time data, execute code, or query external databases. But for each action, the model needs to supply the right inputs. Structured outputs ensure those inputs are exactly what the tool expects. That keeps your system running smoothly.

Generate it: Structured outputs constrain a language model to follow a provided ______, so its arguments can't be free-form text. (cue: the contract the output must match; answer: schema)

Generate it: In LangChain, the thing you pass to a chat model so the model decides when to invoke it and what arguments to provide is a ____. (cue: a callable function with well-defined inputs and outputs; answer: tool)

Ask yourself: Downstream systems need type-safe data, not free-form text — so what would go wrong if the model returned arguments that didn't match the schema?

Recall check (try before reading the answer):

What do downstream systems need, and what happens without structured outputs? — __________________________________________________________________________ Answer: Downstream systems need type-safe data, not free-form text; without structured outputs, the model might produce invalid or inconsistent arguments.

Why are type hints required? — __________________________________________________________________________ Answer: Type hints are required because they define the input schema.

A tool with type hints defines the input schema for structured outputs.

python

from langchain.tools import tool

@tool
def search_database(query: str, limit: int = 10) -> str:
    """Search the customer database for records matching the query.

    Args:
        query: Search terms to look for
        limit: Maximum number of results to return
    """
    return f"Found {limit} results for '{query}'"

ELI5 — the plain-language version

Imagine ordering at a restaurant where the waiter hands you a menu with set categories—appetizer, main, dessert—and each dish has a fixed list of ingredients. You can pick any dish, but you must fill in exactly the required fields (like “no onions” or “medium rare”). That’s what structured outputs do for a language model in LangChain. Instead of letting the model blurt out free-form text, tools act like that menu: they are callable functions with well-defined inputs and outputs. The model decides when to call a tool based on the conversation, but the arguments must match a strict schema. For instance, type hints are required because they define the input schema, and you can use Pydantic models for more complex inputs. This ensures downstream systems receive type-safe data, not messy prose. Without this constraint, the model might return “temperature 75” or “today’s weather is warm” – a downstream weather app would break trying to parse that. A beginner would feel the frustration of a waiter bringing the wrong dish because the order was vague, leading to confusion and extra work fixing what should have been straightforward.

System design — mechanism, invariant, trade-off

The subsystem ensures the language model's output conforms to a strict schema by channeling all structured output through tools. Execution begins when the model decides to invoke a tool based on conversation context: it reads the tool’s name, description, and args_schema (derived from the function’s type hints or a Pydantic model like WeatherInput). The model must supply argument values that match that schema; type hints are required because they define the input schema. The tool then executes (e.g., search_database(query, limit)) and returns a ToolMessage. Optionally, if a tool is decorated with @tool(return_direct=True), the agent stops immediately and returns the tool’s output as the final response, bypassing any further LLM call. On error, middleware such as wrap_tool_call can intercept the ToolCallRequest to retry or return a custom error message.

The invariant the design preserves is that every tool invocation receives arguments that exactly match its input schema. The schema is enforced by LangChain’s conversion of type hints or Pydantic models into a JSON Schema that the model sees. This guarantees that downstream systems never receive free-form text where structured data is expected. The model cannot supply out-of‑schema types without causing a validation failure, which the agent’s middleware or the tool’s own logic would catch and report.

The key trade‑off is constraining the model’s creativity in exchange for type safety for downstream consumers. The obvious alternative is to let the model output raw text and parse it after the fact, which would require fragile regex or ad‑hoc heuristics. That alternative is rejected because it introduces a write-boundary problem: once free-form text enters a database or API, schema violations are hard to detect and fix. By forcing the model to follow the tool’s args_schema at the point of generation, the subsystem avoids the cost of downstream parsing errors, mis‑typed fields, and the resulting debugging effort.

A concrete failure mode occurs when the model tries to call search_database with a limit argument that is a string like "five" instead of an integer. The schema defined by the type hint limit: int = 10 rejects this at the point the tool is invoked. The operator would see a ToolMessage whose content is a validation error (e.g., "Input validation error: limit should be int") logged by the middleware, or the wrap_tool_call handler could surface a custom error. The signal is unambiguous: the tool node returns an error payload instead of a successful result, and the agent either retries or halts depending on the middleware configuration.

Failure modes — what breaks, what catches it

1. Missing Type Hints on Tool Arguments

Trigger – A tool is defined with the @tool decorator but one or more parameters lack type hints. The source states: “Type hints are required as they define the tool's input schema.”
Guard – No explicit guard identified in the source. The requirement is stated but no validation or exception handler is shown.
Posture – Fail-hard – The tool creation would likely raise a TypeError or schema-building failure, aborting the tool definition at import time.
Operator signal – A TypeError such as "Argument 'query' has no type annotation" or a schema validation error when the tool is registered with the model.
Recovery – Manual fix: add a type hint to every parameter. No automatic retry.

2. Reserved Parameter Name Used (`runtime` or `config`)

Trigger – A tool is defined with an argument named config or runtime that is not the injected ToolRuntime or RunnableConfig instance. The source warns: “Using these names will cause runtime errors.”
Guard – No guard identified in the source; it only documents the restriction.
Posture – Fail-hard – The tool call will raise a runtime error (likely a ValueError about a reserved name or a schema conflict) and the agent run aborts.
Operator signal – A ValueError message such as "Parameter name 'runtime' is reserved" or an error traceback in the agent logs.
Recovery – Manual rename of the conflicting parameter. No automatic retry.

3. Attempting to Access `runtime.execution_info` on an Unsupported Version

Trigger – Code calls runtime.execution_info when deepagents is below version 0.5.0 (or langgraph below 1.1.5). The source notes the requirement in a <Note>: “Requires deepagents>=0.5.0 (or langgraph>=1.1.5).”
Guard – No guard is shown; the feature simply does not exist in older versions.
Posture – Fail-hard – An AttributeError is raised because runtime does not have an execution_info attribute.
Operator signal – AttributeError: 'ToolRuntime' object has no attribute 'execution_info' in the logs.
Recovery – Upgrade deepagents or langgraph to the required version. No automatic retry.

4. Using `server_info` When Not Running on LangGraph Server

Trigger – A tool calls runtime.server_info.assistant_id or similar without checking for None. The source documents: “server_info is None when the tool is not running on LangGraph Server (e.g., during local development or testing).”
Guard – The example shows the guard if server is not None: before accessing attributes.
Posture – Fail-hard if the guard is omitted – an AttributeError on None object would occur. Fail-soft if the guard is used – the code gracefully falls through without error.
Operator signal – Without guard: AttributeError: 'NoneType' object has no attribute 'assistant_id'. With guard: silent absence (no log line, no metric).
Recovery – Without guard: manual fix to add a None check. With guard: the tool proceeds normally, returning a fallback behavior (e.g., skipping server-dependent logic).

5. Pydantic Schema Validation Failure Inside a Tool

Trigger – A tool’s input schema is defined with a Pydantic model (e.g., via args_schema or @tool with a Pydantic field), and the model passes an argument that violates the schema (e.g., wrong type, missing required field). The source says “For more complex inputs, you can use Pydantic models.”
Guard – Pydantic’s built-in ValidationError is the guard (raised automatically by the model when parsing the tool call).
Posture – Fail-hard – The tool call is rejected before the function body runs; the agent run typically aborts or retries with a different model output (depending on the runtime).
Operator signal – A pydantic.ValidationError log trace or a message with the error description (e.g., "Field required" or "Input should be a valid integer").
Recovery – The model may attempt to reformulate the tool call (some runtimes automatically retry), or the error is surfaced to the user for manual correction. The source does not specify a retry count or backoff.

6. Accidental Creation of a `HeadlessTool` Without Local Execution

Trigger – A tool is instantiated in Python by calling tool(...) with only name, description, and args_schema (no function body). The source states: “If you call tool(...) in Python with only name, description, and args_schema, LangChain returns a HeadlessTool. There is no .implement() API on the Python side.”
Guard – No guard is shown; the tool is silently returned as a HeadlessTool.
Posture – Fail-soft (but confusing) – The tool exists and can be passed to a model, but when the model calls it, the run interrupts instead of executing locally. The system does not crash, but the expected local execution does not happen.
Operator signal – An interrupt event (detectable via the useStream hook or a HeadlessTool interrupt identifier). No error is raised, but the agent does not complete autonomously.
Recovery – The graph must be resumed manually or through a frontend that implements the tool. No automatic retry; the operator must inspect the payload and provide the result.

02. Schemas And Pydantic Models

When you define a tool, you can declare its input schema with a Pydantic model or a plain JSON schema. Each field in the schema must have a type hint, which tells the model what kind of data to expect, like a string or a number. You also give each field a description. This description is crucial because it guides the model on what information to extract for that field. The model reads the description to understand the purpose of the field. Without a clear description, the model might fill in the wrong value, but with a good description it knows exactly what you need. The trade off is that writing good descriptions takes a little extra effort, but it greatly improves the tool's accuracy. For complex inputs, you use a Pydantic model with a base class and describe each field using the Field function. This gives the model clear guidance. The description steers the model's extraction, making sure the tool gets the right data every time. Type hints are required because they define the tool's input schema. When you use a function with the tool decorator, its docstring becomes the tool's description, helping the model decide when to use it. For a schema defined with a model, you write a description for each field separately. That clarity is what makes the tool reliable.

Generate it: Each field in the schema needs two things: a type hint, and a ___________ that guides the model on what information to extract for that field. (cue: the prose the model reads to understand a field's purpose; answer: description)

Generate it: When you use a function with the tool decorator, its ________ becomes the tool's description, helping the model decide when to use it. (cue: the text written under the function definition; answer: docstring)

Ask yourself: The field description "steers the model's extraction" — so concretely, how does a vague description versus a good one change what value the model fills in?

Recall check (try before reading the answer):

What does each field's type hint do? — ______________________________________________________________________________________________________ Answer: Each field in the schema must have a type hint, which tells the model what kind of data to expect, like a string or a number.

What is the trade off of writing good descriptions? — _______________________________________ Answer: The trade off is that writing good descriptions takes a little extra effort, but it greatly improves the tool's accuracy.

Defining a tool with input schema via type hints and docstring.

python

from langchain.tools import tool

@tool
def search_database(query: str, limit: int = 10) -> str:
    """Search the customer database for records matching the query.

    Args:
        query: Search terms to look for
        limit: Maximum number of results to return
    """
    return f"Found {limit} results for '{query}'"

ELI5 — the plain-language version

Imagine a vending machine where each button has a tiny label that explains what snack you'll get. But if the label is missing or confusing, you might press the button expecting chips and get candy instead. That’s exactly how a tool’s input schema works here. When you define a tool—like search_database(query: str, limit: int)—you give each field a type hint (like "str" or "int") and a description in the docstring. The description acts as the button label: it tells the model exactly what to put in that slot. For example, the description “Search terms to look for” guides the model to extract a search phrase for query, not a number. Without that clear description, the model might stuff a number into a text field or leave it blank, just like jamming a coin into the wrong slot. The real mechanism is that the model reads the docstring to decide what value to supply for each field. If descriptions are vague or missing, the tool call fails with wrong inputs—the vending machine gives you a snack you didn’t want, and you’re left frustrated with no way to fix it.

System design — mechanism, invariant, trade-off

In the subsystem for schemas and Pydantic models, the ordered mechanism begins when a tool is defined using the @tool decorator. The developer must supply type hints for every parameter and a docstring that serves as the tool’s description. This schema—optionally expressed as a Pydantic model via BaseModel and Field—is then passed to a chat model. When the model decides to invoke the tool, it reads the schema to produce a tool call with the required arguments. If the call succeeds, the tool executes and returns a ToolMessage. On failure—for instance, if the model supplies an argument that violates the schema—the error is caught by LangChain’s agent middleware. The framework provides wrap_tool_call to retry the failed call or return a custom error message, allowing the agent to either recover or surface the failure.

The design preserves a critical invariant: every tool must have a well‑defined input schema enforced by required type hints. As the source states, “Type hints are required as they define the tool’s input schema.” Without them the model cannot know what data to supply. Additionally, the tool name must follow snake_case conventions (“Prefer snake_case for tool names … Sticking to alphanumeric characters, underscores, and hyphens helps to improve compatibility across providers.”) These rules guarantee that the model can reliably parse and invoke any registered tool.

The key trade‑off is the upfront cost of writing clear descriptions versus the downstream cost of model errors. By investing in “informative and concise” docstrings and explicit Field descriptions, the developer makes the tool’s purpose unambiguous. The obvious alternative is to omit or skimp on descriptions, expecting the model to infer the meaning from the parameter name alone. That approach is rejected because it risks the model “fill[ing] in the wrong value” for a field. The cost this rejection avoids is wasted LLM inference cycles (e.g., re‑invoking the model after a bad tool call) and unreliable outputs that surface as confusing ToolMessage payloads.

A concrete failure mode occurs when a field’s description is missing or vague. For example, consider a tool with a Pydantic field count: int and no description. The model might mistakenly supply a string like "many" because it does not understand the expected format. The tool’s execution then fails, and the agent middleware returns a ToolMessage containing a validation error. An operator monitoring the trace in LangSmith would see a ToolCallRequest that produced a ToolMessage with a "ValidationError" signal, often followed by retry attempts via wrap_tool_call. This signal—a clearly malformed tool result—is the concrete flag that the schema description was insufficient.

Failure modes — what breaks, what catches it

Reserved parameter name used in tool schema

Trigger – A field in the tool’s Pydantic model or JSON schema is named config or runtime.
Guard – None named; the source states “Using these names will cause runtime errors.” No exception handler or retry is specified.
Posture – fail-hard (the tool creation or invocation aborts with an error).
Operator signal – A runtime error when the tool is defined or called.
Recovery – Rename the field to avoid the reserved names config and runtime.

Missing or ambiguous field description

Trigger – A field in the tool schema lacks a description or has an unclear description.
Guard – None; the source says “the description is crucial” but provides no exception handler, validation, or fallback.
Posture – fail-soft (the tool executes but the LLM may fill in the wrong value; no error is raised).
Operator signal – Silent wrong output from the tool, with no log or metric indicating the cause.
Recovery – Add a clear, specific description to the field.

Field type hint omitted

Trigger – A field in the tool schema is defined without a type hint.
Guard – None explicitly; the source says “must have a type hint” but does not name any guard.
Posture – fail-hard (likely a Pydantic validation error during schema creation).
Operator signal – An error when the tool is defined (e.g., Pydantic validation error).
Recovery – Add a type hint to the field.

Using deprecated injection patterns

Trigger – Tool signature includes InjectedState, InjectedStore, get_runtime(), or InjectedToolCallId.
Guard – The source directs to “[Migrate from older injection patterns]” but does not name the guard (deprecation warning or error).
Posture – fail-hard (likely a runtime error or deprecated behavior).
Operator signal – Deprecation warning or runtime error during tool invocation.
Recovery – Migrate to the ToolRuntime parameter as shown in the source.

Tool defined with only schema (HeadlessTool) but no client implementation

Trigger – tool(...) called with only name, description, and args_schema.
Guard – The run interrupts instead of executing locally. The source identifies this interrupt as the guard.
Posture – fail-closed (the run refuses to execute the tool and pauses).
Operator signal – An interrupt observed in the graph execution; no local execution happens.
Recovery – Implement the tool in the client and resume the graph.

Accessing runtime.server_info when not on LangGraph Server

Trigger – Tool code accesses server = runtime.server_info and then uses server.assistant_id or server.graph_id without a None check.
Guard – The source shows if server is not None: as the conditional guard.
Posture – fail-hard (AttributeError if the guard is absent).
Operator signal – AttributeError when server is None.
Recovery – Add the if server is not None: guard before accessing server attributes.

03. Constrained Decoding

The provided material does not discuss any token-masking mechanism. It focuses on LangChain tools, headless tools, and tool runtime. There is no mention of guiding output structure from the first token. Therefore, I cannot write an audio narration on that topic based solely on the given source.

Generate it: The provided material does not discuss any ____________ mechanism, and there is no mention of guiding output structure from the first token. (cue: the per-token mechanism the source never mentions; answer: token-masking)

Ask yourself: The material focuses on LangChain tools, headless tools, and tool runtime — so why is there no source mention of guiding output structure from the first token?

Recall check (try before reading the answer):

What does the provided material actually focus on instead? — ________________________________________________________________ Answer: It focuses on LangChain tools, headless tools, and tool runtime; there is no mention of guiding output structure from the first token, so the narration cannot be written from this source.

Looking back: Recall "What Structured Outputs Are" — but does this source say anything about guiding output structure from the first token? Answer: No; there is no mention of guiding output structure from the first token.

No relevant code excerpt found for constrained decoding in the provided material.

python

ELI5 — the plain-language version

The provided context focuses on LangChain tools, including headless tools, tool runtime, and return types like strings, objects, and Commands. It does not contain any information about constrained decoding, token-masking mechanisms, or guiding output structure from the first token. Therefore, I cannot write an explanation of that subsystem based solely on the given source.

System design — mechanism, invariant, trade-off

The headless-tools subsystem operates as a coordinated interrupt/resume handshake between a server-side LangGraph agent and a client-side environment. First, a HeadlessTool is defined on the server using tool(name=..., description=..., args_schema=...) with no in‑process implementation, then registered with the agent via create_agent or a LangGraph graph. When the model issues a tool call for this schema‑only tool, the graph interrupts instead of executing locally, producing a payload shaped as {"type": "tool", "tool_call": {"id", "name", "args"}}. The client—typically a browser using the JS SDK hooks—detects this interrupt, runs the matching implementation attached via .implement(...) (or manually inspects the payload), and then submits a resume command to continue the graph with the tool result. Failure at any step (e.g., missing client implementation or a dropped interrupt) leaves the graph paused; the onTool callback can signal start, success, or error events for operator observability.

The invariant the design preserves is environment‑exact execution: the tool’s logic runs only where the user’s app runs, typically the browser. This is captured in the source’s definition of headless tools as “tool definitions … that you register on the server … [with] implementation … registered only on the client.” The guarantee is that no server process ever executes the tool’s body, and that the tool result—once resumed—is the sole response the graph accepts before proceeding. The architecture rejects the obvious alternative of hosting tool logic on the server (e.g., ordinary @tool functions). That approach would require exposing browser‑only APIs (Geolocation, IndexedDB, Canvas) or sensitive user data to a remote process, increasing latency and violating privacy boundaries. The cost of this rejection is the added complexity of the interrupt/resume handshake and the need for a client-side runtime that can detect and respond to interrupts—but it avoids shipping private data over the network and keeps purely local operations instant.

A concrete failure mode is an unresolved interrupt caused by a missing or broken client implementation. An operator would see the graph remain paused indefinitely, with the interrupt payload logged as {"type": "tool", "tool_call": {"id": "call_abc123", "name": "geolocate", "args": {...}}} and no subsequent resume command. The onTool callback would emit an error event (if wired) or simply a start event that never transitions to success. Without a timeout or alert on the interrupt duration, the operator must manually inspect the graph’s state to discover that the tool’s .implement(...) was never registered in the frontend or that the client-side handler failed silently. The fix requires registering the correct client-side handler and re‑emitting the resume command.

Failure modes — what breaks, what catches it

1. Reserved Parameter Name Collision

Trigger — A developer defines a tool parameter named config or runtime (the two names reserved by the framework).
Guard — No guard is shown in the source. Only a documentation warning is present: “Using these names will cause runtime errors.”
Posture — Fail‑hard. The runtime immediately raises an error when the tool is invoked, aborting the run.
Operator signal — A runtime error (e.g., ValueError: parameter name 'config' is reserved or an internal attribute collision) is thrown; no tool result is returned.
Recovery — The developer must rename the offending parameter to a non‑reserved name. The run must be resubmitted.

2. Stream Writer Used Outside LangGraph Context

Trigger — A tool calls runtime.stream_writer while running outside a LangGraph execution context (e.g., during a local script or a plain LangChain chain).
Guard — No guard is shown in the source. The documentation states: “the tool must be invoked within a LangGraph execution context.”
Posture — Fail‑hard. The runtime raises an exception because stream_writer is not available without the LangGraph context.
Operator signal — An AttributeError or RuntimeError indicating that stream_writer is None or not initialized.
Recovery — The tool must be invoked inside a proper LangGraph graph run. No automatic retry; manual correction of the invocation environment is required.

3. Accessing Execution Info Without Required Package Version

Trigger — A tool reads runtime.execution_info (e.g., info.thread_id) when deepagents is below 0.5.0 or langgraph is below 1.1.5.
Guard — No guard is shown in the source. Only a version requirement note is provided: “Requires deepagents>=0.5.0 (or langgraph>=1.1.5).”
Posture — Fail‑hard. The .execution_info attribute may be None or raise AttributeError, causing the tool to crash.
Operator signal — AttributeError: 'NoneType' object has no attribute 'thread_id' (if execution_info is None) or similar missing‑attribute error.
Recovery — Upgrade the package to meet the version requirement. No automatic fallback; the run must be retried after the upgrade.

4. Server Info Accessed During Local Development

Trigger — A tool reads runtime.server_info while running locally (not on LangGraph Server).
Guard — The example code uses the conditional if server is not None: before accessing server.assistant_id, server.graph_id, or server.user.identity.
Posture — Fail‑soft. If the guard is present, the tool degrades gracefully (e.g., returns early or uses a default path). If the guard is absent, the tool experiences AttributeError and fails hard.
Operator signal — When the guard is present: the operator sees no error, but server_info is None and relevant data is absent. When the guard is absent: AttributeError: 'NoneType' object has no attribute 'assistant_id'.
Recovery — With the guard, the tool continues without server-specific data. Without the guard, the run must be fixed by either adding the guard or deploying to LangGraph Server. No automatic retry.

5. Headless Tool Interrupt Without Client‑Side Handler

Trigger — The model issues a call to a headless tool (defined with only name, description, and args_schema), but the application does not implement the required client‑side .implement(...) or the frontend hooks (e.g., useStream) do not detect the interrupt.
Guard — No guard is shown in the source. The documentation describes the interrupt behavior but does not provide a fail‑safe; it states “the run interrupts instead of executing the tool locally.”
Posture — Fail‑closed. The graph pauses and refuses to proceed; no tool result is produced automatically.
Operator signal — The run remains in an “interrupted” state indefinitely. No error is raised, but the graph does not advance.
Recovery — A developer or operator must manually inspect the interrupted payload, perform the intended action (e.g., in a browser), and submit a resume command. No automatic retry or fallback is built in.

04. A Worked Example

You can define a tool that extracts a person's details. Use a Pydantic model with a name string, an age integer, and a city string. This schema tells the model exactly what inputs to expect. To bind the schema to the model, you simply pass the tool to it. The model then reads the schema and generates the arguments. The age comes back as a number because the schema defines it as an integer. The model follows that instruction precisely. This ensures the output is consistent and easy for your code to handle. The trade off is that the model must stick to the schema exactly. It cannot use text like thirty or twenty five. Instead it provides a whole number like twenty five as the digits twenty five. That makes downstream processing reliable. You lose the flexibility of accepting varied formats but gain predictable structured data. The model decides when to call this tool based on the conversation. It provides the name, age, and city as separate fields. This separation lets you store or process each piece independently. The schema acts as a contract between you and the model. Both sides agree on the shape of the data. This approach is common for tasks like extracting information from user messages. You get back exactly what you asked for every time.

Generate it: The age comes back as a number because the schema defines it as an _______. (cue: the type that forces digits, not words; answer: integer)

Generate it: The schema acts as a ________ between you and the model. (cue: an agreement both sides honor about the data's shape; answer: contract)

Ask yourself: You lose the flexibility of accepting varied formats but gain predictable structured data — so when is that a trade worth making?

Recall check (try before reading the answer):

Why does the age come back as a number? — ___________________________________________________________________________ Answer: The age comes back as a number because the schema defines it as an integer.

What does providing name, age, and city as separate fields let you do? — __________________________________________________________ Answer: This separation lets you store or process each piece independently.

A tool returning a structured dictionary tells the model exactly what fields to expect and ensures consistent, predictable output.

python

from langchain.tools import tool


@tool
def get_weather_data(city: str) -> dict:
    """Get structured weather data for a city."""
    return {
        "city": city,
        "temperature_c": 22,
        "conditions": "sunny",
    }

ELI5 — the plain-language version

Think of this subsystem like a waiter handing the chef a pre‑printed order form with blank fields for “dish name,” “quantity,” and “table number.” The chef is the model, and the form is the tool’s schema—a Pydantic model that defines a person’s details: name as a string, age as an integer, and city as a string. When you pass this schema to the model, it reads the exact fields and generates arguments that match. For example, age always comes back as a number because the schema says it must be an integer. The model follows that instruction precisely, ensuring the output is consistent and easy for your code to handle. Without this rigid form, the model could return age as “twenty‑five” instead of 25, or mix city and name together. That would break your code with unpredictable text, forcing you to write messy parsers and error‑handling logic. The strict schema prevents that confusion, keeping every tool call clean and reliable.

System design — mechanism, invariant, trade-off

The subsystem operates as a deterministic pipeline: a user query enters the agent, which invokes a tool defined via the @tool decorator (e.g., @tool("person_extractor")). The tool’s input schema—a Pydantic model built with BaseModel and Field, containing name: str, age: int, and city: str—is parsed by the model. The model reads the schema and generates structured arguments, which are passed to the tool’s Python function. The tool executes, returning a string wrapped in a ToolMessage, which the agent feeds back into the conversation loop. On failure, middleware such as wrap_tool_call can catch errors and produce a ToolMessage containing error details, allowing the agent to retry or respond gracefully.

The invariant the design preserves is type consistency—the schema instructs the model to produce an integer for age, a string for name, etc. The model “follows that instruction precisely,” ensuring the output matches the schema exactly, making it predictable and easy for downstream code to handle. This is a write-boundary guarantee: the tool schema enforces a strict contract on what the model can emit, preventing it from returning free-text or malformed data.

The key trade-off is that the model must stick to the schema exactly; it cannot output freeform text or omit fields. This rejects the obvious alternative of letting the model write natural-language answers for structured extractions. By choosing schema enforcement, the design avoids the cost of downstream parsing errors, validation logic, and inconsistent formatting that arise when models produce freeform text. The price is that the model cannot express uncertainty or nuance that falls outside the schema—it must fill all required fields, even if the input is ambiguous.

A concrete failure mode occurs when the model generates arguments that violate the schema—for example, passing a string "twenty" for the integer age. The tool call fails, and the middleware (or the tool node) produces a ToolMessage with an error description. An operator would see this error in LangSmith traces: a tool call with status “error” and an error message like “ValidationError: 'age' is not a valid integer.” The trace, combined with the ToolMessage content, signals exactly which field violated the schema and the expected type.

Failure modes — what breaks, what catches it

Failure 1 — Missing or mismatched runtime context for stream_writer

Trigger — A tool calls runtime.stream_writer outside a LangGraph execution context (e.g., in a local script or test).
Guard — None. The source only contains a <Note> stating the requirement; no try/except or fallback is shown.
Posture — Fail-hard: the run halts with a runtime error.
Operator signal — An exception traceback, likely AttributeError or a LangGraph-specific error about missing execution context.
Recovery — Manual: the developer must ensure the tool is only invoked within a LangGraph graph execution. No retry or backoff is provided.

Failure 2 — server_info is None when not on LangGraph Server

Trigger — The tool get_assistant_scoped_data is executed locally (e.g., during development).
Guard — The explicit check if server is not None in the source (line "if server is not None:").
Posture — Fail-soft: the tool continues, prints nothing (or returns "done"), and omits server-dependent logic.
Operator signal — Silent absence: no assistant_id, graph_id, or user output appears in logs.
Recovery — Built-in: the tool returns successfully with the fallback string "done". No retry needed; the tool is designed for both environments.

Failure 3 — Reserved parameter name runtime or config used incorrectly

Trigger — A tool author names a parameter config or runtime intending it to receive a user‑supplied value, not the injected ToolRuntime or RunnableConfig.
Guard — None in the code. The source warns: "Using these names will cause runtime errors." No validation or fallback is shown.
Posture — Fail-hard: a runtime error occurs (likely ValueError or a schema collision).
Operator signal — Error traceback indicating duplicate or invalid parameter injection.
Recovery — Manual: the developer must rename the parameter and use the ToolRuntime parameter (runtime reserved) instead. No automatic retry.

Failure 4 — State key "messages" missing due to custom state schema

Trigger — The tool get_last_user_message accesses runtime.state["messages"] but the graph’s state does not include a messages key (e.g., a custom AgentState without that field).
Guard — None. The source iterates reversed(messages) without a try/except or get() call for that key.
Posture — Fail-hard: raises KeyError.
Operator signal — KeyError: 'messages' in the log, possibly with the stack trace.
Recovery — Manual: the graph state must include a messages key, or the tool must be updated to use runtime.state.get("messages", []). No retry or fallback is provided.

Failure 5 — Tool import error due to missing version requirement

Trigger — A tool using ToolRuntime (or tool from langchain.tools) is run with deepagents<0.5.0 and langgraph<1.1.5.
Guard — None. The source contains a <Note> stating the version requirement but no runtime compatibility check.
Posture — Fail-hard: ImportError or ModuleNotFoundError at the point of importing ToolRuntime or Command.
Operator signal — ModuleNotFoundError: No module named '...' or ImportError: cannot import name 'ToolRuntime'.
Recovery — Manual: upgrade the package to the required version. No retry or fallback is implemented.

Failure 6 — Tool call ID correlation failure when using InjectedToolCallId (legacy pattern)

Trigger — A tool that depends on the tool call ID uses the deprecated InjectedToolCallId injection pattern (mentioned in the migration accordion).
Guard — None. The source only shows a note to “migrate from older injection patterns” but does not provide a guard for the deprecated code.
Posture — Fail-hard: the tool may throw an error or return incorrect results because the injection mechanism changed.
Operator signal — Potentially silent mis‑identification of tool calls, or an exception referencing InjectedToolCallId.
Recovery — Manual: rewrite the tool to use ToolRuntime and runtime.tool_call_id (as shown in the “Access context” table). No automatic fallback.

05. Trade-Offs And Alternatives

The provided documents do not discuss the trade-off of reliability against generation speed and diversity, nor do they mention plain prompting, looser json mode, or external grammar tools. The context focuses on creating and using LangChain tools, headless tools, prebuilt tools, and server-side tool use. Without information on those specific alternatives or trade-offs, I cannot write the narration you requested.

Generate it: The provided documents do not discuss the trade-off of reliability against generation _____ and diversity. (cue: what a strict schema might cost you, alongside diversity; answer: speed)

Ask yourself: The context focuses on creating and using LangChain tools, headless tools, prebuilt tools, and server-side tool use — so why must the honest answer about plain prompting or looser json mode be that the material does not cover them?

Recall check (try before reading the answer):

Which alternatives to structured outputs do the documents NOT mention? — ____________________________________________________________________________________________________________________________________________________________________________________________ Answer: The provided documents do not mention plain prompting, looser json mode, or external grammar tools.

What does the context actually focus on instead? — ____________ Answer: The context focuses on creating and using LangChain tools, headless tools, prebuilt tools, and server-side tool use.

Using return_direct=True trades off LLM reasoning for deterministic speed.

python

@tool(return_direct=True)
def fetch_order_status(order_id: str) -> str:
    """Fetch the current status of a customer order."""
    # In production, query your order management system here
    return f"Order {order_id} is shipped and will arrive in 2 days."


# "Order 12345 is shipped and will arrive in 2 days."

ELI5 — the plain-language version

The provided context does not contain any information about trade-offs between reliability and generation speed/diversity, nor about alternatives such as plain prompting, looser JSON mode, or external grammar tools. The documents focus exclusively on LangChain tool creation, headless tools, prebuilt tools, and server-side tool use. Since the query specifically asks for an explanation grounded in the source about a subsystem that is not present in the context, I cannot write the requested narration.

System design — mechanism, invariant, trade-off

The tool subsystem in LangChain operates as an interrupt-driven extension to agent reasoning. The ordered mechanism begins when a chat model—such as ChatOpenAI—decides, based on conversation context, to invoke a tool defined with the @tool decorator. The model issues a ToolCallRequest containing the tool name and arguments. The agent node wraps execution in a ToolMessage and, by default, returns control to the model for further reasoning. However, if the tool is decorated with @tool(return_direct=True), the mechanism short‑circuits: the tool’s output is returned immediately as the final response, bypassing any additional model call. On failure, middleware such as wrap_tool_call can intercept the error, log it, and optionally retry or return a custom message. For headless tools—defined as HeadlessTool objects with only name, description, and args_schema—the graph interrupts instead of executing locally, pausing with a payload shaped {"type": "tool", "tool_call": {"id", "name", "args"}}. The environment (e.g., browser) then performs the action and resumes the graph with the result.

The invariant the design preserves is the return_direct guarantee: when all tools called in a single turn have return_direct=True, the agent stops looping and returns the tool’s output unchanged. The model cannot rephrase, summarize, or act on that output, ensuring deterministic, unmodified delivery. For headless tools, the invariant is locality of execution: data remains on the device and side effects (e.g., using browser APIs) happen exactly once in the client environment, with the graph resuming only after the client signals completion via the resume command. These invariants prevent the model from altering authoritative results and keep sensitive operations confined to the user’s device.

The key trade‑off is server‑side execution versus client‑side execution, and it is built this way to avoid the cost of sending private or device‑dependent data across the network. The obvious alternative—defining all tools with ordinary server‑side logic—is rejected because it cannot access client‑only APIs (geolocation, clipboard, file pickers) and would require transmitting data that should remain local. By using headless tools, the system avoids the latency and privacy risks of a server round‑trip for purely local operations. The trade‑off is a more complex interrupt/resume handshake and the need to mirror the schema on the client with .implement(...), but this cost is justified when the work depends on the environment that only exists on the client.

A concrete failure mode is a server‑side tool call that throws an unhandled exception (e.g., a database timeout in fetch_order_status). Without middleware, the run would abort. With the wrap_tool_call middleware, the operator would see a ToolMessage containing the error text emitted by the error handler, along with the tool’s ToolCallRequest ID enabling correlation across logs. The graph may then retry or return a custom error message. The signal an operator sees is a trace entry in LangSmith showing the tool node’s success or error lifecycle event, and, if onTool callbacks are used, a start event followed immediately by an error event with the exception details.

Failure modes — what breaks, what catches it

Reserved Parameter Name `config` or `runtime` in Tool Signature

Trigger — A developer defines a tool with an argument named config or runtime (e.g., def my_tool(config: dict)) instead of using the ToolRuntime parameter.
Guard — None in the source. The documentation states only that “using these names will cause runtime errors” and reserves them internally, but no explicit exception handler, validation, or fallback is shown.
Posture — Fail-hard. The runtime error aborts the tool execution entirely.
Operator signal — A runtime error (likely TypeError or ValueError) without a specific message in the source; the operator would see an unhandled exception traceback.
Recovery — No retry or fallback. The developer must rename the parameter (e.g., to cfg) and rely on ToolRuntime for runtime information.

Using `runtime.stream_writer` Outside a LangGraph Execution Context

Trigger — A tool annotated with runtime: ToolRuntime calls runtime.stream_writer while the graph is not running (e.g., during local testing or in a non‑LangGraph server).
Guard — None in the source. The documentation only warns that “the tool must be invoked within a LangGraph execution context,” but provides no try/except, conditional check, or fallback.
Posture — Fail-hard. The stream writer fails, likely raising an exception that aborts the tool.
Operator signal — An assertion or runtime error (details not given in source); the operator would observe a crashed tool call and no stream output.
Recovery — No automatic retry. The developer must ensure the tool is only used inside a valid LangGraph execution context (e.g., by checking runtime.execution_info or wrapping in a conditional).

Assuming `runtime.server_info` Is Not None When Running Locally

Trigger — A tool accesses runtime.server_info.assistant_id or server.user.identity without first checking for None, while the tool runs locally (not on LangGraph Server).
Guard — The example in the source uses if server is not None: before accessing server.assistant_id / server.user. This is an explicit guard via conditional check.
Posture — Fail‑soft (when the guard is used) — the tool continues by skipping server‑specific logic. Without the guard, it would be fail‑hard (AttributeError on None).
Operator signal — If guarded: silent absence of server info (no log or error). If unguarded: AttributeError: 'NoneType' object has no attribute 'assistant_id' in the traceback.
Recovery — The guard returns early or prints nothing; no retry. The developer can fall back to local defaults inside the if/else branch.

Calling `tool(...)` in Python Expecting a Client‑Side `.implement()` API

Trigger — A developer calls tool(name="...", description="...", args_schema=...) in Python and expects to later call .implement(...) on the returned object to attach client logic.
Guard — None. The documentation explicitly states: “There is no .implement() API on the Python side.” No validation, exception, or fallback is provided.
Posture — Fail‑soft. The tool still exists as a HeadlessTool but cannot be executed locally; the tool call will interrupt the graph instead. No runtime error occurs, but the developer’s intent is unmet.
Operator signal — The operator would see a HeadlessTool object with no .implement attribute if they try to call it, but more commonly they would see a graph interruption (since headless tools interrupt at runtime). No explicit log line is defined in the source.
Recovery — No automatic retry. The developer must restructure by defining the tool on the frontend (using JS SDK hooks) and avoiding Python‑side .implement() expectations.

Directly Indexing `runtime.state` Without Fallback

Trigger — A tool uses runtime.state["messages"] (as in the example for get_last_user_message) when the state does not contain that key (e.g., a custom state without a "messages" field).
Guard — The source shows a defensive alternative: runtime.state.get("user_preferences", {}) for custom fields, but no guard is provided for the "messages" access itself. The example relies on the key always existing.
Posture — Fail‑hard if the key is missing. The tool raises a KeyError and execution aborts.
Operator signal — KeyError: 'messages' in the traceback.
Recovery — No retry or fallback. The developer must either guarantee the key exists or change the access pattern to use .get("messages", []) with a default.

06. Failure Modes

Designing tool schemas can be tricky. If your schema is too strict, the model might struggle to produce any valid input at all. For example, the documentation warns that some model providers reject tool names containing spaces or special characters. So using simple snake case, like web underscore search, helps avoid rejection. On the other hand, a schema with contradictory rules can confuse the model entirely. When constraints fight each other, the model may generate output that passes format checks but still contains factual mistakes. That kind of structurally valid but wrong answer can be hard to catch. Another common problem happens when extra text leaks outside the expected structure, like breaking the leading brace in a JSON schema. The model might add commentary or stray characters, causing the tool call to fail. To prevent these issues, keep your tool descriptions clear and concise. Use Pydantic models for complex inputs, but avoid overly tight constraints that block every possible token. Remember, the model relies on your schema to decide when and how to call a tool. A well balanced schema gives the model enough freedom to succeed while still guiding it toward correct use. Always test your tools with real examples to catch contradictions or over restrictions. That way you avoid silent failures and keep your agent running smoothly.

Generate it: So using simple ________ case, like web underscore search, helps avoid rejection. (cue: the lower-case-with-underscores naming style; answer: snake)

Generate it: That kind of structurally _____ but wrong answer can be hard to catch. (cue: it passes format checks yet contains factual mistakes; answer: valid)

Ask yourself: A schema can be too strict or have contradictory rules — what is the danger at each extreme, and why does a well balanced schema give the model enough freedom to succeed?

Recall check (try before reading the answer):

What happens if your schema is too strict? — ___________________________________________________________________________________________________________ Answer: If your schema is too strict, the model might struggle to produce any valid input at all.

What happens when extra text leaks outside the expected structure? — _________________________________________________________________________________________ Answer: The model might add commentary or stray characters, causing the tool call to fail.

What should you always do to catch contradictions or over restrictions? — _____________________________________ Answer: Always test your tools with real examples.

Error handling middleware catches tool call exceptions and returns a structured ToolMessage.

python

from collections.abc import Callable

from langchain.agents import create_agent
from langchain.agents.middleware import wrap_tool_call
from langchain.messages import ToolMessage
from langchain.tools.tool_node import ToolCallRequest


@wrap_tool_call
def handle_tool_errors(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], ToolMessage],
) -> ToolMessage:
    """Convert tool exceptions into ToolMessages the model can handle."""
    try:
        return handler(request)
    except Exception as e:
        return ToolMessage(
            content=f"Tool error: Please check your input and try again. ({e})",
            tool_call_id=request.tool_call["id"],
        )

ELI5 — the plain-language version

Think of designing a tool schema like filling out a very picky online form. If the form demands your name in "firstname_lastname" with no spaces—and you write "John Doe"—the website just errors out and won't accept anything. That's exactly what happens when a tool schema is too strict: the model can't produce valid input. The source says some providers reject tool names that have spaces or special characters, so developers use simple snake case like web_search to avoid rejection. But there's another trap: if the form has contradictory rules—like "must include middle initial" and "no letters after the first name"—you might type something that passes all the format checks but still contains a factual mistake (like the wrong middle initial). Without careful schema design, either the model gets stuck (form rejection) or it feeds you a perfectly formatted but completely wrong answer—and you wouldn't even know until the error bites you.

System design — mechanism, invariant, trade-off

The subsystem uses a declarative tool-creation mechanism: a developer annotates a function with the @tool decorator, providing a name, description, and typed arguments. The decorator inspects the function’s signature (type hints) and docstring to build an input schema. When an agent such as create_agent runs, the schema is passed to the chat model. The model decides when to invoke the tool and what arguments to pass. On execution, the tool runs and returns a result wrapped in a ToolMessage. On failure—for example, if the model cannot produce arguments matching the schema—the model provider may return an error or the agent may retry. The process is linear: schema definition → model call → tool invocation → result or error.

The invariant the design preserves is schema comprehensibility for the model—the schema must be both non‑contradictory and easy for the model to satisfy. The documentation explicitly warns against contradictory rules because they confuse the model, causing it to generate output that “passes format checks but still contains factual mistakes.” The guarantee is that a well‑formed schema (using snake_case names, clear descriptions, and non‑conflicting constraints) minimises the chance of invalid or misleading tool calls.

The key trade-off is between schema strictness and model fluency. A strict schema (e.g., requiring a Pydantic WeatherInput model with Literal types) ensures correct inputs but risks the model struggling “to produce any valid input at all.” The obvious rejected alternative is a schema that is too loose or ambiguously defined, which would let the model generate structurally valid but factually wrong outputs—a cost of extra compute and debugging. The design thus rejects lax validation because the cost of processing many incorrect tool calls (and having the model re‑attempt) outweighs the benefit of rarely missing a valid call. Instead, it pushes for minimal, non‑contradictory schemas that let the model reliably succeed.

A concrete failure mode occurs when a tool name contains spaces or special characters, such as @tool("Web Search"). Some model providers reject the name outright with an error. The operator would see an exception or an error message from the model provider, like “Tool name ‘Web Search’ is invalid.” This signal is immediate and visible in logs or the API response. Another failure mode is a schema with contradictory rules (e.g., two fields that must simultaneously be required and excluded). The operator would see that the model’s tool call is structurally valid (the JSON passes schema checks) but contains factual mistakes—for example, returning “Order 12345 is shipped” when the real status is “pending.” The only clue is the wrong answer itself, requiring additional validation logic to detect.

Failure modes — what breaks, what catches it

Reserved Parameter Name Conflict

Trigger — A tool function defines a parameter named config or runtime (the two reserved names) instead of using the ToolRuntime injection pattern.
Guard — No runtime exception handler is shown; the guard is the documentation statement: “Using these names will cause runtime errors.” There is no except clause or validation in the source.
Posture — Fail‑hard: the tool call raises a runtime error, aborting the current run.
Operator signal — An error message such as "'config' is a reserved parameter name" or similar runtime error.
Recovery — Rename the conflicting parameter to something else and use the ToolRuntime injection (e.g., runtime: ToolRuntime) for runtime access.

Missing Required Type Hints on Tool Parameters

Trigger — A function decorated with @tool lacks type hints on its parameters. The source states “Type hints are required as they define the tool's input schema.”
Guard — No explicit guard in the source; the requirement is only documented. The tool definition itself may fail or generate an incomplete schema.
Posture — Fail‑hard: the tool creation or registration raises an error, or the tool is registered with an empty/incorrect input schema, preventing model invocation.
Operator signal — An error during tool registration like TypeError: argument 'query' must have a type annotation or a silent failure when the model attempts to use the tool.
Recovery — Add proper type hints (e.g., query: str, limit: int = 10) to all parameters.

Using Older Injection Patterns (InjectedState, InjectedStore, get_runtime(), or InjectedToolCallId)

Trigger — A tool uses one of the deprecated injection mechanisms (e.g., InjectedState, InjectedStore, get_runtime(), InjectedToolCallId) instead of the current ToolRuntime pattern. The source includes an accordion titled “Migrate from older injection patterns.”
Guard — A migration guide is provided in the source, but no code‑level guard or deprecation warning is shown.
Posture — Fail‑soft or fail‑hard depending on version: older patterns may still work in earlier library versions but are unsupported in newer versions, potentially causing a silent failure or a hard error.
Operator signal — A deprecation warning, an ImportError, or a runtime error when the tool is invoked (e.g., AttributeError: 'InjectedState' object has no attribute ...).
Recovery — Rewrite the tool to use runtime: ToolRuntime and access state via runtime.state, store via runtime.store, etc.

Unchecked Access to runtime.server_info When Not on LangGraph Server

Trigger — A tool accesses runtime.server_info (e.g., server.assistant_id) without first checking if it is None. The source explicitly says “server_info is None when the tool is not running on LangGraph Server.”
Guard — The source shows an explicit guard in code: if server is not None: before accessing server attributes.
Posture — Fail‑soft if the guard is used (tool degrades gracefully, e.g., skips printing server info). If the guard is omitted, posture is fail‑hard (an AttributeError on None).
Operator signal — If guard is present: tool runs but prints nothing about server info. If guard is missing: error AttributeError: 'NoneType' object has no attribute 'assistant_id'.
Recovery — Add the if server is not None: check before using runtime.server_info attributes, or handle None with a fallback value.

Calling runtime.stream_writer Outside a LangGraph Execution Context

Trigger — A tool attempts to use runtime.stream_writer inside its body, but the tool is not invoked within a LangGraph execution (e.g., during local testing or from a non‑LangGraph agent). The source note warns: “If you use runtime.stream_writer inside your tool, the tool must be invoked within a LangGraph execution context.”
Guard — No guard is provided in the source; the warning is purely documentation.
Posture — Fail‑hard: the tool call raises a runtime error because the stream writer is unavailable.
Operator signal — An error such as "Stream writer is not available outside of a LangGraph execution context" or a missing attribute error on runtime.stream_writer.
Recovery — Ensure the tool is only used inside a LangGraph graph, or conditionally check for the presence of stream_writer before using it.

Version Mismatch for execution_info or server_info Features

Trigger — Code uses runtime.execution_info or runtime.server_info but the installed library versions are older than deepagents>=0.5.0 or langgraph>=1.1.5. The source includes <Note> blocks specifying these version requirements.
Guard — No runtime version check is shown; the guard is only the version requirement in documentation.
Posture — Fail‑soft: the attributes may not exist and accessing them raises AttributeError, aborting that part of the tool. Alternatively, the tool may fail entirely if the missing attribute is required.
Operator signal — An AttributeError like 'ToolRuntime' object has no attribute 'execution_info' or a silent failure.
Recovery — Upgrade deepagents to >=0.5.0 or langgraph to >=1.1.5, or add a try‑except block (not present in source) to handle missing attributes.

07. Testing And Operations

I’m sorry, but I cannot provide that narration. The necessary details about testing schemas with many sample inputs, compliance metrics, token efficiency, retry metrics, or a fallback chain from strict mode to a deterministic parser are not available in the provided material.

Generate it: The necessary details about testing schemas with many sample inputs, compliance metrics, token efficiency, retry metrics, or a fallback chain from strict mode to a deterministic parser are not available in the provided _________. (cue: the source the chapter says it lacks; answer: material)

Ask yourself: This chapter states its details are not available in the provided material — so why is "the source does not cover it" the honest answer here?

Recall check (try before reading the answer):

What does this chapter say about testing schemas, metrics, and a fallback chain? — ____________________________________________________________________ Answer: The necessary details about testing schemas with many sample inputs, compliance metrics, token efficiency, retry metrics, or a fallback chain from strict mode to a deterministic parser are not available in the provided material.

Tool returning order status directly to user without further LLM processing.

python

@tool(return_direct=True)
def fetch_order_status(order_id: str) -> str:
    """Fetch the current status of a customer order."""
    return f"Order {order_id} is shipped and will arrive in 2 days."

agent = create_agent(
    ChatOpenAI(model="ollama:devstral-2"),
    tools=[fetch_order_status],
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What is the status of order #12345?"}]
})

ELI5 — the plain-language version

Think of the tool system like a self-service checkout at a grocery store. You scan an item, and the machine immediately shows the price and prints your receipt — no need to wait for a cashier to look up the price, walk to the shelf, and come back. In LangChain, the return_direct mechanism works the same way: when a tool like fetch_order_status(order_id: str) is called, the agent stops its reasoning loop right there and hands the tool’s output directly to you as the final answer. Instead of making an extra expensive call to the model to paraphrase the result, the agent simply returns the plain text or object the tool produced. Without this mechanism, the agent would always try to rephrase the tool’s answer — even when the tool already gave a perfect, ready-to-display response like “Order 12345 is shipped and will arrive in 2 days.” The failure a beginner would feel is delay and unpredictability: the model might change the wording, add unnecessary fluff, or even misinterpret the result, making a simple lookup take longer and feel less trustworthy.

System design — mechanism, invariant, trade-off

The headless tools subsystem defines a two‑phase execution order that separates schema from implementation. First, a developer registers a tool by calling tool(name, description, args_schema) on the server, producing a HeadlessTool with no executable body. This tool is provided to create_agent so the model sees a valid callable. When the model issues a tool call, the graph interrupts rather than executing locally, emitting a structured payload with the tool_call identifier. The client application (typically a browser) detects this interrupt via the supported JS SDK hooks, runs the matching implementation attached with .implement(...), and then submits a resume command to continue the graph. The optional onTool callback fires start, success, and error lifecycle events for UI feedback. On failure, the error event signals the client to handle retries or propagate the failure, and the resume flow can carry an error result back to the graph.

The design preserves a clear environment boundary invariant: tool logic never executes on the server. The guarantee is that all headless tool operations are performed where the user's app runs—typically the browser—so data that depends on device‑specific APIs, privacy constraints, or local state stays client‑side. This is not an exactly‑once guarantee; the interrupt/resume handshake can be retried, but the fundamental invariant is that the server cannot accidentally run the tool’s logic. The separation ensures that even if the server’s process is restarted or the agent is re‑executed, the tool’s real work never leaks into the server’s memory or network.

The key trade‑off rejects the obvious alternative of embedding all client‑side logic into ordinary server‑side tools. That alternative would require the server to host browser APIs (geolocation, IndexedDB, file pickers) and would force sensitive data to leave the device, increasing latency, privacy risk, and infrastructure complexity. By accepting the interrupt/resume round trip, the design avoids the cost of server‑side environment emulation and keeps the tool implementation lightweight, typed, and local. The overhead of a short pause and resume is deemed acceptable for the gain of zero data exfiltration and the ability to use any client‑only API.

A concrete failure mode occurs when the client side does not attach an .implement(...) handler for a headless tool. The model calls the tool, the graph interrupts with a payload containing {"id", "name", "args"}, but no client code submits a resume command. An operator monitoring the graph would see a stalled run with an unresolved interrupt, the tool call details sitting in the payload, and no ToolMessage ever appended to the message list. The onTool callback never fires a success or error event, leaving the agent in a permanent paused state. The operator would need to inspect the interrupt payload and either manually resume or implement the missing client handler.

Failure modes — what breaks, what catches it

Accessing server_info attributes when not on LangGraph Server

Trigger – A tool calls runtime.server_info.assistant_id without first checking whether server_info is None. This occurs when the tool runs locally or during testing, not on LangGraph Server.
Guard – The example get_assistant_scoped_data uses the condition if server is not None: before accessing server.assistant_id. If a tool omits this guard, there is no other protection shown in the source.
Posture – Fail-hard. An AttributeError is raised on the None object, aborting the entire tool call and likely the run.
Operator signal – AttributeError: 'NoneType' object has no attribute 'assistant_id' (or similar) in the logs.
Recovery – Add the if server is not None: check before using any server.* attributes, or move execution to LangGraph Server.

Using reserved parameter names config or runtime unintentionally

Trigger – A tool function defines a parameter with the name config or runtime that is not the special ToolRuntime‑typed injection. The source states “Using these names will cause runtime errors.”
Guard – No explicit guard is shown; the runtime error is the consequence.
Posture – Fail-hard. A runtime error (likely TypeError or NameError) occurs when the tool is invoked.
Operator signal – A runtime error during tool instantiation or invocation, e.g., TypeError: got multiple values for argument 'config'.
Recovery – Rename the parameter to avoid the reserved name. Use the ToolRuntime type annotation when you need access to runtime information.

Stream writer used outside LangGraph execution context

Trigger – A tool calls runtime.stream_writer while the tool is not running inside a LangGraph graph (e.g., in a plain Python script or test).
Guard – No guard is shown in the source. The note only warns “must be invoked within a LangGraph execution context.”
Posture – Fail-hard. The stream writer invocation will raise an exception (not specified) and abort the tool.
Operator signal – An exception (likely RuntimeError) with a message about missing graph context, or a None attribute error on the stream writer.
Recovery – Ensure the tool is only called from within a LangGraph execution context (e.g., wrapped in a langgraph. graph. invoke).

Using execution_info or server_info with incompatible library version

Trigger – A tool references runtime.execution_info or runtime.server_info while the installed deepagents is below 0.5.0 or langgraph is below 1.1.5.
Guard – No guard is shown; the source only notes the version requirement.
Posture – Fail-hard. The attribute does not exist on the ToolRuntime object, raising an AttributeError.
Operator signal – AttributeError: 'ToolRuntime' object has no attribute 'execution_info' (or server_info).
Recovery – Upgrade deepagents to >=0.5.0 or langgraph to >=1.1.5 as required.

HeadlessTool interrupt not resumed

Trigger – A tool is defined with only name, description, and args_schema (creating a HeadlessTool). The model issues a tool call, the run interrupts, but the application lacks the client‑side hooks to detect and resume the interrupt.
Guard – The onTool callback can observe lifecycle events (start, success, error) for UI feedback, but the source does not show it handling the resume. The JS SDK hooks are mentioned as a detection mechanism, but no Python-side guard exists.
Posture – Fail-hard. The graph remains interrupted and never receives the tool result.
Operator signal – The run is stuck in an “interrupt” state; no tool output is produced, and the run does not proceed.
Recovery – Implement the frontend hooks described in the headless tools pattern (e.g., using useStream) to run the client implementation and submit a resume command, or manually resume the graph.

08. Where It Scales And Breaks

Structured outputs work well when you define clear inputs and outputs. You create them with a function and a docstring. The tool decorator makes this easy. But there are places where they break. Using spaces in a tool name causes errors with some providers. So you must use underscores instead. That is a simple rule. Another trade-off comes with the return direct setting. This makes the tool output go straight back to the user. It skips the model for further processing. That is fast for simple replies. But for multi-turn tasks you might lose important reasoning. Stateful tools help with longer conversations. They can access the current messages and custom fields. That scales well for short-term memory. The source does not cover non-text targets or very deep nesting. It also does not mention multi-modal cases. For schemas that change every request, you would need a new tool definition each time. The basic tool decorator handles fixed schemas best. Overall, structured outputs shine when you keep names simple and use clear descriptions. They struggle when you need to change the input shape on every call or handle complex media.

Generate it: The ______ direct setting sends a tool's output straight back to the user, skipping the model for further processing — fast for simple replies, but you may lose reasoning on multi-turn tasks. (cue: the setting that bypasses the model on the way out; answer: return)

Generate it: For schemas that change on every request, the basic tool decorator struggles because it handles _____ schemas best. (cue: schemas that don't change between calls; answer: fixed)

Ask yourself: The chapter says structured outputs "shine" in some cases and "struggle" in others — what property of your inputs decides which side you land on?

Recall check (try before reading the answer):

What happens if you use spaces in a tool name? — ___________________________________ Answer: Using spaces in a tool name causes errors with some providers, so you must use underscores instead.

What do stateful tools help with, and what can they access? — ______________________________________________________________________ Answer: Stateful tools help with longer conversations; they can access the current messages and custom fields.

When do structured outputs struggle? — ______________________ Answer: They struggle when you need to change the input shape on every call or handle complex media.

Return_direct tool skips model processing for immediate answers.

python

@tool(return_direct=True)
def fetch_order_status(order_id: str) -> str:
    """Fetch the current status of a customer order."""
    return f"Order {order_id} is shipped and will arrive in 2 days."

agent = create_agent(
    ChatOpenAI(model="ollama:devstral-2"),
    tools=[fetch_order_status],
)

ELI5 — the plain-language version

Think of structured outputs like a well-organized restaurant menu: each dish has a clear name, description, and list of ingredients (inputs) and what you get (outputs). The @tool decorator in LangChain is the chef’s recipe card that makes this easy—you write a function with type hints and a docstring, and the system knows exactly how to call it. But small choices can break the flow. For example, if you put a space in a tool name (like "order pizza"), some providers choke on it; you must use underscores ("order_pizza") to keep the kitchen running. Another quirk is the return_direct setting—it lets the tool’s result go straight to the customer without further cooking by the model. That’s fast for simple answers, but for a multi‑course conversation you lose the chance to reason, taste, or adjust the dish. Without these structured rules, the system would serve up errors when it hits a space, or skip crucial thinking steps, leaving you with a half‑baked reply that doesn’t fit the full order.

System design — mechanism, invariant, trade-off

The subsystem begins with the @tool decorator, which converts a function and its docstring into a callable tool with a typed schema. The model receives the tool definition and decides, based on conversation context, when to invoke it. Upon invocation, the tool executes and its output is wrapped in a ToolMessage. In the normal flow, that message is returned to the model for further reasoning. However, when return_direct=True is set on a tool, the output is sent straight back to the user as the final answer, bypassing any additional model call. If multiple tools are called in a single turn, return_direct only takes effect when all of those tools have it set to True. On error, the wrap_tool_call middleware from langchain.agents.middleware intercepts the failure, allowing retries or custom error messages to be returned instead of a crash.

The design preserves an invariant around naming: tool names must use snake_case (alphanumeric characters, underscores, hyphens). Spaces or special characters in a tool name are rejected by some model providers with errors, so the invariant guarantees that any tool created via @tool obeys this rule. Additionally, for return_direct=True, an invariant holds that the tool’s output is the complete, user-ready answer and does not require further reasoning—otherwise the feature is unsuitable. This invariant is explicitly stated: “the agent stops looping and returns the tool's output as the final response, bypassing any additional model call”, and it must hold for all tools in a batch for the shortcut to activate.

The key trade-off is embodied by return_direct=True: it trades determinism and latency against model reasoning. It rejects the obvious alternative of always feeding the tool result back to the model for rephrasing or chaining, which would add an extra LLM call and could rewrite or summarize the output. The cost avoided is that unnecessary model call—saving tokens and time—and the risk of the model distorting a correct, ready-to-display answer. The trade-off is accepted because for lookup-style results (e.g., order status) the output is final and no reasoning is needed. For multi-turn or chained tasks, the alternative is required, so return_direct=False remains the default.

A concrete failure mode arises when a developer uses a tool name with spaces (e.g., @tool("Web Search")). The failure is triggered when the model attempts to call the tool: the provider’s API returns an error indicating an invalid or malformed tool name. An operator monitoring logs in LangSmith would see a ToolCallRequest that was never translated into execution, accompanied by an error message from the chat model provider, such as “Invalid tool name: must match pattern ...”. Alternatively, if the tool is defined as a HeadlessTool (with only name, description, and args_schema), calling .implement() on the Python side would raise an AttributeError because the method does not exist, signaling that the tool is schema-only and must be implemented client-side.

Failure modes — what breaks, what catches it

Reserved Parameter Name Collision

Trigger — Using config or runtime as an explicit argument name in a tool function definition.
Guard — No exception handler in the source; only a documentation warning stating “Using these names will cause runtime errors” and a note that the names are reserved for internal use.
Posture — Fail‑hard – the runtime error immediately aborts the tool execution and the run.
Operator signal — A runtime error (likely TypeError or ValueError) raised by the tool‑creation or invocation logic, with no custom error message.
Recovery — The developer must rename the conflicting parameter; no automatic retry or fallback exists.

Spaces in Tool Name

Trigger — Providing a tool name that contains a space (e.g., @tool(name="my tool") or implicitly via function name with spaces).
Guard — No guard present in source. The documentation only advises to use underscores instead.
Posture — Fail‑hard – the model provider rejects the tool schema, causing the model call to fail and aborting the run.
Operator signal — An error from the chat model provider (e.g., “Invalid tool name: …”) or a schema‑validation failure.
Recovery — Manually rename the tool using underscores; no retry or fallback is provided.

Old Injection Pattern Usage

Trigger — Using InjectedState, InjectedStore, get_runtime(), or InjectedToolCallId in a tool signature (the deprecated injection mechanisms).
Guard — A migration note directing users to “Migrate from older injection patterns”, but no runtime guard or fallback.
Posture — Fail‑hard – the tool fails to register or execute because the injection mechanism is no longer supported.
Operator signal — A runtime error (e.g., AttributeError or TypeError) when the tool is invoked, or an import warning.
Recovery — Rewrite the tool to use the runtime: ToolRuntime parameter instead; no automatic retry.

Version Mismatch for Execution Info

Trigger — Calling runtime.execution_info on a version of the library older than deepagents>=0.5.0 or langgraph>=1.1.5.
Guard — A documentation note stating the requirement, but no try‑except block or compatibility shim in the source.
Posture — Fail‑hard – the attribute access raises an AttributeError, aborting the tool.
Operator signal — AttributeError: 'ToolRuntime' object has no attribute 'execution_info' (or similar).
Recovery — Upgrade deepagents or langgraph to the required minimum version.

Unhandled Headless Tool Interrupt

Trigger — The model issues a tool call for a HeadlessTool (created by tool(...) with only name, description, and args_schema) and the application has not provided a client‑side implementation or resume command.
Guard — The optional onTool callback for lifecycle events (e.g., to show spinners), but no automatic fallback or error catch.
Posture — Fail‑soft – the run interrupts (pauses) instead of continuing; it degrades because the tool is not executed, but the graph can be resumed once a client responds.
Operator signal — An interrupt in the graph execution; the tool call payload is available for inspection, and no error is raised.
Recovery — The application must inspect the interrupt, implement the tool action in the appropriate environment, then submit a resume command.

Stream Writer Used Outside LangGraph Context

Trigger — Calling runtime.stream_writer inside a tool that is invoked outside a LangGraph execution context (e.g., during local testing or in a standalone script).
Guard — No guard present in source. Only a documentation note that it “must be invoked within a LangGraph execution context.”
Posture — Fail‑hard – the stream writer operation fails, likely raising an AttributeError or RuntimeError, aborting the tool.
Operator signal — An error like "Stream writer is not available in this context" or an AttributeError when trying to write.
Recovery — Ensure the tool is only called from within a LangGraph graph execution; no automatic retry.

Glossary — the domain terms, grounded in the code

14terms, each defined from this subsystem’s real source.

@tool

The @tool decorator marks a Python function as a LangChain tool, enabling it to receive runtime context—such as ToolRuntime—injected as a parameter for accessing execution info, server info, or other session data during tool execution.

Memory hook @tool is the backstage pass that lets your function grab execution info and server data.

From langchain-tools.md

ToolRuntime

ToolRuntime is a parameter that can be added to a tool’s signature to access runtime information such as state, context, store, execution info, and server info; it is automatically injected into the tool call and hidden from the LLM.

Memory hook ToolRuntime is an invisible backstage pass for tools to access state, context, and store, hidden from the LLM.

From langchain-tools.md

args_schema

args_schema is a parameter accepted by the @tool decorator that provides a schema—either a JSON schema dictionary or a Pydantic model—defining the tool's expected arguments and their types, defaults, and descriptions.

Memory hook args_schema is the cookie cutter that shapes the tool's arguments before they're used.

From langchain-tools.md

Command

A Command is an object returned from a tool that updates graph state via its `update` field, optionally including a ToolMessage for the model to see, and is used when the tool mutates agent state rather than just returning data.

Memory hook Command is a state-editing wand: it inscribes updates directly onto the graph.

From langchain-tools.md

InjectedState

InjectedState is an older injection pattern for tool functions that provides access to conversation state via a parameter of type `InjectedState`, but it has been replaced by `ToolRuntime` which offers a single explicit interface to state, context, store, and execution metadata.

Memory hook InjectedState is the needle that injects conversation state directly into your tool's function.

From langchain-tools.md

RunnableConfig

RunnableConfig is an object accessible through the ToolRuntime's Config component that provides callbacks, tags, and metadata for the execution.

Memory hook RunnableConfig is the tool's runtime control panel, holding callbacks, tags, and metadata for execution.

From langchain-tools.md

ToolMessage

ToolMessage is a message class from langchain.messages used to represent the output of a tool invocation, such as converting tool exceptions into a structured response with a tool_call_id that the model can process.

Memory hook ToolMessage is a tool's rescue flare: it turns a crash into a structured message the model can read.

From langchain-tools.md

Stream Writer

Stream Writer is a component accessed via `runtime.stream_writer` that lets tools emit real-time custom updates during execution, providing progress feedback to users.

Memory hook Stream Writer is a live progress ticker streaming updates from inside a running tool.

From langchain-tools.md

Execution Info

Execution Info is a property on the `ToolRuntime` object that provides the current execution's thread ID, run ID, and node attempt number, accessible inside a tool via `runtime.execution_info`.

Memory hook Execution Info is your tool’s backstage pass – thread, run, and attempt in one object.

From langchain-tools.md

Server Info

Server Info is a property of the ToolRuntime object that provides server-specific metadata such as assistant ID, graph ID, and authenticated user when the tool runs within a LangGraph Server context, and is None otherwise.

Memory hook Server Info is the tool's backstage pass, revealing assistant and graph IDs only on LangGraph Server.

From langchain-tools.md

get_runtime

The `get_runtime()` function is an older injection pattern that was previously used to access runtime state, store, context, and execution metadata within tools, but it has been replaced by the `ToolRuntime` parameter for explicit injection.

Memory hook get_runtime is the obsolete backstage pass that ToolRuntime replaced with a direct mic.

From langchain-tools.md

config (reserved)

In the LangChain tool system, config is a reserved parameter name used to pass a `RunnableConfig` object to tools internally, providing access to callbacks, tags, and metadata during execution.

Memory hook Like a backstage pass, reserved `config` grants tools internal access to callbacks, tags, and metadata.

From langchain-tools.md

runtime (reserved)

"runtime" is a reserved parameter name that, when included in a tool signature, is automatically injected with a `ToolRuntime` object to provide access to runtime information such as state, store, execution info, and server info, and it is hidden from the LLM's tool schema.

Memory hook Runtime is a secret waiter automatically filling your tool's state and store orders behind the LLM's back.

From langchain-tools.md

tool_name

In the codebase, `tool_name` is the string identifier assigned to a tool, either automatically derived from the function name or overridden via the `@tool("custom_name")` decorator, and it is the name the model uses to invoke that tool.

Memory hook tool_name is the call sign the model uses to summon the tool.

From langchain-tools.md

Structured Outputs

01. What Structured Outputs Are

1. Missing Type Hints on Tool Arguments

2. Reserved Parameter Name Used (runtime or config)

3. Attempting to Access runtime.execution_info on an Unsupported Version

4. Using server_info When Not Running on LangGraph Server

5. Pydantic Schema Validation Failure Inside a Tool

6. Accidental Creation of a HeadlessTool Without Local Execution

02. Schemas And Pydantic Models

03. Constrained Decoding

1. Reserved Parameter Name Collision

2. Stream Writer Used Outside LangGraph Context

3. Accessing Execution Info Without Required Package Version

4. Server Info Accessed During Local Development

5. Headless Tool Interrupt Without Client‑Side Handler

04. A Worked Example

05. Trade-Offs And Alternatives

Reserved Parameter Name config or runtime in Tool Signature

Using runtime.stream_writer Outside a LangGraph Execution Context

Assuming runtime.server_info Is Not None When Running Locally

Calling tool(...) in Python Expecting a Client‑Side .implement() API

Directly Indexing runtime.state Without Fallback

06. Failure Modes

07. Testing And Operations

08. Where It Scales And Breaks

Reserved Parameter Name Collision

Spaces in Tool Name

Old Injection Pattern Usage

Version Mismatch for Execution Info

Unhandled Headless Tool Interrupt

Stream Writer Used Outside LangGraph Context

Glossary — the domain terms, grounded in the code

@tool

ToolRuntime

args_schema

Command

InjectedState

RunnableConfig

ToolMessage

Stream Writer

Execution Info

Server Info

get_runtime

config (reserved)

runtime (reserved)

tool_name

2. Reserved Parameter Name Used (`runtime` or `config`)

3. Attempting to Access `runtime.execution_info` on an Unsupported Version

4. Using `server_info` When Not Running on LangGraph Server

6. Accidental Creation of a `HeadlessTool` Without Local Execution

Reserved Parameter Name `config` or `runtime` in Tool Signature

Using `runtime.stream_writer` Outside a LangGraph Execution Context

Assuming `runtime.server_info` Is Not None When Running Locally

Calling `tool(...)` in Python Expecting a Client‑Side `.implement()` API

Directly Indexing `runtime.state` Without Fallback