Tool Calling — Deep Dive

01. What Tool Calling Is

Tool calling lets a large language model ask a host application to run an external function. The model never executes anything itself. Instead, it returns a structured request that names the tool and supplies its arguments, and the application then runs that tool and sends the result back. This solves a key limitation. A stateless model would otherwise have to guess at live data, real computations, or actions it cannot perform on its own. With tool calling, it can fetch current information from the web. It can also run code, query external databases, or take real actions in the world. The tool is simply a callable function with well-specified inputs and outputs. The model decides when to invoke a tool based on the conversation context, which makes agents far more capable than before. They are no longer limited to their training data. They gain live facts, real math, and the power to change things. The trade is that the application must host and secure every tool. The model can only request a call; it never runs anything, so the host decides if and when to execute. Still, the result is a remarkably powerful way to give a static language model genuinely dynamic abilities.

Generate it: With tool calling, the model never runs the function itself — instead it returns a structured r______ that names the tool and supplies its arguments. (cue: r______; answer: request)

Generate it: Tool calling exists to fix one limitation: a s_______ model would otherwise have to guess at live data, real computations, or actions it cannot perform on its own. (cue: s_______; answer: stateless)

Ask yourself: If the model only emits a request and never executes anything, who actually runs the tool — and why does that split matter for security?

Recall check (try before reading the answer):

What does the model hand back instead of running the tool directly? — ____________________________________ Answer: A structured request that names the tool and supplies its arguments; the application then runs the tool and sends the result back.

What core limitation does tool calling solve? — ____________________________________ Answer: A stateless model would otherwise have to guess at live data, real computations, or actions it cannot perform on its own.

What is the trade for giving a static model dynamic abilities? — ____________________________________ Answer: The application must host and secure every tool; the model can only request a call, so the host decides if and when to execute.

A tool is defined with the @tool decorator and an agent uses it to answer a user query, showing the model requesting a tool call.

python

from langchain.tools import tool
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

@tool(return_direct=True)
def fetch_order_status(order_id: str) -> str:
    """Fetch the current status of a customer order."""
    return f"Order {order_id} is shipped and will arrive in 2 days."

agent = create_agent(
    ChatOpenAI(model="google_genai:gemini-3.5-flash"),
    tools=[fetch_order_status],
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What is the status of order #12345?"}]
})

ELI5 — the plain-language version

Imagine a librarian who can read every book ever written but cannot walk to the shelves. When you ask for a specific report, she doesn’t fetch it herself—she scribbles a detailed request slip with the title and location, then hands it to a runner who brings back the actual document. That’s exactly how tool calling works. The large language model never executes code or pulls live data. Instead, it returns a structured request that names a tool (like a search function or a database query) and provides the required arguments. The host application then runs that tool—for example, querying an external database for real-time information—and sends the result back to the model. This mechanism lets the model access up‑to‑the‑minute facts, run calculations, or take real actions in the world. Without tool calling, the model would be that brilliant librarian stuck at her desk, forced to guess everything. Ask it for today’s weather, and it would confidently invent a sunny forecast from old training data. No verification, no live updates, no real actions—just static guesses that quickly become useless.

System design — mechanism, invariant, trade-off

In the tool‑calling subsystem, the ordered mechanism begins when the chat model returns a structured ToolCallRequest that names a tool and its arguments. The host application intercepts this request; if the tool is a HeadlessTool (defined only by name, description, and args_schema without a local implementation), the graph execution interrupts immediately rather than executing the tool locally. Your app inspects the payload, performs the action in the appropriate environment (e.g., a browser or human review), and then resumes the graph by feeding back the tool result. If the tool is locally defined with the @tool decorator, the node function runs the tool directly, wraps its output in a ToolMessage, and continues the graph. An optional onTool callback observes lifecycle events (start, success, error) to provide UI feedback. When return_direct=True is set on a tool, the tool’s output becomes the final response without an additional model call, halting the agent loop early.

The invariant that the design preserves is the state‑update contract enforced by the StateGraph framework: every node (including tool‑calling nodes) must emit updates to the State that are applied via user‑defined reducer functions, and control flow must respect the graph’s static and dynamic edges. This guarantees that the graph’s execution is always a deterministic function of the current state and the messages in transit. No tool call can sidestep the state‑update mechanism; even Command returns combine update and goto in a single atomic step. This prevents partial or inconsistent state modifications and ensures that checkpointers (set at compile time) can capture consistent snapshots for fault tolerance.

The key trade‑off is headless tools versus locally hosted tools. The design explicitly rejects the obvious alternative of executing every tool call within the graph’s own runtime (which would require shipping all tool logic to the model provider or hosting code server‑side). Instead, headless tools allow the application to interrupt the graph and delegate execution to an external service, a browser, or a human approval step. The cost this rejection avoids is the deployment complexity and security risk of running arbitrary third‑party code inside the agent’s process, as well as the inability to reuse existing enterprise systems without rewriting them. The trade‑off is latency: interrupting the graph and resuming it adds network round‑trips, but gains flexibility and separation of concerns.

A concrete failure mode occurs when the external service called by a headless tool crashes mid‑execution. The operator will see that the graph remains suspended at an interrupt point — the stream never receives a resume command and the onTool callback fires an error event with no ToolMessage returned. If a locally defined tool throws an exception, the middleware wrap_tool_call can catch it and either retry the call or return a custom error ToolMessage; the operator would see the error message in the trace logs and the agent may loop indefinitely if retry logic is misconfigured. In either case, the graph does not silently produce corrupted state because the state‑update contract enforces that no update is applied until the tool result is properly returned or the error is handled.

Failure modes — what breaks, what catches it

Failure 1: Tool output exceeds maximum transcript length

Trigger — The tool returns an observation string longer than 8 000 characters.
Guard — The conditional if len(observation) > 8000: inside the agentic search graph truncates the observation and appends the literal marker "\n… (observation truncated)".
Posture — Fail‑soft. The graph continues execution with the truncated observation; no run is aborted.
Operator signal — The observation string ends with "\n… (observation truncated)". The operator sees the truncated content in the transcript.
Recovery — No retry or backoff. The truncated value is used as the tool result for the current step. The graph proceeds to the next node.

Failure 2: Maximum turns exhausted without a definitive answer

Trigger — The agentic search graph iterates through tools for a configurable number of turns (max_turns) without reaching an explicit answer.
Guard — The fallback assignment fallback = transcript[-1] if transcript else "" and the subsequent run.end(outputs={"answer": fallback, "steps": max_turns, "total_tokens": acc_tokens, "total_cost_usd": acc_cost, "exhausted": True}).
Posture — Fail‑soft. The run ends gracefully with the last observation as a best‑effort answer, and the exhausted flag is set to True.
Operator signal — The exhausted field in the run output is True, and the answer field contains the truncated transcript or an empty string.
Recovery — No automatic retry. The user or orchestrator receives the best‑effort answer and can decide to re‑invoke with different parameters.

Failure 3: Recursion / super‑step limit approached (proactive degradation)

Trigger — The graph’s recursion limit is near exhaustion (e.g., state["remaining_steps"] <= 2), as tracked by the managed value RemainingSteps.
Guard — The conditional edge route_decision that reads state["remaining_steps"] and returns "fallback_node" instead of "reasoning_node". The fallback_node function returns {"messages": ["Reached complexity limit, providing best effort answer"]}.
Posture — Fail‑soft. The graph routes to a dedicated fallback node and completes normally without raising an exception.
Operator signal — The message list contains "Reached complexity limit, providing best effort answer" from the fallback_node. No error trace is raised.
Recovery — The graph finishes its execution path to the END node. The user receives the fallback message.

Failure 4: Recursion / super‑step limit exceeded (reactive catch)

Trigger — The graph exhausts its recursion_limit (e.g., passed as {"recursion_limit": 10}) and raises a GraphRecursionError.
Guard — The except GraphRecursionError as e: block from the reactive example (outside the graph invocation). This guard catches the error and assigns result = {"messages": ["Fallback: recursion limit exceeded"]}.
Posture — Fail‑hard (graph execution is aborted), then fail‑soft handled externally. The graph run stops with an exception, but the calling code provides a fallback result.
Operator signal — A GraphRecursionError exception is raised. The caller’s catch block prints or logs the error message. The final result is the fallback dictionary.
Recovery — No retry. The fallback dictionary is returned to the user. A manual re‑invocation with an increased recursion limit is required.

Failure 5: Tool uses `runtime.stream_writer` outside a LangGraph execution context

Trigger — A tool contains a call to runtime.stream_writer but is invoked outside a LangGraph graph execution (e.g., during local testing or from a plain Python script).
Guard — No guard shown in the source. The documentation only warns: “the tool must be invoked within a LangGraph execution context.” No try/except, conditional check, or fallback is provided in the tool code exhibited.
Posture — Fail‑hard. The stream writer operation will likely raise an exception (e.g., AttributeError or RuntimeError) because the runtime context is missing.
Operator signal — An undefined error or exception is raised at the point of the stream_writer call. The operator sees a traceback with no specific service‑side signal.
Recovery — No automatic recovery. The developer must modify the caller to run the tool within a proper LangGraph execution context (e.g., by invoking the compiled graph).

Failure 6: `server_info` is `None` when tool expects it on LangGraph Server

Trigger — A tool accesses runtime.server_info but is running locally or during testing, not on LangGraph Server, causing server_info to be None.
Guard — The explicit if server is not None: check in the example tool get_assistant_scoped_data. When the guard is present, it avoids AttributeError by skipping the print statements.
Posture — Fail‑soft. The tool runs without error; the server‑dependent code is omitted silently.
Operator signal — No print output for assistant, graph, or user identity. The operator may notice the absence of those fields but no error is raised.
Recovery — The tool returns "done" normally. If the server info is required, the operator must run the graph on LangGraph Server. No retry mechanism is present in the source.

02. Defining A Tool

A tool is defined for the model with three parts: a name, a description, and a parameter schema. The model reads the description to decide when to reach for that tool. The schema declares the type of each argument, such as whether it expects a number or a block of text. A clear description helps the model pick the right tool, and a tight schema makes sure it passes valid arguments. Without a clear description, the model might call the tool at exactly the wrong moment. Without a tight schema, it might supply badly shaped input. That is why both parts matter. The description works as a guide, while the schema acts as a checklist that tells the model precisely what it must provide. Together they keep every tool call accurate and useful. So the trade is simple. A vague description leads to confusion, and a loose schema leads to errors. But when both are crafted carefully, the model works smoothly and reliably.

Generate it: Of a tool's three parts, the model reads the d__________ to decide when to reach for that tool. (cue: d__________; answer: description)

Generate it: The other part, the parameter s_____, declares the type of each argument and makes sure the model passes valid input. (cue: s_____; answer: schema)

Ask yourself: The description tells the model when to call a tool; the schema constrains what it passes. Which failure does a vague description cause, and which does a loose schema cause?

Recall check (try before reading the answer):

Which of the three parts does the model read to decide when to use a tool? — ____________________________________ Answer: The description — the model reads it to decide when to reach for that tool.

What does the schema do that the description does not? — ____________________________________ Answer: The schema declares the type of each argument and makes sure the model passes valid arguments.

What goes wrong when each part is weak? — ____________________________________ Answer: A vague description leads to confusion (the model calls at the wrong moment); a loose schema leads to errors (badly shaped input).

A tool defined with @tool uses the function name, docstring, and type hints to provide the name, description, and parameter schema.

python

from langchain.tools import tool

@tool
def search_database(query: str, limit: int = 10) -> str:
    """Search the customer database for records matching the query.

    Args:
        query: Search terms to look for
        limit: Maximum number of results to return
    """
    return f"Found {limit} results for '{query}'"

ELI5 — the plain-language version

Imagine a waiter holding a menu: each item has a name, a brief description, and a list of required details like "size" or "toppings." That is exactly how a tool is defined for the model. You give the tool a name, a clear description (written in the function's docstring), and a parameter schema built from Python type hints—saying whether an argument must be a number, a block of text, or something else. The model reads those three pieces like a waiter reads the menu. When a user says "Find me a customer," the model checks the description "Search the customer database" and decides this tool fits; then it looks at the schema to know it needs a string for the query and an integer for the limit. Without a sharp description, the model might try to use the search tool when the user actually wants weather—a wrong order. Without a tight schema, the model could pass "limit" as "ten" instead of 10, breaking the call. That confusion leaves the user with a useless reply or an error, just like getting burnt spaghetti when you asked for pizza.

System design — mechanism, invariant, trade-off

The tool-definition subsystem proceeds through a fixed order: the developer first annotates a Python function with the @tool decorator, which reads the function’s docstring as the tool’s description and the function’s type hints (e.g., query: str, limit: int) as the input schema. When the graph runs, the chat model receives the tool’s name, description, and schema. The model decides when to invoke a tool based on the conversation context, and what input arguments to provide. On success, the tool executes; on failure – for instance, if the model supplies an argument that violates the type hint – the runtime rejects the call or the tool returns an error. The entire flow is supervised by Command objects (from langgraph.types) that can update state, route to nodes, or interrupt execution.

The invariant preserved is that the tool’s input schema is always derived from the function’s type hints. The source explicitly states: “Type hints are required as they define the tool’s input schema.” This guarantee ensures that every tool call carries arguments whose types match the declared schema, preventing silent type mismatches. There is no exactly‑once or idempotency guarantee here; the invariant is purely structural – the schema is bound to the function signature at definition time, and no runtime code can bypass it.

The key trade‑off is declarative schema via type hints versus manual schema definition. The alternative rejected is writing a separate args_schema object (e.g., a Pydantic model) by hand. By choosing the @tool decorator approach, the system avoids the cost of schema drift: if the function’s signature changes but the manual schema is not updated, the model would receive contradictory or stale argument types. The decorator automatically regenerates the schema from the type hints, ensuring they remain in sync. This rejection also eliminates the developer’s burden of maintaining two descriptions (docstring vs. custom description field) – the docstring single‑handedly supplies the tool’s purpose.

A concrete failure mode is misleading docstring leading to incorrect tool invocation. For example, if search_database has a docstring that says “search customer database” but omits the fact that the limit parameter caps results, the model might call the tool with an extremely high limit, expecting exhaustive results. The operator would see a spike in tool‑call errors or unusually long responses, visible in LangSmith traces (as recommended in the source). The trace would show the model calling search_database with limit=1000, and the tool returning a truncated or error‑laden result. The signal is a high error rate logged under the tool’s name in the monitoring dashboard, alongside mismatched input arguments relative to typical usage.

Failure modes — what breaks, what catches it

1. Excessive tool invocations due to ambiguous description

Trigger — The model’s description is vague or overly broad, causing it to call the tool repeatedly even when a direct answer is available or a different tool would be more appropriate.
Guard — The RemainingSteps managed value is monitored in nodes such as reasoning_node or agent_with_monitoring, and a conditional edge (route_decision) checks state["remaining_steps"]. When that value drops to 2 or fewer, execution diverts to fallback_node which returns a best-effort message like “Reached complexity limit, providing best effort answer.”
Posture — fail‑soft – the graph degrades gracefully, delivering a partial result instead of crashing.
Operator signal — The graph’s output contains a message such as “Approaching limit, returning partial result” or “Reached complexity limit, providing best effort answer”, and no exception is raised.
Recovery — The graph completes normally. The operator can inspect the final state and revise the tool’s description to reduce ambiguity for future runs.

2. Tool returns overly large observation (schema allows unbounded text)

Trigger — The parameter schema does not constrain the size of a string argument (e.g., a tool that retrieves documents can return thousands of characters).
Guard — In the agentic_search_graph.py implementation, after receiving observation from a tool call, the code checks if len(observation) > 8000 and truncates to observation[:8000], appending the marker "\n… (observation truncated)".
Posture — fail‑soft – the tool output is truncated, and execution continues with the shortened observation.
Operator signal — The observation emitted by the tool contains the literal string "\n… (observation truncated)", and the full original length is lost.
Recovery — No retry occurs; the truncated observation is used as‑is. If complete output is critical, the operator must tighten the schema (e.g., add a maxLength constraint) or implement pagination.

3. Tool uses `runtime.stream_writer` outside a LangGraph execution context

Trigger — A tool’s implementation references runtime.stream_writer for streaming, but the tool is invoked in an environment that is not a LangGraph run (e.g., during local testing or a direct script).
Guard — No guard exists in the provided source. The documentation note warns that “the tool must be invoked within a LangGraph execution context,” but there is no try/except or fallback.
Posture — fail‑hard – the tool raises a runtime error (likely RuntimeError or AttributeError), aborting the run immediately.
Operator signal — An unhandled exception traceback with a message indicating that stream_writer is not available.
Recovery — Manual deployment change: either remove the stream_writer dependency from the tool or ensure the tool runs inside LangGraph (e.g., via the compiled graph). No automatic retry.

4. Accessing `runtime.execution_info` with an outdated library version

Trigger — The tool calls runtime.execution_info (to obtain thread ID, run ID, or retry state) while the installed deepagents is below version 0.5.0 (or langgraph below 1.1.5).
Guard — No guard exists in the provided source. The documentation contains a <Note> stating the version requirement but no compatibility layer or conditional check.
Posture — fail‑hard – an AttributeError (or similar) occurs because the execution_info attribute is absent.
Operator signal — An exception traceback referencing 'ToolRuntime' object has no attribute 'execution_info'.
Recovery — Upgrade the package to deepagents>=0.5.0 (or langgraph>=1.1.5). No automatic fallback; the failure is final for that run.

5. Tool fails to produce a usable result, exhausting the turn budget

Trigger — The tool’s logic or the model’s call sequence does not yield a final answer after the maximum number of steps (e.g., the tool keeps returning intermediate data but never a conclusive response).
Guard — In agentic_search_graph.py, after the loop ends without an answer, the code sets fallback = transcript[-1] if transcript else "" and then calls run.end(outputs={"answer": fallback, "steps": max_turns, "total_tokens": acc_tokens, "total_cost_usd": acc_cost, "exhausted": True}).
Posture — fail‑soft – the last observation is returned as a best‑effort answer, and the run is marked exhausted.
Operator signal — The output object contains "exhausted": True and typically a short or truncated observation.
Recovery — No retry logic is shown; the run ends. The operator can investigate the transcript (transcript) and tool descriptions to improve clarity or loop‑breaking conditions.

03. The Round Trip

The model does not always answer directly. Instead, it may request a tool. The request carries a unique identifier along with the arguments for that tool, and this identifier is called the tool call identifier. The application then runs the tool, gets the result, and sends it back to the model tagged with the same identifier. That tagging lets the model match each result to the request it came from. The model then continues reasoning. It might ask for more tools if it needs them, and this back-and-forth repeats until the model produces a final reply. Throughout, the model decides when to use a tool based on the conversation context, and it also decides what arguments to provide. Each tool's description helps the model understand its purpose. The whole pattern is a round trip. The model calls out, the application runs the tool, and the result comes back. This lets the model fetch live data or take real actions in the world. The unique identifier keeps everything correlated, because without it the model might confuse one tool call for another. The cycle continues until the model has gathered enough information to answer. The trade is that the model does not respond immediately, yet in exchange it gains access to current data and the ability to run code. The model can ask for many tools across a single conversation, repeating this loop as many times as the task demands. Finally, once it has what it needs, it delivers the answer.

Generate it: Each tool request carries a unique i__________ called the tool call identifier, and the result comes back tagged with the same one. (cue: i__________; answer: identifier)

Generate it: The pattern is a round t___: the model calls out, the application runs the tool, and the result comes back — repeating until a final reply. (cue: t___; answer: trip)

Ask yourself: Why must the result be tagged with the same identifier as the request, rather than just sent back in order?

Recall check (try before reading the answer):

What keeps each result matched to the request it came from? — ____________________________________ Answer: The unique identifier — the result is tagged with the same identifier, so the model can match it; without it the model might confuse one tool call for another.

What is the cost of this round-trip pattern, and what is gained? — ____________________________________ Answer: The model does not respond immediately, yet in exchange it gains access to current data and the ability to run code.

How many tools can the model invoke before answering? — ____________________________________ Answer: As many as the task demands — it can ask for many tools across a single conversation, repeating the loop until it has enough to answer.

Looking back: In your own words, what is the model actually doing when it 'calls a tool' (from 'What Tool Calling Is')? Answer: It is asking a host application to run an external function — the model itself never executes anything.

Using tool_call_id to correlate tool results in a round trip.

python

from langchain.messages import ToolMessage
from langchain.tools import ToolRuntime, tool
from langgraph.types import Command

@tool
def set_user_preference(runtime: ToolRuntime, key: str, value: str) -> Command:
    """Set a user preference."""
    runtime.state["preferences"][key] = value
    return Command(
        update={"preferences": runtime.state["preferences"]},
        messages=[
            ToolMessage(
                content=f"Set preference {key} to {value}.",
                tool_call_id=runtime.tool_call_id,
            )
        ],
    )

ELI5 — the plain-language version

Imagine a busy diner where the chef (the model) can’t leave the kitchen. When a customer orders something complicated, the chef writes a ticket with a unique table number, including the exact order details, and hands it to a server. The server (your application) runs off to the kitchen or pantry (the tool), prepares the requested item, and brings it back—but crucially, the server places the finished dish on the same numbered ticket. The chef sees that number, knows exactly which order it belongs to, and can continue cooking the next course or declare the meal complete.

In the LangChain round-trip flow, the model does not always answer directly. Instead it may issue a tool call that includes a unique tool call identifier and the arguments. Your app executes the tool, then sends back the result tagged with that same identifier. The model matches the result to its original request and continues reasoning, possibly requesting more tools. This back-and-forth repeats until the model produces a final reply.

Without that identifier, the kitchen would be chaos: the chef would receive dishes but have no idea which order they came from, so the model could not correctly link a tool’s output to its prior request. It would either hallucinate an answer, ignore the tool result, or keep asking for the same tool forever—leaving the user with a broken, never-ending conversation.

System design — mechanism, invariant, trade-off

The round‑trip subsystem is governed by a two‑phase protocol: the model issues a tool call (with its unique identifier and arguments), the graph interrupts at that point, and the application inspects the payload from the interrupt before executing the corresponding tool externally. Only after the app calls resume on a Command object—passing back the tool result tagged with the same identifier—does the graph proceed to deliver that result to the model, which can then continue reasoning. On failure (e.g., the external action never completes or returns an error), the graph remains suspended; Command with resume is never issued, so no new message reaches the waiting node, and the execution stalls.

The invariant preserved is exactly‑once suspension and continuation: the Command primitive guarantees that the graph’s state is frozen at the interrupt point and can only be advanced by an explicit resume payload. This is rooted in the message‑passing super‑step model—nodes become active only when they receive a new message on an incoming edge. Interrupts prevent that message from arriving until the application explicitly provides a value via resume. No duplicate tool invocations occur because the same tool‑call identifier is used to correlate the request and the response, and the graph never re‑executes the interrupted node unless a new message triggers it.

The key trade‑off is offloading tool execution to an external environment rather than running the tool inline inside the graph. The obvious alternative—implementing the tool as a synchronous node in the graph—is rejected because it would require the graph to block waiting for potentially slow, human‑in‑the‑loop, or browser–dependent actions. By using HeadlessTool (which exposes only name, description, and args_schema on the Python side) and the onTool callback for lifecycle events (start, success, error), the design avoids the cost of polling or timeouts inside the graph. The application is free to perform the action asynchronously and resume only when ready, keeping the graph non‑blocking.

A concrete failure mode: the model requests a headless tool, the graph interrupts, but the external service crashes before calling resume. The operator would see a stalled graph run with no new state transitions. Monitoring tools such as LangSmith traces would show a single tool‑call interrupt with no subsequent resume event; the graph’s steps counter would stop incrementing, and the application logs would contain an unhandled interrupt payload. The **onTool("error")** callback could surface the crash to a dashboard, but if the external process never starts, the operator sees only a hanging execution awaiting a resume` that never arrives.

Failure modes — what breaks, what catches it

1. Graph recursion limit exceeded when tool-call loop does not converge

Trigger — The model requests more tools than the graph’s recursion_limit allows, causing LangGraph to raise GraphRecursionError after the limit is exhausted.
Guard — The managed RemainingSteps value (imported from langgraph.managed) can be checked proactively inside a node to route to a fallback node before the limit is hit. Alternatively, a try/except GraphRecursionError block can catch the error externally.
Posture — Fail‑hard when caught externally: the graph execution is terminated and the exception propagates. Fail‑soft when RemainingSteps is used: the graph routes to a fallback node (e.g., fallback_node that returns “Reached complexity limit, providing best effort answer”) and completes without an exception.
Operator signal — In the proactive path, a conditional edge evaluates state["remaining_steps"] <= 2 and logs nothing by default; the fallback node’s output message is visible in the final state. In the reactive path, the GraphRecursionError exception is raised and can be inspected; the operator might see a stack trace or a custom fallback message from the except block.
Recovery — If proactive, the graph routes to a dedicated fallback node (e.g., fallback_node) that returns a partial answer and ends. If reactive, the operator must re‑invoke the graph with a higher recursion_limit or introduce a different routing strategy. No automatic retry is built in.

2. Tool execution failure (e.g., API call error, invalid arguments)

Trigger — The model’s tool call arguments cause the tool’s implementation to raise an exception, or the tool fails to return a result (e.g., network error, timeout).
Guard — The source does not show a dedicated exception handler or retry logic for tool execution failures. The only relevant mechanism is the optional onTool callback, which “observe[s] lifecycle events (start, success, error) for UI feedback such as spinners or toasts.” This callback can observe the error but does not recover the execution.
Posture — Fail‑hard: because no guard catches the error within the tool execution path, the exception propagates and terminates the graph run (or is caught only at the outermost level, if any). The graph does not degrade gracefully.
Operator signal — The tool’s exception is raised; the operator sees a traceback unless handled externally. If onTool is used, the callback receives an 'error' event, which can be logged or displayed, but the run still fails.
Recovery — No automatic recovery. The operator must fix the tool implementation or adjust the model’s tool‑calling behavior (e.g., by refining the tool’s args_schema) and re‑invoke the graph.

3. Precondition failure: tool uses `runtime.stream_writer` outside LangGraph execution

Trigger — A tool annotated with @tool reads runtime.stream_writer (accessed via ToolRuntime) but is invoked outside a LangGraph execution context (e.g., during local testing or in a plain Python script).
Guard — The source provides no guard for this condition. The note states: “If you use runtime.stream_writer inside your tool, the tool must be invoked within a LangGraph execution context.” The tool itself does not check for the context and will likely raise an attribute‑ or invocation‑related error.
Posture — Fail‑hard: the tool’s attempt to access runtime.stream_writer fails at runtime, raising an exception that aborts the current call. No graceful degradation is possible.
Operator signal — An exception (e.g., AttributeError or a LangGraph‑specific error) is raised, visible in the console or log. No structured metric is emitted.
Recovery — Manual fix: wrap the call in a proper LangGraph execution context, or conditionally skip the stream_writer usage when not in a graph. The source does not provide an automatic recovery path.

4. Server‑info unavailable when tool runs locally

Trigger — A tool calls runtime.server_info to access assistant_id, graph_id, or user.identity, but the graph is not running on LangGraph Server (e.g., during local development or testing).
Guard — The tool explicitly checks if server is not None: before using the fields. The code prints the server‑related data only when the condition holds; otherwise it prints nothing and returns "done".
Posture — Fail‑soft: the tool degrades gracefully. It silently skips the prints that require server info and continues to return a normal result. No exception occurs.
Operator signal — Silent absence of the “Assistant: …”, “Graph: …”, and “User: …” lines that would appear when running on the server. The tool still completes successfully, so the operator might not notice the missing data unless they inspect the output.
Recovery — No recovery needed; the tool’s core functionality is unaffected. If the server info is required, the operator must deploy the graph to LangGraph Server. The source does not provide an automatic fallback or alternative data source.

5. Headless tool interrupt when no client‑side implementation is provided

Trigger — A tool is defined in Python with only name, description, and args_schema (creating a HeadlessTool), and the model issues a tool call for it. Because the tool has no .implement() method on the Python side, the graph interrupts instead of executing the tool locally.
Guard — The source does not show a guard that handles the interrupt within the graph. The optional onTool callback can observe the 'error' lifecycle event, but it cannot prevent the interrupt or provide a default result.
Posture — Fail‑hard (from the graph’s perspective): the run pauses with an interrupt. It cannot continue until the client (e.g., a browser or another service) resumes the graph with the tool result. The graph does not degrade; it halts.
Operator signal — The graph produces an interrupt event, typically visible in the LangGraph API as an “interrupt” status. The onTool callback receives an 'error' event if the interrupt is considered a failure. No error log is automatically written unless the callback logs it.
Recovery — The operator (or a client service) must inspect the interrupt payload, execute the tool in the appropriate environment (e.g., a browser or human step), and then resume the graph with the tool’s result. The source mentions that “your app can inspect the payload, perform the action … then resume the graph with the tool result.” No retry logic is provided; recovery is manual or requires a custom frontend hook.

04. Controlling Tool Choice

In LangChain, a model can decide whether to call a tool and which one to use. It makes this choice based on the conversation context. Tools are callable functions with clear inputs and outputs. They let the model fetch real-time data, execute code, or query external systems. There are different ways to control when and how tools are used.

Ordinary tools run on the server. Their function body executes there. Some models also offer built-in tools, like web search or a code interpreter, that run on the provider's side. Then there are headless tools. These are just definitions with a name, description, and argument schema. The actual implementation lives on the client, often in a browser. When the model calls a headless tool, the graph pauses. The app can then perform the action in the right environment. After that, it resumes the graph with the result.

This pattern is useful when the work depends on the client's environment or device. For example, accessing geolocation or local storage. The trade-off is that you need to handle the interrupt on both server and client. But it keeps data local and avoids extra server round trips for simple operations.

Overall, these controls give you flexibility. You can let the model decide freely, force a specific tool, or forbid tools entirely so it relies on its own knowledge. The system also supports multiple calls in one turn or sequential calls when one depends on another.

Generate it: Unlike ordinary tools that run on the server, a h_______ tool is just a definition — its real implementation lives on the client, often in a browser. (cue: h_______; answer: headless)

Generate it: When the model calls a headless tool, the graph p_____ so the app can perform the action in the right environment, then resumes with the result. (cue: p_____; answer: pauses)

Ask yourself: When would you push a tool's implementation to the client instead of running it on the server — and what do you pay for that choice?

Recall check (try before reading the answer):

Where does a headless tool actually execute, and what does the server hold? — ____________________________________ Answer: The implementation lives on the client (often a browser); the server holds only the name, description, and argument schema.

What makes the headless pattern worth it despite the extra complexity? — ____________________________________ Answer: It keeps data local and avoids extra server round trips — useful when the work depends on the client's environment, like geolocation or local storage.

What three ways can you steer the model's tool choice? — ____________________________________ Answer: Let it decide freely, force a specific tool, or forbid tools entirely so it relies on its own knowledge.

Define a tool and create an agent that lets the model choose to call it.

python

@tool(return_direct=True)
def fetch_order_status(order_id: str) -> str:
    """Fetch the current status of a customer order."""
    return f"Order {order_id} is shipped and will arrive in 2 days."

agent = create_agent(
    ChatOpenAI(model="ollama:devstral-2"),
    tools=[fetch_order_status],
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What is the status of order #12345?"}]
})

ELI5 — the plain-language version

Imagine you're at a busy restaurant, and a waiter must decide which kitchen station to call based on what you order: a steak goes to the grill, a salad to the prep line. That's exactly what LangChain's tool choice system does. The model (your waiter) looks at the conversation and picks the right tool (the kitchen station) to handle the request—whether it's fetching real-time weather, running code, or querying a database. Tools are well-defined callable functions with clear inputs and outputs, like a station that only fires up the grill when you say "steak." Some tools run right on the server (the kitchen), others are built-in at the model provider (like a special oven), and some are "headless"—just a name and description that triggers an interrupt, so your own app can execute the action elsewhere, like sending a dish to a food truck. Without this choice mechanism, the model would be a waiter who can't tell the kitchen what to do: every request hits the wrong station, nothing gets cooked, and your order never arrives. You'd be left hungry and frustrated, wondering why the system can't even handle a simple ask.

System design — mechanism, invariant, trade-off

In the LangChain tool-choice subsystem, the model first receives the conversation context and decides whether to invoke a tool, which tool to call, and what arguments to supply. For ordinary tools defined with the @tool decorator, the function body executes server-side immediately. For headless tools—created by calling tool(...) with only name, description, and args_schema—the mechanism is different: the model’s tool call triggers an interrupt in the graph instead of local execution. The run pauses, and the payload (tool identity and arguments) is available for inspection. The application then performs the intended action in a separate environment (e.g., a browser, an external service, or a human review step) and, upon completion, issues a resume command to supply the tool result back to the graph. On failure—if the resume never arrives or the action cannot be completed—the graph remains blocked in the interrupted state, and no subsequent nodes execute.

This design preserves an execution-location invariant: the tool is never executed inside the LangChain server process. By requiring a resume command from an external actor, the system guarantees that headless tool calls are only performed in the designated environment, preventing unintended side effects or reliance on server-side dependencies. The source explicitly names this pattern “headless-tool interrupts,” where the HeadlessTool object has no .implement() method on the Python side; the only allowed path is through the interrupt-and-resume lifecycle.

The key trade-off is replacing immediate server-side execution with an asynchronous handoff that adds latency and complexity. The obvious alternative—implementing the tool’s logic on the server (as with regular @tool functions)—is rejected because it would force the server to carry the tool’s dependencies, environment, and security context. The headless approach avoids the cost of bundling browser automation, third‑party APIs, or human approval workflows into the agent server, at the expense of requiring a separate resume mechanism and the useStream frontend pattern to coordinate the handoff.

A concrete failure mode is a missing client-side implementation: the model calls a headless tool such as write_file (used in the HumanInTheLoopMiddleware example), but no frontend hook is registered to detect the interrupt and perform the file write. The operator sees an agent run that remains in an interrupted state indefinitely, with no tool result returned and no error message in the server logs because the graph simply awaits a resume that never comes. The onTool callback may fire a start event but never a success or error, leaving the operator to manually inspect the interrupted graph’s payload and either force a resume or cancel the run.

Failure modes — what breaks, what catches it

Headless tool interrupt not resumed

Trigger – The model issues a tool call for a headless tool (defined only with name, description, and args_schema), the graph interrupts, and the frontend or service never sends a resume command.
Guard – No guard is shown in the source. The onTool callback is mentioned only to observe lifecycle events (start, success, error) for UI feedback, not to handle missing resumes.
Posture – fail-hard: the graph remains in an interrupted state indefinitely; execution halts.
Operator signal – The operator would see a stalled run (e.g., “interrupted” status in LangGraph Server) with no subsequent output or a timeout. No explicit log line is given.
Recovery – Manual step required: inspect the interrupt payload and send a resume command with the tool result. No automatic retry.

Tool execution error (server‑side tool)

Trigger – An ordinary tool’s function body raises an exception (e.g., an unhandled ValueError or connection failure) during execution.
Guard – No guard is shown in the source. There is no try/except, retry logic, or fallback demonstrated around tool execution.
Posture – fail-hard: the exception propagates and terminates the graph run (unless an outer handler exists, but none is shown).
Operator signal – The operator would observe an unhandled exception traceback in the logs, likely including the tool name and error message.
Recovery – No automatic recovery. The developer must fix the tool and re‑run the graph.

Recursion limit reached during tool choice loop

Trigger – The model repeatedly calls tools (ordinary, built‑in, or headless) without reaching a final answer, exceeding the recursion limit.
Guard – The GraphRecursionError exception from langgraph.errors is caught in a reactive fallback block, or the RemainingSteps managed value from langgraph.managed is used proactively to route to a safe node before the limit.
Posture – In proactive mode (using RemainingSteps): fail-soft – the graph completes gracefully via a fallback node (e.g., fallback_node). In reactive mode (catching GraphRecursionError): fail-hard – graph execution terminates, but the error can be caught externally.
Operator signal – Proactive: output like "Approaching limit, returning partial result" or "Reached complexity limit, providing best effort answer". Reactive: a GraphRecursionError is raised.
Recovery – Proactive: graph returns a partial result. Reactive: the external except GraphRecursionError block returns a fallback dictionary (e.g., {"messages": ["Fallback: recursion limit exceeded"]}). No retry.

Missing runtime context for stream writer

Trigger – A tool uses runtime.stream_writer but is invoked outside a LangGraph execution context (e.g., during local testing or from a non‑graph call).
Guard – No guard is shown in the source. Only a warning note states: “If you use runtime.stream_writer inside your tool, the tool must be invoked within a LangGraph execution context.”
Posture – fail-hard: an error (likely AttributeError or a missing context) occurs when the tool attempts to write to the stream.
Operator signal – The operator would see an error trace indicating that the stream writer is not available or not configured.
Recovery – No automatic recovery. The developer must ensure the tool is called only within a LangGraph‑executed graph.

Model calls a non‑existent tool

Trigger – The language model generates a tool call with a name that does not match any defined tool in the graph (ordinary, built‑in, or headless).
Guard – No guard is shown in the source. There is no validation, fallback, or error handler for unrecognized tool names.
Posture – fail-hard: an error such as "Tool 'xyz' not found" is raised when the runtime attempts to dispatch execution, or the call is silently ignored (the source does not specify).
Operator signal – The operator would see an error log indicating an unknown tool name, or perhaps a missing handler.
Recovery – No automatic recovery. The developer must either filter the model’s tool choices or add a validation node to handle invalid names.

05. A Tool Loop Deployed

In production, a real tool calling loop must be carefully bounded, and the agentic search graph shows exactly how. It binds its tools to the model and runs a fixed number of turns. On each turn, the model chooses one tool, or it decides the task is done. The graph then dispatches that tool with its arguments and feeds the result back into the loop. On the very last turn, it forces a final answer, which prevents an infinite loop. The tools follow a clear cost hierarchy. Glob is near zero and returns only paths. Grep is lightweight and gives file and line matches, while Read is heavy and returns full file content with line numbers. The system reaches for cheap tools first, which saves both money and time. But there is a trade. The loop might exhaust its turns without finding a good answer, and in that case it returns the last observation as a best effort. The whole loop stays portable across providers because the system forces the model to output JavaScript Object Notation, or JSON. It then parses that JSON to recover the tool name and arguments, and that works even when the model wraps the output in code blocks. The graph relies on a simple prompt-driven routing pattern. This design is simple yet reliable. It is a production-ready loop.

Generate it: The loop stays portable across providers because the system forces the model to output J___ and then parses it to recover the tool name and arguments. (cue: J___; answer: JSON)

Generate it: Among the tools, Glob is near zero, Grep is lightweight, and R___ is heavy because it returns full file content with line numbers. (cue: R___; answer: Read)

Ask yourself: Why does the loop force a final answer on the very last turn, and why does it reach for cheap tools before expensive ones?

Recall check (try before reading the answer):

How does the loop avoid running forever? — ____________________________________ Answer: It runs a fixed number of turns and, on the very last turn, forces a final answer, which prevents an infinite loop.

Why is parsing the model's JSON the key to provider portability? — ____________________________________ Answer: Forcing JSON output lets the system recover the tool name and arguments the same way for any provider, even when the model wraps output in code blocks.

What happens if the loop exhausts its turns without a good answer? — ____________________________________ Answer: It returns the last observation as a best effort.

Looking back: Why doesn't the model always reply directly in a single turn (from 'The Round Trip')? Answer: It may request a tool first and reason over the result, repeating the round trip until it has enough to give a final reply.

The agentic search worker runs a bounded tool loop with a cost hierarchy, forcing JSON output on the last turn.

python

async def _run_worker(
    angle: str,
    sub_query: str,
    root: Path,
    max_turns: int,
) -> dict[str, str]:
    system = (
        f'You are a parallel codebase search worker.\n'
        f'Your angle: "{angle}".\n'
        f"Project root: {root}\n"
        "Use tools cheapest-first — glob (near-zero) before grep (lightweight) before read (heavy). "
        "Be focused. Return all relevant findings with file:line references."
    )
    try:
        findings = await _tool_loop(
            system=system,
            user=sub_query,
            root=root,
            max_turns=max_turns,
            label=angle,
            provider="deepseek",
        )
    except Exception as exc:
        findings = f"worker error: {exc}"
    return {"angle": angle, "findings": findings}

ELI5 — the plain-language version

Imagine a smart assistant with a strict budget: it can only ask three types of questions, each costing more. Glob is like glancing at a bookshelf’s spines – nearly free, tells you only which books exist. Grep is flipping through a book’s index – medium cost, gives you page numbers where a term appears. Read is reading whole chapters – expensive, gives you the full content. The assistant is given a fixed number of turns (up to 8) per task. On each turn, it picks one of these tools based on what it needs, or it decides it’s done. It tries cheapest first, so it doesn’t waste budget. On the last turn, it’s forced to give a final answer. Without this careful loop, the assistant could spiral into an endless cycle of expensive reads, burning through cost and never delivering an answer. A beginner would feel that frustration: the system gets stuck, spends too much, or returns nothing useful.

System design — mechanism, invariant, trade-off

The subsystem's ordered mechanism begins with decompose_node, which breaks the query into up to workers sub-queries via the LLM, then workers_node spawns parallel _run_worker coroutines, each of which calls _tool_loop with a max_turns bound. Inside _tool_loop, the model is invoked with tools bound; on each turn it either selects a tool or decides the task is done. Tools are dispatched with arguments, the result feeds back, and on the very last turn the loop forces a final answer. On failure — if a worker’s _tool_loop raises an exception — the error is caught with except Exception as exc and the worker returns f"worker error: {exc}" as its findings, which propagates to the parent workers_node and ultimately into the state. The graph then proceeds to a synthesis step (not shown) after the workers complete.

The invariant the design preserves is boundedness, enforced by the max_turns parameter in _tool_loop. Each worker runs at most max_turns iterations; if no answer emerges by the final turn, a fallback of the last observation is returned and the run is marked with "exhausted": True. This guarantees that no single worker can loop indefinitely, even if the model repeatedly fails to call a tool or produce a final answer. The bound is a hard cap — the loop does not rely on the model’s own termination heuristic alone — so the overall graph execution is predictable in the worst case.

The key trade-off is parallel decomposition versus sequential cost-awareness. The design rejects a single monolithic model call that reads all files upfront, because that would be prohibitively expensive and slow. Instead, it decomposes the query into independent sub-queries, runs them in parallel via asyncio.gather, and inside each worker enforces a strict cost hierarchy: use glob (near‑zero cost, returns only paths) first, then grep (lightweight, returns file/line matches), then read (heavy, returns full content) only as a last resort. This avoids the cost of reading irrelevant files — the alternative of a flat read‑everything approach would waste I/O and tokens. The cost saved is the cumulative latency and token consumption of loading entire files that contain no relevant information.

A concrete failure mode: the model in a worker repeatedly calls read on hundreds of files and exhausts all max_turns without ever producing a final answer. The operator would see a log line "[{angle}] worker error: …" if the loop itself throws, or in the more common case where the loop simply times out, the log would show "[{angle}] findings received ({len} chars)" with an observation truncated at 8000 characters and the final output containing the string "exhausted": True in the run metadata. The state would contain a findings entry with that worker’s last truncated observation as a best‑effort result, signaling that the worker failed to converge, while the graph continues to the synthesis step with whatever partial information was gathered.

Failure modes — what breaks, what catches it

Max turns exhausted without an answer

Trigger — The model repeatedly calls tools without ever producing a final answer, until the max_turns count is reached.
Guard — The exhaustion logic inside _tool_loop (referenced in the snippet: if run is not None: run.end(outputs={"answer": fallback, ... , "exhausted": True}) and the fallback assignment fallback = transcript[-1] if transcript else "").
Posture — fail-soft: the loop returns the last observation as a best‑effort answer rather than hanging or crashing.
Operator signal — The run’s metadata includes "exhausted": True, and the count of max_turns steps is recorded. No log line is emitted by the tool loop itself in the shown source.
Recovery — No retry; the fallback value is passed up to _run_worker, which returns it under "findings". The overall graph continues normally with this degraded result.

Worker tool‑call exception

Trigger — Any exception (e.g., network error, invalid tool arguments, tool timeout) raised inside _tool_loop during tool execution.
Guard — The except Exception as exc clause in _run_worker (line: # noqa: BLE001). It catches all Exception subclasses, logs a warning, and substitutes findings with the string f"worker error: {exc}".
Posture — fail-soft: the worker returns an error description instead of real findings; the graph continues processing the other workers’ results.
Operator signal — The log line log.warning("[%s] worker error: %s", angle, exc) with the exact angle and exception message.
Recovery — No retry; the erroneous finding is returned to the graph as‑is. A downstream node (e.g., the synthesizer) must handle the error message text.

Observation truncated due to length limit

Trigger — A tool returns an observation larger than 8000 characters.
Guard — The conditional if len(observation) > 8000: observation = observation[:8000] + "\n… (observation truncated)" inside _tool_loop.
Posture — fail-soft: the observation is cropped and the loop continues with the truncated content.
Operator signal — The truncation is silent; no log or error is emitted. The only indication is the appended truncation marker in the transcript (which may be visible in the final answer).
Recovery — None; the shortened observation is fed into subsequent turns. The lost tail cannot be recovered without a manual retry with a larger limit.

Cost hierarchy violated by the model

Trigger — The model chooses a “heavy” tool (e.g., read) before using cheaper ones (glob, grep) or uses them out of order.
Guard — No guard exists. The system prompt in _run_worker says “Use tools cheapest‑first …” but there is no code that validates the tool choice or enforces an ordering.
Posture — fail-soft: the system continues to work correctly but wastes tokens and may incur higher latency or cost.
Operator signal — No direct signal; an operator might notice unexpectedly high token usage or latency in the run logs.
Recovery — No automatic recovery. The operator must modify the prompt, inject a tool‑ordering validation node, or add a cost‑budget monitor.

Unhandled exception in the parallel worker gather

Trigger — A worker task raises a non‑Exception (e.g., KeyboardInterrupt, asyncio.CancelledError) or the _run_worker itself fails outside the inner try/except (e.g., a crash in asyncio.gather itself).
Guard — No guard is shown in the source. The except Exception in _run_worker does not cover BaseException, and workers_node has no try/except around asyncio.gather(*tasks).
Posture — fail-hard: the exception propagates out of workers_node, aborting the current graph run (unless the graph executor itself handles it, which is not shown).
Operator signal — The runtime error (e.g., Traceback (most recent call last):) would appear in the application logs or be returned as a graph execution failure. No structured log from the worker system.
Recovery — No automatic retry. The operator must restart the run manually, possibly after adjusting the environment or fixing the worker code.

06. Validating Tool Calls

When an agent runs, it can call tools to fetch information or take actions. But sometimes the model tries to use a tool that does not exist, or it omits a required argument. To guard against this, every tool call is checked against the tool's defined input schema, which comes straight from the type hints in the tool's code. The model must supply arguments that fit that schema. If the call does not match, the system detects the mismatch and returns a clear error, and then the model can try again. This keeps the agent from crashing on a single bad request. There is another safeguard worth knowing. A recursion limit caps how many turns the agent may take, so if the model keeps calling tools without ever finishing, the graph simply stops. A managed value called remaining steps tracks how many steps are left before that limit is reached. The graph can read this value and act early. For example, it might switch to a fallback node just before hitting the ceiling. This design stops endless loops and keeps the agent running reliably.

Generate it: Every tool call is checked against the tool's defined input s_____, which comes straight from the type hints in the tool's code. (cue: s_____; answer: schema)

Generate it: A r________ limit caps how many turns the agent may take, so the graph stops if the model keeps calling tools without finishing. (cue: r________; answer: recursion)

Ask yourself: A model can call a tool that doesn't exist or omit a required argument. What two distinct mechanisms keep one bad call from crashing or hanging the agent?

Recall check (try before reading the answer):

What happens when a tool call doesn't match the schema? — ____________________________________ Answer: The system detects the mismatch and returns a clear error, and then the model can try again — so it doesn't crash on a single bad request.

What does the 'remaining steps' value let the graph do? — ____________________________________ Answer: It tracks how many steps are left before the recursion limit, so the graph can act early — for example, switch to a fallback node just before hitting the ceiling.

Where does the input schema the call is checked against come from? — ____________________________________ Answer: Straight from the type hints in the tool's code.

Middleware catches tool call exceptions and returns a clear error message.

python

from collections.abc import Callable
from langchain.agents.middleware import wrap_tool_call
from langchain.messages import ToolMessage
from langchain.tools.tool_node import ToolCallRequest

@wrap_tool_call
def handle_tool_errors(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], ToolMessage],
) -> ToolMessage:
    """Convert tool exceptions into ToolMessages the model can handle."""
    try:
        return handler(request)
    except Exception as e:
        return ToolMessage(
            content=f"Tool error: Please check your input and try again. ({e})",
            tool_call_id=request.tool_call["id"],
        )

ELI5 — the plain-language version

Think of an agent like a chef following a recipe, but the recipe is just a list of ingredients with no instructions. Without a proper order slip, the chef might throw in salt when the dish needs sugar, or add an ingredient that doesn’t exist. That’s why every tool call gets checked against a predefined order form—the tool’s input schema, which is built directly from the type hints in the tool’s code. When the model asks to run a tool, the system verifies that every argument matches that schema: required fields are present, types are correct (like expecting a string for a customer ID, not a number). If the call is missing a required argument or uses a non‑existent tool, the system catches the mismatch and returns a clear error message, like a waiter saying, “Sorry, we don’t have that dish.” Then the model can correct itself and try again. Without this validation, a bad tool call would crash the agent mid‑step—the kitchen would stop, orders would pile up, and the whole meal would be ruined with no way to fix it.

System design — mechanism, invariant, trade-off

The subsystem for validating tool calls operates through a precise sequence: first, a tool is defined using the @tool decorator, where Python type hints automatically become its input schema. When the agent (constructed via create_agent and executed inside a StateGraph) invokes the tool, the system checks the model‑supplied arguments against that schema. A mismatch—such as missing required arguments or invalid types—triggers a ToolMessage with a clear error description, rather than crashing the process. The model can then retry the call with corrected arguments. In parallel, the graph enforces a recursion guard: a recursion_limit parameter (set at compile time) and the managed RemainingSteps channel allow proactive routing via functions like route_decision, which inspect remaining_steps and can redirect to END when the limit approaches. If the limit is exceeded despite these guards, a GraphRecursionError is raised, halting the graph.

The design preserves two invariants: the type‑hint schema invariant—every tool call must match the exact schema derived from the tool’s type annotations—and the recursion limit guarantee that the graph will not run indefinitely. The schema invariant ensures that no tool ever receives malformed arguments, preventing runtime crashes inside tool logic. The recursion guarantee avoids infinite loops from repeated failed calls or cyclic edges. Both invariants are enforced before any tool body executes, making failures deterministic and recoverable.

This approach rejects the obvious alternative of accepting any call and relying on in‑tool error handling (e.g., try‑except blocks). The cost avoided is the need to implement complex, per‑tool recovery logic and the risk of partial state corruption when an unexpected argument causes a side‑effect before failing. By validating against the declared schema at the invocation boundary—using the same schema that @tool derives—the system keeps errors predictable and curable by the model itself. The trade‑off is that schema validation adds a small up‑front cost and requires the model to adhere strictly to the schema, but this strictness dramatically simplifies debugging and prevents silent data contamination.

A concrete failure mode: the model calls search_database (from the @tool example) but omits the required query argument. The schema validation immediately returns a ToolMessage with an error summary such as “Missing required argument: query”. An operator monitoring LangSmith traces would see a tool call that produced a ToolMessage whose payload contains an error status field and the missing‑argument message. If the model repeatedly fails with the same error, the route_decision function will eventually see remaining_steps drop below the threshold, route to END, and the graph completes without raising an exception. If the recursion limit is reached without graceful exit, the operator would see a GraphRecursionError exception in the logs, accompanied by a trace that includes the final state and the sequence of tool calls that led to the limit.

Failure modes — what breaks, what catches it

Failure 1: Recursion Limit Reached

Trigger – The agent loops beyond the configured recursion_limit (e.g., 10), either because a tool call fails and the model retries indefinitely, or because the graph itself cycles without termination.
Guard – The GraphRecursionError exception (reactive) or the RemainingSteps managed value (proactive). In proactive mode, the if state["remaining_steps"] <= 2 check routes to a fallback_node that returns a truncated answer. In reactive mode, the except GraphRecursionError block catches the error and supplies a fallback message.
Posture – Reactive: fail-hard (graph execution terminates, exception raised). Proactive: fail-soft (graph completes gracefully via conditional edge to END or fallback_node, returning a best‑effort result).
Operator signal – In proactive mode, the graph eventually prints "Approaching limit, wrapping up..." or "Reached complexity limit, providing best effort answer". In reactive mode, the operator sees GraphRecursionError in the logs and the fallback message "Fallback: recursion limit exceeded" (if caught externally).
Recovery – Proactive: no retry; the graph returns a partial answer. Reactive: no automatic retry; the operator must re‑invoke with a higher recursion_limit or a revised graph.

Failure 2: Tool Call Argument Schema Mismatch

Trigger – The model generates a tool call that either targets a non‑existent tool name or omits/incorrectly types a required argument, violating the tool’s Pydantic schema (derived from the function’s type hints).
Guard – No explicit guard for this failure appears in the provided source. The text mentions “the system can detect the mismatch and return a clear error”, but no specific exception class, validation function, or error handler identifier is given. Therefore, no guard is shown.
Posture – Likely fail-hard: the run would abort with a schema‑validation error if no fallback is implemented. However, the source does not confirm the exact posture.
Operator signal – The operator would observe a validation error message (e.g., "Tool call argument validation failed") in the runtime logs or in the model’s response stream, but the exact text is not supplied.
Recovery – Not specified in the source. In practice, the model may retry the call, but no retry count or backoff is defined. Manual intervention (e.g., fixing the tool definition or prompt) may be required.

Failure 3: Tool Invoked Outside LangGraph Execution Context

Trigger – A tool that uses runtime.stream_writer or runtime.execution_info is called without being within a LangGraph graph execution (e.g., during a local test script or a standalone call).
Guard – None shown in the source. The documentation warns “the tool must be invoked within a LangGraph execution context”, but no conditional check or fallback is provided.
Posture – Likely fail-hard: calling runtime.execution_info outside the context would raise an AttributeError or similar, halting execution.
Operator signal – The operator will see a traceback referencing runtime.execution_info or runtime.stream_writer, e.g., "AttributeError: 'NoneType' object has no attribute 'thread_id'".
Recovery – No automatic recovery. The operator must ensure the tool is only called inside a StateGraph.invoke() or astream() context, or modify the tool to check runtime is not None.

Failure 4: Server Info Not Available When Expected

Trigger – A tool calls runtime.server_info and then attempts to access server.assistant_id, server.graph_id, or server.user.identity while running locally (not on LangGraph Server), causing server_info to be None.
Guard – The source provides an explicit guard: if server is not None: before accessing server attributes. The tool get_assistant_scoped_data checks this condition, so it degrades gracefully.
Posture – fail-soft: if the guard is present, the tool skips the server‑info prints and returns "done" without error. If the guard is absent (e.g., a different tool), it would fail‑hard.
Operator signal – When the guard is used, no error is raised and no user/assistant info is printed. The operator would notice the silent absence of the expected "User: ..." log line. Without the guard, a TypeError or AttributeError would appear.
Recovery – No retry. If the guard is present, the tool continues normally. If the guard is missing, the operator must add the if server is not None: check or run only on LangGraph Server.

Failure 5: Tool Observation Exceeds Length Limit

Trigger – A tool returns a very long result (e.g., over 8000 characters), which would exceed the context window or log size.
Guard – In agentic_search_graph.py, the code explicitly truncates: if len(observation) > 8000: observation = observation[:8000] + "\n… (observation truncated)". This is a proactive length guard.
Posture – fail-soft: the observation is truncated, and the string "… (observation truncated)" is appended. The graph continues normally with the shortened observation.
Operator signal – The operator sees the exact suffix "… (observation truncated)" in the logs or the model’s next response, indicating information loss.
Recovery – No retry; the truncated observation is used as‑is. If the truncated data causes a poor answer, the operator must manually redesign the tool or increase the limit.

07. Securing Tool Use

Tools extend what agents can do, letting them fetch live data, execute code, and query external databases. But a security boundary is critical. The model only suggests tool calls; the application decides what actually runs, and that separation keeps control firmly in your hands. For headless tools, the pattern is clear. You define the tool on the server with just its name, description, and argument schema, while the real implementation lives on the client. When the model invokes the tool, the graph pauses. It then sends a payload shaped like a tool call, one that carries an identifier, a name, and its arguments. Your app then performs the action in the right environment. For browser-based flows, you mirror the schema in the frontend and attach the implementation there, then resume the graph with the result. This handshake adds an extra round trip, and each round trip costs latency. It also spends model tokens on the tool call frame and the resume. That is the trade. You gain privacy and locality, because data stays on the device for browser-based tools like geolocation or file pickers. But you pay in added delay. The model sees a normal tool it can call, yet the actual execution happens outside the server process. This design strengthens security. It avoids exposing server-side logic and keeps sensitive data local. The cost of each extra round trip is latency and tokens. Use headless tools when the work depends on an environment that only exists on the client.

Generate it: The security boundary rests on one rule: the model only s________ tool calls, while the application decides what actually runs. (cue: s________; answer: suggests)

Generate it: For headless tools, the server holds only the name, description, and argument schema, while the real i______________ lives on the client. (cue: i______________; answer: implementation)

Ask yourself: How does keeping the implementation off the server both strengthen security and protect privacy — and what is the price you pay each time?

Recall check (try before reading the answer):

Why does 'the model only suggests' keep control in your hands? — ____________________________________ Answer: The model only suggests tool calls; the application decides what actually runs, so that separation keeps control firmly in your hands.

What does the headless pattern cost on every call? — ____________________________________ Answer: An extra round trip — latency plus model tokens spent on the tool call frame and the resume.

Why does this design strengthen security and privacy? — ____________________________________ Answer: It avoids exposing server-side logic and keeps sensitive data local, since data stays on the device for browser-based tools like geolocation or file pickers.

Looking back: Why is forcing the model's output into JSON the trick that makes the tool loop portable across providers (from 'A Tool Loop Deployed')? Answer: Any provider's JSON can be parsed the same way to recover the tool name and arguments, even when the model wraps it in code blocks.

Error handling middleware gives the application control over tool execution by wrapping the handler.

python

from collections.abc import Callable
from langchain.agents.middleware import wrap_tool_call
from langchain.messages import ToolMessage
from langchain.tools.tool_node import ToolCallRequest

@wrap_tool_call
def handle_tool_errors(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], ToolMessage],
) -> ToolMessage:
    """Convert tool exceptions into ToolMessages the model can handle."""
    try:
        return handler(request)
    except Exception as e:
        return ToolMessage(
            content=f"Tool error: Please check your input and try again. ({e})",
            tool_call_id=request.tool_call["id"],
        )

ELI5 — the plain-language version

Think of the model as a well-meaning assistant who points at a tool and says, “You should use this—here’s exactly how.” But the assistant never actually picks it up. You, the human, are the one who decides whether to grab it, how to handle it, and when. That’s the core security boundary: the model only suggests tool calls—it never runs them. The application, your code, stays in control.

Concretely, with headless tools, you register a tool on the server with just its name, description, and input schema (like args_schema). The actual implementation lives only on the client—your app’s environment. When the model invokes that tool, the graph pauses. It sends a structured payload containing the tool call’s id, name, and arguments. Your app then performs the action in the browser or wherever the work needs to happen, then resumes the graph.

Without this separation, the model could execute arbitrary code, fetch sensitive data, or trigger side effects you never approved—like a robot that reaches for the knife on its own. A beginner would feel the system run out of control, doing things they didn’t intend, with no way to stop it. That’s the failure: loss of trust and safety in what should be a helpful tool.

System design — mechanism, invariant, trade-off

The system’s ordered mechanism begins when the chat model, such as one passed to a graph node, decides it should invoke a tool. The model does not execute the tool itself; instead it emits a structured suggestion in the form of a tool call with an ID, name, and arguments. That suggestion triggers an interrupt in the graph, implemented via LangGraph’s Command primitive with the resume parameter unused at this point. The graph pauses execution, and the payload is handed to the application layer. The application runs the actual tool—which is defined using the @tool decorator and whose type hints define the input schema—in its own environment, isolated from the model. After the tool returns, the application resumes the graph by calling Command with resume set to the tool’s result. On failure, the application can either return an error message as the result or, if the tool raises an unhandled exception, the graph may remain paused indefinitely, waiting for a valid resume.

The invariant the design preserves is a strict separation between model suggestion and tool execution, which can be thought of as a write boundary: the graph state is never updated by the model directly, only by the application after it has validated and executed the tool. This guarantee prevents the model from accidentally or maliciously triggering side‑effects outside the controlled environment. The graph’s own execution guarantees—such as the super‑step discipline where nodes become inactive only when no messages are in transit—are preserved because the application’s return via Command re‑engages the normal message‑passing flow.

The key trade‑off rejects the obvious alternative of allowing the model to invoke tools directly (e.g., by executing a search_database function server‑side as part of the model’s own runtime). That alternative would eliminate the pause‑and‑resume overhead but would require trusting the model with arbitrary code execution, opening the system to injection attacks, resource exhaustion, or data exfiltration. The cost avoided is the entire security surface of letting an untrusted model run arbitrary operations. Instead, the LangGraph design forces the model to produce only structured, schema‑constrained outputs, and the application retains full control over when and how tools are executed, even filtering or rejecting calls before they happen.

A concrete failure mode is a network outage in the client‑side tool implementation. The operator would see the graph stall at an interrupt point—no progress is made because no resume is ever delivered. The observable signal is a log entry from the graph’s checkpoint system (saved via checkpointers during compilation) showing that the graph is in a paused state with a pending tool call, and a timeout alarm raised by the application’s monitoring layer after a configured interval. No further nodes become active, and the graph remains inactive permanently unless the operator manually provides a resume value through an external Command invocation.

Failure modes — what breaks, what catches it

Missing LangGraph Execution Context for stream_writer

Trigger — A tool calls runtime.stream_writer (e.g., to stream output) when the graph is not actively running – for instance, during local testing or a standalone invocation.
Guard — No explicit exception handler or guard is shown in the source; the note only states “must be invoked within a LangGraph execution context.” The runtime likely raises a generic error, but no identifier is given.
Posture — fail-hard: the tool call aborts the run because the required context is absent.
Operator signal — An unhandled RuntimeError or similar exception (no exact class from source) appears in logs, and the graph stops early.
Recovery — The operator must either remove the stream_writer usage or ensure the tool is always called inside a compiled LangGraph graph. There is no retry or fallback in the source.

server_info Returns None During Local Development

Trigger — A tool that accesses runtime.server_info runs in a non-server environment (e.g., local Python script, unit test). server_info is documented as being None in that situation.
Guard — The explicit if server is not None: check inside get_assistant_scoped_data prevents use of the None value.
Posture — fail-soft: the tool skips the server‑specific prints (“Assistant”, “Graph”, “User”) but continues to return "done" normally.
Operator signal — No log output for assistant‑scoped data; server_info is silently absent. The operator sees only the normal "done" return.
Recovery — None needed; the tool degrades gracefully. For full data, the operator must deploy on LangGraph Server.

Version Requirement Not Met for execution_info

Trigger — A tool calls runtime.execution_info but the installed library is older than deepagents>=0.5.0 or langgraph>=1.1.5. The attribute may be missing or raise an AttributeError.
Guard — No guard is shown in the source; the note serves as a prerequisite but does not include a try/except or fallback.
Posture — fail-hard: the tool use fails at runtime because execution_info does not exist or is undefined.
Operator signal — An AttributeError such as 'ToolRuntime' object has no attribute 'execution_info' appears in logs.
Recovery — Upgrade the library to meet the version requirement. No automatic retry or fallback is provided.

Recursion Limit Exceeded During Tool‑Chaining Cycle

Trigger — The graph’s recursion limit (e.g., recursion_limit=10) is reached because tools repeatedly call each other or loop back to the same node.
Guard — Either proactive RemainingSteps (managed value in state) routes to a fallback node when remaining <= 2, or reactive except GraphRecursionError catches the exception externally. Both are shown in the source.
Posture — With RemainingSteps: fail-soft (graph completes gracefully via fallback node). With GraphRecursionError: fail-hard (graph terminates with exception).
Operator signal — Proactive: state messages like "Approaching limit, returning partial result". Reactive: GraphRecursionError traceback and no final output.
Recovery — Proactive: the fallback node returns a best‑effort answer. Reactive: the outer try block returns a fallback dictionary (e.g., {"messages": ["Fallback: recursion limit exceeded"]}). No automatic retry.

Observation Truncation in Search Tool Results

Trigger — A tool returns an observation longer than 8000 characters. The agentic_search_graph.py snippet truncates it to the first 8000 chars and appends "\n… (observation truncated)".
Guard — The conditional if len(observation) > 8000: checks length and slices to 8000. No further validation or error handling is shown.
Posture — fail-soft: the tool call succeeds, but the returned data is incomplete, potentially degrading downstream reasoning.
Operator signal — The truncated observation is stored in the transcript list; the presence of "… (observation truncated)" in the log or final answer indicates truncation.
Recovery — The operator can increase the truncation threshold in the source code or implement pagination. No automatic retry is provided.

08. Scaling Tool Calling

Tool calling works well for simple tasks. But it can break under certain conditions. Independent tool calls run concurrently, which speeds the whole process up. Idempotent lookups cache their results by argument, so repeated calls stay cheap and stay efficient. Trouble starts with deeply interdependent tool chains, because they have to run strictly one after another, and this forced serialization slows everything down. Long lists of tool definitions inflate the prompt, and that bloat can push the model past its context window. Large results cause trouble too. They must be summarized or paged to fit inside the limit. If a single result exceeds eight thousand characters, it gets truncated.

Generate it: Scaling breaks down on deeply interdependent tool chains, because forced s____________ makes them run strictly one after another. (cue: s____________; answer: serialization)

Generate it: I__________ lookups cache their results by argument, so repeated calls stay cheap and efficient. (cue: I__________; answer: Idempotent)

Ask yourself: Independent calls scale well but interdependent chains don't — what property of a call decides whether it can be parallelized or cached?

Recall check (try before reading the answer):

Why do interdependent tool chains slow everything down? — ____________________________________ Answer: They have to run strictly one after another, and that forced serialization slows everything down.

What two pressures can push the model past its context window? — ____________________________________ Answer: Long lists of tool definitions inflate the prompt, and large results must be summarized or paged to fit inside the limit.

What happens to an oversized single result? — ____________________________________ Answer: If a single result exceeds eight thousand characters, it gets truncated.

Tool output can bypass the model when no further reasoning is needed, saving a model call and reducing token usage.

python

@tool(return_direct=True)
def fetch_order_status(order_id: str) -> str:
    """Fetch the current status of a customer order."""
    return f"Order {order_id} is shipped and will arrive in 2 days."

agent = create_agent(
    ChatOpenAI(model="ollama:devstral-2"),
    tools=[fetch_order_status],
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What is the status of order #12345?"}]
})

ELI5 — the plain-language version

Imagine a busy kitchen where the chef (the model) has a set of special gadgets (tools) that can fetch ingredients, check temperatures, or update the menu. Some gadgets are simple—the chef uses them and gets a result instantly. But other gadgets can't be used in the kitchen; they require a waiter to step out into the dining room and perform the action manually. This is exactly what headless tools do: when the model issues a tool call for one of these, the graph interrupts instead of executing locally. The app (the waiter) inspects the request, does the real work (e.g., in a browser, another service, or after human review), then resumes the graph with the result. The supported JS SDK hooks detect these headless-tool interrupts, run the matching client-side implementation, and submit the resume command automatically. Without this interrupt-and-resume mechanism, the chef would try to use a gadget that can't work in the kitchen—the recipe would stall, the order would never complete, and the whole meal service would grind to a halt. The system simply cannot proceed when a tool requires an environment the model doesn't have.

System design — mechanism, invariant, trade-off

The mechanism unfolds in ordered super-steps, as defined in the LangGraph Graph API. Execution begins when a node function—such as the one inside agentic_search_graph.py that calls ainvoke_json_with_telemetry—passes the current state (the transcript) along with the system prompt and TOOLS_DOC to the LLM. The model responds with a tool call specifying tool and args. That call is wrapped in a tool_call_span context manager that tracks the attempt number via turn. Within that span, execute_tool is invoked to run the tool logic. Independent tool calls are dispatched concurrently, because the graph assigns them to separate nodes that become active simultaneously in the same super‑step. After execution, the result is appended to the transcript via transcript.append. If the observation length exceeds 8000 characters, it is forcibly truncated: observation[:8000] + "\n… (observation truncated)". The loop continues, incrementing turn, until a final answer is extracted (the model returns {"answer": ...}) or the max_turns limit is reached. On failure—for example, an exception from execute_tool—the tool_call_span calls finish(error=exc) and re‑raises the error, halting that turn.

The design preserves a strict context‑window invariant: the total token count of the LLM input must never exceed the model’s maximum context length. This guarantee is enforced through several middleware components identified in langchain-agents.md: SummarizationMiddleware compresses accumulated history before overflow, MemoryMiddleware loads persistent instructions at startup so knowledge carries across sessions, and SkillsMiddleware surfaces domain knowledge on demand rather than loading everything upfront. Additionally, any tool observation that grows beyond 8000 characters is truncated at the source, preventing a single large result from pushing the payload over the limit. The invariant is that every call to ainvoke_json_with_telemetry receives a payload that fits within the model’s context window, ensuring the LLM can always process the input.

The key trade‑off is between throughput (concurrency) and correctness for interdependent tool chains. Independent tool calls run concurrently, speeding the overall process. However, deeply interdependent tool chains must execute serially, because later tools depend on the output of earlier ones. The obvious rejected alternative is to run all tool calls in parallel regardless of dependencies, which would produce incorrect state—later nodes would operate on stale or missing data. The cost avoided by rejecting that alternative is the need to re‑execute failed chains or to implement expensive rollback logic. A second trade‑off appears in result handling: large tool observations are truncated at 8000 characters rather than being fully loaded or paged. The alternative of no truncation risks exceeding the context window and breaking the invariant; truncation sacrifices information but avoids a catastrophic run failure.

A concrete failure mode arises when a tool call raises an exception during execute_tool. The code inside agentic_search_graph.py logs the event via log.info("[%s] turn %d/%d tool=%s", …), then the tool_call_span context manager calls finish(error=exc) and re‑raises the exception. An operator would see the log line showing the turn, tool name, and max_turns, immediately followed by a Python traceback from the unhandled exception. If the exception is caught upstream, the run may fall through the loop and return the last truncated observation as a best‑effort answer, producing a nonsensical or incomplete response. In either case, the signal is either an error trace in the logs or an unexpected final output that lacks a proper "answer" key.

Failure modes — what breaks, what catches it

Large Tool Result Exceeds Eight‑Thousand‑Character Limit

Trigger – A tool returns an observation whose length is greater than 8000 characters.
Guard – The inline if len(observation) > 8000: statement in agentic_search_graph.py truncates the string to the first 8000 characters and appends "\n… (observation truncated)".
Posture – fail‑soft. The truncated result is used in the transcript, and execution continues, though information is lost.
Operator signal – The literal string "… (observation truncated)" appears at the end of the observation.
Recovery – No retry or backoff occurs; the agent proceeds with the partial observation as‑is.

Recursion Limit Exceeded

Trigger – The graph’s recursion_limit (e.g., 10) is reached because the tool chain has too many sequential steps or an infinite loop.
Guard – Proactive: the RemainingSteps managed value (state["remaining_steps"]) is checked; when remaining <= 2, a conditional edge routes to a fallback node or END. Reactive: a try/except block catches GraphRecursionError.
Posture – Proactive: fail‑soft – the graph completes gracefully via the fallback node (e.g., "Approaching limit, returning partial result"). Reactive: fail‑hard – the graph raises GraphRecursionError and terminates, but the except clause handles it externally.
Operator signal – Proactive: the fallback node’s output message (e.g., "Approaching limit, returning partial result"). Reactive: the raised GraphRecursionError.
Recovery – Proactive: no retry; the graph ends with a best‑effort answer. Reactive: the external except clause returns a fallback like {"messages": ["Fallback: recursion limit exceeded"]}; no automatic retry is shown.

Context Window Overflow from Long Tool Definitions (No Guard)

Trigger – The cumulative size of tool definitions in the prompt exceeds the model’s context window.
Guard – No guard, exception handler, or validation is present in the provided source. The source only states that “Long lists of tool definitions inflate the prompt size. That can push the model beyond its context window.”
Posture – fail‑hard (implied). The model will reject the input or produce truncated/garbled output, effectively aborting the run.
Operator signal – Not shown in source; the operator would likely observe a token‑limit error from the model provider (e.g., "max_tokens exceeded") or silent truncation.
Recovery – None automatic. The developer must manually reduce tool definitions, use paging, or implement a custom guard.

Headless Tool Interrupt Not Resolved

Trigger – The model issues a tool call for a tool defined with only name, description, and args_schema (i.e., a HeadlessTool), and the client does not implement the tool or resume the graph.
Guard – No guard in the source automatically resolves the interrupt. The optional onTool callback only observes lifecycle events (start, success, error), it does not implement the tool. The source explicitly notes that “there is no .implement() API on the Python side.”
Posture – fail‑hard. The graph remains in the interrupted state indefinitely, producing no output.
Operator signal – The run hangs with no further log lines; the operator sees a stalled execution.
Recovery – Manual intervention required: inspect the payload, perform the action externally, and submit a resume command. No automatic retry or backoff is provided.

Server Info Unavailable in Local Development

Trigger – The tool runs outside LangGraph Server (e.g., during local development or testing), so runtime.server_info is None.
Guard – The explicit conditional if server is not None: in the tool get_assistant_scoped_data prevents access to server.assistant_id, server.graph_id, and server.user.identity when server info is absent.
Posture – fail‑soft. The tool skips the server‑specific prints and returns "done" normally; no data is used, but execution continues.
Operator signal – No error; the operator sees that the prints for assistant ID, graph ID, and user identity are absent (silent skip).
Recovery – No retry; the tool completes without server information. The developer can later deploy on LangGraph Server to obtain those fields.

Tool Calling — Deep Dive

01. What Tool Calling Is

Failure 1: Tool output exceeds maximum transcript length

Failure 2: Maximum turns exhausted without a definitive answer

Failure 3: Recursion / super‑step limit approached (proactive degradation)

Failure 4: Recursion / super‑step limit exceeded (reactive catch)

Failure 5: Tool uses runtime.stream_writer outside a LangGraph execution context

Failure 6: server_info is None when tool expects it on LangGraph Server

02. Defining A Tool

1. Excessive tool invocations due to ambiguous description

2. Tool returns overly large observation (schema allows unbounded text)

3. Tool uses runtime.stream_writer outside a LangGraph execution context

4. Accessing runtime.execution_info with an outdated library version

5. Tool fails to produce a usable result, exhausting the turn budget

03. The Round Trip

1. Graph recursion limit exceeded when tool-call loop does not converge

2. Tool execution failure (e.g., API call error, invalid arguments)

3. Precondition failure: tool uses runtime.stream_writer outside LangGraph execution

4. Server‑info unavailable when tool runs locally

5. Headless tool interrupt when no client‑side implementation is provided

04. Controlling Tool Choice

05. A Tool Loop Deployed

06. Validating Tool Calls

Failure 1: Recursion Limit Reached

Failure 2: Tool Call Argument Schema Mismatch

Failure 3: Tool Invoked Outside LangGraph Execution Context

Failure 4: Server Info Not Available When Expected

Failure 5: Tool Observation Exceeds Length Limit

07. Securing Tool Use

08. Scaling Tool Calling

Failure 5: Tool uses `runtime.stream_writer` outside a LangGraph execution context

Failure 6: `server_info` is `None` when tool expects it on LangGraph Server

3. Tool uses `runtime.stream_writer` outside a LangGraph execution context

4. Accessing `runtime.execution_info` with an outdated library version

3. Precondition failure: tool uses `runtime.stream_writer` outside LangGraph execution