01. What Tool Calling Is
Tool calling lets a large language model ask a host application to run an external function. The model never executes anything itself. Instead, it returns a structured request that names the tool and supplies its arguments, and the application then runs that tool and sends the result back. This solves a key limitation. A stateless model would otherwise have to guess at live data, real computations, or actions it cannot perform on its own. With tool calling, it can fetch current information from the web. It can also run code, query external databases, or take real actions in the world. The tool is simply a callable function with well-specified inputs and outputs. The model decides when to invoke a tool based on the conversation context, which makes agents far more capable than before. They are no longer limited to their training data. They gain live facts, real math, and the power to change things. The trade is that the application must host and secure every tool. The model can only request a call; it never runs anything, so the host decides if and when to execute. Still, the result is a remarkably powerful way to give a static language model genuinely dynamic abilities.
<!-- mem:begin -->Generate it: With tool calling, the model never runs the function itself — instead it returns a structured r______ that names the tool and supplies its arguments. (cue: r______; answer: request)
Generate it: Tool calling exists to fix one limitation: a s_______ model would otherwise have to guess at live data, real computations, or actions it cannot perform on its own. (cue: s_______; answer: stateless)
Ask yourself: If the model only emits a request and never executes anything, who actually runs the tool — and why does that split matter for security?
<!-- mem:end -->Recall check (try before reading the answer):
What does the model hand back instead of running the tool directly? — ____________________________________ Answer: A structured request that names the tool and supplies its arguments; the application then runs the tool and sends the result back.
What core limitation does tool calling solve? — ____________________________________ Answer: A stateless model would otherwise have to guess at live data, real computations, or actions it cannot perform on its own.
What is the trade for giving a static model dynamic abilities? — ____________________________________ Answer: The application must host and secure every tool; the model can only request a call, so the host decides if and when to execute.
A tool is defined with the @tool decorator and an agent uses it to answer a user query, showing the model requesting a tool call.
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
@tool(return_direct=True)
def fetch_order_status(order_id: str) -> str:
"""Fetch the current status of a customer order."""
return f"Order {order_id} is shipped and will arrive in 2 days."
agent = create_agent(
ChatOpenAI(model="google_genai:gemini-3.5-flash"),
tools=[fetch_order_status],
)
result = agent.invoke({
"messages": [{"role": "user", "content": "What is the status of order #12345?"}]
})
Imagine a librarian who can read every book ever written but cannot walk to the shelves. When you ask for a specific report, she doesn’t fetch it herself—she scribbles a detailed request slip with the title and location, then hands it to a runner who brings back the actual document. That’s exactly how tool calling works. The large language model never executes code or pulls live data. Instead, it returns a structured request that names a tool (like a search function or a database query) and provides the required arguments. The host application then runs that tool—for example, querying an external database for real-time information—and sends the result back to the model. This mechanism lets the model access up‑to‑the‑minute facts, run calculations, or take real actions in the world. Without tool calling, the model would be that brilliant librarian stuck at her desk, forced to guess everything. Ask it for today’s weather, and it would confidently invent a sunny forecast from old training data. No verification, no live updates, no real actions—just static guesses that quickly become useless.
In the tool‑calling subsystem, the ordered mechanism begins when the chat model returns a structured ToolCallRequest that names a tool and its arguments. The host application intercepts this request; if the tool is a HeadlessTool (defined only by name, description, and args_schema without a local implementation), the graph execution interrupts immediately rather than executing the tool locally. Your app inspects the payload, performs the action in the appropriate environment (e.g., a browser or human review), and then resumes the graph by feeding back the tool result. If the tool is locally defined with the @tool decorator, the node function runs the tool directly, wraps its output in a ToolMessage, and continues the graph. An optional onTool callback observes lifecycle events (start, success, error) to provide UI feedback. When return_direct=True is set on a tool, the tool’s output becomes the final response without an additional model call, halting the agent loop early.
The invariant that the design preserves is the state‑update contract enforced by the StateGraph framework: every node (including tool‑calling nodes) must emit updates to the State that are applied via user‑defined reducer functions, and control flow must respect the graph’s static and dynamic edges. This guarantees that the graph’s execution is always a deterministic function of the current state and the messages in transit. No tool call can sidestep the state‑update mechanism; even Command returns combine update and goto in a single atomic step. This prevents partial or inconsistent state modifications and ensures that checkpointers (set at compile time) can capture consistent snapshots for fault tolerance.
The key trade‑off is headless tools versus locally hosted tools. The design explicitly rejects the obvious alternative of executing every tool call within the graph’s own runtime (which would require shipping all tool logic to the model provider or hosting code server‑side). Instead, headless tools allow the application to interrupt the graph and delegate execution to an external service, a browser, or a human approval step. The cost this rejection avoids is the deployment complexity and security risk of running arbitrary third‑party code inside the agent’s process, as well as the inability to reuse existing enterprise systems without rewriting them. The trade‑off is latency: interrupting the graph and resuming it adds network round‑trips, but gains flexibility and separation of concerns.
A concrete failure mode occurs when the external service called by a headless tool crashes mid‑execution. The operator will see that the graph remains suspended at an interrupt point — the stream never receives a resume command and the onTool callback fires an error event with no ToolMessage returned. If a locally defined tool throws an exception, the middleware wrap_tool_call can catch it and either retry the call or return a custom error ToolMessage; the operator would see the error message in the trace logs and the agent may loop indefinitely if retry logic is misconfigured. In either case, the graph does not silently produce corrupted state because the state‑update contract enforces that no update is applied until the tool result is properly returned or the error is handled.
Failure 1: Tool output exceeds maximum transcript length
- Trigger — The tool returns an observation string longer than 8 000 characters.
- Guard — The conditional
if len(observation) > 8000:inside the agentic search graph truncates the observation and appends the literal marker"\n… (observation truncated)". - Posture — Fail‑soft. The graph continues execution with the truncated observation; no run is aborted.
- Operator signal — The observation string ends with
"\n… (observation truncated)". The operator sees the truncated content in the transcript. - Recovery — No retry or backoff. The truncated value is used as the tool result for the current step. The graph proceeds to the next node.
Failure 2: Maximum turns exhausted without a definitive answer
- Trigger — The agentic search graph iterates through tools for a configurable number of turns (
max_turns) without reaching an explicit answer. - Guard — The fallback assignment
fallback = transcript[-1] if transcript else ""and the subsequentrun.end(outputs={"answer": fallback, "steps": max_turns, "total_tokens": acc_tokens, "total_cost_usd": acc_cost, "exhausted": True}). - Posture — Fail‑soft. The run ends gracefully with the last observation as a best‑effort answer, and the
exhaustedflag is set toTrue. - Operator signal — The
exhaustedfield in the run output isTrue, and theanswerfield contains the truncated transcript or an empty string. - Recovery — No automatic retry. The user or orchestrator receives the best‑effort answer and can decide to re‑invoke with different parameters.
Failure 3: Recursion / super‑step limit approached (proactive degradation)
- Trigger — The graph’s recursion limit is near exhaustion (e.g.,
state["remaining_steps"] <= 2), as tracked by the managed valueRemainingSteps. - Guard — The conditional edge
route_decisionthat readsstate["remaining_steps"]and returns"fallback_node"instead of"reasoning_node". Thefallback_nodefunction returns{"messages": ["Reached complexity limit, providing best effort answer"]}. - Posture — Fail‑soft. The graph routes to a dedicated fallback node and completes normally without raising an exception.
- Operator signal — The message list contains
"Reached complexity limit, providing best effort answer"from thefallback_node. No error trace is raised. - Recovery — The graph finishes its execution path to the
ENDnode. The user receives the fallback message.
Failure 4: Recursion / super‑step limit exceeded (reactive catch)
- Trigger — The graph exhausts its
recursion_limit(e.g., passed as{"recursion_limit": 10}) and raises aGraphRecursionError. - Guard — The
except GraphRecursionError as e:block from the reactive example (outside the graph invocation). This guard catches the error and assignsresult = {"messages": ["Fallback: recursion limit exceeded"]}. - Posture — Fail‑hard (graph execution is aborted), then fail‑soft handled externally. The graph run stops with an exception, but the calling code provides a fallback result.
- Operator signal — A
GraphRecursionErrorexception is raised. The caller’s catch block prints or logs the error message. The final result is the fallback dictionary. - Recovery — No retry. The fallback dictionary is returned to the user. A manual re‑invocation with an increased recursion limit is required.
Failure 5: Tool uses runtime.stream_writer outside a LangGraph execution context
- Trigger — A tool contains a call to
runtime.stream_writerbut is invoked outside a LangGraph graph execution (e.g., during local testing or from a plain Python script). - Guard — No guard shown in the source. The documentation only warns: “the tool must be invoked within a LangGraph execution context.” No try/except, conditional check, or fallback is provided in the tool code exhibited.
- Posture — Fail‑hard. The stream writer operation will likely raise an exception (e.g.,
AttributeErrororRuntimeError) because the runtime context is missing. - Operator signal — An undefined error or exception is raised at the point of the
stream_writercall. The operator sees a traceback with no specific service‑side signal. - Recovery — No automatic recovery. The developer must modify the caller to run the tool within a proper LangGraph execution context (e.g., by invoking the compiled graph).
Failure 6: server_info is None when tool expects it on LangGraph Server
- Trigger — A tool accesses
runtime.server_infobut is running locally or during testing, not on LangGraph Server, causingserver_infoto beNone. - Guard — The explicit
if server is not None:check in the example toolget_assistant_scoped_data. When the guard is present, it avoidsAttributeErrorby skipping the print statements. - Posture — Fail‑soft. The tool runs without error; the server‑dependent code is omitted silently.
- Operator signal — No print output for assistant, graph, or user identity. The operator may notice the absence of those fields but no error is raised.
- Recovery — The tool returns
"done"normally. If the server info is required, the operator must run the graph on LangGraph Server. No retry mechanism is present in the source.