This is a code-first companion to agent architectures. That lesson surveys the theory across ReAct, Reflexion, Plan-and-Execute, and LATS; here we slow down on the simplest of them and build a working ReAct agent you can run in this repo. The goal is to make the loop concrete: you will see exactly where reasoning becomes a tool call, where a tool result re-enters the prompt, and where the loop is forced to stop. Everything below maps onto runnable code in the agents-lab/ Python package.
ReAct comes from Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (ICLR 2023, arXiv:2210.03629). The paper's claim is small and durable: if you interleave a model's reasoning traces with the actions it takes, the two reinforce each other. Reasoning decides which action to take and how to interpret what comes back; actions pull real observations from the world that keep the reasoning honest.
Mental Model
ReAct is chain-of-thought with its eyes open: the model thinks one step, acts once, looks at the result, and only then thinks again. Pure chain-of-thought reasons in a closed room — it can produce a fluent multi-step argument that is confidently wrong because nothing ever checks it against reality. ReAct breaks the chain into single steps and inserts a real observation between each one. That observation is ground truth from a tool, not the model's guess, so the next thought is conditioned on a fact rather than on the model's own prior tokens. This is why the paper reports that ReAct reduces the hallucination and error-propagation that plague chain-of-thought-only prompting on knowledge tasks (HotpotQA, FEVER): a wrong intermediate belief gets corrected by the next observation instead of compounding.
The cost of opening the loop is that it might never close. A reasoning-only prompt always terminates — it emits text once. A loop can spin forever, repeating a failing action or chasing a dead end. So a real ReAct agent is two things welded together: the reason-act-observe cycle, and a hard budget that guarantees the cycle ends.
What the paper actually showed
The original ReAct prompt is a flat text trace of three repeating tags:
Thought: I need the population of Tokyo before I can compute the ratio.
Action: search("Tokyo population 2024")
Observation: Approximately 13.96 million in the city proper.
Thought: Now I have what I need.
Action: finish("about 14 million")
Two ablations in the paper are worth internalizing because they justify every line of the loop:
- Reasoning without acting (chain-of-thought only) hallucinates. The model invents an "observation" it never retrieved and then reasons confidently from the fiction. Errors propagate forward with nothing to stop them.
- Acting without reasoning (tool calls with no thought traces) flails. Without an explicit thought to decide why the next action, the model picks tools poorly and cannot recover when an observation is surprising.
ReAct's contribution is the synergy: the thought before each action is where decomposition, tool selection, and error recovery happen, and the observation after each action is where the model's beliefs get re-grounded in fact. On the interactive benchmarks (ALFWorld, WebShop) the same interleaving let a one- or two-shot prompted model beat imitation- and RL-trained baselines, because the thoughts let it plan and re-plan inside the trajectory.
Modern implementations rarely parse Thought:/Action: text by hand. Instead the "action" is a structured function calling request and the "observation" is the tool's return value fed back as a tool message. The semantics are identical; only the transport changed from regex-parsed text to JSON tool calls. If you have not seen how tools are declared and invoked, tool use covers the mechanics this lab assumes.
The loop, made of two nodes
The runnable agent in this repo is the explicit-graph version of what LangGraph ships as a prebuilt react agent. Modeling it as a graph makes the two halves of ReAct literal nodes:
- an
agentnode — one LLM call, the model bound to its tools, producing either a final answer or one-or-more tool calls (the "reason" step); - a
ToolNode— runs whatever tool calls the agent emitted and appends their results as observations (the "act + observe" step).
A conditional edge sits between them. After the agent node, it inspects the last message: if the model requested tool calls and the step budget is not exhausted, route to the tools node; otherwise stop. The tools node always routes back to the agent. That single conditional edge is where both the loop and its termination live.
from langgraph.graph import END, StateGraph
from langgraph.prebuilt import ToolNode
def build_react_agent(llm, tools, max_steps):
tool_node = ToolNode(tools)
def agent_node(state):
response = llm.bind_tools(tools).invoke(state["messages"])
update = {"messages": [response], "steps": state.get("steps", 0) + 1}
if not response.tool_calls: # no action -> this is the answer
update["answer"] = response.content
return update
def route(state):
last = state["messages"][-1]
if last.tool_calls and state["steps"] < max_steps:
return "tools" # keep looping
return END # answered, or out of budget
graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", route, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")
return graph.compile()
The shape — agent node, ToolNode, conditional edge back to agent — is the canonical LangGraph ReAct skeleton. Writing it out by hand (rather than calling the prebuilt helper) is the whole point of the lab: the loop has no hidden machinery.
Why the step budget is load-bearing
The most important line above is state["steps"] < max_steps. ReAct's strength — re-deciding after every observation — is also its failure mode: a model can loop indefinitely, re-calling the same tool, or oscillate between two actions, burning tokens forever. The paper's benchmarks ran in bounded environments; a production loop has no such guarantee.
So max_steps is not a tuning knob, it is a correctness requirement. The graph also passes a recursion_limit to LangGraph as a hard backstop, so even a malformed routing decision cannot run away. When the budget is hit mid-task, the agent returns its best current message rather than crashing — graceful degradation over an infinite hang. This is the cheapest, bluntest termination strategy on the autonomy-vs-control spectrum that agent architectures lays out, and for a minimal ReAct loop it is exactly the right one.
The tools
The lab binds an offline-safe, open-source tool set so you can run the loop without any external services beyond the LLM. The two default tools are plain LangChain @tool functions:
calculator(expression)— evaluates an arithmetic expression and returns the result as a string;read_file(path)— reads a local file and returns its contents.
These are deliberately tiny. They exist to give the loop a real observation to react to — a computed number, a file's contents — so you can watch reasoning get re-grounded in fact. The arithmetic task in the next section forces at least one calculator call, which means at least one real reason-act-observe cycle before the model can answer.
from langchain_core.tools import tool
@tool
def calculator(expression: str) -> str:
"""Evaluate an arithmetic expression like '17 * 23 - 100'."""
...
OFFLINE_TOOLS = [calculator, read_file]
Run it
The runnable agent lives at agents-lab/agents_lab/react.py. The model is DeepSeek — the only paid API in the lab — wired up in agents_lab/llm.py; every tool is open-source and runs locally, so DeepSeek is the single external dependency.
From Python:
from agents_lab.react import run_react
final = run_react("What is 17 * 23, minus 100?")
print(final["answer"]) # -> 291
run_react returns the full final state, not just a string: final["messages"] is the complete reason-act-observe trace (system prompt, your question, the model's tool call, the calculator's observation, the final answer), and final["answer"] is the model's last message once it stops calling tools. Reading the message list is the fastest way to see the loop actually loop.
From the CLI:
uv run python -m agents_lab.cli react "What is 17 * 23, minus 100?"
For this task you should see one trip through the loop: the agent node reasons that it needs arithmetic and emits a calculator tool call, the ToolNode runs it and appends 391 - 100 = 291 as an observation, the agent node runs again, sees the tool result, calls no further tool, and returns 291. Swap in a question that needs no tool ("What is the capital of France?") and the loop terminates after a single agent step with no tool call at all — the same graph, a zero-action trajectory.
Where to go next
This is the floor, not the ceiling. Once the bare loop is clear, the other patterns in agent architectures are each a specific addition to it: Reflexion wraps a memory of past failures around the loop, Plan-and-Execute front-loads a plan to cut per-step drift, and LATS explores multiple loop futures with backtracking. They all live in the same agents-lab/ package, built on the same AgentState and tool set, so you can diff them against this file to see precisely what each one adds. And whether any of them actually succeeds is a question for agent evaluation, which scores the whole trajectory in final["messages"], not just the final string.