Agent Autonomy — Field Guide

🧭 9 chapters · the autonomy spectrum · agent loops · human-in-the-loop · durable execution · guardrails · go deeper with the LangGraph autonomy deep-dive →

01. The Autonomy Spectrum

There are two main ways to build a system. Workflows have predetermined code paths. They are designed to operate in a certain order. Agents are different. They are dynamic. Agents define their own processes and tool usage. Workflows give you a fixed sequence that runs step by step. Agents can adapt and choose their own path as they go. Many systems actually combine both ideas. You get the predictability of a fixed plan with the flexibility to change based on new information. That blend is common in real applications. For example, an agent might use a workflow for routine checks. That keeps things reliable. Then it switches to dynamic decision making for open ended problems. This way you do not lose control. But you also get the ability to handle unexpected tasks. So think of it as a spectrum. On one end you have strict workflows. On the other end you have free agents. Most real systems sit somewhere in between. They mix the two approaches to get the best of both worlds.

02. Levels Of Autonomy

A single call to a large language model can write a poem or a joke. That is the simplest level. Next, you might chain several calls in a fixed order. For example, an agent can write a story, then a joke, then a poem. But each step is predetermined. To add flexibility, you can use a router. The router decides which task to run based on the user’s request. That gives the model more control. The next level gives the model access to tools. An agent can call a tool to write a file or run a query. But you may want to pause for approval when the action is risky. That is where conditional interrupts come in. Finally, the agent can run in a loop. It remembers past interactions using short-term memory. It chooses its own next action and can decide when the task is done. Only climb to a higher level when the problem demands that extra freedom. For simple tasks, a single call or fixed chain is enough. More complex tasks need the ability to use tools or make decisions on the fly. That trade-off keeps the system efficient. Each jump adds power but also complexity. So you only take that step when the problem truly needs it.

03. The Agent Loop

An agent works in a loop. It looks at the current state of the conversation and decides what to do next. The runtime then performs that action and adds the result back to the state. This cycle repeats until the agent chooses to stop. Each step uses the large language model's context window. That window can only hold so many messages. Too many steps would push older messages out. That is why a limit on the number of steps is important. It prevents the loop from running without end. It also controls how much the process costs. The agent's memory is saved with a checkpointer. This lets the thread be picked up again later. The model makes each decision by routing the input to one of several possible tasks. Those tasks might be writing a story or a joke. After each task, the result feeds back into the state. The loop continues until the model signals it is finished. A bound on steps keeps everything practical and efficient.

04. Tools Turn Text Into Action

Giving a model tools lets it turn words into real actions. The model decides which tool to call and what arguments to use. A clear description and a typed signature help the model make the right choice. The runtime then runs the tool and returns the output as an observation. That observation enters the model's context and influences the next decision. This cycle repeats. The model picks another tool or generates a final response based on the new information.

Why does a clear description matter? The model needs to understand what each tool does. A good explanation guides it to pick the right one. A typed signature tells the model what kind of input the tool expects. For example, a tool for weather queries might require a city name as a string. Without that, the model could pass the wrong kind of data. That leads to errors.

When the tool runs, it produces a result. That result re-enters the model's memory. The model sees it as a new piece of information. It can then use that result to decide what to do next. Maybe it calls another tool. Or it gives a final answer. This back and forth creates a powerful loop. The model learns from each step.

In practice, the main agent keeps track of the conversation. It holds the context. Observations from tools become part of that context. The model can refer to them later. That makes the interaction feel natural and responsive. It is like having a conversation where each answer builds on the last one.

So giving a model tools with clear names, descriptions, and input types turns it from a simple text generator into an active problem solver. It can take real actions and react to what happens.

05. Humans In The Loop

The human in the loop pattern pauses an agent run for a person’s review. The pause happens when a tool call matches an interrupt condition. You can set allowed decisions like approve, edit, or reject. A predicate checks the tool’s arguments. If the check returns true, the run stops and waits. If false, the call runs without a pause. The run waits until you respond. That wait can last as long as needed. The agent’s state is saved by a checkpointer. So you can resume the run later from exactly where it paused. Your decision is threaded back into the agent. Then the agent continues from that point. The point of the pause is to catch actions that might be risky or hard to undo. For example, writing a file outside the workspace or changing a database. Those actions pause unless you approve them. That gives you control over important steps. The agent does not burn resources while waiting. It simply stops and holds its place. When you come back, you pick up right there. That makes the pattern practical for long running tasks.

06. Durable Execution

An agent uses a checkpointer to save its state. After each step, the runtime records the state. It stores that state in a database under a thread identifier. A thread groups all interactions in a single conversation. So if the run stops for any reason, the agent can resume exactly where it left off. It does not need to start over from scratch. The checkpoint holds the full message history. That includes human inputs and model responses. The agent can access the whole context for that thread. Short term memory is kept at thread level persistence. This keeps different conversations separate. The checkpointer stores state so the thread can be resumed at any time. That is the key benefit. The trade off is that each step must be safe to replay, but that is a detail for developers. The point is reliability. The agent can survive long waits. Because the state is saved, it never loses its place. Every interaction is recorded permanently. Even long conversations are preserved. They do not get lost. The agent can pick up later and continue smoothly. The system works across many turns in a thread. It makes long running agents practical.

07. Short And Long Memory

Agents use two kinds of memory. Short-term memory keeps track of the current conversation. It stores the list of messages that alternate between human and model responses. This list grows longer over time. Because models have limited context windows, a full message list can become too long. That forces the system to remove or summarize older messages to stay within the window. So short-term memory is tied to one thread, one session. It helps the model follow that single conversation.

Long-term memory is different. It saves information across different conversations or sessions. Instead of storing raw messages, it stores distilled facts. Those facts can be about a user's preferences or past experiences. The system writes them into custom namespaces. There are types of long-term memory: semantic memory for facts, episodic memory for past events, and procedural memory for rules. The model uses these stored facts to personalize responses later.

The two types differ in where and how they exist. Short-term memory lives inside the thread. It is automatically saved by a checkpointer so the thread can resume. Long-term memory exists outside any thread. It can be recalled in a completely new conversation. That means the agent can remember something learned yesterday even when starting a fresh chat today. Short-term memory handles the immediate flow. Long-term memory handles lasting knowledge. Each serves a different purpose, so they are stored separately.

08. Many Agents One System

When one agent tries to handle too many tasks, its memory gets cluttered. The model struggles to keep track of everything. It starts making worse decisions because the context window fills up with stale information. That is the core problem: a single broad agent becomes inefficient.

A better approach is to split the work among several focused agents, each responsible for a smaller domain. For example, a supervisor can route each task to the right agent. The source calls this a router. But routers are stateless. Each new request requires a separate routing call from the language model. That costs extra time and money.

Another pattern is handoffs. One agent stays active and passes control directly to another agent. State persists across turns, so the second agent does not start fresh. This saves calls on repeat requests. Skills work the same way. The relevant skill is already loaded in the conversation history, so the agent can call its tool without reloading everything.

Subagents are different. They start fresh each time, providing strong context isolation. But that isolation comes at a cost. Each invocation repeats the full flow, leading to four calls per turn instead of two.

The trade-off is clear. Stateful patterns like handoffs and skills save forty to fifty percent of calls on repeat requests. But they require more moving parts and more model calls to coordinate. The extra overhead buys you centralized control. So the price of better decisions is more complexity behind the scenes.

09. Guardrails For Autonomy

To put safe autonomy into practice, you gate every action that could cause harm. For example, the system can pause a tool call and ask a human to approve it before it runs. You can set conditions that only interrupt when the tool is about to do something risky. A file write that tries to go outside the workspace directory triggers a review. A SQL query that is not a read only SELECT also gets paused. Calls that pass the safety check run without any interruption. That way a reviewer only sees the actions that truly need a decision. The trade off is speed. If you pause every call, the agent becomes slow. By making the guardrails conditional you let safe actions flow automatically while still locking down dangerous ones. Autonomy is earned step by step as you add those checks.