A ReAct loop decides one step at a time, with only the immediate next action in view. A Plan-and-Execute agent inverts that: it spends one expensive call up front to decompose the whole task into a plan, then executes the steps, then replans โ finalizing if the task is done or revising the remaining steps if it is not. This article pairs with runnable code in this repo's agents-lab/ package (agents_lab/plan_execute.py) so you can run the loop, read the plan it emits, and watch the replanner decide when to stop. It is the runnable companion to agent architectures, where Plan-and-Execute sits on the more-structured end of the autonomy-vs-control spectrum.
Mental Model
Front-load the plan to reduce per-step drift; trade flexibility for reliability. A flat ReAct loop is myopic โ each thought sees only the last observation, so on a long-horizon task it wanders, repeats failed approaches, and burns the context window before reaching the goal. Plan-and-Execute hands the model the entire task at once and asks for a decomposition before any action. The plan becomes a global roadmap: the executor follows it step by step, and a separate replanner re-checks the remaining plan against what actually happened.
The split is the whole point. A planner is one big planning call that sees the task as a whole. An executor runs a single step at a time (and can itself be a small ReAct sub-loop or a single tool use call). A replanner is the control valve: after each step it either emits FINAL: <answer> to terminate, or revises the remaining steps. Tune that valve too eagerly and you collapse back into ReAct (replanning every step); tune it too lazily and the agent marches off a cliff following a stale plan. This is fundamentally an agent orchestration decision about where structure lives.
Contrast with ReAct
ReAct (interleave Thought โ Action โ Observation, repeat) makes one decision per turn. That works beautifully for short tool-use tasks and Q&A, and you should reach for it first โ see the ReAct lab. Its failure mode is long-horizon tasks: tasks with many dependent sub-steps, where a wrong early turn compounds, and where the model never zooms out to see the overall shape of the work.
Plan-and-Execute beats a flat ReAct loop precisely there:
- The planner sees the whole task at once. Decomposition happens with full context, so the agent commits to a coherent strategy instead of discovering it one myopic step at a time.
- Fewer wasted steps. A plan front-loads the structure, so the executor doesn't re-derive "what should I even do next?" on every turn. That cuts redundant searches and dead-end exploration.
- Modularity. The planner can be a stronger (slower, pricier) reasoning model while the executor uses a cheaper one โ the planning happens once, execution happens N times.
- Observability. The plan is a first-class artifact you can log, show the user, or diff across replans. ReAct only gives you a flat trace.
The cost is rigidity. A plan written before any action is a hypothesis about a world the agent hasn't touched yet. When execution reveals the plan was wrong, you must replan โ which is why the replanner, not the planner, is the hard part. Wang et al.'s Plan-and-Solve prompting (ACL 2023, arXiv:2305.04091) showed the core intuition even at the prompt level: explicitly asking a model to "devise a plan, then carry out the subtasks" reduces missing-step errors in zero-shot chain-of-thought. Plan-and-Execute agents lift that idea from a single prompt into a multi-call control loop.
| ReAct | Plan-and-Execute | |
|---|---|---|
| Decision granularity | one step at a time | whole-task plan up front |
| Global view of task | no (myopic) | yes (planner sees everything) |
| Best for | short tool-use, Q&A | long-horizon, multi-step workflows |
| Main failure mode | wandering, redundant steps | stale plan after the world changes |
| Replanning | implicit (next thought) | explicit (replanner finalizes or revises) |
The Planning Taxonomy
Huang et al., "Understanding the Planning of LLM Agents: A Survey" (2024, arXiv:2402.02716), gives the first systematic map of how LLM agents plan. Their taxonomy is the vocabulary for why Plan-and-Execute is one design point among several:
- Task decomposition. Break the goal into smaller sub-tasks and solve them in sequence. This is the backbone of Plan-and-Execute โ the planner is the decomposer. Two flavors: decompose-then-plan (split first, plan each piece) and interleaved (decompose and execute together, closer to ReAct).
- Plan selection (multi-plan). Generate several candidate plans and choose among them rather than committing to the first. Tree-of-Thoughts and LATS-style search live here; you trade many more LLM calls for the chance to pick a better route.
- Reflection and refinement. Critique the plan or its results and revise โ the replanner in Plan-and-Execute is a lightweight reflection step, and Reflexion-style verbal self-critique is the heavier version.
- External module-aided. Lean on a symbolic planner, solver, or verifier outside the LLM to produce or check the plan (e.g., LLM+P handing off to a PDDL planner).
- Memory-augmented. Condition planning on retrieved past experience โ episodic memory of what worked before โ so the agent plans better on familiar tasks across sessions.
The lab's plan_execute.py deliberately implements the simplest useful slice: decomposition (planner) + reflection/refinement (replanner). Plan selection (LATS) and memory live in sibling modules so you can compose them later.
How the Lab Implements It
The implementation is a small LangGraph state machine with three nodes โ plan, execute, replan โ and a routing function that closes the loop. The protocol is kept deliberately line-oriented so it is easy to test and easy to read:
- Planner emits a numbered plan, one step per line.
- Executor runs exactly the first remaining step and reports only its result, with prior
(step, result)pairs supplied as context. - Replanner emits
FINAL: <answer>to stop, or fresh step lines to continue. - A
max_stepsguard bounds the replan loop so a model that never saysFINALstill terminates (plus a LangGraphrecursion_limitas a hard backstop).
PLANNER_SYSTEM = (
"Devise a short numbered plan (one step per line) to solve the task. "
"Keep it to the minimum number of concrete steps."
)
EXECUTOR_SYSTEM = "Execute exactly this one step and report only its result."
REPLANNER_SYSTEM = (
"Given the task and the steps already done, either reply 'FINAL: <answer>' if "
"the task is solved, or give the remaining steps (one per line)."
)
The routing function is where the reliability/flexibility trade-off is encoded โ it stops on a FINAL answer, on an empty plan, or when the step budget is exhausted, and otherwise loops back to execute the next step:
def route_after_replan(state) -> str:
if state.get("answer") is not None:
return END
if not state.get("plan") or state.get("steps", 0) >= max_steps:
return END
return "execute"
Note that every execution turn appends a (step, result) pair to past_steps, and the replanner sees that full history. That is what lets it finalize once the accumulated results already answer the task โ instead of dutifully executing leftover steps that are no longer needed.
Run it
The module lives at agents-lab/agents_lab/plan_execute.py. DeepSeek is the only paid API this lab calls โ set DEEPSEEK_API_KEY in your environment first.
From Python:
from agents_lab.plan_execute import run_plan_execute
final = run_plan_execute("Multiply 6 and 7, then state it", max_steps=8)
print(final["answer"]) # -> e.g. "6 multiplied by 7 is 42."
print(final["past_steps"]) # -> [(step, result), (step, result), ...]
run_plan_execute returns the final state dict: plan (any steps left, usually empty at the end), past_steps (the executed (step, result) pairs โ your execution trace), and answer (the replanner's FINAL string). Inspecting past_steps is the easiest way to see decomposition in action: the planner split a two-part task into discrete steps, and the replanner stopped as soon as the results sufficed.
From the CLI:
uv run python -m agents_lab.cli plan-execute "Multiply 6 and 7, then state it"
The CLI dispatches to the same run_plan_execute and enforces the same --max-steps budget (default in state.DEFAULT_MAX_STEPS). Try a longer-horizon prompt โ "Outline a blog post on RAG, then write the intro paragraph" โ and watch the plan grow to several steps and the replanner revise before finalizing.
Things to try
- Shrink the budget. Run with
max_steps=2on a task that genuinely needs more steps and watch the guard force termination โ the agent returns its best partial state instead of looping forever. This is the termination-strategy lesson from agent architectures made concrete. - Compare against ReAct. Run the same multi-step task through
agents_lab.cli reactandplan-execute. On a short task ReAct wins on latency; on a long, dependent task Plan-and-Execute usually reaches the goal in fewer wasted turns. - Make the replanner eager vs. lazy. Edit
REPLANNER_SYSTEMto replan aggressively (rewrite the whole remaining plan every turn) versus conservatively (only revise on failure). Eager replanning degenerates toward ReAct; lazy replanning risks following a stale plan. You are tuning exactly the control valve described in the mental model.
When to Reach for It
Use Plan-and-Execute when the task is long-horizon and multi-step, when sub-steps have dependencies that a planner should resolve up front, when you want a stronger model planning and a cheaper model executing, or when you need the plan itself as an auditable artifact. Stay with a flat ReAct loop for short tool-use and Q&A where a global plan adds latency without buying reliability โ and remember that the harder the world changes mid-task, the more your replanner, not your planner, determines whether the agent succeeds. From there, plan selection (search over multiple plans) and memory-augmented planning are the next steps up the survey's taxonomy, and they compose cleanly on top of the decomposition core you just ran.