โ† all lessons/๐Ÿค– Phase 4 ยท Agents & Orchestration/#102
Lesson 10 of 19 in Phase 4 ยท Agents & Orchestration

Plan-and-Execute Agents: Decompose First, Then Act

๐Ÿค– Phase 4 ยท Agents & OrchestrationIntermediate~9 min read
Recommended prerequisite:#101 Reflexion: Self-Improving Agents via Verbal Reinforcement
โ† PreviousReflexion: Self-Improving Agents via Verbal ReinforcementNext โ†’LATS: Tree Search Agents with Monte-Carlo Tree Search

A ReAct loop decides one step at a time, with only the immediate next action in view. A Plan-and-Execute agent inverts that: it spends one expensive call up front to decompose the whole task into a plan, then executes the steps, then replans โ€” finalizing if the task is done or revising the remaining steps if it is not. This article pairs with runnable code in this repo's agents-lab/ package (agents_lab/plan_execute.py) so you can run the loop, read the plan it emits, and watch the replanner decide when to stop. It is the runnable companion to agent architectures, where Plan-and-Execute sits on the more-structured end of the autonomy-vs-control spectrum.

Mental Model

Front-load the plan to reduce per-step drift; trade flexibility for reliability. A flat ReAct loop is myopic โ€” each thought sees only the last observation, so on a long-horizon task it wanders, repeats failed approaches, and burns the context window before reaching the goal. Plan-and-Execute hands the model the entire task at once and asks for a decomposition before any action. The plan becomes a global roadmap: the executor follows it step by step, and a separate replanner re-checks the remaining plan against what actually happened.

The split is the whole point. A planner is one big planning call that sees the task as a whole. An executor runs a single step at a time (and can itself be a small ReAct sub-loop or a single tool use call). A replanner is the control valve: after each step it either emits FINAL: <answer> to terminate, or revises the remaining steps. Tune that valve too eagerly and you collapse back into ReAct (replanning every step); tune it too lazily and the agent marches off a cliff following a stale plan. This is fundamentally an agent orchestration decision about where structure lives.

Contrast with ReAct

ReAct (interleave Thought โ†’ Action โ†’ Observation, repeat) makes one decision per turn. That works beautifully for short tool-use tasks and Q&A, and you should reach for it first โ€” see the ReAct lab. Its failure mode is long-horizon tasks: tasks with many dependent sub-steps, where a wrong early turn compounds, and where the model never zooms out to see the overall shape of the work.

Plan-and-Execute beats a flat ReAct loop precisely there:

  • The planner sees the whole task at once. Decomposition happens with full context, so the agent commits to a coherent strategy instead of discovering it one myopic step at a time.
  • Fewer wasted steps. A plan front-loads the structure, so the executor doesn't re-derive "what should I even do next?" on every turn. That cuts redundant searches and dead-end exploration.
  • Modularity. The planner can be a stronger (slower, pricier) reasoning model while the executor uses a cheaper one โ€” the planning happens once, execution happens N times.
  • Observability. The plan is a first-class artifact you can log, show the user, or diff across replans. ReAct only gives you a flat trace.

The cost is rigidity. A plan written before any action is a hypothesis about a world the agent hasn't touched yet. When execution reveals the plan was wrong, you must replan โ€” which is why the replanner, not the planner, is the hard part. Wang et al.'s Plan-and-Solve prompting (ACL 2023, arXiv:2305.04091) showed the core intuition even at the prompt level: explicitly asking a model to "devise a plan, then carry out the subtasks" reduces missing-step errors in zero-shot chain-of-thought. Plan-and-Execute agents lift that idea from a single prompt into a multi-call control loop.

ReActPlan-and-Execute
Decision granularityone step at a timewhole-task plan up front
Global view of taskno (myopic)yes (planner sees everything)
Best forshort tool-use, Q&Along-horizon, multi-step workflows
Main failure modewandering, redundant stepsstale plan after the world changes
Replanningimplicit (next thought)explicit (replanner finalizes or revises)

The Planning Taxonomy

Huang et al., "Understanding the Planning of LLM Agents: A Survey" (2024, arXiv:2402.02716), gives the first systematic map of how LLM agents plan. Their taxonomy is the vocabulary for why Plan-and-Execute is one design point among several:

  • Task decomposition. Break the goal into smaller sub-tasks and solve them in sequence. This is the backbone of Plan-and-Execute โ€” the planner is the decomposer. Two flavors: decompose-then-plan (split first, plan each piece) and interleaved (decompose and execute together, closer to ReAct).
  • Plan selection (multi-plan). Generate several candidate plans and choose among them rather than committing to the first. Tree-of-Thoughts and LATS-style search live here; you trade many more LLM calls for the chance to pick a better route.
  • Reflection and refinement. Critique the plan or its results and revise โ€” the replanner in Plan-and-Execute is a lightweight reflection step, and Reflexion-style verbal self-critique is the heavier version.
  • External module-aided. Lean on a symbolic planner, solver, or verifier outside the LLM to produce or check the plan (e.g., LLM+P handing off to a PDDL planner).
  • Memory-augmented. Condition planning on retrieved past experience โ€” episodic memory of what worked before โ€” so the agent plans better on familiar tasks across sessions.

The lab's plan_execute.py deliberately implements the simplest useful slice: decomposition (planner) + reflection/refinement (replanner). Plan selection (LATS) and memory live in sibling modules so you can compose them later.

How the Lab Implements It

The implementation is a small LangGraph state machine with three nodes โ€” plan, execute, replan โ€” and a routing function that closes the loop. The protocol is kept deliberately line-oriented so it is easy to test and easy to read:

  • Planner emits a numbered plan, one step per line.
  • Executor runs exactly the first remaining step and reports only its result, with prior (step, result) pairs supplied as context.
  • Replanner emits FINAL: <answer> to stop, or fresh step lines to continue.
  • A max_steps guard bounds the replan loop so a model that never says FINAL still terminates (plus a LangGraph recursion_limit as a hard backstop).
python
PLANNER_SYSTEM = (
    "Devise a short numbered plan (one step per line) to solve the task. "
    "Keep it to the minimum number of concrete steps."
)
EXECUTOR_SYSTEM = "Execute exactly this one step and report only its result."
REPLANNER_SYSTEM = (
    "Given the task and the steps already done, either reply 'FINAL: <answer>' if "
    "the task is solved, or give the remaining steps (one per line)."
)

The routing function is where the reliability/flexibility trade-off is encoded โ€” it stops on a FINAL answer, on an empty plan, or when the step budget is exhausted, and otherwise loops back to execute the next step:

python
def route_after_replan(state) -> str:
    if state.get("answer") is not None:
        return END
    if not state.get("plan") or state.get("steps", 0) >= max_steps:
        return END
    return "execute"

Note that every execution turn appends a (step, result) pair to past_steps, and the replanner sees that full history. That is what lets it finalize once the accumulated results already answer the task โ€” instead of dutifully executing leftover steps that are no longer needed.

Run it

The module lives at agents-lab/agents_lab/plan_execute.py. DeepSeek is the only paid API this lab calls โ€” set DEEPSEEK_API_KEY in your environment first.

From Python:

python
from agents_lab.plan_execute import run_plan_execute

final = run_plan_execute("Multiply 6 and 7, then state it", max_steps=8)
print(final["answer"])      # -> e.g. "6 multiplied by 7 is 42."
print(final["past_steps"])  # -> [(step, result), (step, result), ...]

run_plan_execute returns the final state dict: plan (any steps left, usually empty at the end), past_steps (the executed (step, result) pairs โ€” your execution trace), and answer (the replanner's FINAL string). Inspecting past_steps is the easiest way to see decomposition in action: the planner split a two-part task into discrete steps, and the replanner stopped as soon as the results sufficed.

From the CLI:

bash
uv run python -m agents_lab.cli plan-execute "Multiply 6 and 7, then state it"

The CLI dispatches to the same run_plan_execute and enforces the same --max-steps budget (default in state.DEFAULT_MAX_STEPS). Try a longer-horizon prompt โ€” "Outline a blog post on RAG, then write the intro paragraph" โ€” and watch the plan grow to several steps and the replanner revise before finalizing.

Things to try

  • Shrink the budget. Run with max_steps=2 on a task that genuinely needs more steps and watch the guard force termination โ€” the agent returns its best partial state instead of looping forever. This is the termination-strategy lesson from agent architectures made concrete.
  • Compare against ReAct. Run the same multi-step task through agents_lab.cli react and plan-execute. On a short task ReAct wins on latency; on a long, dependent task Plan-and-Execute usually reaches the goal in fewer wasted turns.
  • Make the replanner eager vs. lazy. Edit REPLANNER_SYSTEM to replan aggressively (rewrite the whole remaining plan every turn) versus conservatively (only revise on failure). Eager replanning degenerates toward ReAct; lazy replanning risks following a stale plan. You are tuning exactly the control valve described in the mental model.

When to Reach for It

Use Plan-and-Execute when the task is long-horizon and multi-step, when sub-steps have dependencies that a planner should resolve up front, when you want a stronger model planning and a cheaper model executing, or when you need the plan itself as an auditable artifact. Stay with a flat ReAct loop for short tool-use and Q&A where a global plan adds latency without buying reliability โ€” and remember that the harder the world changes mid-task, the more your replanner, not your planner, determines whether the agent succeeds. From there, plan selection (search over multiple plans) and memory-augmented planning are the next steps up the survey's taxonomy, and they compose cleanly on top of the decomposition core you just ran.

โ† PreviousReflexion: Self-Improving Agents via Verbal ReinforcementNext โ†’LATS: Tree Search Agents with Monte-Carlo Tree Search