01. Why Outreach Is A Graph
Cold outreach is not a single bulk send. It is a directed graph of small, gated steps. Each step owns exactly one concern: who to contact, whether we may, what to say, and when to follow up. The most important rule is that the graph drafts copy but never sends. Sending is a separate decision that the caller owns every time.
The same drafting engine is reused in three distinct ways. First, an autonomous pipeline sends without a human in the loop. Second, a campaign pauses for approval by a human. Third, a preview that shows only a draft and nothing else. Because the graph never assumes how it is invoked, the caller decides approval and sending.
The system keeps a registry that lists each graph by a short name. It pairs the name with the module that builds it. The outreach graph, the compose graph, the reply graph, and the durable campaign engine are all separate entries. The outreach graph returns a subject line, a text-only body, and an HTML body. It also returns bookkeeping. That bookkeeping is a skip reason, an engagement signal, and the time of the next touch. The graph never calls send.
Now consider the three-way trade-off. A single big send function has the fewest moving parts. But it has one failure domain and no place to insert a human or a safety gate. Separate microservices per flow give clean isolation. They make you pay a platform tax — deployment, tracing, and state plumbing — once per flow. A shared graph runtime fronted by a registry gives gated, traceable steps with additive growth. The cost is one routing layer and the discipline to keep the registry simple.
Failure modes matter. One is a missing contact row in the database. That leaves the personalization with nothing to stand on. Another is a stale engagement signal. An open recorded late will bias the next-touch gap the wrong way. The suppression gate fails closed: if the check cannot complete, the contact is treated as suppressed. This avoids risking a wrong send.
The design rationale is a deliberate choice. The team rejected a single monolithic function because it offered no seam for a human gate. They also rejected per-flow microservices because the platform tax multiplied with each new flow. The shared graph runtime with a registry was chosen for its additive growth and traceability.
End with a transferable rule. Use this shape when safety, grounding, and observability each need their own testable seam, and when the same copy engine must be reused under different approval policies. Do not use it when the number of verticals overwhelms static configuration or when you cannot measure the faithfulness judge’s accuracy.
<!-- mem:begin -->Generate it: The most important rule is that the graph drafts copy but never _____. (cue: never _____; answer: sends)
Generate it: Because the graph never assumes how it is invoked, the ______ decides approval and sending. (cue: the ______; answer: caller)
Ask yourself: Why does the graph draft copy but never send it itself?
Answer: Sending is a separate decision the caller owns every time, so the same engine can serve an autonomous pipeline, a human-approved campaign, or a draft-only preview without assuming how it was invoked.
<!-- mem:end -->Recall check (try before reading the answer):
What three distinct ways is the one drafting engine reused? Answer: An autonomous pipeline (no human), a campaign that pauses for human approval, and a preview that shows only a draft.
Besides subject and bodies, what bookkeeping does the outreach graph return? Answer: A skip reason, an engagement signal, and the time of the next touch.
Why was the per-flow microservices option rejected? Answer: It makes you pay a platform tax — deployment, tracing, and state plumbing — once per flow.
The outreach graph is a directed graph of small, gated steps that drafts copy but never sends, with the caller owning the send decision.
"""Email outreach graph.
Flow:
lookup_contact
→ suppression_gate (E22: central do-not-contact suppression list check — fail-closed)
→ check_stop_conditions (skip if recipient already replied / bounced / unsubscribed)
→ decide_cadence (V81: engagement-aware next-touch scheduling)
→ select_template (free-form: always returns no template)
→ select_sequence (V84: emit structured {sequence_id, touches} plan for the vertical)
→ extract_hook
→ draft_step (V38: per-vertical multi-step copy; falls back to draft node for
step=0 when no vertical is set — backward-compatible)
→ draft (free-form cold email referencing the hook — step=0 fallback)
→ format_html
Produces {subject, text, html, contact_id, skip_reason}. When ``skip_reason``
is set the graph short-circuits before any LLM/IO work and returns an empty
draft for the resolver layer to handle.
"""
from langgraph.graph import END, START, StateGraph
log = logging.getLogger(__name__)
Think of this outreach system like a restaurant kitchen where each station has one job—someone preps vegetables, another grills, another plates—but no station is allowed to serve the food. Serving is the waiter’s job. That’s the core idea: cold outreach is not one big “send” button; it’s a directed graph of small, gated steps, each owning exactly one concern—who to contact, whether we may, what to say, and when to follow up. The most important rule is that the graph drafts copy but never sends. Sending is a separate decision the caller owns every time.
Concretely, the same drafting engine is reused in three distinct ways. An autonomous pipeline sends without a human in the loop—like a self-serve salad bar. A human-approved campaign pauses for a person to check each draft before it goes out—like a tasting menu. And a one-shot preview just shows a draft without sending anything—like looking at a recipe. Because the graph never assumes how it’s invoked, it stays safe and flexible.
Without this separation, a single big send function would have one failure domain: a bug or a misstep could blast unsolicited emails, burn reputation, or skip compliance checks. There’d be nowhere to insert a human gate or a safety check. A beginner would feel that chaos—a missed bounce, an accidental repeat send, or a fabricated claim that erodes trust. The graph keeps each risk isolated and inspectable.
-
registry lookup — Resolves the graph identity by its short name from the registry, pairing it with the module that builds the outreach graph.
reads registry record foroutreach_graph; writes graph builder instance.
branch: No early return; happy path returns the graph builder. -
look up the contact — Reads the contact from the database once and loads their role, seniority, department, and profile into the working state.
reads contact database row; writes role, seniority, department, profile into working state.
branch: Missing contact row leaves personalization with nothing to stand on (failure); happy path proceeds with snapshot. -
suppression gate — Checks a central do-not-contact list using a one-way fingerprint of the email address plus the domain; fails closed if the check cannot be completed.
reads fingerprint (email+domain hash) from contact; writes audit record of the decision.
branch: Contact on list → end run with skip reason (early return); not suppressed → continue. -
stop conditions — Examines the contact’s current thread state and ends the run with a machine‑readable reason if any stop condition holds (replied, bounced, unsubscribed, unverified).
reads thread_state from contact; writes reason (one of “replied”, “bounced”, “unsubscribed”, “unverified”).
branch: Any condition true → end run with reason; none → continue to next step. -
plan the sequence — Looks up the sequence definition from the vertical‑level
VERTICAL_SEQUENCE_DEFSmap, with sub‑niche‑level overrides if one exists.
reads vertical and sub_niche from contact snapshot; writes sequence_def (touch_angles, steps, cadence_days, fallback_step).
branch: Missing sub_niche or no match → fall back to vertical‑level definition; happy path picks sub‑niche variant if present. -
extract the hook — Reads the supplied post text (a recent public post or job description) and picks exactly one concrete hook to ground the opener.
reads post_text from request; writes hook (single grounded fact).
branch: Emptypost_text→ failure mode (opener has nothing real); non‑empty → happy path. -
drafting step — Looks up the directive for the current step index from the sequence definition and writes copy that fits that step’s role (opener, value, or soft close).
reads step_index, steps directives from sequence_def, optional opportunity link; writes draft (body text).
branch: Step index past end of sequence → usesfallback_step(generic drafting); within range → per‑step directive used. -
faithfulness gate — Uses a judge model to audit the draft against the assembled evidence (the hook and contact profile), removing any sentence whose claim is not supported.
reads draft, evidence (hook + contact snapshot); writes filtered_draft, score (0–1).
branch: Over‑aggressive judge may strip a true but tersely worded claim; empty evidence set → gate has nothing to compare. -
return_output — Compiles the final result: subject line, plain‑text body, HTML body, skip reason (if any), engagement signal, and next touch time.
reads filtered_draft, skip_reason, thread_state; writes subject, plain_body, html_body, engagement_signal, next_touch_time.
branch: No early return; always produces the output struct. The caller owns the send decision.
The subsystem is a directed graph of small, gated steps, each owning exactly one concern. The ordered mechanism begins with the lookup step, which reads the contact from the database once and loads role, seniority, department, and profile into working state. Next, the suppression gate runs early: it checks a central do-not-contact list keyed on a one‑way fingerprint of the email address plus domain, and fails closed—any incomplete check treats the contact as suppressed. The stop conditions step then examines the contact’s current thread state, ending the run with a distinct machine‑readable reason if the contact has already replied, bounced, unsubscribed, or has an unverified address. Only after these guards pass does the vertical sequence selector perform a deterministic lookup for the contact’s vertical (and an optional narrower niche), returning a structured three‑touch plan. The hook step then reads the supplied post text and extracts exactly one concrete grounded fact. The drafting step uses a directive lookup per step in the sequence, with a generic fallback, and writes the body. Finally, the faithfulness gate uses a judge model to audit every personalized sentence against the assembled evidence, removing any unsupported claim. The entire graph returns a subject line, plain‑text body, HTML body, and bookkeeping (skip reason, engagement signal, next‑touch time)—but never calls send. That sending decision is always owned by the caller.
The invariant the design preserves is stated explicitly: “the graph drafts copy but never sends.” This rule is the single most important structural guarantee. It means the graph is stateless with respect to transmission and can be invoked identically by three distinct flows: an autonomous pipeline that sends without a human, a human‑approved campaign that pauses for sign‑off, and a one‑shot preview that shows only a draft. Because the graph never assumes how it is invoked, the caller decides approval and sending every time, and no accidental send can escape from the drafting steps. The design also ensures that all personalized claims are grounded in evidence, enforced by the faithfulness gate that produces a score between zero and one, posted as feedback for observability.
The key trade‑off behind this shape rejects two obvious alternatives. A single big send function has the fewest moving parts but creates “one failure domain and nowhere to insert a human or a safety gate”—a monolithic routine cannot pause for approval or run a suppression check without coupling it into the same code path. Separate microservices per flow give clean isolation but “make you pay the platform tax — deployment, tracing, state plumbing — once per flow.” The chosen design uses a shared graph runtime fronted by a registry, yielding “gated, traceable steps with additive growth, at the cost of one routing layer and the discipline to keep the registry simple.” This cost is accepted because it prevents the monolithic send’s inflexibility and avoids the per‑service overhead of independent microservices.
A concrete failure mode in this subsystem is “a step index past the end of the sequence.” This occurs when the drafting step looks up a directive at an index that does not exist in the sequence definitions—for example, after a sequence selector returned a plan with three touches but the engine tries to compose a nonexistent fourth touch. The signal an operator would actually see is a skip reason logged against that contact in the outreach graph’s bookkeeping fields, specifically the skip reason that short‑circuits the contact’s processing. The trace log would show that the graph stopped early for that thread, with no draft returned, and the counter “email.compose.vertical_hook_rate” would not increment because no hook was ever extracted.
cadence_days default
- Knob —
cadence_days: [0, 4, 7]inVERTICAL_SEQUENCE_DEFS - Bounds — Controls the minimum days between successive touches in a sequence.
- Effect — Larger values stretch the campaign timeline, increasing latency before each follow‑up; smaller values compress the schedule, raising request throughput and the potential for faster iteration.
- Risk — Too short risks appearing aggressive or violating sender‑reputation limits; too long lets leads go cold or the campaign stall.
fallback_step
- Knob —
fallback_step: 2inVERTICAL_SEQUENCE_DEFS(integer) - Bounds — Defines which step directive is used when the current touch index exceeds the sequence length (e.g., after step 2 of a 3‑step sequence).
- Effect — A higher fallback gives a more static “last resort” copy; a lower one may reuse an earlier directive. This trades off adaptation (model cost) for predictability (no extra model call to handle overrun).
- Risk — Mis‑set it and a step‑past‑end produces copy that is either too generic or repeats an earlier angle, confusing the recipient.
Number of touches (sequence length)
- Knob — Implicit length of the
stepslist in each vertical sequence definition (default 3) - Bounds — Determines how many discrete emails are drafted per campaign, directly driving LLM call count per thread.
- Effect — More touches increase total drafting cost proportionally and extend the campaign timeline; fewer touches reduce dollar spend and total latency but may convert fewer leads.
- Risk — Too many touches wastes budget and risks inbox fatigue; too few may not nurture the contact long enough for a reply.
Faithfulness judge model
- Knob — No env var; the choice of which LLM serves as the judge in the faithfulness gate (described as “a judge that compares each claim to the evidence” — “at the cost of one extra model call”)
- Bounds — Adds one model inference per drafted email, gating the final output on that judge’s score.
- Effect — A cheaper/faster judge reduces per‑email dollar cost and latency but may miss unsupported claims; a more expensive/thorough judge raises cost and latency but improves safety.
- Risk — A too‑strict judge strips true claims (degrades personalization); a too‑lenient judge lets fabricated claims through (erodes trust and compliance).
Missing Contact Row
- Trigger — The contact lookup step runs but the contact row is not found in the database, so no
recipient_name,recipient_role, or profile attributes are loaded into state. - Guard — No explicit guard is shown in the source. The lookup step simply describes reading the contact once; a missing row is identified as a failure mode but no error handler, retry, or fallback is mentioned.
- Posture — Fail‑soft: the source says the missing row “leaves the personalization with nothing to stand on,” implying the run continues with empty personalization fields, degrading the output.
- Operator signal — The source does not specify a log line or metric; the operator would observe that personalization fields are empty in the final draft, or that the contact attributes used later are blank.
- Recovery — No automated recovery is described. The operator must manually verify that the contact exists in the database and, if necessary, re‑run or add the contact before the next attempt.
Empty Post Text
- Trigger — The hook‑extraction step (
hookin the code) receives an empty or whitespace‑onlypost_textfield, so there is no concrete fact to ground the opener. - Guard — No guard is shown. The source states: “The failure mode is an empty post text, which leaves the opener with nothing real to stand on.” The code later uses
post_raw = (state.get("post_text", "") or "")[:1000]and thenpost_safe = wrap_untrusted(post_raw) if post_raw.strip() else "", but this only wraps an empty string; it does not stop the run or replace the missing hook. - Posture — Fail‑soft: the opener is drafted with no grounded fact, producing a generic or unfounded first sentence.
- Operator signal — The operator would see that the opener lacks any specific personalization, or that the
hookvalue is"none"(as the code setshook_safe = "none"whenhook_raw.strip()is false). - Recovery — The run continues; the only recovery is for the caller to provide a non‑empty
post_texton a subsequent attempt. No automatic retry or fallback is implemented.
Step Index Past End of Sequence
- Trigger — The drafting step receives a
sequence_stepindex that exceeds the length of the sequence plan deterministically returned by the sequence selector (e.g., a three‑step sequence is defined but step index 4 is requested). - Guard — No explicit guard is shown. The source mentions the failure mode but does not describe an exception handler or validation that catches an out‑of‑range step.
- Posture — Likely fail‑hard: the directive lookup
get_step_directive(company_vertical, sequence_step, sub_niche)would probably raise an error or returnNone; the code then falls back toawait draft(state), but if the step does not exist in the sequence, the draft may produce irrelevant copy or error out. The source gives no specific behavior. - Operator signal — The operator would observe a missing directive or a generic draft where a step‑specific piece was expected. If an exception occurs, an unhandled error trace would appear.
- Recovery — No automated retry is described. The operator must correct the sequence definition or reset the campaign to a valid step index before re‑running.
Over‑Aggressive Faithfulness Judge
- Trigger — The
faithfulness_gatejudge model audits each claim against the evidence and strips any sentence it deems unsupported. A true but tersely worded claim (e.g., “You spoke at X” when the evidence says “Keynote at X”) is incorrectly removed. - Guard — The only guard is that the gate produces a score between zero and one and “posts it as feedback,” allowing prompt and model versions to be ranked. No retry or fallback is described for the gate itself; the judge’s decision is final for that run.
- Posture — Fail‑soft: the draft is edited to remove the false‑positive claim, continuing with a less personalized or less accurate email.
- Operator signal — The operator sees the gate’s feedback score (e.g., a low faithfulness score) and observes that a claim known to be true was removed from the final draft.
- Recovery — No automated recovery. The operator must adjust the judge model’s prompt or sensitivity, or manually re‑insert the claim and resend.
Suppression Gate Address Normalization Failure
- Trigger — The suppression gate keys on a one‑way fingerprint of the email address plus domain. If the address was not normalized (e.g., different casing or sub‑addressing) before fingerprinting, the fingerprint will not match the suppression record, and a suppressed contact is treated as unsuppressed.
- Guard — No guard is shown. The source explicitly flags this as a failure mode: “The failure mode is an address that was not normalized before fingerprinting, which could let a suppressed contact slip through.” The gate fails closed when the check cannot be completed, but not for a mismatch caused by normalization.
- Posture — Fail‑soft (dangerous): the contact passes the gate and proceeds to drafting and eventually sending, violating the opt‑out.
- Operator signal — The operator would notice that a suppressed contact received an email, or an audit record of the suppression gate decision would show a miss (the source says it writes an audit record of the decision).
- Recovery — No automated recovery. The operator must normalize the address and re‑fingerprint the suppression entry, then manually suppress the contact again.
Thread Left Waiting Forever (Timer Stops)
- Trigger — The campaign engine’s external timer that drains threads whose wake time has passed stops or fails, leaving a thread in a waiting status indefinitely.
- Guard — No guard is shown. The source notes: “The failure mode is a thread left waiting forever if the timer stops.” There is no mention of a watchdog, alert, or retry mechanism for the timer itself.
- Posture — Fail‑soft (silent): the thread remains pending, no further touches are scheduled, and no error is raised because the system simply pauses.
- Operator signal — The operator would see that the thread has a “waiting” status and a past wake time, with no subsequent send. The source does not specify a specific log line; the signal is the silent absence of progress.
- Recovery — Manual intervention required: restart the timer service or manually resume the thread from its checkpointed state in the database.
Q — "The system defines cold outreach as a directed graph of gated steps. Can you name the specific nodes or functions that enforce the rule 'draft but never send'?"
- A — The drafting logic lives inside the outreach engine that invokes
build_outreach_evidenceand theVERTICAL_SEQUENCE_DEFSlookup, but there is no node that calls an SMTP library. The graph produces apending_draftstate, and thecampaign engineholds it until an external caller explicitly decides to send. Thereply graphalso never sends; it only classifies the inbound message and adds a suppression entry for unsubscribe. - Follow-up — "Where does the caller actually trigger the send?"
Answer — The caller owns the send decision; the graph only returns a draft or classification label. - Weak answer misses — A shallow answer would omit that the
campaign enginepauses for human approval and that thereply graph’s routing is decided in code, not by the model.
Q — "Why build a separate faithfulness_check node instead of trusting the drafting model to stay grounded or using a simple keyword check?"
- A — A keyword check is deterministic but blind to meaning, and trusting the model is cheapest but can ship a single confident fabrication. The
faithfulness_checknode uses a judge model to audit each claim against the assembled evidence (faithfulness_evidenceblock built bybuild_outreach_evidence) and removes unsupported sentences before finalization. This catches semantic fabrication at the cost of one extra model call. The failure mode is an over-aggressive judge that strips a true but tersely worded claim. - Follow-up — "How does the evidence block differ from a compose-style
context_summary?"
Answer — Outreach evidence has nocontext_summary; it concatenates hook, source post, memory, and contact facts, wrapped inwrap_untrustedwith the labelEVIDENCE. - Weak answer misses — A shallow answer would fail to mention that
build_outreach_evidenceis called per step and that the judge posts a score between zero and one as feedback for ranking model versions.
Q — "The same drafting engine is reused in three modes: autonomous pipeline, campaign with human approval, and preview-only. Why design it so the graph never knows which mode it’s in?"
- A — The graph never assumes how it’s called because it only returns a pending draft (or a classification label). This clean separation means the drafting logic, the
faithfulness_checknode, and the evidence assembly (build_outreach_evidence) are identical in all three uses. The autonomous pipeline sends without a human, the campaign pauses for approval, and the preview shows only the draft; the graph doesn’t need to branch on the mode, keeping the safety rules in one copy. - Follow-up — "How does the campaign survive restarts if the graph has no state about the mode?"
Answer — The campaign engine runs a durable thread per campaign and contact, checkpointed in the database with a stable thread name, so the graph itself is stateless and restarts pick up from the pending draft. - Weak answer misses — A shallow answer would overlook the
durable threadanddatabase checkpointingmechanism that makes reuse possible without mode awareness.
Q — "Why does the sub-niche sequence lookup use a nested map that falls back to a vertical-level definition, rather than forcing every caller to provide the exact sequence for every sub-niche?"
- A — The nested map
{vertical: {sub_niche: seq_def}}is additive: a missing vertical, a missing sub_niche, or sub_nicheNoneall fall back to the vertical-levelVERTICAL_SEQUENCE_DEFSentry. This avoids brittle hard-coding: a new vertical works immediately with the generic sequence, and only the calibrated sub-niches (those with per-sub-niche score weights) get tailored copy. The failure mode is a niche tag that no longer matches any definition after the taxonomy changes. - Follow-up — "What happens if the sub_niche tag resolves correctly but the step index is out of range?"
Answer — The failure mode is a step index past the end of the sequence; thefallback_step(e.g., 2) is used to avoid a crash. - Weak answer misses — A shallow answer would omit the exact keys (
micro_verticals.pysub_nichestuple) that must match, and the fact thatfallback_stepexists specifically for that off-by-one failure mode.