01. Why agentic-sales Classifies
The agentic sales platform takes in a noisy stream of signals and turns it into clear decisions about who to contact. The raw input is messy. A company describes itself in its own words. A headquarters address arrives as plain text. A job posting carries a workplace field. A person leaves recent social posts. Before any outreach happens, each signal passes through a classifier. A classifier is a small, focused part that reads one kind of input and returns one structured verdict. One asks whether a company is a staffing agency. Another asks which country an address names. A third asks whether a reply sounds interested. Classification is the gate that keeps the pipeline honest, so only verified leads flow downstream. Each classifier stays deliberately narrow. It answers one question and returns a strict, checkable shape. That narrowness is the core trade off. A focused part is easy to measure and its mistakes stay contained, but you need many small parts instead of one clever model. This guide walks through the six classifiers that run in production and the principles that make them worth trusting.
The inbound email classifier takes raw subject/body text and returns a structured verdict (label, confidence, intent, route) with validation and safe fallback, ensuring a wrong answer degrades without corrupting the pipeline.
async def classify(state: InboundEmailClassifyState) -> dict:
subject = (state.get("subject") or "")[:500]
body_raw = state.get("body") or ""
thread_context = (state.get("thread_context") or "").strip()
sender = (state.get("from_email") or "").strip()
vertical_hint = (state.get("company_vertical") or "").strip()
# Fence untrusted body text…
fenced_body = wrap_untrusted(body_raw, label="INBOUND EMAIL BODY")
user_msg = f"Subject: {subject or '(no subject)'}\n{fenced_body}"
if sender:
user_msg = f"From: {sender}\n{user_msg}"
if thread_context:
user_msg = (
f"--- Original outbound email (context) ---\n{thread_context}\n\n"
f"--- Inbound reply to classify ---\n{user_msg}"
)
if vertical_hint:
user_msg += f"\n\n[Vertical context hint: {vertical_hint}]"
# … LLM invocation omitted for brevity …
# result = ainvoke_json_with_telemetry(...)
raw = result if isinstance(result, dict) else {}
label = str(raw.get("label", "")).strip().lower()
if label not in VALID_LABELS:
label = "not_interested"
fallback = True
else:
fallback = False
# Map label to deterministic route
raw_intent = str(raw.get("intent", "")).strip().lower()
intent = raw_intent if raw_intent in VALID_INTENTS else LABEL_TO_INTENT.get(label, "out")
route = INTENT_ROUTES.get(intent, "suppress")
# …
return {
"label": label,
"confidence": confidence,
"vertical": vertical,
"intent": intent,
"route": route,
}
Imagine you’re a mailroom worker facing a giant pile of letters—some are handwritten, some typed, some have return addresses, some don’t. You have to decide: is this a bill, a fan letter, or junk? That’s exactly what this chapter’s classification subsystem does, but for business outreach. It takes messy, raw signals—a company’s vague self-description, a job posting, an email reply—and runs each one through a tiny, focused component called a classifier. For example, one classifier reads an inbound email and stamps it interested, not_interested, or auto_reply using a deterministic routing table that never guesses. Another picks out buying-intent cues like “evaluating AI vendors” and assigns a confidence score. Without this sorting, the system would be overwhelmed: a spam email could trigger a sales call, a company that’s just bragging about AI would get treated like a hot buyer, and you’d waste time chasing dead ends. The classifiers turn noise into clean, actionable labels so every outreach decision starts from a clear verdict, not a hunch.
-
Entry into
company_enrichment_graph– the graph is invoked with aCompanyEnrichmentStatecontainingcompany,company_id,home_markdown,careers_markdown, anddomain.- reads / writes – reads the state object; no writes yet.
- branch – no branch at entry; happy path proceeds.
-
Node
enrich_vertical_fitbegins – the function checksstate.get("_error")orstate.get("_skip_reason"); if any exist, it returns{}immediately.- reads –
_error,_skip_reason. - writes – none on early return.
- branch – happy: no error/skip; empty/fallback: returns empty dict, skipping all LLM work.
- reads –
-
Wrapping untrusted inputs –
wrap_untrusted(home_markdown, label='HOME PAGE', max_chars=6000)andwrap_untrusted(careers_markdown, label='CAREERS PAGE', max_chars=2000)are called to fence scraped text against prompt injection.- reads –
home_markdown,careers_markdownfrom state. - writes – local variables (fenced strings).
- branch – none.
- reads –
-
LLM classification call –
ainvoke_json_with_telemetryis invoked with the vertical‑fit system prompt (not shown in snippet) and the fenced user text, requesting a JSON verdict with fields likevertical,confidence,reason,vertical_fit.- reads –
company,domainfrom state; fenced text. - writes – returns a dict containing
vertical_fit(withvertical,confidence,reason),agent_timings, andgraph_meta. - branch – if the LLM fails or times out, the node returns
{}(non‑fatal fallback).
- reads –
-
D1 telemetry insert – the result is inserted into a D1 table using parameterised SQL with keys such as
company_id,f"vertical_fit.{vertical}", and the LLM‑output values. Insert is wrapped intry/except D1Error: pass.- reads –
company_id,domain, the LLM result,EXTRACTOR_VERSION, timestamps. - writes – D1 row (non‑critical).
- branch – on D1 error the insert is silently dropped; the node still returns the state updates.
- reads –
-
Conditional edge after
enrich_vertical_fit– the graph inspectsstate["vertical_fit"]["vertical"](or the returnedvertical). If it equals"legal-pi-demand", control fans out toextract_pi_signals; otherwise the graph skips that node.- reads –
verticalfromvertical_fit. - writes – nothing yet; decides next node.
- branch – happy (vertical matches) leads to step 7; else jumps to step 10.
- reads –
-
Node
extract_pi_signalsbegins – again checksstate.get("_error")andstate.get("_skip_reason"), then verifiesvertical == "legal-pi-demand". Returns{}early if either condition fails.- reads –
_error,_skip_reason,vertical. - writes – none on early exit.
- branch – early exit on error/vertical mismatch; happy continues.
- reads –
-
Wrapping and LLM call for PI signals –
wrap_untrustedis applied again tohome_markdownandcareers_markdown. The system prompt (provided in snippet) asks fordemand_automation,medical_record_summarization, andcase_intakeeach withdetected,confidence,evidence.ainvoke_json_with_telemetryextracts the JSON.- reads – same state fields; fenced text.
- writes –
state["pi_signals"](a dict with the three signal objects),agent_timings,graph_meta. - branch – LLM failure returns
{}; the node is non‑fatal.
-
D1 telemetry insert for PI signals – same pattern: insert of
pi_signals.{vertical}row; on failure silently ignored.- reads –
company_id, domain, result, version. - writes – D1 row.
- branch – errors are non‑fatal.
- reads –
-
Conditional edge for immigration signals – the graph checks if
vertical == "legal-immigration". If true, control moves toextract_immigration_signals; else skips to terminal.- reads –
verticalfrom state. - writes – none.
- branch – happy (vertical matches) leads to step 11; otherwise to step 13.
- reads –
-
Node
extract_immigration_signalsbegins – same early‑exit checks (_error,_skip_reason, vertical equality). Returns{}if conditions fail.- reads –
_error,_skip_reason,vertical. - writes – none on early exit.
- branch – early exit; happy continues.
- reads –
-
LLM call for immigration signals –
ainvoke_json_with_telemetrywith a system prompt (not fully shown but hints atpetition_drafting,rfe_response,visa_categories). The result is written tostate["immigration_signals"]. Then a D1 insert ofimmigration_signals.{vertical}is attempted.- reads –
state["company"],home_markdown,careers_markdown,domain,company_id. - writes –
immigration_signals; D1 row;agent_timings,graph_meta. - branch – LLM or DB failure returns empty dict; non‑fatal.
- reads –
-
Terminal – the graph returns the final
CompanyEnrichmentState, now containingvertical_fit,pi_signals(if extracted),immigration_signals(if extracted),agent_timings, andgraph_meta.- reads – all accumulated state.
- writes – final state returned to caller.
- branch – none; this is the only exit.
The subsystem operates as a pipeline of focused classifiers, each triggered by a specific vertical or signal type. For company enrichment, the graph first checks the vertical field: if it equals legal-pi-demand, the extract_pi_signals function runs; if it equals health-applied, extract_health_signals runs instead. These are gated by LLM_KILL_SWITCH and are explicitly non-fatal — any failure returns an empty dict and does not block enrichment already committed in the persist node. For every company regardless of vertical, the extract_buying_intent function runs in parallel, returning a structured verdict with cue_type, strength, confidence, evidence, and source. On the inbound email side, the InboundEmailClassifyState graph first classifies the reply into one of nine labels and derives vertical, intent, opportunity_score, and route; if the intent is “interested”, a second node extracts a scheduling-handoff payload with fields like meeting_intent and proposed_times. The route decision is deterministic from a hardcoded table, not LLM-driven.
The central invariant the design preserves is the non-fatal boundary: once the persist node has committed enrichment, any subsequent classifier failure cannot roll that data back. This gives an exactly-once semantic for the persisted state — the extractors may fail silently, but the base enrichment that already wrote is guaranteed safe. A second invariant, visible in the email classifier, is that the routing table is the only source of truth for route decisions; editing that table is explicitly the sole way to change routing behaviour, which makes the routing path idempotent with respect to any LLM output.
The key trade-off is LLM-based extraction over deterministic, rule-based classification. The alternative rejected is a keyword or regex approach that would parse home-page markdown and job-postings text for hardcoded patterns (e.g., “HIPAA”, “EHR integration”, “Twilio”). That alternative would be cheaper and more predictable, but it costs false negatives from loose language and evolving product copy: a company that describes “BAA availability and SOC 2” without the exact string “HIPAA” would be missed, and integration with “Epic” spelled out in a non-standard format would be lost. The LLM route avoids that brittle maintenance burden by using semantic understanding, accepting higher per-call latency and token cost in exchange for recall on ambiguous signals. The wrap_untrusted fencing and max_chars limits mitigate prompt-injection and cost blowout.
A concrete failure mode: the extract_buying_intent function hits an LLM parse error — the model returns prose instead of strict JSON. The function catches the exception and returns {}, so the buying_intent fact is simply absent from company_facts. An operator would see a warning-level log entry with the function name and an error message indicating JSON parse failure, alongside a gen_ai.* span attribute showing the raw LLM output. They would not see the enrichment fail; no alerts would fire, but downstream ranking logic (V73) that expects that field would silently degrade, producing a lower composite confidence score for that company. If the error is systematic (e.g., a prompt regression), the operator notices a drop in average opportunity_score across the queue or a surge in companies with cue_type='none' in the buying_intent field.
The subsystem spends time and money primarily on LLM inference (token processing), database queries, and remote API calls (GitHub, possibly others). The following four knobs, visible in the source, directly control these costs.
_GH_ANALYSE_REFRESH_DAYS
- Knob — Constant
_GH_ANALYSE_REFRESH_DAYSincompany_enrichment_graph.py. No default numeric value appears in the excerpt. - Bounds — Limits how often a company’s GitHub organization is re–analysed. Only skips re-analysis if the last scan age is less than this many days.
- Effect — Lowering the value increases API call frequency (more GitHub API requests, more compute), raising both latency and dollar spend. Raising the value reduces repeat work, lowering costs but accepting staler data.
- Risk — Too low: unnecessary API calls waste money and can trigger rate limits. Too high: stale GitHub signals (e.g., old commit activity) degrade downstream scoring.
max_chars parameters in wrap_untrusted
- Knob — Hardcoded integer arguments:
max_chars=6000for home page,max_chars=2000or3000for careers page. - Bounds — Truncates scraped markdown text before it is passed to the LLM prompt, capping token consumption. Trades off input completeness for reduced token cost and latency.
- Effect — Increasing
max_charssends more content to the LLM, improving signal quality but raising token spend and response latency. Decreasing saves money and speeds up classification but may miss relevant evidence. - Risk — Too high: ballooning token counts dramatically increase LLM cost and timeout probability. Too low: the model cannot find crucial phrases (e.g., pricing mentions, hiring language) and returns low‑confidence signals.
LLM_KILL_SWITCH
- Knob — Environment variable or constant
LLM_KILL_SWITCH(referenced in docstrings ofextract_pricing_model,extract_buying_intent,extract_hiring_velocity). No default value shown. - Bounds — When set, all LLM‑dependent extraction functions return
{}immediately, completely bypassing inference. - Effect — Turning this switch on reduces time and money to zero for those nodes but also drops all signal outputs, leading to empty stanzas in downstream scoring.
- Risk — Mis‑setting it to
Trueaccidentally disables the entire LLM classification pipeline; downstream nodes then receive no pricing, intent, or hiring data. Setting itFalsewhen the LLM key is missing causes repeated timeouts or errors.
LANGSMITH_TRACING
- Knob — Environment variable
LANGSMITH_TRACING(mentioned in the docstring ofinbound_email_classify_graph.py). Whentrue, LangGraph automatically creates tracing spans for each classification invocation. - Bounds — Adds telemetry overhead (network calls to LangSmith, span serialization) without affecting classification logic or throughput.
- Effect — Enabling tracing increases request latency by a small amount and adds outbound network traffic, raising operational cost. Disabling it removes that overhead entirely.
- Risk — Leaving it on in high‑volume production can introduce unpredictable latency spikes or expensive telemetry storage. Off during debugging removes observability, making it harder to diagnose failures.
All identifiers above are taken verbatim from the provided source excerpts; no knobs are invented.
1. LLM API call failure (timeout, rate limit, service outage)
- Trigger — The
ainvoke_json_with_telemetrycall to DeepSeek (insideextract_immigration_signals,extract_pi_signals,classify, etc.) raises a network error, 5xx response, or timeout. - Guard — The docstring of each extraction function states “any failure (LLM error, kill-switch, parse failure) returns
{}”. No explicittry/exceptidentifier appears in the snippet; the guard is the implicit exception-catching wrapper that returns an empty dict. - Posture — fail-soft. The node returns
{}and the rest of the enrichment graph continues unaffected. The failed signal is simply absent. - Operator signal — The
gen_ai.*span attributes will contain an error status; theagent_timingsentry for that node will show a short elapsed time (often much less than a normal LLM round‑trip) or be missing. No explicit log line is shown in the source. - Recovery — No retry is implemented. The signal is lost for this run; the operator must re‑trigger enrichment later or accept the missing field.
2. LLM kill switch engaged
- Trigger — The global
LLM_KILL_SWITCHflag is set toTrue(e.g., during maintenance or after high cost). The extraction functions (extract_buying_intent,extract_pi_signals, etc.) are explicitly “Gated byLLM_KILL_SWITCH”. - Guard — A check against
LLM_KILL_SWITCHis performed at the top of each gated node. The source does not show the exact boolean variable name, butLLM_KILL_SWITCHis the identifier used in the docstring. - Posture — fail-soft. The node returns
{}immediately; downstream nodes run with missing signal fields. - Operator signal — No explicit log; the operator observes that
immigration_signalsorpi_signalsfields remain null incompany_facts. The span attributes for that node will have akill_switch=truetag (implied but not shown). - Recovery — Manual operator action: flip
LLM_KILL_SWITCHoff and re‑trigger enrichment for the impacted companies. No automatic retry.
3. Classification grade verdict triggers CRAG retry exhaustion
- Trigger — The
gradenode returnsverdict: "not_ok"for one of_CRAG_GATED_FIELDS(category_ok,tier_ok,remote_policy_ok), and the countergrade_attemptsreaches_CRAG_MAX_ATTEMPTS(2). The router loops back toclassifyup to two times, then proceeds toscore. - Guard — The
gradenode’s verdict and the_CRAG_MAX_ATTEMPTSconstant (set to2) limit the retry loop. Additionally, heuristic‑sourced classifications bypass grading entirely (viastate.get("classify_source") == "heuristic"). - Posture — fail-soft. After exhausting retries, the system continues to
scoreusing the potentially incorrect classification. The classification is not blocked. - Operator signal — The
grade_attemptscounter in the state (incremented ingrade) provides the number of retries. A span attributegrade_attempts: 2would be visible if telemetry captures it. No explicit log line is shown. - Recovery — No further automatic recovery; the low‑confidence classification is used. Manual inspection of the
gradeissues and re‑running with corrected context is the only recourse.
4. Heuristic fallback after LLM classification failure
- Trigger — The
classifynode’s LLM call fails or returns malformed JSON, and no retry is attempted (or CRAG retries are exhausted). The node falls back to a keyword‑based heuristic. - Guard — The heuristic function (within
classify) returns a classification withconfidence: 0.3,source: "heuristic", andevidencelisting matched keywords. Thegradenode skips heuristic‑sourced results (checksclassify_source == "heuristic"). - Posture — fail-soft. The classification proceeds with low confidence; downstream scoring uses the lower weight to minimise impact.
- Operator signal — The
classifyspan will showsource: heuristic; thegradespan will haveskipped: heuristic. The heuristicreasonandevidenceare stored, and the low confidence (0.3) is observable in thecompany_factsrow. - Recovery — No retry; the operator can manually override the classification or re‑run enrichment with different markdown if the heuristic is wrong.
5. Vertical mismatch causes silent skip of vertical‑specific extraction
- Trigger — The
verticalfield instateis not exactly"legal-immigration"(forextract_immigration_signals) or_PI_VERTICAL(forextract_pi_signals). This can happen due to a misspelling, a bug in the vertical classifier, or a temporaryverticalset incorrectly. - Guard — The explicit
if state.get("vertical") != ...: return {}check at the start of each vertical‑specific node. - Posture — fail-soft. The node returns
{}; no error is raised, and the rest of the graph continues. The missing signal fields are simply absent from the enrichment. - Operator signal — No log or error; the operator must cross‑check the
verticalvalue stored in the run state with the expected value. Theagent_timingswill show a very short elapsed time for that node. - Recovery — No automatic recovery. The operator must correct the
verticalassignment in the pipeline upstream (e.g., the company classifier) and re‑trigger enrichment.
6. Truncated or empty home/careers markdown degrades LLM output
- Trigger —
home_markdownorcareers_markdownis empty, or is truncated atmax_chars(e.g., 6000/3000 inextract_immigration_signals, 5000/2000 inclassifyandgrade) such that key product features are omitted. - Guard — No guard is shown in the source. The functions pass the truncated markdown directly to the LLM via
wrap_untrusted. There is no validation that the markdown is non‑empty or sufficient. - Posture — fail-soft but introduces silent inaccuracy. The LLM may guess or return low confidence, but the system proceeds. No error is raised.
- Operator signal — The LLM’s
confidencefield may be low, or thereasonfield may mention that no relevant information was found. The operator can inspect the storedevidencestring to see the truncated source text. - Recovery — No automatic recovery. The operator must ensure the scraper collects sufficient content and re‑run enrichment. A manual check of the markdown length could be added.
Q1 (warm-up)
Q: When the LLM classifier fails to classify a company, what fallback mechanism ensures we still get a structured verdict?
A: The classify function in company_enrichment_graph.py returns a heuristic fallback dictionary with confidence: 0.3, source: "heuristic", and a reason: "heuristic fallback (regex keyword match)". This fallback uses regex keyword matching on the company’s home and careers markdown, recording evidence as matched keywords. It marks the source as "heuristic" (not "LLM") so that downstream scoring can distinguish guesses from grounded facts.
Follow‑up: How does the heuristic fallback affect the downstream scoring?
Answer: Downstream scoring weights the confidence (0.3) less than LLM‑produced signals, and the source="heuristic" label prevents a guess from being treated as a grounded fact in the persist layer.
Weak answer misses: The exact confidence value (0.3) and the explicit source: "heuristic" field that marks the method in the persist layer.
Q2 (medium)
Q: Why does the system use a separate, no‑LLM heuristic classifier for buyer‑fit (buyer_fit_classifier.py) while company classification uses an LLM?
A: The buyer_fit_classifier.py module is a heuristic, no‑LLM verdict on whether a contact’s affiliation is a plausible B2B buyer. It relies on structured fields from OpenAlex (institution_type, institution_name) and a curated keyword list (_ACADEMIC_NAME_KEYWORDS), plus GitHub topic signals (_GH_AI_TOPIC_SIGNALS). The design choice is deliberate: buyer‑fit only needs structural facts (academic vs. company) and a small set of topical signals — a fast, deterministic rule set is sufficient and avoids the latency/cost of an LLM call. Company classification, by contrast, requires nuanced semantic understanding of free‑text home and careers pages, which justifies the LLM.
Follow‑up: What degrades gracefully when Team A’s affiliation_type is unavailable?
Answer: The docstring states “affiliation_type … may be None … this module degrades gracefully” by falling back on institution name keyword matching via _ACADEMIC_NAME_KEYWORDS.
Weak answer misses: The specific curated lists (_ACADEMIC_NAME_KEYWORDS, _GH_AI_TOPIC_SIGNALS) and the reliance on OpenAlex’s institution_type field, not just name heuristics.
Q3 (hard)
Q: The classify function mentions a “CRAG retry” mechanism. Explain the design rationale and how it interacts with the heuristic fallback.
A: In company_enrichment_graph.py, the classify function includes a CRAG retry: when an earlier “grade” pass flagged the row, the critic’s issues are folded into the user prompt so the second LLM pass can correct itself instead of repeating the same mistake. This is a guided self‑correction loop. If both LLM passes fail (e.g., parse error or API error), the function does not immediately fall back to heuristic; instead the heuristic fallback is only returned when the LLM call itself fails to produce a valid structured result. The heuristic is a last‑resort output, not part of the retry loop.
Follow‑up: What prevents the heuristic output from being persisted as a “grounded fact”?
Answer: The heuristic dictionary sets source: "heuristic" and confidence: 0.3; the persist layer uses the source field to label the method as HEURISTIC (not LLM), ensuring the fact is stamped as a guess.
Weak answer misses: The key detail that the critic’s output is injected into the LLM prompt (not into the heuristic branch), and that the heuristic is a pure final fallback, not a retry alternative.
Q4 (design alternative)
Q: Why does extract_buying_intent run for every company regardless of vertical, rather than being gated on a prior filter?
A: The docstring of extract_buying_intent in company_enrichment_graph.py states: “Runs for every company regardless of vertical.” The function is designed to detect buying‑intent signals (RFP, migration, intent‑hiring) for all companies, exposing the signal for composite ranking consumption (V73). Making it unconditional ensures no potential buyer is missed by a pre‑filter. The function is non‑fatal – any failure returns {} so the rest of the graph is unaffected – which means the cost of running it on every row is acceptable because it never blocks downstream nodes.
Follow‑up: How is the signal persisted for later consumption?
Answer: The state key buying_intent is persisted to company_facts under field='buying_intent', and is consumed by the score node to affect the composite ICP score.
Weak answer misses: The explicit mention that the signal is “consumed by V73” (composite ranking) and that the function is gated by LLM_KILL_SWITCH but not by vertical.
Q5 (hard)
Q: Why does the inbound email classification step include a separate meeting extraction assistant with a few‑shot prompt, rather than integrating meeting extraction into the main classification prompt?
A: The inbound_email_classify_graph.py defines two separate prompts: SYSTEM_PROMPT for reply classification (label, intent, opportunity score) and _MEETING_EXTRACTION_SYSTEM for extracting meeting‑specific fields (meeting_intent, proposed_times, timezone, evidence). The meeting extraction is a focused, structured parsing task that benefits from a few‑shot example (_MEETING_EXTRACTION_FEW_SHOT) to demonstrate exact formatting. Combining them into one prompt risks diluting the classification signal or producing hallucinated times. The separate prompt also makes it easy to gate or bypass meeting extraction independently (e.g., only call it when the label is “interested”).
Follow‑up: What rule prevents fabricated time slots from being returned?
Answer: The system prompt explicitly states: “Only include times EXPLICITLY stated in the email — never fabricate or infer times.”
Weak answer misses: The few‑shot example structure (_MEETING_EXTRACTION_FEW_SHOT) and the fact that meeting_intent is a boolean separate from the main label field.