feat(llm): route search.ts classify/extract/synthesize through the LLM router#463
feat(llm): route search.ts classify/extract/synthesize through the LLM router#463HomenShum wants to merge 1 commit into
Conversation
…M router Track B of the LLM-router rollout. The /search route hardcoded gemini-3.1-flash-lite-preview at 7 Gemini call sites. Wire each through the shared planner-on-a-pool router (shared/llm/router.ts) so model choice is owned in one place and long / analytical / multi-entity turns can escalate. ADDITIVE + behavior-preserving: the floor of every classify/extract/synthesize pool is the same flash-lite model, and signals are derived cheaply + locally (query length, retrieved source count, multiEntity for comparison branches), so a simple single-entity query routes to the exact same model as before. Call sites wired (server/routes/search.ts): - classifyQueryWithLLM (query classification) -> routeLLM classify [single-candidate pool, guaranteed no-op] - agent_synthesize trace (synthesizeResults) -> routeLLM synthesize [surfaces chosen model + reason in trace; wire-level call lives in agentHarness.ts] - why-this-team credibility enrichment -> routeLLM extract - multi-entity comparison extraction -> routeLLM extract (multiEntity true) - single-entity extraction -> routeLLM extract - founder-direction extraction -> routeLLM extract Observability: the chosen model lands in each trace step tool field and the route reason is appended to the step detail, matching the existing SearchTraceEntry shape exactly. Reliability (.claude/rules/agentic_reliability.md): searchRouteSignals is a pure function (no Date/random) so routing is DETERMINISTIC + replay-safe; NaN/negative source counts are coerced to 0. The AbortController/Promise.race budget gates and the grounding pipeline are untouched. Tests: server/searchRouteLlmRouting.test.ts -- scenario-based (founder lookup, investor comparison, banker diligence), asserting the no-op floor for simple queries, escalation for hard turns, classify-never-escalates, and determinism. Verification: tsc --noEmit clean; vitest 21 routing + 25 existing search-route tests pass; npm run build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🤖 Augment PR SummarySummary: This PR completes “Track B” of the LLM-router rollout by routing the search pipeline’s Gemini calls through the shared deterministic router, centralizing model choice and enabling controlled escalation on hard turns. Changes:
Technical Notes: Routing remains replay-safe (pure + deterministic), and the classify pool remains a guaranteed no-op (single-candidate floor). 🤖 Was this summary useful? React with 👍 or 👎 |
| // flash-lite, escalates only on heavy local context). | ||
| const { model: credModel } = routeSearchModel( | ||
| "extract", | ||
| searchRouteSignals(query, 0), |
There was a problem hiding this comment.
searchRouteSignals(query, 0) here only reflects the raw query, but the actual Gemini prompt includes synthesized.answer plus potentially large localContext, so routing may stay on the floor even when the input is long/complex. Consider deriving signals from the real prompt/context size (or passing a meaningful sourceCount) so routing decisions match the workload.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
What
Track B of the LLM-router rollout. Wires the 7 hardcoded
gemini-3.1-flash-lite-previewcall sites inserver/routes/search.tsthrough the shared planner-on-a-pool router (shared/llm/router.ts, merged in #460), so model choice is owned in one place and genuinely hard turns can escalate.ADDITIVE + behavior-preserving. The floor of every
classify/extract/synthesizepool is the same flash-lite model, and signals are derived cheaply + locally (query length ->inputChars, retrieved source count ->sourceCount, comparison branches ->multiEntity). A simple single-entity query routes to the exact same model as before — only long, analytical, multi-entity, or many-source turns escalate togemini-3-flash-preview.Call sites wired (
server/routes/search.ts)classifyQueryWithLLMquery classificationclassifyagent_synthesizetrace (synthesizeResults)synthesizesynthesizeResultsitself lives inserver/agentHarness.ts(out of scope); this routes + labels the search-side decisionextractextract(multiEntity: true)extractallSnippets.lengthextractgenWebSnippets.lengthObservability
The chosen model lands in each trace step's
toolfield, and the route reason ("<model> -- floor light (complexity N.NN)"/"<model> -- escalated to balanced (complexity N.NN)") is appended to the step'sdetail— matching the existingSearchTraceEntryshape exactly. Escalations are visible in "How we got this answer".Reliability (
.claude/rules/agentic_reliability.md)searchRouteSignalsis a pure function (noDate/random), so routing is replay-safe. NaN/negative source counts are coerced to0(no NaN leak into the complexity score).AbortController/Promise.racerequest-budget gates and the 4-layer grounding pipeline are untouched.Tests
server/searchRouteLlmRouting.test.ts— scenario-based (founder bare-name lookup, investor head-to-head comparison, banker diligence teardown), asserting:classifynever escalates regardless of complexity,Verification
npx tsc --noEmit --pretty false— 0 errorsnpx vitest run server/searchRouteLlmRouting.test.ts shared/llm/router.test.ts— 21 passednpx vitest run server/searchRoute.test.ts— 25 passed (no regression)npm run build— cleanDo not enable auto-merge — for review.
🤖 Generated with Claude Code