Skip to content

feat(llm): route search.ts classify/extract/synthesize through the LLM router#463

Open
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-router-search
Open

feat(llm): route search.ts classify/extract/synthesize through the LLM router#463
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-router-search

Conversation

@HomenShum
Copy link
Copy Markdown
Owner

What

Track B of the LLM-router rollout. Wires the 7 hardcoded gemini-3.1-flash-lite-preview call sites in server/routes/search.ts through the shared planner-on-a-pool router (shared/llm/router.ts, merged in #460), so model choice is owned in one place and genuinely hard turns can escalate.

ADDITIVE + behavior-preserving. The floor of every classify/extract/synthesize pool is the same flash-lite model, and signals are derived cheaply + locally (query length -> inputChars, retrieved source count -> sourceCount, comparison branches -> multiEntity). A simple single-entity query routes to the exact same model as before — only long, analytical, multi-entity, or many-source turns escalate to gemini-3-flash-preview.

Call sites wired (server/routes/search.ts)

Site Task class Notes
classifyQueryWithLLM query classification classify single-candidate pool — guaranteed no-op, just centralizes the model id
agent_synthesize trace (synthesizeResults) synthesize surfaces chosen model + reason in the trace. The wire-level model for synthesizeResults itself lives in server/agentHarness.ts (out of scope); this routes + labels the search-side decision
why-this-team credibility enrichment extract structured extraction over synthesized result + local context
multi-entity comparison extraction extract (multiEntity: true) most likely branch to escalate
single-entity extraction extract source count = allSnippets.length
founder-direction extraction extract source count = genWebSnippets.length

Observability

The chosen model lands in each trace step's tool field, and the route reason ("<model> -- floor light (complexity N.NN)" / "<model> -- escalated to balanced (complexity N.NN)") is appended to the step's detail — matching the existing SearchTraceEntry shape exactly. Escalations are visible in "How we got this answer".

Reliability (.claude/rules/agentic_reliability.md)

  • DETERMINISTICsearchRouteSignals is a pure function (no Date/random), so routing is replay-safe. NaN/negative source counts are coerced to 0 (no NaN leak into the complexity score).
  • The AbortController / Promise.race request-budget gates and the 4-layer grounding pipeline are untouched.

Tests

server/searchRouteLlmRouting.test.ts — scenario-based (founder bare-name lookup, investor head-to-head comparison, banker diligence teardown), asserting:

  • the no-op floor for simple single-entity queries (the additive guarantee),
  • escalation for hard multi-entity / analytical / many-source turns,
  • classify never escalates regardless of complexity,
  • determinism (identical inputs -> identical model).

Verification

  • npx tsc --noEmit --pretty false — 0 errors
  • npx vitest run server/searchRouteLlmRouting.test.ts shared/llm/router.test.ts — 21 passed
  • npx vitest run server/searchRoute.test.ts — 25 passed (no regression)
  • npm run build — clean

Do not enable auto-merge — for review.

🤖 Generated with Claude Code

…M router

Track B of the LLM-router rollout. The /search route hardcoded
gemini-3.1-flash-lite-preview at 7 Gemini call sites. Wire each through the
shared planner-on-a-pool router (shared/llm/router.ts) so model choice is owned
in one place and long / analytical / multi-entity turns can escalate.

ADDITIVE + behavior-preserving: the floor of every classify/extract/synthesize
pool is the same flash-lite model, and signals are derived cheaply + locally
(query length, retrieved source count, multiEntity for comparison branches), so
a simple single-entity query routes to the exact same model as before.

Call sites wired (server/routes/search.ts):
- classifyQueryWithLLM (query classification)  -> routeLLM classify [single-candidate pool, guaranteed no-op]
- agent_synthesize trace (synthesizeResults)   -> routeLLM synthesize [surfaces chosen model + reason in trace; wire-level call lives in agentHarness.ts]
- why-this-team credibility enrichment         -> routeLLM extract
- multi-entity comparison extraction           -> routeLLM extract (multiEntity true)
- single-entity extraction                     -> routeLLM extract
- founder-direction extraction                 -> routeLLM extract

Observability: the chosen model lands in each trace step tool field and the
route reason is appended to the step detail, matching the existing
SearchTraceEntry shape exactly.

Reliability (.claude/rules/agentic_reliability.md): searchRouteSignals is a pure
function (no Date/random) so routing is DETERMINISTIC + replay-safe; NaN/negative
source counts are coerced to 0. The AbortController/Promise.race budget gates and
the grounding pipeline are untouched.

Tests: server/searchRouteLlmRouting.test.ts -- scenario-based (founder lookup,
investor comparison, banker diligence), asserting the no-op floor for simple
queries, escalation for hard turns, classify-never-escalates, and determinism.

Verification: tsc --noEmit clean; vitest 21 routing + 25 existing search-route
tests pass; npm run build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nodebench-ai Ready Ready Preview, Comment Jun 2, 2026 7:34am

Request Review

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Jun 2, 2026

🤖 Augment PR Summary

Summary: This PR completes “Track B” of the LLM-router rollout by routing the search pipeline’s Gemini calls through the shared deterministic router, centralizing model choice and enabling controlled escalation on hard turns.

Changes:

  • Added deterministic signal derivation in server/routes/search.ts via searchRouteSignals() (query length, source count, multi-entity flag, analytical hint).
  • Introduced routeSearchModel() to map task class + signals to a router decision (routeLLM) and emit a trace-friendly reason string.
  • Rewired search route call sites that previously hardcoded gemini-3.1-flash-lite-preview (classify, extract branches, and search-side synthesize decision) to use the router-selected model.
  • Enhanced trace observability by recording the chosen model in the trace step’s tool field and appending route reasoning to the step detail.
  • Added scenario-based Vitest coverage (server/searchRouteLlmRouting.test.ts) to assert the behavior-preserving floor, escalation on hard multi-entity/analytical/many-source turns, and determinism.

Technical Notes: Routing remains replay-safe (pure + deterministic), and the classify pool remains a guaranteed no-op (single-candidate floor).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread server/routes/search.ts
// flash-lite, escalates only on heavy local context).
const { model: credModel } = routeSearchModel(
"extract",
searchRouteSignals(query, 0),
Copy link
Copy Markdown

@augmentcode augmentcode Bot Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

searchRouteSignals(query, 0) here only reflects the raw query, but the actual Gemini prompt includes synthesized.answer plus potentially large localContext, so routing may stay on the floor even when the input is long/complex. Consider deriving signals from the real prompt/context size (or passing a meaningful sourceCount) so routing decisions match the workload.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@HomenShum HomenShum enabled auto-merge (squash) June 2, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants