feat(llm): eval-gated demote-down — the router cost lever#466
Conversation
Roadmap #2 of the LLM router. Adds DEMOTE-DOWN: a pool opts in with `mode: "demote"` to default to its quality TARGET (heaviest) and drop to a cheaper candidate ONLY on clearly-light turns, and ONLY to models eval-CLEARED for the task class. This is the Prism cost lever for over-provisioned paths (e.g. a persona router pinning Opus for every turn regardless of difficulty). router (shared/llm/router.ts): - `RouteMode = "escalate" | "demote"`; `TaskPool.mode?` (default "escalate" — every existing pool is UNCHANGED). - `RouteDecision.demoted` (additive field). - `DEMOTE_THRESHOLD` (0.25) — only clearly-trivial turns demote. - `isDemoteCleared(taskClass, model, opts)`: a conservative static `DEMOTE_CLEARANCE` allowlist + a pluggable `RouteOptions.clearance` hook — the seam the live agentRunJudge / dogfood rolling-agreement feed plugs into. - FAIL-SAFE: if no cheaper model is cleared, STAY on the target — quality is never dropped un-cleared. forceTarget always pins the target. - `agent_reason` reframed as the first demote pool (target Opus, demote -> Sonnet on light turns). NO live caller yet — the cache-sticky agent wiring (roadmap #3) is the first caller — so this is behavior-preserving today. Pure + DETERMINISTIC (no Date/random). Additive: the new optional `opts` param and the `demoted` field don't touch the /ask or search.ts callers (tsc clean). Tests (shared/llm/router.test.ts, +8 scenario): demote a trivial agent turn -> Sonnet; hard turn -> stays Opus; forceTarget -> Opus; fail-safe (un-cleared) -> Opus; live-clearance override both directions; determinism; threshold boundary; escalate pools never report demoted. Docs: docs/architecture/LLM_ROUTER.md roadmap updated. Verification: tsc --noEmit clean, 20 router tests pass, build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🤖 Augment PR SummarySummary: This PR adds an opt-in demote-down routing mode to the shared LLM router, enabling cost savings on over-provisioned paths while keeping quality protected by an eval-style clearance gate. Changes:
Technical Notes: The demote mechanism is additive to existing callers (default mode remains 🤖 Was this summary useful? React with 👍 or 👎 |
| * un-cleared. agent_reason is the first demote pool (no live caller yet). | ||
| * | ||
| * Reliability (.claude/rules/agentic_reliability.md): | ||
| * - DETERMINISTIC: routeLLM is a pure function of (taskClass, signals, env). |
There was a problem hiding this comment.
shared/llm/router.ts:L42 — The header comment says routeLLM is a pure function of (taskClass, signals, env), but it now also depends on opts (and especially opts.clearance). Consider updating this invariant (or documenting that clearance must be deterministic for replay safety) so the determinism guarantee matches the new API.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| // to the cheapest demote-CLEARED candidate below the target. Fail-safe: | ||
| // nothing cleared → stay on target (never sacrifice quality un-cleared). | ||
| let chosen = heaviest; | ||
| if (score < DEMOTE_THRESHOLD) { |
There was a problem hiding this comment.
shared/llm/router.ts:L298 — In demote mode, signals defaults to {}, so computeComplexityScore becomes 0 and a caller that forgets to pass signals will demote on any cleared candidate, which seems to contradict the “uncertainty resolves UP” guarantee. Consider treating missing/empty signal cases as uncertain and staying on the target (heaviest) rather than demoting.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
Roadmap #2 of the LLM router. Adds DEMOTE-DOWN — the Prism cost lever.
A pool opts in with
mode: "demote"to default to its quality TARGET (heaviest) and drop to a cheaper candidate only on clearly-light turns, and only to models eval-CLEARED for the task class. For over-provisioned paths (e.g. a persona router pinning Opus every turn).router (shared/llm/router.ts)
RouteMode+TaskPool.mode?(defaultescalate— existing pools UNCHANGED)RouteDecision.demoted(additive)DEMOTE_THRESHOLD(0.25) — only trivial turns demoteisDemoteCleared: conservative staticDEMOTE_CLEARANCE+ pluggableRouteOptions.clearancehook (the seam for the live agentRunJudge feed)agent_reasonis the first demote pool (Opus target, demote→Sonnet). No live caller yet → behavior-preserving today; the cache-sticky agent wiring (Fix agent type mismatches and tool wrappers #3) is the first caller.Pure + DETERMINISTIC. Additive — the
optsparam +demotedfield don't touch /ask or search callers (tsc clean).Tests (+8 scenario)
demote trivial→Sonnet; hard→Opus; forceTarget→Opus; fail-safe un-cleared→Opus; live-clearance override both ways; determinism; threshold boundary; escalate pools never demoted.
Verification
tsc --noEmit clean; 20 router tests pass; build clean.
🤖 Generated with Claude Code