feat(llm): eval-gated demote-down — the router cost lever by HomenShum · Pull Request #466 · HomenShum/nodebench-ai

HomenShum · 2026-06-02T19:09:27Z

Roadmap #2 of the LLM router. Adds DEMOTE-DOWN — the Prism cost lever.

A pool opts in with mode: "demote" to default to its quality TARGET (heaviest) and drop to a cheaper candidate only on clearly-light turns, and only to models eval-CLEARED for the task class. For over-provisioned paths (e.g. a persona router pinning Opus every turn).

router (shared/llm/router.ts)

RouteMode + TaskPool.mode? (default escalate — existing pools UNCHANGED)
RouteDecision.demoted (additive)
DEMOTE_THRESHOLD (0.25) — only trivial turns demote
isDemoteCleared: conservative static DEMOTE_CLEARANCE + pluggable RouteOptions.clearance hook (the seam for the live agentRunJudge feed)
FAIL-SAFE: nothing cleared → stay on target; quality never dropped un-cleared
agent_reason is the first demote pool (Opus target, demote→Sonnet). No live caller yet → behavior-preserving today; the cache-sticky agent wiring (Fix agent type mismatches and tool wrappers #3) is the first caller.

Pure + DETERMINISTIC. Additive — the opts param + demoted field don't touch /ask or search callers (tsc clean).

Tests (+8 scenario)

demote trivial→Sonnet; hard→Opus; forceTarget→Opus; fail-safe un-cleared→Opus; live-clearance override both ways; determinism; threshold boundary; escalate pools never demoted.

Verification

tsc --noEmit clean; 20 router tests pass; build clean.

🤖 Generated with Claude Code

Roadmap #2 of the LLM router. Adds DEMOTE-DOWN: a pool opts in with `mode: "demote"` to default to its quality TARGET (heaviest) and drop to a cheaper candidate ONLY on clearly-light turns, and ONLY to models eval-CLEARED for the task class. This is the Prism cost lever for over-provisioned paths (e.g. a persona router pinning Opus for every turn regardless of difficulty). router (shared/llm/router.ts): - `RouteMode = "escalate" | "demote"`; `TaskPool.mode?` (default "escalate" — every existing pool is UNCHANGED). - `RouteDecision.demoted` (additive field). - `DEMOTE_THRESHOLD` (0.25) — only clearly-trivial turns demote. - `isDemoteCleared(taskClass, model, opts)`: a conservative static `DEMOTE_CLEARANCE` allowlist + a pluggable `RouteOptions.clearance` hook — the seam the live agentRunJudge / dogfood rolling-agreement feed plugs into. - FAIL-SAFE: if no cheaper model is cleared, STAY on the target — quality is never dropped un-cleared. forceTarget always pins the target. - `agent_reason` reframed as the first demote pool (target Opus, demote -> Sonnet on light turns). NO live caller yet — the cache-sticky agent wiring (roadmap #3) is the first caller — so this is behavior-preserving today. Pure + DETERMINISTIC (no Date/random). Additive: the new optional `opts` param and the `demoted` field don't touch the /ask or search.ts callers (tsc clean). Tests (shared/llm/router.test.ts, +8 scenario): demote a trivial agent turn -> Sonnet; hard turn -> stays Opus; forceTarget -> Opus; fail-safe (un-cleared) -> Opus; live-clearance override both directions; determinism; threshold boundary; escalate pools never report demoted. Docs: docs/architecture/LLM_ROUTER.md roadmap updated. Verification: tsc --noEmit clean, 20 router tests pass, build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-02T19:09:30Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
nodebench-ai	Ready	Preview, Comment	Jun 2, 2026 7:10pm

augmentcode · 2026-06-02T19:12:40Z

🤖 Augment PR Summary

Summary: This PR adds an opt-in demote-down routing mode to the shared LLM router, enabling cost savings on over-provisioned paths while keeping quality protected by an eval-style clearance gate.

Changes:

Introduces RouteMode (escalate default, demote opt-in) and TaskPool.mode.
Adds RouteDecision.demoted, DEMOTE_THRESHOLD, and a conservative static DEMOTE_CLEARANCE allowlist.
Adds RouteOptions.clearance hook to override static clearance (seam for future live eval feed).
Implements demote-mode routing: default to the heaviest target, demote only on clearly-light turns and only to cleared cheaper candidates (fail-safe: nothing cleared → stay on target).
Marks agent_reason as the first demote pool (Opus target, demote→Sonnet) and updates architecture docs/roadmap accordingly.
Adds scenario tests covering demotion behavior, fail-safe, override hook, determinism, threshold boundary, and ensuring escalate pools are unaffected.

Technical Notes: The demote mechanism is additive to existing callers (default mode remains escalate), and the clearance system is designed to be conservative until a live agentRunJudge/dogfood feed is wired.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-06-02T19:12:42Z

+ *     un-cleared. agent_reason is the first demote pool (no live caller yet).
 *
 * Reliability (.claude/rules/agentic_reliability.md):
 *   - DETERMINISTIC: routeLLM is a pure function of (taskClass, signals, env).


shared/llm/router.ts:L42 — The header comment says routeLLM is a pure function of (taskClass, signals, env), but it now also depends on opts (and especially opts.clearance). Consider updating this invariant (or documenting that clearance must be deterministic for replay safety) so the determinism guarantee matches the new API.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-06-02T19:12:42Z

+    // to the cheapest demote-CLEARED candidate below the target. Fail-safe:
+    // nothing cleared → stay on target (never sacrifice quality un-cleared).
+    let chosen = heaviest;
+    if (score < DEMOTE_THRESHOLD) {


shared/llm/router.ts:L298 — In demote mode, signals defaults to {}, so computeComplexityScore becomes 0 and a caller that forgets to pass signals will demote on any cleared candidate, which seems to contradict the “uncertainty resolves UP” guarantee. Consider treating missing/empty signal cases as uncertain and staying on the target (heaviest) rather than demoting.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

HomenShum enabled auto-merge (squash) June 2, 2026 19:09

vercel Bot deployed to Preview June 2, 2026 19:10 View deployment

augmentcode Bot reviewed Jun 2, 2026

View reviewed changes

HomenShum closed this Jun 3, 2026

auto-merge was automatically disabled June 3, 2026 22:32
Pull request was closed

HomenShum deleted the feat/llm-router-demote branch June 3, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): eval-gated demote-down — the router cost lever#466

feat(llm): eval-gated demote-down — the router cost lever#466
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-router-demote

HomenShum commented Jun 2, 2026

Uh oh!

vercel Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

augmentcode Bot commented Jun 2, 2026

Uh oh!

augmentcode Bot left a comment

Uh oh!

augmentcode Bot Jun 2, 2026 •

edited

Loading

Uh oh!

augmentcode Bot Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HomenShum commented Jun 2, 2026

router (shared/llm/router.ts)

Tests (+8 scenario)

Verification

Uh oh!

vercel Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode Bot commented Jun 2, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode Bot Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 2, 2026 •

edited

Loading

augmentcode Bot Jun 2, 2026 •

edited

Loading

augmentcode Bot Jun 2, 2026 •

edited

Loading