Skip to content

feat(llm): eval-gated demote-down — the router cost lever#466

Closed
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-router-demote
Closed

feat(llm): eval-gated demote-down — the router cost lever#466
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-router-demote

Conversation

@HomenShum
Copy link
Copy Markdown
Owner

Roadmap #2 of the LLM router. Adds DEMOTE-DOWN — the Prism cost lever.

A pool opts in with mode: "demote" to default to its quality TARGET (heaviest) and drop to a cheaper candidate only on clearly-light turns, and only to models eval-CLEARED for the task class. For over-provisioned paths (e.g. a persona router pinning Opus every turn).

router (shared/llm/router.ts)

  • RouteMode + TaskPool.mode? (default escalate — existing pools UNCHANGED)
  • RouteDecision.demoted (additive)
  • DEMOTE_THRESHOLD (0.25) — only trivial turns demote
  • isDemoteCleared: conservative static DEMOTE_CLEARANCE + pluggable RouteOptions.clearance hook (the seam for the live agentRunJudge feed)
  • FAIL-SAFE: nothing cleared → stay on target; quality never dropped un-cleared
  • agent_reason is the first demote pool (Opus target, demote→Sonnet). No live caller yet → behavior-preserving today; the cache-sticky agent wiring (Fix agent type mismatches and tool wrappers #3) is the first caller.

Pure + DETERMINISTIC. Additive — the opts param + demoted field don't touch /ask or search callers (tsc clean).

Tests (+8 scenario)

demote trivial→Sonnet; hard→Opus; forceTarget→Opus; fail-safe un-cleared→Opus; live-clearance override both ways; determinism; threshold boundary; escalate pools never demoted.

Verification

tsc --noEmit clean; 20 router tests pass; build clean.

🤖 Generated with Claude Code

Roadmap #2 of the LLM router. Adds DEMOTE-DOWN: a pool opts in with
`mode: "demote"` to default to its quality TARGET (heaviest) and drop to a
cheaper candidate ONLY on clearly-light turns, and ONLY to models eval-CLEARED
for the task class. This is the Prism cost lever for over-provisioned paths
(e.g. a persona router pinning Opus for every turn regardless of difficulty).

router (shared/llm/router.ts):
- `RouteMode = "escalate" | "demote"`; `TaskPool.mode?` (default "escalate" —
  every existing pool is UNCHANGED).
- `RouteDecision.demoted` (additive field).
- `DEMOTE_THRESHOLD` (0.25) — only clearly-trivial turns demote.
- `isDemoteCleared(taskClass, model, opts)`: a conservative static
  `DEMOTE_CLEARANCE` allowlist + a pluggable `RouteOptions.clearance` hook —
  the seam the live agentRunJudge / dogfood rolling-agreement feed plugs into.
- FAIL-SAFE: if no cheaper model is cleared, STAY on the target — quality is
  never dropped un-cleared. forceTarget always pins the target.
- `agent_reason` reframed as the first demote pool (target Opus, demote -> Sonnet
  on light turns). NO live caller yet — the cache-sticky agent wiring (roadmap
  #3) is the first caller — so this is behavior-preserving today.

Pure + DETERMINISTIC (no Date/random). Additive: the new optional `opts` param
and the `demoted` field don't touch the /ask or search.ts callers (tsc clean).

Tests (shared/llm/router.test.ts, +8 scenario): demote a trivial agent turn ->
Sonnet; hard turn -> stays Opus; forceTarget -> Opus; fail-safe (un-cleared) ->
Opus; live-clearance override both directions; determinism; threshold boundary;
escalate pools never report demoted.

Docs: docs/architecture/LLM_ROUTER.md roadmap updated.

Verification: tsc --noEmit clean, 20 router tests pass, build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nodebench-ai Ready Ready Preview, Comment Jun 2, 2026 7:10pm

Request Review

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Jun 2, 2026

🤖 Augment PR Summary

Summary: This PR adds an opt-in demote-down routing mode to the shared LLM router, enabling cost savings on over-provisioned paths while keeping quality protected by an eval-style clearance gate.

Changes:

  • Introduces RouteMode (escalate default, demote opt-in) and TaskPool.mode.
  • Adds RouteDecision.demoted, DEMOTE_THRESHOLD, and a conservative static DEMOTE_CLEARANCE allowlist.
  • Adds RouteOptions.clearance hook to override static clearance (seam for future live eval feed).
  • Implements demote-mode routing: default to the heaviest target, demote only on clearly-light turns and only to cleared cheaper candidates (fail-safe: nothing cleared → stay on target).
  • Marks agent_reason as the first demote pool (Opus target, demote→Sonnet) and updates architecture docs/roadmap accordingly.
  • Adds scenario tests covering demotion behavior, fail-safe, override hook, determinism, threshold boundary, and ensuring escalate pools are unaffected.

Technical Notes: The demote mechanism is additive to existing callers (default mode remains escalate), and the clearance system is designed to be conservative until a live agentRunJudge/dogfood feed is wired.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread shared/llm/router.ts
* un-cleared. agent_reason is the first demote pool (no live caller yet).
*
* Reliability (.claude/rules/agentic_reliability.md):
* - DETERMINISTIC: routeLLM is a pure function of (taskClass, signals, env).
Copy link
Copy Markdown

@augmentcode augmentcode Bot Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared/llm/router.ts:L42 — The header comment says routeLLM is a pure function of (taskClass, signals, env), but it now also depends on opts (and especially opts.clearance). Consider updating this invariant (or documenting that clearance must be deterministic for replay safety) so the determinism guarantee matches the new API.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Comment thread shared/llm/router.ts
// to the cheapest demote-CLEARED candidate below the target. Fail-safe:
// nothing cleared → stay on target (never sacrifice quality un-cleared).
let chosen = heaviest;
if (score < DEMOTE_THRESHOLD) {
Copy link
Copy Markdown

@augmentcode augmentcode Bot Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared/llm/router.ts:L298 — In demote mode, signals defaults to {}, so computeComplexityScore becomes 0 and a caller that forgets to pass signals will demote on any cleared candidate, which seems to contradict the “uncertainty resolves UP” guarantee. Consider treating missing/empty signal cases as uncertain and staying on the target (heaviest) rather than demoting.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@HomenShum HomenShum closed this Jun 3, 2026
auto-merge was automatically disabled June 3, 2026 22:32

Pull request was closed

@HomenShum HomenShum deleted the feat/llm-router-demote branch June 3, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants