Skip to content

feat(llm): LLM routing observability panel on the telemetry surface#470

Closed
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-routing-dashboard
Closed

feat(llm): LLM routing observability panel on the telemetry surface#470
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-routing-dashboard

Conversation

@HomenShum
Copy link
Copy Markdown
Owner

What

Roadmap #3 of the NodeBench LLM Router: a real, honest LLM Routing observability panel that surfaces the existing per-answer routing telemetry the /ask path already persists on liveEventAnswers (modelId / provider / agentMode / estimatedCostCents). An operator can now see shared/llm/router.ts working in production — the Haiku floor vs. Sonnet escalation split, escalation rate, avg cost/answer, and the provider-fallback rate.

Read-only and additive. Does NOT touch routeLLM or the /ask write path.

Real data source

New global, bounded Convex query — events:getAskRoutingTelemetry (in convex/events.ts):

  • args: { limit? }, capped at ≤1000 rows via .take(cap) (BOUND). No global time index exists on liveEventAnswers, so this is a bounded table scan — the .take() is the hard cap.
  • Delegates the math to a pure, dependency-free aggregator aggregateAskRouting() in shared/llm/askRoutingTelemetry.ts, so the logic is scenario-tested directly with plain arrays (no DB), the same way router.ts is tested.

Returned fields: total, capped, routedCount, floorCount, escalatedCount, escalationRate, providerFallbackRate, avgCostCents, totalCostCents, agentModes mix, modelMix (per-model count + floor/escalated/other tier), providerMix.

Floor vs. escalated is decided by the same modelId.includes("haiku") convention the existing cost estimator and the router's Haiku floor use, so the panel can never disagree with what was actually routed/billed.

Honesty (agentic_reliability)

  • HONEST_SCORES: every rate is null (panel renders "—" / a "No routed /ask traffic yet" card) when there's no denominator — never a fabricated 0% or $0. cache/deterministic answers (which never invoked routeLLM) are excluded from the routing denominator but still counted in the agentMode mix.
  • BOUND: ≤1000 capped read.
  • DETERMINISTIC: pure aggregator, sorted breakdowns.

Files changed

  • convex/events.ts — new getAskRoutingTelemetry query (thin bounded DB read → pure aggregator)
  • shared/llm/askRoutingTelemetry.ts — pure aggregateAskRouting() + tierForModelId() + types
  • shared/llm/askRoutingTelemetry.test.ts — 10 scenario-based tests
  • src/features/telemetry/LlmRoutingPanel.tsx — glass-card panel (headline stats, model mix, provider mix, honest empty + loading states, aria labels, reduced-motion-safe)
  • src/features/telemetry/index.ts — barrel export
  • src/features/monitoring/views/AgentTelemetryDashboard.tsx — compose the panel into the named telemetry surface

Verification

  • npx tsc --noEmit (app) — 0 errors
  • npx tsc -p convex --noEmit0 errors
  • npx vitest run shared/llm/askRoutingTelemetry.test.ts shared/llm/router.test.ts22 pass (10 new scenario + 12 router)
  • npx vitest run convex/__tests__/scratchnode.events.test.ts convex/events.runtime-boundary.test.ts78 pass (no regression to existing getAskTelemetry)
  • npm run buildclean
  • React render smoke (temp, not committed): empty / data / loading states all render correctly.

npx convex codegen could not run in the isolated worktree (no CONVEX_DEPLOYMENT; and the shared node_modules has a pre-existing broken @mistralai/mistralai ESM install unrelated to this change). CI runs codegen on a fresh npm install; both tsc gates pass locally and are the authoritative type checks.

Honest caveat for the reviewer

AgentTelemetryDashboard (the file the task named as the telemetry surface) is currently orphaned in the prod-parity build — no live route/JSX mounts it, so Vite tree-shakes it (and therefore this panel) out of the bundle. The panel is fully implemented, compiles, and is tested, but it is not yet visually live until its host component is wired into a route. I did not bolt it into the redesign nav (Home/Reports/Chat/Inbox/Me) or add a new route, per the project's "preserve the nav / no sixth tab" rules — that routing decision is yours.

🤖 Generated with Claude Code

Surface the EXISTING per-answer routing telemetry the /ask path persists on
liveEventAnswers (modelId / provider / agentMode / estimatedCostCents) so an
operator can see shared/llm/router.ts working in production: Haiku floor vs.
Sonnet escalation split, escalation rate, avg cost/answer, and provider-
fallback rate. Read-only + additive - does NOT touch routeLLM or the /ask
write path.

Backend (additive, bounded, honest):
- convex/events.ts: new GLOBAL query getAskRoutingTelemetry - a <=1000-row
  bounded scan over recent liveEventAnswers (BOUND via .take(cap)). Delegates
  the math to a pure aggregator so it is scenario-tested without a DB.
- shared/llm/askRoutingTelemetry.ts: pure aggregateAskRouting() - floor vs.
  escalated decided by the same modelId.includes("haiku") convention the cost
  estimator + router floor use (DETERMINISTIC, sorted breakdowns). Rates are
  null (not a fabricated 0%) when there is no denominator (HONEST_SCORES).

Frontend:
- src/features/telemetry/LlmRoutingPanel.tsx: glass-card panel with headline
  stats, model mix, provider mix, an honest "No routed /ask traffic yet" empty
  state, loading state, aria labels + reduced-motion-safe.
- Composed into AgentTelemetryDashboard (the named telemetry surface) +
  exported from the telemetry barrel.

Tests: shared/llm/askRoutingTelemetry.test.ts - 10 scenario-based tests
(operator floor/escalation split, true-0 vs null, cache/deterministic-only,
provider-fallback, cost-over-routed-only, adversarial pinned/blank models,
1000-row scale, determinism).

Verification: app tsc clean, convex tsc clean, vitest 22 pass (10 new + 12
router) + 78 existing event tests green, npm run build clean.

Note: AgentTelemetryDashboard is currently orphaned (no live route mounts it)
in the prod-parity build, so the panel compiles + is tested but is not yet
visually live until its host is routed. Reported for reviewer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nodebench-ai Ready Ready Preview, Comment Jun 2, 2026 7:28pm

Request Review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

PR size advisory

This PR adds 642 lines of substantive change. CONTRIBUTING.md defines a soft limit of ~400 LOC.

If the PR is genuinely cohesive (e.g. an architecture map, a generated migration, a deletion of a dead module), no action is needed. Otherwise consider:

  • splitting into 2-3 PRs along independent concerns
  • pre-discussing the architecture change in a GitHub Discussion before merge

This is advisory — it does not block the merge.

@HomenShum HomenShum enabled auto-merge (squash) June 2, 2026 19:25
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

✅ Dogfood Visual QA Gate: PASSED

Check Status
Screenshots 23 captured (pass)
Walkthrough 9 chapters (pass)
Key Frames 9 extracted (pass)
Scribe Steps 8 how-to steps (pass)
Build success
Artifacts

Download the dogfood-evidence-8e2b050 artifact from the Actions tab for full screenshots, frames, and walkthrough video.


Generated by Dogfood QA Gate

@HomenShum HomenShum closed this Jun 3, 2026
auto-merge was automatically disabled June 3, 2026 22:32

Pull request was closed

@HomenShum HomenShum deleted the feat/llm-routing-dashboard branch June 3, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants