feat(llm): LLM routing observability panel on the telemetry surface by HomenShum · Pull Request #470 · HomenShum/nodebench-ai

HomenShum · 2026-06-02T19:22:52Z

What

Roadmap #3 of the NodeBench LLM Router: a real, honest LLM Routing observability panel that surfaces the existing per-answer routing telemetry the /ask path already persists on liveEventAnswers (modelId / provider / agentMode / estimatedCostCents). An operator can now see shared/llm/router.ts working in production — the Haiku floor vs. Sonnet escalation split, escalation rate, avg cost/answer, and the provider-fallback rate.

Read-only and additive. Does NOT touch routeLLM or the /ask write path.

Real data source

New global, bounded Convex query — events:getAskRoutingTelemetry (in convex/events.ts):

args: { limit? }, capped at ≤1000 rows via .take(cap) (BOUND). No global time index exists on liveEventAnswers, so this is a bounded table scan — the .take() is the hard cap.
Delegates the math to a pure, dependency-free aggregator aggregateAskRouting() in shared/llm/askRoutingTelemetry.ts, so the logic is scenario-tested directly with plain arrays (no DB), the same way router.ts is tested.

Returned fields: total, capped, routedCount, floorCount, escalatedCount, escalationRate, providerFallbackRate, avgCostCents, totalCostCents, agentModes mix, modelMix (per-model count + floor/escalated/other tier), providerMix.

Floor vs. escalated is decided by the same modelId.includes("haiku") convention the existing cost estimator and the router's Haiku floor use, so the panel can never disagree with what was actually routed/billed.

Honesty (agentic_reliability)

HONEST_SCORES: every rate is null (panel renders "—" / a "No routed /ask traffic yet" card) when there's no denominator — never a fabricated 0% or $0. cache/deterministic answers (which never invoked routeLLM) are excluded from the routing denominator but still counted in the agentMode mix.
BOUND: ≤1000 capped read.
DETERMINISTIC: pure aggregator, sorted breakdowns.

Files changed

convex/events.ts — new getAskRoutingTelemetry query (thin bounded DB read → pure aggregator)
shared/llm/askRoutingTelemetry.ts — pure aggregateAskRouting() + tierForModelId() + types
shared/llm/askRoutingTelemetry.test.ts — 10 scenario-based tests
src/features/telemetry/LlmRoutingPanel.tsx — glass-card panel (headline stats, model mix, provider mix, honest empty + loading states, aria labels, reduced-motion-safe)
src/features/telemetry/index.ts — barrel export
src/features/monitoring/views/AgentTelemetryDashboard.tsx — compose the panel into the named telemetry surface

Verification

npx tsc --noEmit (app) — 0 errors
npx tsc -p convex --noEmit — 0 errors
npx vitest run shared/llm/askRoutingTelemetry.test.ts shared/llm/router.test.ts — 22 pass (10 new scenario + 12 router)
npx vitest run convex/__tests__/scratchnode.events.test.ts convex/events.runtime-boundary.test.ts — 78 pass (no regression to existing getAskTelemetry)
npm run build — clean
React render smoke (temp, not committed): empty / data / loading states all render correctly.

npx convex codegen could not run in the isolated worktree (no CONVEX_DEPLOYMENT; and the shared node_modules has a pre-existing broken @mistralai/mistralai ESM install unrelated to this change). CI runs codegen on a fresh npm install; both tsc gates pass locally and are the authoritative type checks.

Honest caveat for the reviewer

AgentTelemetryDashboard (the file the task named as the telemetry surface) is currently orphaned in the prod-parity build — no live route/JSX mounts it, so Vite tree-shakes it (and therefore this panel) out of the bundle. The panel is fully implemented, compiles, and is tested, but it is not yet visually live until its host component is wired into a route. I did not bolt it into the redesign nav (Home/Reports/Chat/Inbox/Me) or add a new route, per the project's "preserve the nav / no sixth tab" rules — that routing decision is yours.

🤖 Generated with Claude Code

Surface the EXISTING per-answer routing telemetry the /ask path persists on liveEventAnswers (modelId / provider / agentMode / estimatedCostCents) so an operator can see shared/llm/router.ts working in production: Haiku floor vs. Sonnet escalation split, escalation rate, avg cost/answer, and provider- fallback rate. Read-only + additive - does NOT touch routeLLM or the /ask write path. Backend (additive, bounded, honest): - convex/events.ts: new GLOBAL query getAskRoutingTelemetry - a <=1000-row bounded scan over recent liveEventAnswers (BOUND via .take(cap)). Delegates the math to a pure aggregator so it is scenario-tested without a DB. - shared/llm/askRoutingTelemetry.ts: pure aggregateAskRouting() - floor vs. escalated decided by the same modelId.includes("haiku") convention the cost estimator + router floor use (DETERMINISTIC, sorted breakdowns). Rates are null (not a fabricated 0%) when there is no denominator (HONEST_SCORES). Frontend: - src/features/telemetry/LlmRoutingPanel.tsx: glass-card panel with headline stats, model mix, provider mix, an honest "No routed /ask traffic yet" empty state, loading state, aria labels + reduced-motion-safe. - Composed into AgentTelemetryDashboard (the named telemetry surface) + exported from the telemetry barrel. Tests: shared/llm/askRoutingTelemetry.test.ts - 10 scenario-based tests (operator floor/escalation split, true-0 vs null, cache/deterministic-only, provider-fallback, cost-over-routed-only, adversarial pinned/blank models, 1000-row scale, determinism). Verification: app tsc clean, convex tsc clean, vitest 22 pass (10 new + 12 router) + 78 existing event tests green, npm run build clean. Note: AgentTelemetryDashboard is currently orphaned (no live route mounts it) in the prod-parity build, so the panel compiles + is tested but is not yet visually live until its host is routed. Reported for reviewer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-02T19:22:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
nodebench-ai	Ready	Preview, Comment	Jun 2, 2026 7:28pm

github-actions · 2026-06-02T19:24:03Z

PR size advisory

This PR adds 642 lines of substantive change. CONTRIBUTING.md defines a soft limit of ~400 LOC.

If the PR is genuinely cohesive (e.g. an architecture map, a generated migration, a deletion of a dead module), no action is needed. Otherwise consider:

splitting into 2-3 PRs along independent concerns
pre-discussing the architecture change in a GitHub Discussion before merge

This is advisory — it does not block the merge.

github-actions · 2026-06-02T19:27:59Z

✅ Dogfood Visual QA Gate: PASSED

Check	Status
Screenshots	23 captured (pass)
Walkthrough	9 chapters (pass)
Key Frames	9 extracted (pass)
Scribe Steps	8 how-to steps (pass)
Build	success

Artifacts

Download the dogfood-evidence-8e2b050 artifact from the Actions tab for full screenshots, frames, and walkthrough video.

Generated by Dogfood QA Gate

HomenShum enabled auto-merge (squash) June 2, 2026 19:25

vercel Bot deployed to Preview June 2, 2026 19:28 View deployment

HomenShum closed this Jun 3, 2026

auto-merge was automatically disabled June 3, 2026 22:32
Pull request was closed

HomenShum deleted the feat/llm-routing-dashboard branch June 3, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): LLM routing observability panel on the telemetry surface#470

feat(llm): LLM routing observability panel on the telemetry surface#470
HomenShum wants to merge 1 commit into
mainfrom
feat/llm-routing-dashboard

HomenShum commented Jun 2, 2026

Uh oh!

vercel Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HomenShum commented Jun 2, 2026

What

Real data source

Honesty (agentic_reliability)

Files changed

Verification

Honest caveat for the reviewer

Uh oh!

vercel Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 2, 2026

PR size advisory

Uh oh!

github-actions Bot commented Jun 2, 2026

✅ Dogfood Visual QA Gate: PASSED

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 2, 2026 •

edited

Loading