Skip to content

fix: measure token speed from streaming phase only (exclude TTFB)#227

Merged
kianwoon merged 1 commit intomainfrom
fix/streaming-only-tps
Apr 11, 2026
Merged

fix: measure token speed from streaming phase only (exclude TTFB)#227
kianwoon merged 1 commit intomainfrom
fix/streaming-only-tps

Conversation

@kianwoon
Copy link
Copy Markdown
Owner

Summary

  • Token speed (TOK/S) previously used total wall-clock latency (including TTFB wait), producing misleadingly low averages
  • Now captures _streamStartTime when the first streaming chunk arrives, and computes TPS using streaming-only duration
  • Adds a 200ms minimum duration threshold — falls back to total latency for fast responses where timing precision is unreliable (prevents inflated numbers like 64K tok/s)

Changes

  • src/types.ts — added _streamStartTime?: number to RequestContext
  • src/server.ts — capture streaming start on first chunk (SSE + JSON paths), use streaming-only duration in recordMetrics with 200ms floor

Test plan

  • npx tsc --noEmit — passes
  • npm run build — succeeds
  • npx vitest run — 323/323 tests passing
  • Manual: verify TOK/S shows realistic values in GUI for long streaming responses (should be higher than before)
  • Manual: verify short/fast responses don't show inflated numbers

Token speed (TOK/S) was calculated using total wall-clock latency including
TTFB wait time, producing misleadingly low averages. Now captures the first
streaming chunk timestamp and uses streaming-only duration for the TPS formula.
A 200ms minimum threshold prevents inflated numbers from imprecise timing on
fast responses.
@kianwoon kianwoon force-pushed the fix/streaming-only-tps branch from f6fc943 to e0d3be9 Compare April 11, 2026 06:26
@kianwoon kianwoon merged commit 525ea9d into main Apr 11, 2026
4 checks passed
@kianwoon kianwoon deleted the fix/streaming-only-tps branch April 11, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant