Skip to content

fix(harness-llm): surface OpenAI streaming token usage before Finish#59

Draft
TYRMars wants to merge 1 commit into
mainfrom
claude/vibrant-dijkstra-eWZQK
Draft

fix(harness-llm): surface OpenAI streaming token usage before Finish#59
TYRMars wants to merge 1 commit into
mainfrom
claude/vibrant-dijkstra-eWZQK

Conversation

@TYRMars
Copy link
Copy Markdown
Owner

@TYRMars TYRMars commented May 30, 2026

Summary

Fixes #48 — on the default OpenAI Chat Completions streaming path, token usage was silently dropped.

OpenAI ships the usage payload in a separate final SSE chunk (choices: []) that arrives after the finish_reason chunk. The agent loop breaks out of the stream the moment it sees Finish (crates/harness-core/src/agent.rs:636), so the trailing Usage chunk was never consumed and usage accounting read zero.

Fix

Buffer the terminal Finish in StreamAccumulator (a new pending_finish field) instead of emitting it inline. It's released either:

  • when the trailing usage-only chunk is ingested — emitting Usage first, then Finish, or
  • when the stream closes (new flush() helper, which also covers gateways that close the body without a finish_reason).

This matches the Usage-before-Finish ordering the other three providers (Anthropic / Responses / Google) already produce, so the agent loop no longer drops usage on the default provider.

Tests

  • New regression test usage_emitted_before_finish_when_usage_trails asserts the [Usage, Finish] ordering across the two-chunk sequence.
  • Existing accumulator tests updated to pull the buffered Finish from flush() on stream close.
  • cargo test -p harness-llm and cargo clippy -p harness-llm --all-targets -- -D warnings both pass.

https://claude.ai/code/session_01E28FLiYKcDuos5wiA8vVUC


Generated by Claude Code

OpenAI Chat Completions streaming ships the token-usage payload in a
separate SSE chunk (`choices: []`) that arrives *after* the
`finish_reason` chunk. The agent loop breaks out of the stream the
moment it sees `Finish`, so the trailing `Usage` chunk was never
consumed and usage accounting read zero on the default provider.

Buffer the terminal `Finish` in `StreamAccumulator` and release it
either when the trailing usage-only chunk is ingested (emitting `Usage`
first) or when the stream closes. This matches the ordering the other
three providers already produce. Adds a regression test and updates the
existing accumulator tests to flush the buffered Finish on close.

Closes #48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI Chat Completions streaming silently drops token usage (Usage chunk arrives after Finish, never consumed)

2 participants