refactor(usage): ♻️ normalize token usage accounting across providers#42
refactor(usage): ♻️ normalize token usage accounting across providers#42HayWolf wants to merge 2 commits into
Conversation
Standardize how `input`, `cache_read`, and `cache_write` are computed in the Anthropic and Google protocol modules so they follow a uniform model: - Anthropic: take final cache read/write counts from `message_delta` (the `message_start` values are placeholders) so the three-segment billable input remains non-overlapping. - Google: subtract `cached_content_token_count` from `prompt_token_count` to derive the uncached `input`, matching OpenAI's convention where `input + cache_read == prompt_tokens`. This allows downstream code to use a single formula for the true context footprint of any request, exposed via the new `Usage::context_size()` method. Comprehensive documentation now explains the per-protocol field semantics at both the module and struct level. Also removes the implicit `total_tokens` recomputation in `Usage::add()` since that field carries provider-specific wire semantics and must not be re-aggregated across requests or protocols.
AI Code Review SummaryPR: #42 (refactor(usage): ♻️ normalize token usage accounting across providers) Overall AssessmentDetected 3 actionable findings, prioritize CRITICAL/HIGH before merge. Major Findings by Severity
Actionable Suggestions
Potential Risks
Test Suggestions
File-Level Coverage Notes
Inline Downgraded Items (processed but not inline)
Coverage Status
Uncovered list:
No-patch covered list:
Runtime/Budget
|
| /// separately and do **not** overlap with this counter). | ||
| pub input: u64, | ||
| /// Number of output tokens. | ||
| /// Number of output tokens (model-generated). |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| output.usage.input = usage.prompt_token_count; | ||
| output.usage.input = usage | ||
| .prompt_token_count | ||
| .saturating_sub(usage.cached_content_token_count); |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
…omits them; update test_usage_add for new contract - Anthropic: change MessageDeltaUsage cache fields to Option<u64> so a message_delta event without cache fields no longer clobbers the values populated from message_start. - test_usage_add: assert the new `add` contract — total_tokens is not recomputed, use context_size() for the cross-protocol footprint.
| // `message_start` values and break three-segment | ||
| // billable accounting. | ||
| if let Some(v) = usage.cache_read_input_tokens { | ||
| output.usage.cache_read = v; |
There was a problem hiding this comment.
[MEDIUM] Missing test for Anthropic message_delta optional cache fields
No test for the Anthropic message_delta event parsing with optional cache fields.
Suggestion: Add a test that simulates a message_delta with and without the new cache fields, and asserts that the resulting Usage.cache_read/cache_write match expectations (preserving prior values when absent).
Risk: Possible incorrect cache usage tracking in production if Anthropic varies the field presence unexpectedly.
Confidence: 0.90
| output.usage.input = usage.prompt_token_count; | ||
| output.usage.input = usage | ||
| .prompt_token_count | ||
| .saturating_sub(usage.cached_content_token_count); |
There was a problem hiding this comment.
[MEDIUM] Missing test for Google usage normalization saturating_sub
No test covers the scenario where Google's cached_content_token_count > prompt_token_count, which would clamp to 0 via saturating_sub.
Suggestion: Add a unit test in the protocol test suite that constructs a mock Google chunk with prompt_token_count < cached_content_token_count and verifies input is 0.
Risk: Potential silent undercount of input tokens if Google API reports erroneous values.
Confidence: 0.85
| self.output += other.output; | ||
| self.cache_read += other.cache_read; | ||
| self.cache_write += other.cache_write; | ||
| self.total_tokens = self.input + self.output + self.cache_read + self.cache_write; |
There was a problem hiding this comment.
[MEDIUM] Usage::add() no longer recomputes total_tokens (behavior change)
Usage::add previously recomputed total_tokens as the component sum after merging. Now it leaves total_tokens untouched, which is semantically correct for the new normalization model but a breaking change for any caller that relied on the old behavior.
Suggestion: If external callers rely on total_tokens being an aggregate after accumulation, consider providing a merge_total() helper that recomputes from components, or document that callers should use context_size() instead. Given the internal scope of this change, risk is moderate but worth flagging.
Risk: Callers that depend on total_tokens reflecting accumulated usage after add() will see the wrong values (the old value from the last-merged Usage instance) unless they switch to context_size().
Confidence: 0.85
Summary
统一
Usage字段在不同协议间的语义,使上层应用可以跨协议准确地计算"本次请求真实占用的上下文 token 数",同时修复两处会让 cache 计费失真的 bug。Changes
Bug 修复
src/protocol/google.rs—input改为prompt_token_count.saturating_sub(cached_content_token_count)。原来直接赋prompt_token_count,导致cache_read字段被 cache 子集二次累加。src/protocol/anthropic.rs—MessageDeltaUsage新增cache_read_input_tokens/cache_creation_input_tokens字段,并在message_delta事件阶段写入。Anthropic 在message_start阶段上报的 cache 计数是占位值,真正的最终值在message_delta才有;旧实现只更新output,导致三段计费(input/cache_read/cache_write)中后两段被丢弃。API & 文档
src/types/usage.rsUsage::context_size(),跨协议返回input + output + cache_read + cache_write。Usage各字段加 doc 注释,明确 OpenAI/Google(含 cache 子集)vs Anthropic(三段互斥)两种语义。Usage::add不再覆盖total_tokens:旧逻辑用互斥求和重算会污染 OpenAI/Google 字段,新逻辑保留原 wire 值并让调用方用context_size()自算。Motivation
不同协议对
cache的上报方式不一致:prompt_tokens含 cache,cached_tokens是子集prompt=2006, cached=1920prompt_token_count含cached_content_token_countinput + cache_read + cache_write旧实现让 Google 漏做减法(cache 重复计),让 Anthropic 在
message_delta阶段漏读 cache 字段(三段失真)。同时Usage.total_tokens在跨协议累加时无意义——prompt + completion和互斥三段和不能直接相加。修复后
Usage::context_size()是跨协议统一的"上下文占用"指标,total_tokens保留作为 wire 透传字段供调试。Testing
cargo check --all-targets通过cargo test --lib133/133 通过cargo clippy --all-targets无新增 warninggit stash比对确认clippy的 4 个 error 来自anthropic.rs第 532 / 939 行的历史问题,与本次改动无关Breaking Changes
是的,会改变
Usage::add累加后的total_tokens字段值。 旧实现下total_tokens = input + output + cache_read + cache_write(仅对 Anthropic 正确,对 OpenAI/Google 是被 cache 污染的数字);新实现下total_tokens保持最后一次 wire 透传值不变。usage.add(other).total_tokens做账单累计的代码。usage.context_size()替代usage.total_tokens跨协议计算上下文占用;按字段分别计费时直接用usage.input/usage.output/usage.cache_read/usage.cache_write。Checklist
cargo check/cargo test/cargo clippy🤖 Generated with TiyCode