Skip to content

refactor(usage): ♻️ normalize token usage accounting across providers#42

Open
HayWolf wants to merge 2 commits into
masterfrom
refactor/usage-normalization
Open

refactor(usage): ♻️ normalize token usage accounting across providers#42
HayWolf wants to merge 2 commits into
masterfrom
refactor/usage-normalization

Conversation

@HayWolf

@HayWolf HayWolf commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

统一 Usage 字段在不同协议间的语义,使上层应用可以跨协议准确地计算"本次请求真实占用的上下文 token 数",同时修复两处会让 cache 计费失真的 bug。

Changes

Bug 修复

  • src/protocol/google.rsinput 改为 prompt_token_count.saturating_sub(cached_content_token_count)。原来直接赋 prompt_token_count,导致 cache_read 字段被 cache 子集二次累加。
  • src/protocol/anthropic.rsMessageDeltaUsage 新增 cache_read_input_tokens / cache_creation_input_tokens 字段,并在 message_delta 事件阶段写入。Anthropic 在 message_start 阶段上报的 cache 计数是占位值,真正的最终值在 message_delta 才有;旧实现只更新 output,导致三段计费(input / cache_read / cache_write)中后两段被丢弃。

API & 文档

  • src/types/usage.rs
    • 新增 Usage::context_size(),跨协议返回 input + output + cache_read + cache_write
    • Usage 各字段加 doc 注释,明确 OpenAI/Google(含 cache 子集)vs Anthropic(三段互斥)两种语义。
    • Usage::add 不再覆盖 total_tokens:旧逻辑用 互斥求和 重算会污染 OpenAI/Google 字段,新逻辑保留原 wire 值并让调用方用 context_size() 自算。
  • 模块级文档表格列出每个字段在每个协议下的含义。

Motivation

不同协议对 cache 的上报方式不一致:

协议 形态 例子
OpenAI Chat Completions / Responses prompt_tokens 含 cache,cached_tokens 是子集 prompt=2006, cached=1920
Google Generative AI 同上 prompt_token_countcached_content_token_count
Anthropic 三段互斥 input + cache_read + cache_write

旧实现让 Google 漏做减法(cache 重复计),让 Anthropic 在 message_delta 阶段漏读 cache 字段(三段失真)。同时 Usage.total_tokens 在跨协议累加时无意义——prompt + completion互斥三段和 不能直接相加。

修复后 Usage::context_size() 是跨协议统一的"上下文占用"指标,total_tokens 保留作为 wire 透传字段供调试。

Testing

  • cargo check --all-targets 通过
  • cargo test --lib 133/133 通过
  • cargo clippy --all-targets 无新增 warning
  • git stash 比对确认 clippy 的 4 个 error 来自 anthropic.rs 第 532 / 939 行的历史问题,与本次改动无关

Breaking Changes

是的,会改变 Usage::add 累加后的 total_tokens 字段值。 旧实现下 total_tokens = input + output + cache_read + cache_write(仅对 Anthropic 正确,对 OpenAI/Google 是被 cache 污染的数字);新实现下 total_tokens 保持最后一次 wire 透传值不变。

  • 受影响调用方:依赖 usage.add(other).total_tokens 做账单累计的代码。
  • 迁移方法:用 usage.context_size() 替代 usage.total_tokens 跨协议计算上下文占用;按字段分别计费时直接用 usage.input / usage.output / usage.cache_read / usage.cache_write

Checklist

  • Code follows project style guidelines
  • Tests have been added/updated
  • Documentation has been updated
  • No new warnings or errors introduced
  • Verified cargo check / cargo test / cargo clippy

🤖 Generated with TiyCode

Standardize how `input`, `cache_read`, and `cache_write` are computed
in the Anthropic and Google protocol modules so they follow a uniform
model:

- Anthropic: take final cache read/write counts from `message_delta`
  (the `message_start` values are placeholders) so the three-segment
  billable input remains non-overlapping.
- Google: subtract `cached_content_token_count` from `prompt_token_count`
  to derive the uncached `input`, matching OpenAI's convention where
  `input + cache_read == prompt_tokens`.

This allows downstream code to use a single formula for the true context
footprint of any request, exposed via the new `Usage::context_size()`
method. Comprehensive documentation now explains the per-protocol field
semantics at both the module and struct level.

Also removes the implicit `total_tokens` recomputation in `Usage::add()`
since that field carries provider-specific wire semantics and must not
be re-aggregated across requests or protocols.
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

AI Code Review Summary

PR: #42 (refactor(usage): ♻️ normalize token usage accounting across providers)
Preferred language: English

Overall Assessment

Detected 3 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

  • MEDIUM (3)
    • src/protocol/anthropic.rs:1495 - Missing test for Anthropic message_delta optional cache fields
    • src/protocol/google.rs:1091 - Missing test for Google usage normalization saturating_sub
    • src/types/usage.rs:46 - Usage::add() no longer recomputes total_tokens (behavior change)

Actionable Suggestions

  • Audit all callers of Usage::add in the codebase (agent loop, billing, cost tracking) to ensure they are not relying on total_tokens recomputation
  • Consider adding a deprecation warning or migration note in CHANGELOG for the Usage::add behavior change
  • Add a unit test for Google protocol chunk parsing where cached_content_token_count > prompt_token_count (saturating_sub clamp).
  • Add a unit test for Anthropic message_delta parsing with both Some and None cache fields.

Potential Risks

  • Downstream code using Usage::add for cumulative cost calculation may report incorrect total_tokens if not updated to use context_size()
  • Production bug if Anthropic message_delta omits cache fields and existing code incorrectly resets them.
  • Production bug if Google API returns inconsistent usage metadata causing saturating_sub to hide real undercount.

Test Suggestions

  • Add integration test for Anthropic message_delta cache fields using recorded HTTP fixtures
  • Add integration test for Google Gemini with cache hit scenario
  • Add unit test for Usage::add with explicit cache_read/cache_write values to verify all four counters accumulate correctly and total_tokens is preserved
  • Add a test for Google chunk parsing with cached > total
  • Add a test for Anthropic message_delta with null cache fields
  • Add a test for Anthropic message_delta with explicit cache override

File-Level Coverage Notes

  • src/protocol/anthropic.rs: The parsing change for optional cache fields is correct, but lacks dedicated test coverage.
  • src/protocol/google.rs: The normalization using saturating_sub is sensible, but needs test for edge case where cached tokens exceed total.
  • src/types/usage.rs: The new context_size method and documentation are added correctly. Existing usage_add test updated appropriately. (Documentation is clear and new context_size is intuitively testable.)
  • tests/test_types.rs: Test updated to reflect the new non-recomputation behavior of total_tokens and includes a test for context_size. (Good update.)

Inline Downgraded Items (processed but not inline)

  • None

Coverage Status

  • Target files: 4
  • Covered files: 4
  • Uncovered files: 0
  • No-patch/binary covered as file-level: 0
  • Findings with unknown confidence (N/A): 0

Uncovered list:

  • None

No-patch covered list:

  • None

Runtime/Budget

  • Rounds used: 1/4
  • Planned batches: 1
  • Executed batches: 1
  • Sub-agent runs: 2
  • Planner calls: 1
  • Reviewer calls: 2
  • Model calls: 3/64
  • Structured-output summary-only degradation: NO

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 2
  • Findings with unknown confidence: 0
  • Inline comments attempted: 2
  • Target files: 3
  • Covered files: 3
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

Comment thread src/types/usage.rs
/// separately and do **not** overlap with this counter).
pub input: u64,
/// Number of output tokens.
/// Number of output tokens (model-generated).

This comment was marked as outdated.

Comment thread src/protocol/google.rs
output.usage.input = usage.prompt_token_count;
output.usage.input = usage
.prompt_token_count
.saturating_sub(usage.cached_content_token_count);

This comment was marked as outdated.

…omits them; update test_usage_add for new contract

- Anthropic: change MessageDeltaUsage cache fields to Option<u64> so a
  message_delta event without cache fields no longer clobbers the values
  populated from message_start.
- test_usage_add: assert the new `add` contract — total_tokens is not
  recomputed, use context_size() for the cross-protocol footprint.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 3
  • Findings with unknown confidence: 0
  • Inline comments attempted: 3
  • Target files: 4
  • Covered files: 4
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

Comment thread src/protocol/anthropic.rs
// `message_start` values and break three-segment
// billable accounting.
if let Some(v) = usage.cache_read_input_tokens {
output.usage.cache_read = v;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Missing test for Anthropic message_delta optional cache fields

No test for the Anthropic message_delta event parsing with optional cache fields.

Suggestion: Add a test that simulates a message_delta with and without the new cache fields, and asserts that the resulting Usage.cache_read/cache_write match expectations (preserving prior values when absent).

Risk: Possible incorrect cache usage tracking in production if Anthropic varies the field presence unexpectedly.

Confidence: 0.90

[From SubAgent: testing]

Comment thread src/protocol/google.rs
output.usage.input = usage.prompt_token_count;
output.usage.input = usage
.prompt_token_count
.saturating_sub(usage.cached_content_token_count);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Missing test for Google usage normalization saturating_sub

No test covers the scenario where Google's cached_content_token_count > prompt_token_count, which would clamp to 0 via saturating_sub.

Suggestion: Add a unit test in the protocol test suite that constructs a mock Google chunk with prompt_token_count < cached_content_token_count and verifies input is 0.

Risk: Potential silent undercount of input tokens if Google API reports erroneous values.

Confidence: 0.85

[From SubAgent: testing]

Comment thread src/types/usage.rs
self.output += other.output;
self.cache_read += other.cache_read;
self.cache_write += other.cache_write;
self.total_tokens = self.input + self.output + self.cache_read + self.cache_write;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Usage::add() no longer recomputes total_tokens (behavior change)

Usage::add previously recomputed total_tokens as the component sum after merging. Now it leaves total_tokens untouched, which is semantically correct for the new normalization model but a breaking change for any caller that relied on the old behavior.

Suggestion: If external callers rely on total_tokens being an aggregate after accumulation, consider providing a merge_total() helper that recomputes from components, or document that callers should use context_size() instead. Given the internal scope of this change, risk is moderate but worth flagging.

Risk: Callers that depend on total_tokens reflecting accumulated usage after add() will see the wrong values (the old value from the last-merged Usage instance) unless they switch to context_size().

Confidence: 0.85

[From SubAgent: general]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants