refactor(usage): ♻️ normalize token usage accounting across providers by HayWolf · Pull Request #42 · tiylabs/tiycore

HayWolf · 2026-06-10T16:41:46Z

Summary

统一 Usage 字段在不同协议间的语义，使上层应用可以跨协议准确地计算"本次请求真实占用的上下文 token 数"，同时修复两处会让 cache 计费失真的 bug。

Changes

Bug 修复

src/protocol/google.rs — input 改为 prompt_token_count.saturating_sub(cached_content_token_count)。原来直接赋 prompt_token_count，导致 cache_read 字段被 cache 子集二次累加。
src/protocol/anthropic.rs — MessageDeltaUsage 新增 cache_read_input_tokens / cache_creation_input_tokens 字段，并在 message_delta 事件阶段写入。Anthropic 在 message_start 阶段上报的 cache 计数是占位值，真正的最终值在 message_delta 才有；旧实现只更新 output，导致三段计费（input / cache_read / cache_write）中后两段被丢弃。

API & 文档

src/types/usage.rs
- 新增 Usage::context_size()，跨协议返回 input + output + cache_read + cache_write。
- Usage 各字段加 doc 注释，明确 OpenAI/Google（含 cache 子集）vs Anthropic（三段互斥）两种语义。
- Usage::add 不再覆盖 total_tokens：旧逻辑用 互斥求和 重算会污染 OpenAI/Google 字段，新逻辑保留原 wire 值并让调用方用 context_size() 自算。
模块级文档表格列出每个字段在每个协议下的含义。

Motivation

不同协议对 cache 的上报方式不一致：

协议	形态	例子
OpenAI Chat Completions / Responses	`prompt_tokens` 含 cache，`cached_tokens` 是子集	`prompt=2006, cached=1920`
Google Generative AI	同上	`prompt_token_count` 含 `cached_content_token_count`
Anthropic	三段互斥	`input + cache_read + cache_write`

旧实现让 Google 漏做减法（cache 重复计），让 Anthropic 在 message_delta 阶段漏读 cache 字段（三段失真）。同时 Usage.total_tokens 在跨协议累加时无意义——prompt + completion 和 互斥三段和 不能直接相加。

修复后 Usage::context_size() 是跨协议统一的"上下文占用"指标，total_tokens 保留作为 wire 透传字段供调试。

Testing

cargo check --all-targets 通过
cargo test --lib 133/133 通过
cargo clippy --all-targets 无新增 warning
用 git stash 比对确认 clippy 的 4 个 error 来自 anthropic.rs 第 532 / 939 行的历史问题，与本次改动无关

Breaking Changes

是的，会改变 Usage::add 累加后的 total_tokens 字段值。 旧实现下 total_tokens = input + output + cache_read + cache_write（仅对 Anthropic 正确，对 OpenAI/Google 是被 cache 污染的数字）；新实现下 total_tokens 保持最后一次 wire 透传值不变。

受影响调用方：依赖 usage.add(other).total_tokens 做账单累计的代码。
迁移方法：用 usage.context_size() 替代 usage.total_tokens 跨协议计算上下文占用；按字段分别计费时直接用 usage.input / usage.output / usage.cache_read / usage.cache_write。

Checklist

Code follows project style guidelines
Tests have been added/updated
Documentation has been updated
No new warnings or errors introduced
Verified cargo check / cargo test / cargo clippy

🤖 Generated with TiyCode

Standardize how `input`, `cache_read`, and `cache_write` are computed in the Anthropic and Google protocol modules so they follow a uniform model: - Anthropic: take final cache read/write counts from `message_delta` (the `message_start` values are placeholders) so the three-segment billable input remains non-overlapping. - Google: subtract `cached_content_token_count` from `prompt_token_count` to derive the uncached `input`, matching OpenAI's convention where `input + cache_read == prompt_tokens`. This allows downstream code to use a single formula for the true context footprint of any request, exposed via the new `Usage::context_size()` method. Comprehensive documentation now explains the per-protocol field semantics at both the module and struct level. Also removes the implicit `total_tokens` recomputation in `Usage::add()` since that field carries provider-specific wire semantics and must not be re-aggregated across requests or protocols.

github-actions · 2026-06-10T16:45:53Z

AI Code Review Summary

PR: #42 (refactor(usage): ♻️ normalize token usage accounting across providers)
Preferred language: English

Overall Assessment

Detected 3 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

MEDIUM (3)
- src/protocol/anthropic.rs:1495 - Missing test for Anthropic message_delta optional cache fields
- src/protocol/google.rs:1091 - Missing test for Google usage normalization saturating_sub
- src/types/usage.rs:46 - Usage::add() no longer recomputes total_tokens (behavior change)

Actionable Suggestions

Audit all callers of Usage::add in the codebase (agent loop, billing, cost tracking) to ensure they are not relying on total_tokens recomputation
Consider adding a deprecation warning or migration note in CHANGELOG for the Usage::add behavior change
Add a unit test for Google protocol chunk parsing where cached_content_token_count > prompt_token_count (saturating_sub clamp).
Add a unit test for Anthropic message_delta parsing with both Some and None cache fields.

Potential Risks

Downstream code using Usage::add for cumulative cost calculation may report incorrect total_tokens if not updated to use context_size()
Production bug if Anthropic message_delta omits cache fields and existing code incorrectly resets them.
Production bug if Google API returns inconsistent usage metadata causing saturating_sub to hide real undercount.

Test Suggestions

Add integration test for Anthropic message_delta cache fields using recorded HTTP fixtures
Add integration test for Google Gemini with cache hit scenario
Add unit test for Usage::add with explicit cache_read/cache_write values to verify all four counters accumulate correctly and total_tokens is preserved
Add a test for Google chunk parsing with cached > total
Add a test for Anthropic message_delta with null cache fields
Add a test for Anthropic message_delta with explicit cache override

File-Level Coverage Notes

src/protocol/anthropic.rs: The parsing change for optional cache fields is correct, but lacks dedicated test coverage.
src/protocol/google.rs: The normalization using saturating_sub is sensible, but needs test for edge case where cached tokens exceed total.
src/types/usage.rs: The new context_size method and documentation are added correctly. Existing usage_add test updated appropriately. (Documentation is clear and new context_size is intuitively testable.)
tests/test_types.rs: Test updated to reflect the new non-recomputation behavior of total_tokens and includes a test for context_size. (Good update.)

Inline Downgraded Items (processed but not inline)

None

Coverage Status

Target files: 4
Covered files: 4
Uncovered files: 0
No-patch/binary covered as file-level: 0
Findings with unknown confidence (N/A): 0

Uncovered list:

None

No-patch covered list:

None

Runtime/Budget

Rounds used: 1/4
Planned batches: 1
Executed batches: 1
Sub-agent runs: 2
Planner calls: 1
Reviewer calls: 2
Model calls: 3/64
Structured-output summary-only degradation: NO

github-actions

Automated PR review completed.

Findings kept: 2
Findings with unknown confidence: 0
Inline comments attempted: 2
Target files: 3
Covered files: 3
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

Sign in to view

+    ///   separately and do **not** overlap with this counter).
    pub input: u64,
-    /// Number of output tokens.
+    /// Number of output tokens (model-generated).


Sign in to view

-                        output.usage.input = usage.prompt_token_count;
+                        output.usage.input = usage
+                            .prompt_token_count
+                            .saturating_sub(usage.cached_content_token_count);


…omits them; update test_usage_add for new contract - Anthropic: change MessageDeltaUsage cache fields to Option<u64> so a message_delta event without cache fields no longer clobbers the values populated from message_start. - test_usage_add: assert the new `add` contract — total_tokens is not recomputed, use context_size() for the cross-protocol footprint.

github-actions

Automated PR review completed.

Findings kept: 3
Findings with unknown confidence: 0
Inline comments attempted: 3
Target files: 4
Covered files: 4
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

github-actions · 2026-06-10T16:53:16Z

+                            // `message_start` values and break three-segment
+                            // billable accounting.
+                            if let Some(v) = usage.cache_read_input_tokens {
+                                output.usage.cache_read = v;


[MEDIUM] Missing test for Anthropic message_delta optional cache fields

No test for the Anthropic message_delta event parsing with optional cache fields.

Suggestion: Add a test that simulates a message_delta with and without the new cache fields, and asserts that the resulting Usage.cache_read/cache_write match expectations (preserving prior values when absent).

Risk: Possible incorrect cache usage tracking in production if Anthropic varies the field presence unexpectedly.

Confidence: 0.90

[From SubAgent: testing]

github-actions · 2026-06-10T16:53:16Z

-                        output.usage.input = usage.prompt_token_count;
+                        output.usage.input = usage
+                            .prompt_token_count
+                            .saturating_sub(usage.cached_content_token_count);


[MEDIUM] Missing test for Google usage normalization saturating_sub

No test covers the scenario where Google's cached_content_token_count > prompt_token_count, which would clamp to 0 via saturating_sub.

Suggestion: Add a unit test in the protocol test suite that constructs a mock Google chunk with prompt_token_count < cached_content_token_count and verifies input is 0.

Risk: Potential silent undercount of input tokens if Google API reports erroneous values.

Confidence: 0.85

[From SubAgent: testing]

github-actions · 2026-06-10T16:53:16Z

        self.output += other.output;
        self.cache_read += other.cache_read;
        self.cache_write += other.cache_write;
-        self.total_tokens = self.input + self.output + self.cache_read + self.cache_write;


[MEDIUM] Usage::add() no longer recomputes total_tokens (behavior change)

Usage::add previously recomputed total_tokens as the component sum after merging. Now it leaves total_tokens untouched, which is semantically correct for the new normalization model but a breaking change for any caller that relied on the old behavior.

Suggestion: If external callers rely on total_tokens being an aggregate after accumulation, consider providing a merge_total() helper that recomputes from components, or document that callers should use context_size() instead. Given the internal scope of this change, risk is moderate but worth flagging.

Risk: Callers that depend on total_tokens reflecting accumulated usage after add() will see the wrong values (the old value from the last-merged Usage instance) unless they switch to context_size().

Confidence: 0.85

[From SubAgent: general]

github-actions Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(usage): ♻️ normalize token usage accounting across providers#42

refactor(usage): ♻️ normalize token usage accounting across providers#42
HayWolf wants to merge 2 commits into
masterfrom
refactor/usage-normalization

HayWolf commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 10, 2026

Uh oh!

github-actions Bot Jun 10, 2026

Uh oh!

github-actions Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HayWolf commented Jun 10, 2026

Summary

Changes

Bug 修复

API & 文档

Motivation

Testing

Breaking Changes

Checklist

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Code Review Summary

Overall Assessment

Major Findings by Severity

Actionable Suggestions

Potential Risks

Test Suggestions

File-Level Coverage Notes

Inline Downgraded Items (processed but not inline)

Coverage Status

Runtime/Budget

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 10, 2026 •

edited

Loading