Skip to content

fix: Google GenAI usage metadata mapping for cached tokens#2779

Merged
RogerHYang merged 6 commits intoArize-ai:mainfrom
s-bessing:main
Feb 26, 2026
Merged

fix: Google GenAI usage metadata mapping for cached tokens#2779
RogerHYang merged 6 commits intoArize-ai:mainfrom
s-bessing:main

Conversation

@s-bessing
Copy link
Copy Markdown
Contributor

@s-bessing s-bessing commented Feb 23, 2026

Problem
The Google GenAI API includes cached_content_token_count within the general prompt_token_count. Currently, this leads to:

  • Inaccurate Costing: Downstream tools (like Langfuse) calculate the full prompt price for tokens that were actually served from cache.
  • Lack of Visibility: There is no breakdown of cache hits vs. fresh inputs.

Solution
This PR refines the usage metadata extraction to align with OpenTelemetry standards:

Subtract cached_content_token_count from the total prompt_token_count to represent only the "active" input tokens.

Populate the llm.token_count.prompt_details.cache_read field with the cached token count.


Note

Medium Risk
Changes how GenAI token counts are derived and reported (including llm.token_count.total), which can impact downstream costing/analytics and alerting. Logic now prefers computed prompt+completion totals over the API-provided total_token_count when they differ.

Overview
Updates Google GenAI usage-metadata extraction to account for cached_content_token_count: emits llm.token_count.prompt_details.cache_read and includes cached tokens in the derived prompt total.

Also changes llm.token_count.total to be emitted whenever any tokens are present, using max(prompt+completion, usage_metadata.total_token_count) to resolve inconsistencies, and extends tests (including a cached-token scenario) to lock in the new mapping behavior.

Written by Cursor Bugbot for commit a2f7a34. This will update automatically on new commits. Configure here.

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 23, 2026

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@s-bessing
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request Feb 23, 2026
@RogerHYang RogerHYang self-assigned this Feb 24, 2026
@s-bessing
Copy link
Copy Markdown
Contributor Author

@RogerHYang can you ran the checks again? I double checked it locally and fixed the ruff issue.

@RogerHYang RogerHYang changed the title Fix Google GenAI usage metadata mapping for cached tokens fix: Google GenAI usage metadata mapping for cached tokens Feb 26, 2026
@RogerHYang RogerHYang requested a review from a team as a code owner February 26, 2026 07:46
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.

Copy link
Copy Markdown
Contributor

@RogerHYang RogerHYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR! We really appreciate it.

Under our convention, prompt tokens include cached tokens, meaning they reflect the full context length. This is reflected by the prompt details breakdown, where all components should sum to the total prompt token count.

For pricing calculations, token prices are typically specified separately for cached tokens and "input" tokens. So we would derive the "input" tokens by subtracting cached tokens from the total prompt tokens.

@RogerHYang RogerHYang merged commit 01ed0c1 into Arize-ai:main Feb 26, 2026
18 checks passed
@mikeldking mikeldking mentioned this pull request Feb 26, 2026
SpanAttributes.LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ,
usage_metadata.cached_content_token_count,
)
prompt_token_count += usage_metadata.cached_content_token_count
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RogerHYang i am not sure if that is correct. In the response of google the cached tokens are already included in the prompt_token_count:

{
"cache_tokens_details": [
{
"modality": "TEXT",
"token_count": 2295
}
],
"cached_content_token_count": 2295,
"candidates_token_count": 163,
"prompt_token_count": 2560,
"prompt_tokens_details": [
{
"modality": "TEXT",
"token_count": 2560
}
],
"total_token_count": 2723
}

I would guess the tool tokens are included as well. I did not checked this yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the information. It appears the situation is more complex than I initially understood. I was following this (PDF) example from the Google documentation and observed usage metadata indicating that cached tokens were not included in either the prompt token count or the total token count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants