fix: Google GenAI usage metadata mapping for cached tokens#2779
fix: Google GenAI usage metadata mapping for cached tokens#2779RogerHYang merged 6 commits intoArize-ai:mainfrom
Conversation
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
@RogerHYang can you ran the checks again? I double checked it locally and fixed the ruff issue. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.
RogerHYang
left a comment
There was a problem hiding this comment.
Thank you for this PR! We really appreciate it.
Under our convention, prompt tokens include cached tokens, meaning they reflect the full context length. This is reflected by the prompt details breakdown, where all components should sum to the total prompt token count.
For pricing calculations, token prices are typically specified separately for cached tokens and "input" tokens. So we would derive the "input" tokens by subtracting cached tokens from the total prompt tokens.
| SpanAttributes.LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ, | ||
| usage_metadata.cached_content_token_count, | ||
| ) | ||
| prompt_token_count += usage_metadata.cached_content_token_count |
There was a problem hiding this comment.
@RogerHYang i am not sure if that is correct. In the response of google the cached tokens are already included in the prompt_token_count:
{
"cache_tokens_details": [
{
"modality": "TEXT",
"token_count": 2295
}
],
"cached_content_token_count": 2295,
"candidates_token_count": 163,
"prompt_token_count": 2560,
"prompt_tokens_details": [
{
"modality": "TEXT",
"token_count": 2560
}
],
"total_token_count": 2723
}
I would guess the tool tokens are included as well. I did not checked this yet.
There was a problem hiding this comment.
Thank you for the information. It appears the situation is more complex than I initially understood. I was following this (PDF) example from the Google documentation and observed usage metadata indicating that cached tokens were not included in either the prompt token count or the total token count.
Problem
The Google GenAI API includes
cached_content_token_countwithin the generalprompt_token_count. Currently, this leads to:Solution
This PR refines the usage metadata extraction to align with OpenTelemetry standards:
Subtract
cached_content_token_countfrom the totalprompt_token_countto represent only the "active" input tokens.Populate the
llm.token_count.prompt_details.cache_readfield with the cached token count.Note
Medium Risk
Changes how GenAI token counts are derived and reported (including
llm.token_count.total), which can impact downstream costing/analytics and alerting. Logic now prefers computed prompt+completion totals over the API-providedtotal_token_countwhen they differ.Overview
Updates Google GenAI usage-metadata extraction to account for
cached_content_token_count: emitsllm.token_count.prompt_details.cache_readand includes cached tokens in the derived prompt total.Also changes
llm.token_count.totalto be emitted whenever any tokens are present, usingmax(prompt+completion, usage_metadata.total_token_count)to resolve inconsistencies, and extends tests (including a cached-token scenario) to lock in the new mapping behavior.Written by Cursor Bugbot for commit a2f7a34. This will update automatically on new commits. Configure here.