fix: Google GenAI usage metadata mapping for cached tokens by s-bessing · Pull Request #2779 · Arize-ai/openinference

s-bessing · 2026-02-23T13:30:37Z

Problem
The Google GenAI API includes cached_content_token_count within the general prompt_token_count. Currently, this leads to:

Inaccurate Costing: Downstream tools (like Langfuse) calculate the full prompt price for tokens that were actually served from cache.
Lack of Visibility: There is no breakdown of cache hits vs. fresh inputs.

Solution
This PR refines the usage metadata extraction to align with OpenTelemetry standards:

Subtract cached_content_token_count from the total prompt_token_count to represent only the "active" input tokens.

Populate the llm.token_count.prompt_details.cache_read field with the cached token count.

Note

Medium Risk
Changes how GenAI token counts are derived and reported (including llm.token_count.total), which can impact downstream costing/analytics and alerting. Logic now prefers computed prompt+completion totals over the API-provided total_token_count when they differ.

Overview
Updates Google GenAI usage-metadata extraction to account for cached_content_token_count: emits llm.token_count.prompt_details.cache_read and includes cached tokens in the derived prompt total.

Also changes llm.token_count.total to be emitted whenever any tokens are present, using max(prompt+completion, usage_metadata.total_token_count) to resolve inconsistencies, and extends tests (including a cached-token scenario) to lock in the new mapping behavior.

^{Written by Cursor Bugbot for commit a2f7a34. This will update automatically on new commits. Configure here.}

github-actions · 2026-02-23T13:30:52Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

s-bessing · 2026-02-23T13:34:24Z

I have read the CLA Document and I hereby sign the CLA

s-bessing · 2026-02-25T09:38:57Z

@RogerHYang can you ran the checks again? I double checked it locally and fixed the ruff issue.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.}

RogerHYang

Thank you for this PR! We really appreciate it.

Under our convention, prompt tokens include cached tokens, meaning they reflect the full context length. This is reflected by the prompt details breakdown, where all components should sum to the total prompt token count.

For pricing calculations, token prices are typically specified separately for cached tokens and "input" tokens. So we would derive the "input" tokens by subtracting cached tokens from the total prompt tokens.

s-bessing · 2026-02-27T08:51:48Z

+            SpanAttributes.LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ,
+            usage_metadata.cached_content_token_count,
+        )
+        prompt_token_count += usage_metadata.cached_content_token_count


@RogerHYang i am not sure if that is correct. In the response of google the cached tokens are already included in the prompt_token_count:

{
"cache_tokens_details": [
{
"modality": "TEXT",
"token_count": 2295
}
],
"cached_content_token_count": 2295,
"candidates_token_count": 163,
"prompt_token_count": 2560,
"prompt_tokens_details": [
{
"modality": "TEXT",
"token_count": 2560
}
],
"total_token_count": 2723
}

I would guess the tool tokens are included as well. I did not checked this yet.

Thank you for the information. It appears the situation is more complex than I initially understood. I was following this (PDF) example from the Google documentation and observed usage metadata indicating that cached tokens were not included in either the prompt token count or the total token count.

github-project-automation Bot added this to Instrumentation Feb 23, 2026

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 23, 2026

cursor Bot reviewed Feb 23, 2026

View reviewed changes

Comment thread ...erence-instrumentation-google-genai/src/openinference/instrumentation/google_genai/_utils.py Outdated

github-actions Bot added a commit that referenced this pull request Feb 23, 2026

@s-bessing has signed the CLA in #2779

c5dda5b

cursor Bot reviewed Feb 23, 2026

View reviewed changes

Comment thread ...erence-instrumentation-google-genai/src/openinference/instrumentation/google_genai/_utils.py Outdated

cursor Bot reviewed Feb 23, 2026

View reviewed changes

Comment thread ...erence-instrumentation-google-genai/src/openinference/instrumentation/google_genai/_utils.py Outdated

RogerHYang self-assigned this Feb 24, 2026

s-bessing added 4 commits February 25, 2026 23:45

Adding support for Google GenAI cached input tokens

3602dd1

preventing potential negative prompt_token_count

2e84a6a

change to LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ

9a03629

fixing ruff error

875b1ba

RogerHYang changed the title ~~Fix Google GenAI usage metadata mapping for cached tokens~~ fix: Google GenAI usage metadata mapping for cached tokens Feb 26, 2026

RogerHYang force-pushed the main branch from 3bf7a1e to 875b1ba Compare February 26, 2026 07:46

RogerHYang requested a review from a team as a code owner February 26, 2026 07:46

remove subtraction

34ee78e

cursor Bot reviewed Feb 26, 2026

View reviewed changes

Comment thread ...erence-instrumentation-google-genai/src/openinference/instrumentation/google_genai/_utils.py

clean up

a2f7a34

RogerHYang approved these changes Feb 26, 2026

View reviewed changes

RogerHYang merged commit 01ed0c1 into Arize-ai:main Feb 26, 2026
18 checks passed

github-project-automation Bot moved this to Done in Instrumentation Feb 26, 2026

mikeldking mentioned this pull request Feb 26, 2026

chore: release main #2804

Merged

s-bessing commented Feb 27, 2026

View reviewed changes

singh-saurabh mentioned this pull request Apr 20, 2026

google-genai: cached_content_token_count added twice into llm.token_count.prompt #3013

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Google GenAI usage metadata mapping for cached tokens#2779

fix: Google GenAI usage metadata mapping for cached tokens#2779
RogerHYang merged 6 commits intoArize-ai:mainfrom
s-bessing:main

s-bessing commented Feb 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

s-bessing commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

s-bessing commented Feb 25, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

RogerHYang left a comment

Uh oh!

Uh oh!

s-bessing Feb 27, 2026

Uh oh!

RogerHYang Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

s-bessing commented Feb 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

s-bessing commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

s-bessing commented Feb 25, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RogerHYang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

s-bessing Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

RogerHYang Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

s-bessing commented Feb 23, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented Feb 23, 2026 •

edited

Loading