Skip to content

Adds tracking of actual token usage per model.#124

Open
davidbreitgand wants to merge 2 commits into
llm-d:mainfrom
davidbreitgand:issue-115-track-actual-token-usage
Open

Adds tracking of actual token usage per model.#124
davidbreitgand wants to merge 2 commits into
llm-d:mainfrom
davidbreitgand:issue-115-track-actual-token-usage

Conversation

@davidbreitgand
Copy link
Copy Markdown
Contributor

What does this PR do?

  1. Adds tracking of actual token usage per model
  2. Hardens implementation: protects against an accidental writing to the nil datastore
  3. Current implementation is limited to OpenAI as Phase 1, other providers will be added in subsequent PRs

Why is this change needed?

requestmetadata/plugin.go extracts max_tokens on the inbound request event. max_tokens is optional and is only a proxy for the actual token usage. The actual token usage will be instrumental for advanced cost-aware scoring, accurate cost tracking to capture price reversals, cache management, and visualization.

How was this tested?

  • Unit tests added/updated
  • Integration/e2e tests added/updated
  • Manual testing performed

Checklist

  • Commits are signed off (git commit -s) per DCO
  • Code follows project contributing guidelines
  • Tests pass locally (make test)
  • Linters pass (make lint)
  • Documentation updated (if applicable)

Related Issues

Fixes #115

Signed-off-by: David Breitgand <davidbreitgand@users.noreply.github.com>
@github-actions github-actions Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 20, 2026
@davidbreitgand
Copy link
Copy Markdown
Contributor Author

davidbreitgand commented May 20, 2026

@shmuelk , @Mohammad-nassar10 please review

@davidbreitgand
Copy link
Copy Markdown
Contributor Author

/cc @shmuelk
/cc @Mohammad-nassar10

Signed-off-by: David Breitgand <davidbreitgand@users.noreply.github.com>
@davidbreitgand
Copy link
Copy Markdown
Contributor Author

/cc @ronenkat

@ronenkat : thanks for the feedback on the counters overflow. Fixed. Please take another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Track actual token usage of the inference requests

1 participant