Distribute tool budget evenly across tools in a batch by onmete · Pull Request #2820 · openshift/lightspeed-service

onmete · 2026-03-17T12:57:52Z

Description

When the LLM requests N tools in a single round, effective_per_tool_limit was previously set to the full remaining budget. Every tool in the batch could then consume up to that limit independently, meaning a batch of N tools could collectively consume up to N × remaining_budget tokens — far exceeding the budget.

Fix: divide remaining_tool_budget by the number of tool calls in the batch before capping against max_tokens_per_tool_output:

per_tool_share = remaining_tool_budget // max(len(tool_calls), 1)
effective_per_tool_limit = min(max_tokens_per_tool, per_tool_share)

This ensures a single batch cannot consume more than the remaining budget in total. Individual tools still get up to max_tokens_per_tool_output when the budget allows.

The debug log is also updated to include batch_size so truncation behaviour is easier to diagnose.

Type of change

Bug fix
Optimization

Related Tickets & Documents

Related Issue #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Added test_tool_budget_distributed_across_batch which patches execute_tool_calls, triggers a 2-tool batch with a 1000-token total budget, and asserts that the per-tool limit passed to execute_tool_calls is ≤ 500 (budget / 2). Also asserts batch_size=2 appears in the debug log.
All 884 unit tests pass (make test-unit).
make verify passes (black, ruff, pylint 10/10, mypy clean).

Made with Cursor

When the LLM requests N tools in one round, the effective per-tool token limit is now calculated as remaining_budget // N instead of the full remaining budget. This prevents a single batch from consuming more than the remaining token budget in total. Also updates the debug log to include the batch size for easier troubleshooting. Made-with: Cursor

openshift-ci · 2026-03-17T12:58:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign onmete for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

onmete · 2026-03-17T13:14:05Z

Known edge case (deferred): The 100-token floor applied at max(effective_per_tool_limit, 100) can still cause the batch total to slightly exceed remaining_tool_budget when the budget is very tight (i.e. remaining / batch_size < 100). Each tool gets 100 tokens, so the total batch allowance becomes batch_size × 100 rather than remaining. In practice this only occurs when the budget is already nearly exhausted, the overshoot is bounded to batch_size × 100 tokens at most, and the real token usage tracked in tool_tokens_used reflects the actual output so the next round correctly accounts for it. A strict fix would cap the floor itself at min(100, per_tool_share), left for a follow-up.

blublinsky · 2026-03-17T13:32:24Z

using
per_tool_share = remaining_tool_budget // max(len(tool_calls), 1)
effective_per_tool_limit = min(max_tokens_per_tool, per_tool_share)
seems a bit crude. It assumes that each tool will produce roughly the same amount of tokens. I do not think thats a valid assumption

onmete · 2026-03-17T13:36:46Z

Fair point, but the equal split is not meant to be a prediction of how much each tool will produce — it is a defensive ceiling to prevent a single batch from exhausting all remaining budget in one round.

Without this change, effective_per_tool_limit equals the full remaining_tool_budget. Every tool in the batch independently gets that limit, so 10 tools could collectively consume 10× the remaining budget. The actual root cause we observed was a single round of 20+ tool calls depleting the entire budget, leaving every subsequent tool with only the 100-token minimum floor.

The equal split is the simplest way to enforce that the batch as a whole cannot exceed the remaining budget. Yes, in practice some tools may use far less than their share and that space is "wasted" — but that is a much better outcome than one greedy tool starving all the others. A smarter adaptive strategy (e.g. re-distributing unspent share) would require tracking per-tool actual usage mid-batch, which is more complex and deferred for now.

openshift-ci · 2026-03-17T14:18:07Z

@onmete: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

blublinsky · 2026-03-17T14:49:12Z

Fair point, but the equal split is not meant to be a prediction of how much each tool will produce — it is a defensive ceiling to prevent a single batch from exhausting all remaining budget in one round.

Without this change, effective_per_tool_limit equals the full remaining_tool_budget. Every tool in the batch independently gets that limit, so 10 tools could collectively consume 10× the remaining budget. The actual root cause we observed was a single round of 20+ tool calls depleting the entire budget, leaving every subsequent tool with only the 100-token minimum floor.

The equal split is the simplest way to enforce that the batch as a whole cannot exceed the remaining budget. Yes, in practice some tools may use far less than their share and that space is "wasted" — but that is a much better outcome than one greedy tool starving all the others. A smarter adaptive strategy (e.g. re-distributing unspent share) would require tracking per-tool actual usage mid-batch, which is more complex and deferred for now.

Maybe we should think about a better approach? This PR is not associated to any story yet. I am just not convinced that equal pre-split is the right direction. How about doing something when we collect all the tools output? do checks and cutting after all tools were executed

openshift-ci bot requested review from blublinsky and joshuawilson March 17, 2026 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribute tool budget evenly across tools in a batch#2820

Distribute tool budget evenly across tools in a batch#2820
onmete wants to merge 1 commit intoopenshift:mainfrom
onmete:fix/distribute-tool-budget-across-batch

onmete commented Mar 17, 2026

Uh oh!

openshift-ci bot commented Mar 17, 2026

Uh oh!

onmete commented Mar 17, 2026

Uh oh!

blublinsky commented Mar 17, 2026

Uh oh!

onmete commented Mar 17, 2026

Uh oh!

openshift-ci bot commented Mar 17, 2026

Uh oh!

blublinsky commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

onmete commented Mar 17, 2026

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

openshift-ci bot commented Mar 17, 2026

Uh oh!

onmete commented Mar 17, 2026

Uh oh!

blublinsky commented Mar 17, 2026

Uh oh!

onmete commented Mar 17, 2026

Uh oh!

openshift-ci bot commented Mar 17, 2026

Uh oh!

blublinsky commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants