Skip to content

Distribute tool budget evenly across tools in a batch#2820

Open
onmete wants to merge 1 commit intoopenshift:mainfrom
onmete:fix/distribute-tool-budget-across-batch
Open

Distribute tool budget evenly across tools in a batch#2820
onmete wants to merge 1 commit intoopenshift:mainfrom
onmete:fix/distribute-tool-budget-across-batch

Conversation

@onmete
Copy link
Contributor

@onmete onmete commented Mar 17, 2026

Description

When the LLM requests N tools in a single round, effective_per_tool_limit was previously set to the full remaining budget. Every tool in the batch could then consume up to that limit independently, meaning a batch of N tools could collectively consume up to N × remaining_budget tokens — far exceeding the budget.

Fix: divide remaining_tool_budget by the number of tool calls in the batch before capping against max_tokens_per_tool_output:

per_tool_share = remaining_tool_budget // max(len(tool_calls), 1)
effective_per_tool_limit = min(max_tokens_per_tool, per_tool_share)

This ensures a single batch cannot consume more than the remaining budget in total. Individual tools still get up to max_tokens_per_tool_output when the budget allows.

The debug log is also updated to include batch_size so truncation behaviour is easier to diagnose.

Type of change

  • Bug fix
  • Optimization

Related Tickets & Documents

  • Related Issue #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Added test_tool_budget_distributed_across_batch which patches execute_tool_calls, triggers a 2-tool batch with a 1000-token total budget, and asserts that the per-tool limit passed to execute_tool_calls is ≤ 500 (budget / 2). Also asserts batch_size=2 appears in the debug log.
  • All 884 unit tests pass (make test-unit).
  • make verify passes (black, ruff, pylint 10/10, mypy clean).

Made with Cursor

When the LLM requests N tools in one round, the effective per-tool
token limit is now calculated as remaining_budget // N instead of
the full remaining budget. This prevents a single batch from
consuming more than the remaining token budget in total.

Also updates the debug log to include the batch size for easier
troubleshooting.

Made-with: Cursor
@openshift-ci
Copy link

openshift-ci bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign onmete for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@onmete
Copy link
Contributor Author

onmete commented Mar 17, 2026

Known edge case (deferred): The 100-token floor applied at max(effective_per_tool_limit, 100) can still cause the batch total to slightly exceed remaining_tool_budget when the budget is very tight (i.e. remaining / batch_size < 100). Each tool gets 100 tokens, so the total batch allowance becomes batch_size × 100 rather than remaining. In practice this only occurs when the budget is already nearly exhausted, the overshoot is bounded to batch_size × 100 tokens at most, and the real token usage tracked in tool_tokens_used reflects the actual output so the next round correctly accounts for it. A strict fix would cap the floor itself at min(100, per_tool_share), left for a follow-up.

@blublinsky
Copy link
Contributor

using
per_tool_share = remaining_tool_budget // max(len(tool_calls), 1)
effective_per_tool_limit = min(max_tokens_per_tool, per_tool_share)
seems a bit crude. It assumes that each tool will produce roughly the same amount of tokens. I do not think thats a valid assumption

@onmete
Copy link
Contributor Author

onmete commented Mar 17, 2026

Fair point, but the equal split is not meant to be a prediction of how much each tool will produce — it is a defensive ceiling to prevent a single batch from exhausting all remaining budget in one round.

Without this change, effective_per_tool_limit equals the full remaining_tool_budget. Every tool in the batch independently gets that limit, so 10 tools could collectively consume 10× the remaining budget. The actual root cause we observed was a single round of 20+ tool calls depleting the entire budget, leaving every subsequent tool with only the 100-token minimum floor.

The equal split is the simplest way to enforce that the batch as a whole cannot exceed the remaining budget. Yes, in practice some tools may use far less than their share and that space is "wasted" — but that is a much better outcome than one greedy tool starving all the others. A smarter adaptive strategy (e.g. re-distributing unspent share) would require tracking per-tool actual usage mid-batch, which is more complex and deferred for now.

@openshift-ci
Copy link

openshift-ci bot commented Mar 17, 2026

@onmete: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@blublinsky
Copy link
Contributor

Fair point, but the equal split is not meant to be a prediction of how much each tool will produce — it is a defensive ceiling to prevent a single batch from exhausting all remaining budget in one round.

Without this change, effective_per_tool_limit equals the full remaining_tool_budget. Every tool in the batch independently gets that limit, so 10 tools could collectively consume 10× the remaining budget. The actual root cause we observed was a single round of 20+ tool calls depleting the entire budget, leaving every subsequent tool with only the 100-token minimum floor.

The equal split is the simplest way to enforce that the batch as a whole cannot exceed the remaining budget. Yes, in practice some tools may use far less than their share and that space is "wasted" — but that is a much better outcome than one greedy tool starving all the others. A smarter adaptive strategy (e.g. re-distributing unspent share) would require tracking per-tool actual usage mid-batch, which is more complex and deferred for now.

Maybe we should think about a better approach? This PR is not associated to any story yet. I am just not convinced that equal pre-split is the right direction. How about doing something when we collect all the tools output? do checks and cutting after all tools were executed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants