Distribute tool budget evenly across tools in a batch#2820
Distribute tool budget evenly across tools in a batch#2820onmete wants to merge 1 commit intoopenshift:mainfrom
Conversation
When the LLM requests N tools in one round, the effective per-tool token limit is now calculated as remaining_budget // N instead of the full remaining budget. This prevents a single batch from consuming more than the remaining token budget in total. Also updates the debug log to include the batch size for easier troubleshooting. Made-with: Cursor
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Known edge case (deferred): The 100-token floor applied at |
|
using |
|
Fair point, but the equal split is not meant to be a prediction of how much each tool will produce — it is a defensive ceiling to prevent a single batch from exhausting all remaining budget in one round. Without this change, The equal split is the simplest way to enforce that the batch as a whole cannot exceed the remaining budget. Yes, in practice some tools may use far less than their share and that space is "wasted" — but that is a much better outcome than one greedy tool starving all the others. A smarter adaptive strategy (e.g. re-distributing unspent share) would require tracking per-tool actual usage mid-batch, which is more complex and deferred for now. |
|
@onmete: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Maybe we should think about a better approach? This PR is not associated to any story yet. I am just not convinced that equal pre-split is the right direction. How about doing something when we collect all the tools output? do checks and cutting after all tools were executed |
Description
When the LLM requests N tools in a single round,
effective_per_tool_limitwas previously set to the full remaining budget. Every tool in the batch could then consume up to that limit independently, meaning a batch of N tools could collectively consume up to N × remaining_budget tokens — far exceeding the budget.Fix: divide
remaining_tool_budgetby the number of tool calls in the batch before capping againstmax_tokens_per_tool_output:This ensures a single batch cannot consume more than the remaining budget in total. Individual tools still get up to
max_tokens_per_tool_outputwhen the budget allows.The debug log is also updated to include
batch_sizeso truncation behaviour is easier to diagnose.Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
test_tool_budget_distributed_across_batchwhich patchesexecute_tool_calls, triggers a 2-tool batch with a 1000-token total budget, and asserts that the per-tool limit passed toexecute_tool_callsis ≤ 500 (budget / 2). Also assertsbatch_size=2appears in the debug log.make test-unit).make verifypasses (black, ruff, pylint 10/10, mypy clean).Made with Cursor