fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true (#14004)#25645
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| @@ -1,9 +1,11 @@ | |||
| from fastapi import HTTPException | |||
|
|
|||
| import litellm | |||
Greptile SummaryThis PR moves end-user budget enforcement out of Confidence Score: 5/5Safe to merge — the logic is correct, security is preserved (bypass only triggers on explicitly configured 0.0 costs), and prior review concerns are addressed. No P0 or P1 issues found. The router fix correctly adds the missing No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/router.py | Added missing elif custom_model_info is not None branch so custom deployments with explicit zero-cost pricing are returned instead of falling through to None |
| litellm/proxy/auth/auth_checks.py | Removed _check_end_user_budget and its two call sites from get_end_user_object; budget enforcement now lives in the auth flow where the zero-cost skip flag is available |
| litellm/proxy/auth/user_api_key_auth.py | Adds "Check 5c" (key-auth) and "4b" (custom-auth) end-user max-budget checks inside if not skip_budget_checks:; also moves _is_model_cost_zero import to module level and propagates end_user_max_budget through update_valid_token_with_end_user_params |
| litellm/proxy/hooks/max_budget_limiter.py | New is_end_user_within_budget method implements the moved budget check, correctly skipping info routes and raising BudgetExceededError on overspend |
| tests/proxy_unit_tests/test_default_end_user_budget_simple.py | Restructured test_budget_enforcement_blocks_over_budget_users to test object fetch and limiter invocation separately; budget enforcement is now unit-tested against the limiter rather than through a full auth-flow integration path |
| tests/test_litellm/proxy/auth/test_custom_auth_end_user_budget.py | Adds test_custom_auth_zero_cost_model_skips_budget_checks verifying all three budget checks are skipped and common_checks receives skip_budget_checks=True for zero-cost models in custom-auth path |
| tests/test_litellm/test_router.py | Adds Test Case 6 verifying the new elif custom_model_info branch returns correct zero-cost pricing when no model-name info exists |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming request] --> B[_user_api_key_auth_builder]
B --> C[get_end_user_object\nreturns object only\nno budget raise]
C --> D[_is_model_cost_zero\ncheck router pricing]
D -- "zero cost\nexplicit 0.0 in both costs" --> E[skip_budget_checks = True]
D -- "non-zero / unknown cost" --> F[skip_budget_checks = False]
E --> G[skip Check 5c\nskip common_checks budget step]
F --> H[Check 5c\nis_end_user_within_budget\nspend vs max_budget]
H -- "within budget" --> I[common_checks\nalso checks end-user budget]
H -- "over budget" --> J[BudgetExceededError]
I --> K[Request allowed]
G --> K
subgraph router_fix["Router.get_deployment_model_info fix"]
L[litellm_model_name_model_info?]
L -- "both present" --> M[merge: custom overrides base]
L -- "only litellm" --> N[use litellm info]
L -- "only custom NEW" --> O[use custom_model_info\nprice 0.0 preserved]
end
D -.->|uses pricing from| router_fix
Reviews (7): Last reviewed commit: "fix(tests): add test for zero-cost model..." | Re-trigger Greptile
| # 4b. Check end-user max_budget | ||
| end_user_mb = valid_token.end_user_max_budget | ||
| if ( | ||
| end_user_mb is not None | ||
| and end_user_object is not None | ||
| and valid_token.end_user_id is not None | ||
| ): | ||
| await proxy_logging_obj.max_budget_limiter.is_end_user_within_budget( | ||
| end_user_id=valid_token.end_user_id, | ||
| end_user_max_budget=end_user_mb, | ||
| end_user_spend=end_user_object.spend, | ||
| route=route, | ||
| ) |
There was a problem hiding this comment.
Zero-cost bypass missing in custom-auth path
In the standard key-auth and JWT auth flows, "Check 5c" and the common_checks end-user budget check both sit inside if not skip_budget_checks:, so zero-cost models skip them correctly. However, _run_post_custom_auth_checks never evaluates _is_model_cost_zero or sets skip_budget_checks, so "4b" here will block over-budget end-users even when the requested model is free. The same gap exists for the existing model_max_budget check at line 1923, but this new check makes the asymmetry more visible. Consider computing skip_budget_checks at the start of _run_post_custom_auth_checks (same logic as lines 1283-1295) and guarding both "4b" and the model-budget check with it.
There was a problem hiding this comment.
@greptile Is the problem now fixed
There was a problem hiding this comment.
Yes, the issue is now fixed. The PR adds exactly the guard I described:
# 2a. Check if model has zero cost - if so, skip all budget checks
model = get_model_from_request(request_data, route)
skip_budget_checks = False
if model is not None and llm_router is not None:
skip_budget_checks = _is_model_cost_zero(model=model, llm_router=llm_router)All three budget checks in _run_post_custom_auth_checks — key model budget (check 3), end-user model budget (check 4), and end-user max budget (check 4b) — are now wrapped under if not skip_budget_checks:. The flag is also forwarded into common_checks at the bottom of the function, so zero-cost models correctly bypass all budget enforcement in the custom-auth path.
| # Check 5c. End user max budget | ||
| end_user_mb = valid_token.end_user_max_budget | ||
| if ( | ||
| end_user_mb is not None | ||
| and _end_user_object is not None | ||
| and valid_token.end_user_id is not None | ||
| ): | ||
| await proxy_logging_obj.max_budget_limiter.is_end_user_within_budget( | ||
| end_user_id=valid_token.end_user_id, | ||
| end_user_max_budget=end_user_mb, | ||
| end_user_spend=_end_user_object.spend, | ||
| route=route, | ||
| ) |
There was a problem hiding this comment.
Redundant check with
common_checks
common_checks (called a few lines later at line 1519) already performs the identical end-user budget comparison using end_user_object.litellm_budget_table.max_budget, and it is also inside if not skip_budget_checks:. "Check 5c" will always agree with the common_checks result, so this block fires twice on every non-zero-cost request. It isn't harmful, but it adds noise and a second potential raise path. Consider removing "Check 5c" and letting common_checks remain the single source of truth for this check in the key-auth flow.
There was a problem hiding this comment.
@greptile There were already duplication in Team Budget Checks and Global Proxy Budget Check between common_checks and the logic inside user_api_key_auth.py. This is a job for another refactor.
There was a problem hiding this comment.
Fair point — if Team and Global Proxy budget checks already follow the same "duplicated in both places" pattern, then this PR is consistent with the existing style rather than introducing a new problem. The broader deduplication is a separate concern.
Tip: You can give me more context with .greptile/rules so I get this right next time.
2829293 to
99fbc7a
Compare
| Raises: | ||
| BudgetExceededError: If the end-user has exceeded overall budget. | ||
| """ | ||
| from litellm.proxy.auth.route_checks import RouteChecks |
…l_info is missing Return custom_model_info from get_deployment_model_info when litellm_model_name_model_info is unavailable. This preserves explicit input_cost_per_token and output_cost_per_token values (including 0.0) for custom deployments and prevents model_group_info from defaulting to None costs.
…tion Remove _check_end_user_budget functionality from auth_checks.py, which is being triggered each time get_end_user_object is called, causing all the models to be blocked (even the Zero Cost Ones), and move this functionality to max_budget_limiter.py, mimicing the max_model_budget_limiter.py functionality with handling exceeded model budget. This unifies all budget checks inside user_api_key_auth.py, make the code more streamlined, and with the combination with the previous commit solving None model cost values, it fixes the zero-cost models triggering budget exceeded.
…n get_end_user_object Since the budget enforcement is relocated from get_end_user_object inside auth_checks.py to is_end_user_within_budget inside max_budget_limiter.py, both test cases test_get_end_user_object and test_budget_enforcement_blocks_over_budget_users are edited accordingly
99fbc7a to
ea47665
Compare
ea47665 to
23a78f1
Compare
There was a problem hiding this comment.
isn't this a regression for jwt auth?
There was a problem hiding this comment.
@greptile can you confirm this?
There was a problem hiding this comment.
Based on my analysis of the diff, yes — this is a regression for JWT auth.
Here's why:
Before: _check_end_user_budget was called directly inside get_end_user_object, which is invoked from both the key auth path AND the JWT auth path (handle_jwt.py also calls get_end_user_object). Budget enforcement was guaranteed regardless of auth method.
After: The budget check was removed from get_end_user_object and moved to two places in user_api_key_auth.py:
- Check 5c in
_user_api_key_auth_builder(key auth path) - Check 4b in
_run_post_custom_auth_checks(custom auth)
The new Check 5c only fires if valid_token.end_user_max_budget is not None. This field is populated via update_valid_token_with_end_user_params. For the JWT auth path, the end_user_params dict is constructed from the JWT builder result — and there's no evidence in this PR that end_user_max_budget is being populated in that flow. If it's None on the token (the default), the check is silently skipped.
The net result: a JWT-authenticated end user who has exceeded their max_budget will no longer be blocked, because:
get_end_user_objectno longer raisesBudgetExceededError- Check 5c won't trigger if
valid_token.end_user_max_budgetwas never set from the JWT path
Can you confirm whether end_user_max_budget is reliably populated on valid_token during JWT auth? If not, you'll need to either ensure the JWT builder populates it, or add a fallback in Check 5c that reads _end_user_object.litellm_budget_table.max_budget directly when valid_token.end_user_max_budget is None.
Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.
There was a problem hiding this comment.
@greptile does adding a simple fallback like:
if end_user_mb is None and end_user_object is not None:
end_user_mb = getattr(end_user_object.litellm_budget_table, 'max_budget', None)
in 5c and 4b solves the issue?
There was a problem hiding this comment.
Yes, the fallback is the right approach, but your proposed code has a subtle bug: getattr(end_user_object.litellm_budget_table, 'max_budget', None) will raise AttributeError if litellm_budget_table itself is None — the getattr default only catches missing attributes, not None.attr access.
The safe version:
end_user_mb = valid_token.end_user_max_budget
if end_user_mb is None and _end_user_object is not None:
budget_table = _end_user_object.litellm_budget_table
end_user_mb = budget_table.max_budget if budget_table is not None else NoneApply this before the existing if end_user_mb is not None check in both 5c and 4b (using end_user_object instead of _end_user_object in 4b, since that's the variable name there).
This ensures:
- JWT auth path: even if
valid_token.end_user_max_budgetwas never populated fromend_user_params, the budget is read directly from the fetchedend_user_object - Key auth path: no change in behavior since
valid_token.end_user_max_budgetis already populated viaupdate_valid_token_with_end_user_params - No
Nonepointer risk from a missinglitellm_budget_table
Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.
Relevant issues
Fixes #14004
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Type
🐛 Bug Fix
✅ Test
Changes
This PR fixes critical issues regarding zero-cost models correctly being white listed without triggering budget limits.
Router Pricing Fix: Updated
get_deployment_model_infoinsiderouter.pyto returncustom_model_infowhen standardlitelll_model_name_model_infois unavailable. This ensures explicitinput/output_cost_per_tokenvalues (including0.0) are preserved for custom deployments, preventing costs from defaulting toNone.Auth Budget Refactor: Moved
_check_end_user_budgetlogic fromauth_checks.pytomax_budget_limiter.py.user_api_key_auth.py, streamlining the code and fixing the bug where zero-cost models erroneously exceeded budget limits.Updated tests: reflecting new budget enforcement logic since
_check_end_user_budgetis not called whenget_end_user_objectis executed.