Skip to content

fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true (#14004)#25645

Open
suleimanelkhoury wants to merge 6 commits intoBerriAI:litellm_oss_staging_04_14_2026_p1from
suleimanelkhoury:zero-cost-fix
Open

fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true (#14004)#25645
suleimanelkhoury wants to merge 6 commits intoBerriAI:litellm_oss_staging_04_14_2026_p1from
suleimanelkhoury:zero-cost-fix

Conversation

@suleimanelkhoury
Copy link
Copy Markdown

@suleimanelkhoury suleimanelkhoury commented Apr 13, 2026

Relevant issues

Fixes #14004

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🐛 Bug Fix
✅ Test

Changes

This PR fixes critical issues regarding zero-cost models correctly being white listed without triggering budget limits.

  • Router Pricing Fix: Updated get_deployment_model_info inside router.py to return custom_model_info when standard litelll_model_name_model_info is unavailable. This ensures explicit input/output_cost_per_token values (including 0.0) are preserved for custom deployments, preventing costs from defaulting to None.

  • Auth Budget Refactor: Moved _check_end_user_budget logic from auth_checks.py to max_budget_limiter.py.

    • Why: The previous implementation triggered checks prematurely, blocking all models (including zero-cost ones) incorrectly.
    • Result: Unifies budget checks within user_api_key_auth.py, streamlining the code and fixing the bug where zero-cost models erroneously exceeded budget limits.
  • Updated tests: reflecting new budget enforcement logic since _check_end_user_budget is not called when get_end_user_object is executed.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 15, 2026 9:50am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 13, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing suleimanelkhoury:zero-cost-fix (23a78f1) with main (e64d98f)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/proxy/hooks/max_budget_limiter.py 25.00% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread litellm/proxy/hooks/max_budget_limiter.py Fixed
@@ -1,9 +1,11 @@
from fastapi import HTTPException

import litellm
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 13, 2026

Greptile Summary

This PR moves end-user budget enforcement out of get_end_user_object and into the auth flow (key-auth "Check 5c" and custom-auth "4b"), both gated behind if not skip_budget_checks. A companion fix to Router.get_deployment_model_info ensures custom deployments with explicit zero costs are returned (previously the elif custom_model_info is not None branch was missing), so _is_model_cost_zero can correctly identify free models and skip all budget gates for them. Tests were updated to reflect that get_end_user_object no longer raises BudgetExceededError, and new tests cover the zero-cost bypass in the custom-auth path.

Confidence Score: 5/5

Safe to merge — the logic is correct, security is preserved (bypass only triggers on explicitly configured 0.0 costs), and prior review concerns are addressed.

No P0 or P1 issues found. The router fix correctly adds the missing elif custom_model_info branch. Budget enforcement is logically equivalent to before for non-zero-cost models (same comparison, same raise), just gated by skip_budget_checks so free models bypass it. The _is_model_cost_zero function is conservative (requires both costs explicitly set to 0.0 AND present in litellm.model_cost), preventing accidental bypasses. Test changes correctly reflect the intentional behavior change rather than masking regressions.

No files require special attention.

Important Files Changed

Filename Overview
litellm/router.py Added missing elif custom_model_info is not None branch so custom deployments with explicit zero-cost pricing are returned instead of falling through to None
litellm/proxy/auth/auth_checks.py Removed _check_end_user_budget and its two call sites from get_end_user_object; budget enforcement now lives in the auth flow where the zero-cost skip flag is available
litellm/proxy/auth/user_api_key_auth.py Adds "Check 5c" (key-auth) and "4b" (custom-auth) end-user max-budget checks inside if not skip_budget_checks:; also moves _is_model_cost_zero import to module level and propagates end_user_max_budget through update_valid_token_with_end_user_params
litellm/proxy/hooks/max_budget_limiter.py New is_end_user_within_budget method implements the moved budget check, correctly skipping info routes and raising BudgetExceededError on overspend
tests/proxy_unit_tests/test_default_end_user_budget_simple.py Restructured test_budget_enforcement_blocks_over_budget_users to test object fetch and limiter invocation separately; budget enforcement is now unit-tested against the limiter rather than through a full auth-flow integration path
tests/test_litellm/proxy/auth/test_custom_auth_end_user_budget.py Adds test_custom_auth_zero_cost_model_skips_budget_checks verifying all three budget checks are skipped and common_checks receives skip_budget_checks=True for zero-cost models in custom-auth path
tests/test_litellm/test_router.py Adds Test Case 6 verifying the new elif custom_model_info branch returns correct zero-cost pricing when no model-name info exists

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming request] --> B[_user_api_key_auth_builder]
    B --> C[get_end_user_object\nreturns object only\nno budget raise]
    C --> D[_is_model_cost_zero\ncheck router pricing]
    D -- "zero cost\nexplicit 0.0 in both costs" --> E[skip_budget_checks = True]
    D -- "non-zero / unknown cost" --> F[skip_budget_checks = False]
    E --> G[skip Check 5c\nskip common_checks budget step]
    F --> H[Check 5c\nis_end_user_within_budget\nspend vs max_budget]
    H -- "within budget" --> I[common_checks\nalso checks end-user budget]
    H -- "over budget" --> J[BudgetExceededError]
    I --> K[Request allowed]
    G --> K

    subgraph router_fix["Router.get_deployment_model_info fix"]
        L[litellm_model_name_model_info?]
        L -- "both present" --> M[merge: custom overrides base]
        L -- "only litellm" --> N[use litellm info]
        L -- "only custom NEW" --> O[use custom_model_info\nprice 0.0 preserved]
    end
    D -.->|uses pricing from| router_fix
Loading

Reviews (7): Last reviewed commit: "fix(tests): add test for zero-cost model..." | Re-trigger Greptile

Comment thread litellm/proxy/auth/user_api_key_auth.py Outdated
Comment thread litellm/proxy/auth/user_api_key_auth.py Outdated
Comment on lines +1952 to +1964
# 4b. Check end-user max_budget
end_user_mb = valid_token.end_user_max_budget
if (
end_user_mb is not None
and end_user_object is not None
and valid_token.end_user_id is not None
):
await proxy_logging_obj.max_budget_limiter.is_end_user_within_budget(
end_user_id=valid_token.end_user_id,
end_user_max_budget=end_user_mb,
end_user_spend=end_user_object.spend,
route=route,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Zero-cost bypass missing in custom-auth path

In the standard key-auth and JWT auth flows, "Check 5c" and the common_checks end-user budget check both sit inside if not skip_budget_checks:, so zero-cost models skip them correctly. However, _run_post_custom_auth_checks never evaluates _is_model_cost_zero or sets skip_budget_checks, so "4b" here will block over-budget end-users even when the requested model is free. The same gap exists for the existing model_max_budget check at line 1923, but this new check makes the asymmetry more visible. Consider computing skip_budget_checks at the start of _run_post_custom_auth_checks (same logic as lines 1283-1295) and guarding both "4b" and the model-budget check with it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile Is the problem now fixed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the issue is now fixed. The PR adds exactly the guard I described:

# 2a. Check if model has zero cost - if so, skip all budget checks
model = get_model_from_request(request_data, route)
skip_budget_checks = False
if model is not None and llm_router is not None:
    skip_budget_checks = _is_model_cost_zero(model=model, llm_router=llm_router)

All three budget checks in _run_post_custom_auth_checks — key model budget (check 3), end-user model budget (check 4), and end-user max budget (check 4b) — are now wrapped under if not skip_budget_checks:. The flag is also forwarded into common_checks at the bottom of the function, so zero-cost models correctly bypass all budget enforcement in the custom-auth path.

Comment on lines +1431 to +1443
# Check 5c. End user max budget
end_user_mb = valid_token.end_user_max_budget
if (
end_user_mb is not None
and _end_user_object is not None
and valid_token.end_user_id is not None
):
await proxy_logging_obj.max_budget_limiter.is_end_user_within_budget(
end_user_id=valid_token.end_user_id,
end_user_max_budget=end_user_mb,
end_user_spend=_end_user_object.spend,
route=route,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant check with common_checks

common_checks (called a few lines later at line 1519) already performs the identical end-user budget comparison using end_user_object.litellm_budget_table.max_budget, and it is also inside if not skip_budget_checks:. "Check 5c" will always agree with the common_checks result, so this block fires twice on every non-zero-cost request. It isn't harmful, but it adds noise and a second potential raise path. Consider removing "Check 5c" and letting common_checks remain the single source of truth for this check in the key-auth flow.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile There were already duplication in Team Budget Checks and Global Proxy Budget Check between common_checks and the logic inside user_api_key_auth.py. This is a job for another refactor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — if Team and Global Proxy budget checks already follow the same "duplicated in both places" pattern, then this PR is consistent with the existing style rather than introducing a new problem. The broader deduplication is a separate concern.

Tip: You can give me more context with .greptile/rules so I get this right next time.

@suleimanelkhoury
Copy link
Copy Markdown
Author

@greptileai

@suleimanelkhoury suleimanelkhoury changed the title Zero cost fix fix: Allow zero‑cost models to stay unblocked when BudgetExceeded is true Apr 14, 2026
Raises:
BudgetExceededError: If the end-user has exceeded overall budget.
"""
from litellm.proxy.auth.route_checks import RouteChecks
…l_info is missing

Return custom_model_info from get_deployment_model_info when
litellm_model_name_model_info is unavailable. This preserves explicit
input_cost_per_token and output_cost_per_token values (including 0.0)
for custom deployments and prevents model_group_info from defaulting to
None costs.
…tion

Remove _check_end_user_budget functionality from auth_checks.py, which is being triggered each time get_end_user_object is called, causing all the models to be blocked (even the Zero Cost Ones), and move this functionality to max_budget_limiter.py, mimicing the max_model_budget_limiter.py functionality with handling exceeded model budget. This unifies all budget checks inside user_api_key_auth.py, make the code more streamlined, and with the combination with the previous commit solving None model cost values, it fixes the zero-cost models triggering budget exceeded.
…n get_end_user_object

Since the budget enforcement is relocated from get_end_user_object inside auth_checks.py to  is_end_user_within_budget inside max_budget_limiter.py, both test cases test_get_end_user_object and test_budget_enforcement_blocks_over_budget_users are edited accordingly
@suleimanelkhoury suleimanelkhoury changed the title fix: Allow zero‑cost models to stay unblocked when BudgetExceeded is true fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true Apr 14, 2026
@suleimanelkhoury suleimanelkhoury changed the title fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true (#14004) Apr 14, 2026
@krrish-berri-2 krrish-berri-2 changed the base branch from main to litellm_oss_staging_04_14_2026_p1 April 15, 2026 02:10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a regression for jwt auth?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile can you confirm this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my analysis of the diff, yes — this is a regression for JWT auth.

Here's why:

Before: _check_end_user_budget was called directly inside get_end_user_object, which is invoked from both the key auth path AND the JWT auth path (handle_jwt.py also calls get_end_user_object). Budget enforcement was guaranteed regardless of auth method.

After: The budget check was removed from get_end_user_object and moved to two places in user_api_key_auth.py:

  1. Check 5c in _user_api_key_auth_builder (key auth path)
  2. Check 4b in _run_post_custom_auth_checks (custom auth)

The new Check 5c only fires if valid_token.end_user_max_budget is not None. This field is populated via update_valid_token_with_end_user_params. For the JWT auth path, the end_user_params dict is constructed from the JWT builder result — and there's no evidence in this PR that end_user_max_budget is being populated in that flow. If it's None on the token (the default), the check is silently skipped.

The net result: a JWT-authenticated end user who has exceeded their max_budget will no longer be blocked, because:

  1. get_end_user_object no longer raises BudgetExceededError
  2. Check 5c won't trigger if valid_token.end_user_max_budget was never set from the JWT path

Can you confirm whether end_user_max_budget is reliably populated on valid_token during JWT auth? If not, you'll need to either ensure the JWT builder populates it, or add a fallback in Check 5c that reads _end_user_object.litellm_budget_table.max_budget directly when valid_token.end_user_max_budget is None.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile does adding a simple fallback like:
if end_user_mb is None and end_user_object is not None:
end_user_mb = getattr(end_user_object.litellm_budget_table, 'max_budget', None)
in 5c and 4b solves the issue?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the fallback is the right approach, but your proposed code has a subtle bug: getattr(end_user_object.litellm_budget_table, 'max_budget', None) will raise AttributeError if litellm_budget_table itself is None — the getattr default only catches missing attributes, not None.attr access.

The safe version:

end_user_mb = valid_token.end_user_max_budget
if end_user_mb is None and _end_user_object is not None:
    budget_table = _end_user_object.litellm_budget_table
    end_user_mb = budget_table.max_budget if budget_table is not None else None

Apply this before the existing if end_user_mb is not None check in both 5c and 4b (using end_user_object instead of _end_user_object in 4b, since that's the variable name there).

This ensures:

  1. JWT auth path: even if valid_token.end_user_max_budget was never populated from end_user_params, the budget is read directly from the fetched end_user_object
  2. Key auth path: no change in behavior since valid_token.end_user_max_budget is already populated via update_valid_token_with_end_user_params
  3. No None pointer risk from a missing litellm_budget_table

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: API blocks free (no cost) models when budget is exceeded

4 participants