fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true (#14004) by suleimanelkhoury · Pull Request #25645 · BerriAI/litellm

suleimanelkhoury · 2026-04-13T17:50:23Z

Relevant issues

Fixes #14004

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🐛 Bug Fix
✅ Test

Changes

This PR fixes critical issues regarding zero-cost models correctly being white listed without triggering budget limits.

Router Pricing Fix: Updated get_deployment_model_info inside router.py to return custom_model_info when standard litelll_model_name_model_info is unavailable. This ensures explicit input/output_cost_per_token values (including 0.0) are preserved for custom deployments, preventing costs from defaulting to None.
Auth Budget Refactor: Moved _check_end_user_budget logic from auth_checks.py to max_budget_limiter.py.
- Why: The previous implementation triggered checks prematurely, blocking all models (including zero-cost ones) incorrectly.
- Result: Unifies budget checks within user_api_key_auth.py, streamlining the code and fixing the bug where zero-cost models erroneously exceeded budget limits.
Updated tests: reflecting new budget enforcement logic since _check_end_user_budget is not called when get_end_user_object is executed.

vercel · 2026-04-13T17:50:29Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 15, 2026 9:50am

CLAassistant · 2026-04-13T17:50:42Z

All committers have signed the CLA.

codspeed-hq · 2026-04-13T17:52:26Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing suleimanelkhoury:zero-cost-fix (23a78f1) with main (e64d98f)}

codecov · 2026-04-13T17:53:56Z

Codecov Report

❌ Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/hooks/max_budget_limiter.py	25.00%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@@ -1,9 +1,11 @@
 from fastapi import HTTPException

+import litellm


greptile-apps · 2026-04-13T18:04:10Z

Greptile Summary

This PR moves end-user budget enforcement out of get_end_user_object and into the auth flow (key-auth "Check 5c" and custom-auth "4b"), both gated behind if not skip_budget_checks. A companion fix to Router.get_deployment_model_info ensures custom deployments with explicit zero costs are returned (previously the elif custom_model_info is not None branch was missing), so _is_model_cost_zero can correctly identify free models and skip all budget gates for them. Tests were updated to reflect that get_end_user_object no longer raises BudgetExceededError, and new tests cover the zero-cost bypass in the custom-auth path.

Confidence Score: 5/5

Safe to merge — the logic is correct, security is preserved (bypass only triggers on explicitly configured 0.0 costs), and prior review concerns are addressed.

No P0 or P1 issues found. The router fix correctly adds the missing elif custom_model_info branch. Budget enforcement is logically equivalent to before for non-zero-cost models (same comparison, same raise), just gated by skip_budget_checks so free models bypass it. The _is_model_cost_zero function is conservative (requires both costs explicitly set to 0.0 AND present in litellm.model_cost), preventing accidental bypasses. Test changes correctly reflect the intentional behavior change rather than masking regressions.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/router.py	Added missing `elif custom_model_info is not None` branch so custom deployments with explicit zero-cost pricing are returned instead of falling through to None
litellm/proxy/auth/auth_checks.py	Removed `_check_end_user_budget` and its two call sites from `get_end_user_object`; budget enforcement now lives in the auth flow where the zero-cost skip flag is available
litellm/proxy/auth/user_api_key_auth.py	Adds "Check 5c" (key-auth) and "4b" (custom-auth) end-user max-budget checks inside `if not skip_budget_checks:`; also moves `_is_model_cost_zero` import to module level and propagates `end_user_max_budget` through `update_valid_token_with_end_user_params`
litellm/proxy/hooks/max_budget_limiter.py	New `is_end_user_within_budget` method implements the moved budget check, correctly skipping info routes and raising `BudgetExceededError` on overspend
tests/proxy_unit_tests/test_default_end_user_budget_simple.py	Restructured `test_budget_enforcement_blocks_over_budget_users` to test object fetch and limiter invocation separately; budget enforcement is now unit-tested against the limiter rather than through a full auth-flow integration path
tests/test_litellm/proxy/auth/test_custom_auth_end_user_budget.py	Adds `test_custom_auth_zero_cost_model_skips_budget_checks` verifying all three budget checks are skipped and `common_checks` receives `skip_budget_checks=True` for zero-cost models in custom-auth path
tests/test_litellm/test_router.py	Adds Test Case 6 verifying the new `elif custom_model_info` branch returns correct zero-cost pricing when no model-name info exists

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming request] --> B[_user_api_key_auth_builder]
    B --> C[get_end_user_object\nreturns object only\nno budget raise]
    C --> D[_is_model_cost_zero\ncheck router pricing]
    D -- "zero cost\nexplicit 0.0 in both costs" --> E[skip_budget_checks = True]
    D -- "non-zero / unknown cost" --> F[skip_budget_checks = False]
    E --> G[skip Check 5c\nskip common_checks budget step]
    F --> H[Check 5c\nis_end_user_within_budget\nspend vs max_budget]
    H -- "within budget" --> I[common_checks\nalso checks end-user budget]
    H -- "over budget" --> J[BudgetExceededError]
    I --> K[Request allowed]
    G --> K

    subgraph router_fix["Router.get_deployment_model_info fix"]
        L[litellm_model_name_model_info?]
        L -- "both present" --> M[merge: custom overrides base]
        L -- "only litellm" --> N[use litellm info]
        L -- "only custom NEW" --> O[use custom_model_info\nprice 0.0 preserved]
    end
    D -.->|uses pricing from| router_fix

_{Reviews (7): Last reviewed commit: "fix(tests): add test for zero-cost model..." | Re-trigger Greptile}

greptile-apps · 2026-04-13T18:04:15Z

+    # 4b. Check end-user max_budget
+    end_user_mb = valid_token.end_user_max_budget
+    if (
+        end_user_mb is not None
+        and end_user_object is not None
+        and valid_token.end_user_id is not None
+    ):
+        await proxy_logging_obj.max_budget_limiter.is_end_user_within_budget(
+            end_user_id=valid_token.end_user_id,
+            end_user_max_budget=end_user_mb,
+            end_user_spend=end_user_object.spend,
+            route=route,
+        )


Zero-cost bypass missing in custom-auth path

In the standard key-auth and JWT auth flows, "Check 5c" and the common_checks end-user budget check both sit inside if not skip_budget_checks:, so zero-cost models skip them correctly. However, _run_post_custom_auth_checks never evaluates _is_model_cost_zero or sets skip_budget_checks, so "4b" here will block over-budget end-users even when the requested model is free. The same gap exists for the existing model_max_budget check at line 1923, but this new check makes the asymmetry more visible. Consider computing skip_budget_checks at the start of _run_post_custom_auth_checks (same logic as lines 1283-1295) and guarding both "4b" and the model-budget check with it.

@greptile Is the problem now fixed

Yes, the issue is now fixed. The PR adds exactly the guard I described:

# 2a. Check if model has zero cost - if so, skip all budget checks model = get_model_from_request(request_data, route) skip_budget_checks = False if model is not None and llm_router is not None: skip_budget_checks = _is_model_cost_zero(model=model, llm_router=llm_router)

All three budget checks in _run_post_custom_auth_checks — key model budget (check 3), end-user model budget (check 4), and end-user max budget (check 4b) — are now wrapped under if not skip_budget_checks:. The flag is also forwarded into common_checks at the bottom of the function, so zero-cost models correctly bypass all budget enforcement in the custom-auth path.

greptile-apps · 2026-04-13T18:04:16Z

+                    # Check 5c. End user max budget
+                    end_user_mb = valid_token.end_user_max_budget
+                    if (
+                        end_user_mb is not None
+                        and _end_user_object is not None
+                        and valid_token.end_user_id is not None
+                    ):
+                        await proxy_logging_obj.max_budget_limiter.is_end_user_within_budget(
+                            end_user_id=valid_token.end_user_id,
+                            end_user_max_budget=end_user_mb,
+                            end_user_spend=_end_user_object.spend,
+                            route=route,
+                        )


Redundant check with common_checks

common_checks (called a few lines later at line 1519) already performs the identical end-user budget comparison using end_user_object.litellm_budget_table.max_budget, and it is also inside if not skip_budget_checks:. "Check 5c" will always agree with the common_checks result, so this block fires twice on every non-zero-cost request. It isn't harmful, but it adds noise and a second potential raise path. Consider removing "Check 5c" and letting common_checks remain the single source of truth for this check in the key-auth flow.

@greptile There were already duplication in Team Budget Checks and Global Proxy Budget Check between common_checks and the logic inside user_api_key_auth.py. This is a job for another refactor.

Fair point — if Team and Global Proxy budget checks already follow the same "duplicated in both places" pattern, then this PR is consistent with the existing style rather than introducing a new problem. The broader deduplication is a separate concern.

Tip: You can give me more context with .greptile/rules so I get this right next time.

suleimanelkhoury · 2026-04-13T18:05:18Z

@greptileai

+        Raises:
+            BudgetExceededError: If the end-user has exceeded overall budget.
+        """
+        from litellm.proxy.auth.route_checks import RouteChecks


…l_info is missing Return custom_model_info from get_deployment_model_info when litellm_model_name_model_info is unavailable. This preserves explicit input_cost_per_token and output_cost_per_token values (including 0.0) for custom deployments and prevents model_group_info from defaulting to None costs.

…tion Remove _check_end_user_budget functionality from auth_checks.py, which is being triggered each time get_end_user_object is called, causing all the models to be blocked (even the Zero Cost Ones), and move this functionality to max_budget_limiter.py, mimicing the max_model_budget_limiter.py functionality with handling exceeded model budget. This unifies all budget checks inside user_api_key_auth.py, make the code more streamlined, and with the combination with the previous commit solving None model cost values, it fixes the zero-cost models triggering budget exceeded.

…n get_end_user_object Since the budget enforcement is relocated from get_end_user_object inside auth_checks.py to is_end_user_within_budget inside max_budget_limiter.py, both test cases test_get_end_user_object and test_budget_enforcement_blocks_over_budget_users are edited accordingly

krrish-berri-2 · 2026-04-15T02:11:34Z

isn't this a regression for jwt auth?

@greptile can you confirm this?

Based on my analysis of the diff, yes — this is a regression for JWT auth.

Here's why:

Before: _check_end_user_budget was called directly inside get_end_user_object, which is invoked from both the key auth path AND the JWT auth path (handle_jwt.py also calls get_end_user_object). Budget enforcement was guaranteed regardless of auth method.

After: The budget check was removed from get_end_user_object and moved to two places in user_api_key_auth.py:

Check 5c in _user_api_key_auth_builder (key auth path)

Check 4b in _run_post_custom_auth_checks (custom auth)

The new Check 5c only fires if valid_token.end_user_max_budget is not None. This field is populated via update_valid_token_with_end_user_params. For the JWT auth path, the end_user_params dict is constructed from the JWT builder result — and there's no evidence in this PR that end_user_max_budget is being populated in that flow. If it's None on the token (the default), the check is silently skipped.

The net result: a JWT-authenticated end user who has exceeded their max_budget will no longer be blocked, because:

get_end_user_object no longer raises BudgetExceededError

Check 5c won't trigger if valid_token.end_user_max_budget was never set from the JWT path

Can you confirm whether end_user_max_budget is reliably populated on valid_token during JWT auth? If not, you'll need to either ensure the JWT builder populates it, or add a fallback in Check 5c that reads _end_user_object.litellm_budget_table.max_budget directly when valid_token.end_user_max_budget is None.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

@greptile does adding a simple fallback like:
if end_user_mb is None and end_user_object is not None:
end_user_mb = getattr(end_user_object.litellm_budget_table, 'max_budget', None)
in 5c and 4b solves the issue?

Yes, the fallback is the right approach, but your proposed code has a subtle bug: getattr(end_user_object.litellm_budget_table, 'max_budget', None) will raise AttributeError if litellm_budget_table itself is None — the getattr default only catches missing attributes, not None.attr access.

The safe version:

end_user_mb = valid_token.end_user_max_budget if end_user_mb is None and _end_user_object is not None: budget_table = _end_user_object.litellm_budget_table end_user_mb = budget_table.max_budget if budget_table is not None else None

Apply this before the existing if end_user_mb is not None check in both 5c and 4b (using end_user_object instead of _end_user_object in 4b, since that's the variable name there).

This ensures:

JWT auth path: even if valid_token.end_user_max_budget was never populated from end_user_params, the budget is read directly from the fetched end_user_object

Key auth path: no change in behavior since valid_token.end_user_max_budget is already populated via update_valid_token_with_end_user_params

No None pointer risk from a missing litellm_budget_table

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

…r custom auth workflow

…tom auth

vercel bot deployed to Preview April 13, 2026 17:51 View deployment

github-advanced-security AI found potential problems Apr 13, 2026

View reviewed changes

Comment thread litellm/proxy/hooks/max_budget_limiter.py Fixed

Comment thread litellm/proxy/hooks/max_budget_limiter.py

@@ -1,9 +1,11 @@

from fastapi import HTTPException

import litellm

suleimanelkhoury mentioned this pull request Apr 13, 2026

[Bug]: API blocks free (no cost) models when budget is exceeded #14004

Open

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

vercel bot deployed to Preview April 13, 2026 18:07 View deployment

suleimanelkhoury changed the title ~~Zero cost fix~~ fix: Allow zero‑cost models to stay unblocked when BudgetExceeded is true Apr 14, 2026

suleimanelkhoury force-pushed the zero-cost-fix branch from 2829293 to 99fbc7a Compare April 14, 2026 09:47

vercel bot deployed to Preview April 14, 2026 09:48 View deployment

github-advanced-security AI found potential problems Apr 14, 2026

View reviewed changes

Comment thread litellm/proxy/hooks/max_budget_limiter.py

Raises:

BudgetExceededError: If the end-user has exceeded overall budget.

"""

from litellm.proxy.auth.route_checks import RouteChecks

suleimanelkhoury added 3 commits April 14, 2026 13:14

suleimanelkhoury force-pushed the zero-cost-fix branch from 99fbc7a to ea47665 Compare April 14, 2026 11:19

vercel bot deployed to Preview April 14, 2026 11:20 View deployment

suleimanelkhoury force-pushed the zero-cost-fix branch from ea47665 to 23a78f1 Compare April 14, 2026 11:32

vercel bot deployed to Preview April 14, 2026 11:33 View deployment

suleimanelkhoury changed the title ~~fix: Allow zero‑cost models to stay unblocked when BudgetExceeded is true~~ fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true Apr 14, 2026

suleimanelkhoury changed the title ~~fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true~~ fix: Custom Model pricing / Allow zero‑cost models to stay unblocked when BudgetExceeded is true (#14004) Apr 14, 2026

krrish-berri-2 changed the base branch from main to litellm_oss_staging_04_14_2026_p1 April 15, 2026 02:10

krrish-berri-2 reviewed Apr 15, 2026

View reviewed changes

fix(auth): add fallback for end-user max budget when not set on token

9b27d10

vercel bot deployed to Preview April 15, 2026 08:56 View deployment

suleimanelkhoury added 2 commits April 15, 2026 11:48

fix(auth): optimize budget checks by skipping for zero-cost models fo…

fe79bbd

…r custom auth workflow

fix(tests): add test for zero-cost model to skip budget checks in cus…

3082325

…tom auth

vercel bot deployed to Preview April 15, 2026 09:50 View deployment

suleimanelkhoury requested a review from krrish-berri-2 April 17, 2026 10:57

		@@ -1,9 +1,11 @@
		from fastapi import HTTPException

		import litellm

Uh oh!

Conversation

suleimanelkhoury commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

vercel bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

codecov bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

greptile-apps bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

greptile-apps bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

suleimanelkhoury Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

suleimanelkhoury Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

suleimanelkhoury commented Apr 13, 2026

Uh oh!

krrish-berri-2 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

krrish-berri-2 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

suleimanelkhoury Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

suleimanelkhoury commented Apr 13, 2026 •

edited

Loading

vercel bot commented Apr 13, 2026 •

edited

Loading

CLAassistant commented Apr 13, 2026 •

edited

Loading

codspeed-hq bot commented Apr 13, 2026 •

edited

Loading

codecov bot commented Apr 13, 2026 •

edited

Loading

greptile-apps bot commented Apr 13, 2026 •

edited

Loading