fix(team-routing): preserve sibling deployment candidates for team public models by Sameerlite · Pull Request #24688 · BerriAI/litellm

Sameerlite · 2026-03-27T14:46:23Z

Summary

remove team model_alias writes from team model create/update paths to avoid collapsing team public names to a single internal deployment
keep team model routing on public names and add team-aware deployment lookup by team_public_model_name + team_id so sibling deployments remain in the candidate pool

Fixes Scenario 1 where it overwrites model aliases created via new
Fixes Scenario 2, first issue, routing not working

Use a deterministic internal model_name for team-scoped deployments so sibling deployments with the same public model share a routing group. This makes team alias writes idempotent and preserves multi-deployment failover/load balancing behavior. Made-with: Cursor

Remove team model_alias rewrites and resolve team deployments by team_public_model_name with team_id so sibling deployments stay in the routing candidate pool, with explicit logs showing candidate selection before load balancing. Made-with: Cursor

Remove temporary fire-emoji router logs used for local verification while keeping team sibling deployment routing behavior unchanged. Made-with: Cursor

- Add None guard for original_model_name in _add_team_model_to_db - Remove stale old public name when renaming team model - Add comment clarifying team deployment early-return priority Made-with: Cursor

- Update map_team_model test to expect public name return - Only remove old public name if no sibling deployments use it Made-with: Cursor

- Guard against llm_router=None to prevent silent deletion - Add O(1) team_model index to avoid O(n) scan on every team request Made-with: Cursor

Guard should_include_deployment fallback to only return deployments matching the requested team_id, preventing public-name collisions from leaking deployments across teams Made-with: Cursor

- Add clarifying comments to test assertions - Query prisma DB instead of in-memory router to avoid stale state - Prevents incorrect deletion of old public name when siblings exist Made-with: Cursor

- Guard against None model_info in sibling deployment check - Extract _update_team_model_index helper to eliminate duplication Made-with: Cursor

…routing - Skip model_aliases rewrite if model resolves to team deployments - Add test coverage for sibling-preservation branch - Update MockPrismaClient to support sibling deployment scenarios Made-with: Cursor

- Use O(1) team index lookup instead of map_team_model in alias guard - Fix MockPrismaClient to validate where clause filters - Add comment explaining DB query trade-off for team deployments Made-with: Cursor

- Check alias target pattern to detect stale team aliases - Fix PrismaClient type annotation to Optional - Eliminate in-place mutation in index update logic Made-with: Cursor

- Add deduplication guard in _update_team_model_index to prevent duplicate indices - Add wildcard comment in map_team_model for clarity - Add monkeypatch to test_team_alias_stale_bypass_disabled_by_default for determinism - Extract _get_team_deployments helper to centralize DB access pattern - Add clarifying comments for team_public_model_name assignment ordering Made-with: Cursor

- Cache LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS at module level to avoid hot-path secret lookups - Add clarifying comments for should_include_deployment team isolation logic - Add negative assertion for update_team.assert_not_called() in test - Add docstring clarification for _get_team_deployments helper pattern - Add explicit assertion message in test_get_model_list_alias_optimization Made-with: Cursor

- Reorder team_public_model_name assignment to happen before model_name mutation for clarity - Add comment explaining no-rename fast-exit case in _update_existing_team_model_assignment - Add comment explaining final patch_data.model_name = None applies to all code paths Made-with: Cursor

Reset _ENABLE_TEAM_STALE_ALIAS_BYPASS to None in both test functions to ensure test isolation and prevent ordering-dependent failures Made-with: Cursor

…lback2 feat(router): order-based fallback across deployment priority levels

vercel · 2026-03-27T14:46:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 27, 2026 6:33pm

CLAassistant · 2026-03-27T14:46:37Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Sameerlite
❌ yuneng-berri
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codspeed-hq · 2026-03-27T14:48:46Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Sameerlite:litellm_litellm_team-model-group-name-routing-fix (c4159a2) with main (8c2c6a4)}

litellm/proxy/litellm_pre_call_utils.py

greptile-apps · 2026-03-27T14:56:34Z

Greptile Summary

This PR fixes two team routing bugs: (1) sibling team deployments sharing the same public model name being collapsed to a single candidate because model_aliases was overwritten on each new deployment, and (2) team-scoped routing not working because map_team_model was returning an internal UUID name instead of the public name. The fix removes model_aliases writes from the team model create/update paths entirely, introduces a new team_model_to_deployment_indices index for O(1) team-scoped deployment lookups, and changes map_team_model to return the public name unchanged so all sibling deployments are found via team_id filtering. A backwards-compatible LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS flag handles legacy deployments that still have stale aliases in the database.\n\nAlongside the core fix, the PR adds two independent features: health-check-driven routing (enable_health_check_routing) that proactively excludes unhealthy deployments based on background health check results, and first-class order-based fallback that triggers automatically when a lower-order deployment fails, without requiring enable_pre_call_checks=True.\n\nKey findings:\n\n- tests/test_litellm/test_constants.py: The entire body of the only test in this file (test_all_numeric_constants_can_be_overridden) was removed and replaced with a plain helper function that pytest never executes. The PR adds DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER = 2 as a bare literal that cannot be overridden via environment variables — exactly the case the deleted test was designed to catch. The file now has zero tests.\n- litellm/utils.py _get_order_filtered_deployments: When called with a target_order that has no matching deployments in the already-cooldown-filtered list, the function returns all healthy deployments — which may include lower-order deployments that were the ones that originally failed.\n- litellm/proxy/litellm_pre_call_utils.py: _ENABLE_TEAM_STALE_ALIAS_BYPASS is cached at module level on first call and cannot be refreshed at runtime without a process restart. This is intentional for performance but should be documented explicitly.\n- The upsert_deployment change (also checking model_info equality, not just litellm_params) is a correct fix ensuring deployments with changed team_public_model_name are properly updated on the router.

Confidence Score: 4/5

Safe to merge after addressing the deleted test coverage in test_constants.py.

The core team routing fix is well-designed and comprehensively tested with new test classes. The health-check routing and order-based fallback features are clean additions. The score is 4 rather than 5 because test_constants.py had its only test function deleted to avoid exposing that DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER cannot be overridden — a direct test coverage regression that should be resolved before merging.

tests/test_litellm/test_constants.py (test deleted, zero tests remain), litellm/utils.py _get_order_filtered_deployments (target_order fallback may re-select failed-order deployments)

Important Files Changed

Filename	Overview
litellm/router.py	Core routing logic: adds team_model_to_deployment_indices for O(1) team-scoped lookups, moves order filtering from _common_checks_available_deployment to async_get_healthy_deployments (with target_order support), adds order-based fallback in async_function_with_fallbacks_common_utils, and introduces health-check-driven routing filters. map_team_model now returns the public name unchanged instead of the internal UUID name.
litellm/proxy/management_endpoints/model_management_endpoints.py	Removes model_alias writes from team model create/update paths; _add_team_model_to_db and _setup_new_team_model_assignment no longer call update_team for alias registration. _update_existing_team_model_assignment now queries sibling deployments to decide whether to remove old public name from team models list. Adds _get_team_deployments helper with direct Prisma call.
litellm/proxy/litellm_pre_call_utils.py	update_model_if_team_alias_exists now detects stale team aliases (format model_name{team_id}_{uuid}) and optionally bypasses the alias rewrite when sibling team deployments exist; controlled by LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS env var cached at module level.
litellm/router_utils/health_state_cache.py	New file: DeploymentHealthCache wraps DualCache for per-deployment health state, with staleness-based filtering and safety-net (never excludes all deployments). Clean, well-tested implementation.
litellm/proxy/proxy_server.py	Adds _write_health_state_to_router_cache to push background health check results into the router's DeploymentHealthCache after each health check cycle; wires enable_health_check_routing and health_check_staleness_threshold from general_settings into Router constructor params.
litellm/proxy/health_check.py	Adds model_id to health check endpoint results for deployment tracking; adds build_deployment_health_states helper to convert endpoint lists to deployment_id -> health state mapping.
litellm/utils.py	_get_order_filtered_deployments gains optional target_order parameter: when set, filters to that exact order level (returns all if no match); default behavior (min order) preserved.
tests/test_litellm/test_constants.py	test_all_numeric_constants_can_be_overridden removed without replacement; file now contains only an unused helper function _build_constant_env_var_map() — zero tests remain, masking that DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER is not env-var-overridable.
tests/test_litellm/proxy/management_endpoints/test_model_management_endpoints.py	Adds TestTeamModelSiblingRouting class with tests for no-alias-write behavior, sibling router discovery, and global deployment accessibility; updates TestTeamModelUpdate to assert update_team is not called; adds sibling_deployments support to MockPrismaClient.
tests/test_litellm/router_utils/test_health_check_routing.py	New file: comprehensive tests for health-check routing filter (sync/async, disabled, safety-net, stale cache, empty cache) and build_deployment_health_states helper.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request with team_id] --> B[route_llm_request]
    B --> C[map_team_model\npublic_name, team_id]
    C --> D{Team deployments\nexist?}
    D -- Yes returns public_name unchanged --> E[data.model = public_name]
    D -- No --> F[Standard model routing]
    E --> G[llm_router.acompletion\nmodel=public_name]
    G --> H[_common_checks_available_deployment\nmodel=public_name, team_id]
    H --> I{model in\nmodel_names?}
    I -- No --> J[team_model_to_deployment_indices\nO1 lookup via team_id+public_name]
    J --> K[All sibling deployments returned]
    I -- Yes --> L[Standard name lookup]
    K --> M[async_get_healthy_deployments]
    M --> N[Health-check filter]
    N --> O[Cooldown filter]
    O --> P[_get_order_filtered_deployments]
    P --> Q{Deployments available?}
    Q -- Yes --> R[Load balancing: select one]
    Q -- No --> S[Exception raised]
    S --> T[async_function_with_fallbacks\norder-based fallback entries]
    T --> U[Retry with _target_order=next_level]
    U --> G

_{Reviews (4): Last reviewed commit: "Fix codeql" | Re-trigger Greptile}

greptile-apps · 2026-03-27T14:56:48Z

litellm/router.py

+        if len(order_values) > 1 and not _skip_order_fallback:
+            # Determine which order levels have already been tried
+            current_target = kwargs.get("_target_order")
+            skip_up_to = (
+                current_target if current_target is not None else order_values[0]
+            )
+            # Build order-based fallback entries (skip already-tried levels)
+            order_fallback_entries: List = [
+                {"model": original_model_group, "_target_order": o}
+                for o in order_values
+                if o > skip_up_to


_target_order tracking may be unreliable after pop in the same kwargs dict

async_function_with_fallbacks reads current_target = kwargs.get("_target_order") to know which order level is currently being attempted. However, in async_get_healthy_deployments (and its sync equivalent) the value is removed via (request_kwargs or {}).pop("_target_order", None) before the LLM call is made. If request_kwargs is the same dict object as kwargs (which is typical in litellm's call chain), then _target_order is already gone from kwargs by the time the exception is caught and async_function_with_fallbacks runs.

The practical consequence is that current_target is always None, skip_up_to is always order_values[0] (min order), and the order-fallback list always includes every level above the minimum — meaning a failing order=2 fallback would re-enqueue an order=2 entry and potentially rely on max_fallbacks to terminate rather than cleanly progressing to order=3.

The existing tests pass because run_async_fallback iterates the combined fallback list sequentially (so order=3 is still reached), but the progression is less deterministic than intended. Consider preserving _target_order before it is consumed:

# In async_function_with_fallbacks, capture before any pop can happen current_target = kwargs.get("_target_order") or input_kwargs.get("_target_order")

Or pass it separately rather than through the shared kwargs dict.

Background health checks now feed deployment health state into the router candidate-filtering pipeline. Unhealthy deployments are excluded proactively instead of waiting for request failures to trigger cooldown. Gated by `enable_health_check_routing: true` in general_settings. Off by default — zero behavior change for existing users. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- model_id is now added after _clean_endpoint_data() so it survives health_check_details: False (MINIMAL_DISPLAY_PARAMS filtering) - Health state write failures logged at warning instead of debug Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The isort hook picked up a stale rename from the working directory. Both router.py and proxy_server.py need litellm._uuid, not _litellm_uuid. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…iven-routing2 feat(router): health-check-driven routing

Sameerlite · 2026-03-27T16:02:11Z

@greptile-apps the test has been added into another pR, please don't decrease the score

greptile-apps · 2026-03-27T16:02:35Z

Understood. If the test coverage is handled in another PR, that's fine. Can you share the PR number for reference so I can verify the test addresses the scenarios mentioned in this fix?

Sameerlite and others added 27 commits March 27, 2026 20:11

chore(team-routing): remove temporary candidate pool logs

1835e9a

Remove temporary fire-emoji router logs used for local verification while keeping team sibling deployment routing behavior unchanged. Made-with: Cursor

fix(router): address Greptile review comments

7b5e7e0

- Add None guard for original_model_name in _add_team_model_to_db - Remove stale old public name when renaming team model - Add comment clarifying team deployment early-return priority Made-with: Cursor

fix(router): address remaining Greptile P0/P1 issues

248fb8b

- Update map_team_model test to expect public name return - Only remove old public name if no sibling deployments use it Made-with: Cursor

fix(router): address Greptile P1/P2 performance issues

ef9ea1f

- Guard against llm_router=None to prevent silent deletion - Add O(1) team_model index to avoid O(n) scan on every team request Made-with: Cursor

fix(router): prevent cross-team deployment leakage in fallback path

4f302f1

Guard should_include_deployment fallback to only return deployments matching the requested team_id, preventing public-name collisions from leaking deployments across teams Made-with: Cursor

fix(management): query DB directly for sibling deployments on rename

f5b7298

- Add clarifying comments to test assertions - Query prisma DB instead of in-memory router to avoid stale state - Prevents incorrect deletion of old public name when siblings exist Made-with: Cursor

fix(router): guard None model_info and deduplicate team index logic

298df75

- Guard against None model_info in sibling deployment check - Extract _update_team_model_index helper to eliminate duplication Made-with: Cursor

fix(routing): prevent stale model_aliases from interfering with team …

8aa58bd

…routing - Skip model_aliases rewrite if model resolves to team deployments - Add test coverage for sibling-preservation branch - Update MockPrismaClient to support sibling deployment scenarios Made-with: Cursor

perf(routing): optimize team model checks and improve test coverage

e8fb776

- Use O(1) team index lookup instead of map_team_model in alias guard - Fix MockPrismaClient to validate where clause filters - Add comment explaining DB query trade-off for team deployments Made-with: Cursor

fix(routing): address state consistency and type safety issues

8db867c

- Check alias target pattern to detect stale team aliases - Fix PrismaClient type annotation to Optional - Eliminate in-place mutation in index update logic Made-with: Cursor

Fix greptile comments

173695f

Fix greptile comments

303072d

Fix greptile comments

fc6865c

Fix greptile comments

d02a70a

Fix greptile comments

316a742

Fix code qa issues

9a0a216

Fix greptile reviews and mock test

c6cc034

Fix greptile reviews and mock test

fb8d9c2

Fix greptile reviews and mock test

1a0b30a

fix(tests): reset module-level cache in stale alias bypass tests

1fac58a

Reset _ENABLE_TEAM_STALE_ALIAS_BYPASS to None in both test functions to ensure test isolation and prevent ordering-dependent failures Made-with: Cursor

Fix tests

15f5dc3

Merge pull request BerriAI#24611 from Sameerlite/Sameerlite/order-fal…

d3568ef

…lback2 feat(router): order-based fallback across deployment priority levels

github-advanced-security bot found potential problems Mar 27, 2026

View reviewed changes

litellm/proxy/litellm_pre_call_utils.py Fixed Show fixed Hide fixed

vercel bot deployed to Preview March 27, 2026 14:52 View deployment

greptile-apps bot reviewed Mar 27, 2026

View reviewed changes

Sameerlite and others added 5 commits March 27, 2026 20:57

fix: revert accidental _litellm_uuid import back to _uuid

8210fd7

The isort hook picked up a stale rename from the working directory. Both router.py and proxy_server.py need litellm._uuid, not _litellm_uuid. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge pull request #2 from Sameerlite/litellm_litellm_health-check-dr…

e4a1e52

…iven-routing2 feat(router): health-check-driven routing

Fix test

09675ef

vercel bot deployed to Preview March 27, 2026 15:49 View deployment

Fix test

931c88f

vercel bot deployed to Preview March 27, 2026 15:53 View deployment

Fix codeql

c4159a2

vercel bot deployed to Preview March 27, 2026 18:33 View deployment

yuneng-berri approved these changes Mar 27, 2026

View reviewed changes

yuneng-berri merged commit 1b111d2 into BerriAI:main Mar 27, 2026
40 of 41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(team-routing): preserve sibling deployment candidates for team public models#24688

fix(team-routing): preserve sibling deployment candidates for team public models#24688
yuneng-berri merged 34 commits intoBerriAI:mainfrom
Sameerlite:litellm_litellm_team-model-group-name-routing-fix

Sameerlite commented Mar 27, 2026

Uh oh!

vercel bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 27, 2026

Uh oh!

codspeed-hq bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

greptile-apps bot commented Mar 27, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

Sameerlite commented Mar 27, 2026

Uh oh!

greptile-apps bot commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Sameerlite commented Mar 27, 2026

Summary

Uh oh!

vercel bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Mar 27, 2026

Uh oh!

codspeed-hq bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

Uh oh!

greptile-apps bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Sameerlite commented Mar 27, 2026

Uh oh!

greptile-apps bot commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented Mar 27, 2026 •

edited

Loading

codspeed-hq bot commented Mar 27, 2026 •

edited

Loading

greptile-apps bot commented Mar 27, 2026 •

edited

Loading