Skip to content

fix(team-routing): preserve sibling deployment candidates for team public models#24688

Merged
yuneng-berri merged 34 commits intoBerriAI:mainfrom
Sameerlite:litellm_litellm_team-model-group-name-routing-fix
Mar 27, 2026
Merged

fix(team-routing): preserve sibling deployment candidates for team public models#24688
yuneng-berri merged 34 commits intoBerriAI:mainfrom
Sameerlite:litellm_litellm_team-model-group-name-routing-fix

Conversation

@Sameerlite
Copy link
Copy Markdown
Collaborator

Summary

  • remove team model_alias writes from team model create/update paths to avoid collapsing team public names to a single internal deployment
  • keep team model routing on public names and add team-aware deployment lookup by team_public_model_name + team_id so sibling deployments remain in the candidate pool
  1. Fixes Scenario 1 where it overwrites model aliases created via new
  2. Fixes Scenario 2, first issue, routing not working

Sameerlite and others added 27 commits March 27, 2026 20:11
Use a deterministic internal model_name for team-scoped deployments so sibling deployments with the same public model share a routing group. This makes team alias writes idempotent and preserves multi-deployment failover/load balancing behavior.

Made-with: Cursor
Remove team model_alias rewrites and resolve team deployments by team_public_model_name with team_id so sibling deployments stay in the routing candidate pool, with explicit logs showing candidate selection before load balancing.

Made-with: Cursor
Remove temporary fire-emoji router logs used for local verification while keeping team sibling deployment routing behavior unchanged.

Made-with: Cursor
- Add None guard for original_model_name in _add_team_model_to_db
- Remove stale old public name when renaming team model
- Add comment clarifying team deployment early-return priority

Made-with: Cursor
- Update map_team_model test to expect public name return
- Only remove old public name if no sibling deployments use it

Made-with: Cursor
- Guard against llm_router=None to prevent silent deletion
- Add O(1) team_model index to avoid O(n) scan on every team request

Made-with: Cursor
Guard should_include_deployment fallback to only return deployments
matching the requested team_id, preventing public-name collisions
from leaking deployments across teams

Made-with: Cursor
- Add clarifying comments to test assertions
- Query prisma DB instead of in-memory router to avoid stale state
- Prevents incorrect deletion of old public name when siblings exist

Made-with: Cursor
- Guard against None model_info in sibling deployment check
- Extract _update_team_model_index helper to eliminate duplication

Made-with: Cursor
…routing

- Skip model_aliases rewrite if model resolves to team deployments
- Add test coverage for sibling-preservation branch
- Update MockPrismaClient to support sibling deployment scenarios

Made-with: Cursor
- Use O(1) team index lookup instead of map_team_model in alias guard
- Fix MockPrismaClient to validate where clause filters
- Add comment explaining DB query trade-off for team deployments

Made-with: Cursor
- Check alias target pattern to detect stale team aliases
- Fix PrismaClient type annotation to Optional
- Eliminate in-place mutation in index update logic

Made-with: Cursor
- Add deduplication guard in _update_team_model_index to prevent duplicate indices
- Add wildcard comment in map_team_model for clarity
- Add monkeypatch to test_team_alias_stale_bypass_disabled_by_default for determinism
- Extract _get_team_deployments helper to centralize DB access pattern
- Add clarifying comments for team_public_model_name assignment ordering

Made-with: Cursor
- Cache LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS at module level to avoid hot-path secret lookups
- Add clarifying comments for should_include_deployment team isolation logic
- Add negative assertion for update_team.assert_not_called() in test
- Add docstring clarification for _get_team_deployments helper pattern
- Add explicit assertion message in test_get_model_list_alias_optimization

Made-with: Cursor
- Reorder team_public_model_name assignment to happen before model_name mutation for clarity
- Add comment explaining no-rename fast-exit case in _update_existing_team_model_assignment
- Add comment explaining final patch_data.model_name = None applies to all code paths

Made-with: Cursor
Reset _ENABLE_TEAM_STALE_ALIAS_BYPASS to None in both test functions
to ensure test isolation and prevent ordering-dependent failures

Made-with: Cursor
…lback2

feat(router): order-based fallback across deployment priority levels
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 27, 2026 6:33pm

Request Review

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Sameerlite
❌ yuneng-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 27, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Sameerlite:litellm_litellm_team-model-group-name-routing-fix (c4159a2) with main (8c2c6a4)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR fixes two team routing bugs: (1) sibling team deployments sharing the same public model name being collapsed to a single candidate because model_aliases was overwritten on each new deployment, and (2) team-scoped routing not working because map_team_model was returning an internal UUID name instead of the public name. The fix removes model_aliases writes from the team model create/update paths entirely, introduces a new team_model_to_deployment_indices index for O(1) team-scoped deployment lookups, and changes map_team_model to return the public name unchanged so all sibling deployments are found via team_id filtering. A backwards-compatible LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS flag handles legacy deployments that still have stale aliases in the database.\n\nAlongside the core fix, the PR adds two independent features: health-check-driven routing (enable_health_check_routing) that proactively excludes unhealthy deployments based on background health check results, and first-class order-based fallback that triggers automatically when a lower-order deployment fails, without requiring enable_pre_call_checks=True.\n\nKey findings:\n\n- tests/test_litellm/test_constants.py: The entire body of the only test in this file (test_all_numeric_constants_can_be_overridden) was removed and replaced with a plain helper function that pytest never executes. The PR adds DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER = 2 as a bare literal that cannot be overridden via environment variables — exactly the case the deleted test was designed to catch. The file now has zero tests.\n- litellm/utils.py _get_order_filtered_deployments: When called with a target_order that has no matching deployments in the already-cooldown-filtered list, the function returns all healthy deployments — which may include lower-order deployments that were the ones that originally failed.\n- litellm/proxy/litellm_pre_call_utils.py: _ENABLE_TEAM_STALE_ALIAS_BYPASS is cached at module level on first call and cannot be refreshed at runtime without a process restart. This is intentional for performance but should be documented explicitly.\n- The upsert_deployment change (also checking model_info equality, not just litellm_params) is a correct fix ensuring deployments with changed team_public_model_name are properly updated on the router.

Confidence Score: 4/5

Safe to merge after addressing the deleted test coverage in test_constants.py.

The core team routing fix is well-designed and comprehensively tested with new test classes. The health-check routing and order-based fallback features are clean additions. The score is 4 rather than 5 because test_constants.py had its only test function deleted to avoid exposing that DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER cannot be overridden — a direct test coverage regression that should be resolved before merging.

tests/test_litellm/test_constants.py (test deleted, zero tests remain), litellm/utils.py _get_order_filtered_deployments (target_order fallback may re-select failed-order deployments)

Important Files Changed

Filename Overview
litellm/router.py Core routing logic: adds team_model_to_deployment_indices for O(1) team-scoped lookups, moves order filtering from _common_checks_available_deployment to async_get_healthy_deployments (with target_order support), adds order-based fallback in async_function_with_fallbacks_common_utils, and introduces health-check-driven routing filters. map_team_model now returns the public name unchanged instead of the internal UUID name.
litellm/proxy/management_endpoints/model_management_endpoints.py Removes model_alias writes from team model create/update paths; _add_team_model_to_db and _setup_new_team_model_assignment no longer call update_team for alias registration. _update_existing_team_model_assignment now queries sibling deployments to decide whether to remove old public name from team models list. Adds _get_team_deployments helper with direct Prisma call.
litellm/proxy/litellm_pre_call_utils.py update_model_if_team_alias_exists now detects stale team aliases (format model_name{team_id}_{uuid}) and optionally bypasses the alias rewrite when sibling team deployments exist; controlled by LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS env var cached at module level.
litellm/router_utils/health_state_cache.py New file: DeploymentHealthCache wraps DualCache for per-deployment health state, with staleness-based filtering and safety-net (never excludes all deployments). Clean, well-tested implementation.
litellm/proxy/proxy_server.py Adds _write_health_state_to_router_cache to push background health check results into the router's DeploymentHealthCache after each health check cycle; wires enable_health_check_routing and health_check_staleness_threshold from general_settings into Router constructor params.
litellm/proxy/health_check.py Adds model_id to health check endpoint results for deployment tracking; adds build_deployment_health_states helper to convert endpoint lists to deployment_id -> health state mapping.
litellm/utils.py _get_order_filtered_deployments gains optional target_order parameter: when set, filters to that exact order level (returns all if no match); default behavior (min order) preserved.
tests/test_litellm/test_constants.py test_all_numeric_constants_can_be_overridden removed without replacement; file now contains only an unused helper function _build_constant_env_var_map() — zero tests remain, masking that DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER is not env-var-overridable.
tests/test_litellm/proxy/management_endpoints/test_model_management_endpoints.py Adds TestTeamModelSiblingRouting class with tests for no-alias-write behavior, sibling router discovery, and global deployment accessibility; updates TestTeamModelUpdate to assert update_team is not called; adds sibling_deployments support to MockPrismaClient.
tests/test_litellm/router_utils/test_health_check_routing.py New file: comprehensive tests for health-check routing filter (sync/async, disabled, safety-net, stale cache, empty cache) and build_deployment_health_states helper.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request with team_id] --> B[route_llm_request]
    B --> C[map_team_model\npublic_name, team_id]
    C --> D{Team deployments\nexist?}
    D -- Yes returns public_name unchanged --> E[data.model = public_name]
    D -- No --> F[Standard model routing]
    E --> G[llm_router.acompletion\nmodel=public_name]
    G --> H[_common_checks_available_deployment\nmodel=public_name, team_id]
    H --> I{model in\nmodel_names?}
    I -- No --> J[team_model_to_deployment_indices\nO1 lookup via team_id+public_name]
    J --> K[All sibling deployments returned]
    I -- Yes --> L[Standard name lookup]
    K --> M[async_get_healthy_deployments]
    M --> N[Health-check filter]
    N --> O[Cooldown filter]
    O --> P[_get_order_filtered_deployments]
    P --> Q{Deployments available?}
    Q -- Yes --> R[Load balancing: select one]
    Q -- No --> S[Exception raised]
    S --> T[async_function_with_fallbacks\norder-based fallback entries]
    T --> U[Retry with _target_order=next_level]
    U --> G
Loading

Reviews (4): Last reviewed commit: "Fix codeql" | Re-trigger Greptile

Comment on lines +5306 to +5316
if len(order_values) > 1 and not _skip_order_fallback:
# Determine which order levels have already been tried
current_target = kwargs.get("_target_order")
skip_up_to = (
current_target if current_target is not None else order_values[0]
)
# Build order-based fallback entries (skip already-tried levels)
order_fallback_entries: List = [
{"model": original_model_group, "_target_order": o}
for o in order_values
if o > skip_up_to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _target_order tracking may be unreliable after pop in the same kwargs dict

async_function_with_fallbacks reads current_target = kwargs.get("_target_order") to know which order level is currently being attempted. However, in async_get_healthy_deployments (and its sync equivalent) the value is removed via (request_kwargs or {}).pop("_target_order", None) before the LLM call is made. If request_kwargs is the same dict object as kwargs (which is typical in litellm's call chain), then _target_order is already gone from kwargs by the time the exception is caught and async_function_with_fallbacks runs.

The practical consequence is that current_target is always None, skip_up_to is always order_values[0] (min order), and the order-fallback list always includes every level above the minimum — meaning a failing order=2 fallback would re-enqueue an order=2 entry and potentially rely on max_fallbacks to terminate rather than cleanly progressing to order=3.

The existing tests pass because run_async_fallback iterates the combined fallback list sequentially (so order=3 is still reached), but the progression is less deterministic than intended. Consider preserving _target_order before it is consumed:

# In async_function_with_fallbacks, capture before any pop can happen
current_target = kwargs.get("_target_order") or input_kwargs.get("_target_order")

Or pass it separately rather than through the shared kwargs dict.

Sameerlite and others added 5 commits March 27, 2026 20:57
Background health checks now feed deployment health state into the
router candidate-filtering pipeline. Unhealthy deployments are excluded
proactively instead of waiting for request failures to trigger cooldown.

Gated by `enable_health_check_routing: true` in general_settings.
Off by default — zero behavior change for existing users.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- model_id is now added after _clean_endpoint_data() so it survives
  health_check_details: False (MINIMAL_DISPLAY_PARAMS filtering)
- Health state write failures logged at warning instead of debug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The isort hook picked up a stale rename from the working directory.
Both router.py and proxy_server.py need litellm._uuid, not _litellm_uuid.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iven-routing2

feat(router): health-check-driven routing
@Sameerlite
Copy link
Copy Markdown
Collaborator Author

@greptile-apps the test has been added into another pR, please don't decrease the score

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Understood. If the test coverage is handled in another PR, that's fine. Can you share the PR number for reference so I can verify the test addresses the scenarios mentioned in this fix?

@yuneng-berri yuneng-berri merged commit 1b111d2 into BerriAI:main Mar 27, 2026
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants