fix(team-routing): preserve sibling deployment candidates for team public models#24688
Conversation
Use a deterministic internal model_name for team-scoped deployments so sibling deployments with the same public model share a routing group. This makes team alias writes idempotent and preserves multi-deployment failover/load balancing behavior. Made-with: Cursor
Remove team model_alias rewrites and resolve team deployments by team_public_model_name with team_id so sibling deployments stay in the routing candidate pool, with explicit logs showing candidate selection before load balancing. Made-with: Cursor
Remove temporary fire-emoji router logs used for local verification while keeping team sibling deployment routing behavior unchanged. Made-with: Cursor
- Add None guard for original_model_name in _add_team_model_to_db - Remove stale old public name when renaming team model - Add comment clarifying team deployment early-return priority Made-with: Cursor
- Update map_team_model test to expect public name return - Only remove old public name if no sibling deployments use it Made-with: Cursor
- Guard against llm_router=None to prevent silent deletion - Add O(1) team_model index to avoid O(n) scan on every team request Made-with: Cursor
Guard should_include_deployment fallback to only return deployments matching the requested team_id, preventing public-name collisions from leaking deployments across teams Made-with: Cursor
- Add clarifying comments to test assertions - Query prisma DB instead of in-memory router to avoid stale state - Prevents incorrect deletion of old public name when siblings exist Made-with: Cursor
- Guard against None model_info in sibling deployment check - Extract _update_team_model_index helper to eliminate duplication Made-with: Cursor
…routing - Skip model_aliases rewrite if model resolves to team deployments - Add test coverage for sibling-preservation branch - Update MockPrismaClient to support sibling deployment scenarios Made-with: Cursor
- Use O(1) team index lookup instead of map_team_model in alias guard - Fix MockPrismaClient to validate where clause filters - Add comment explaining DB query trade-off for team deployments Made-with: Cursor
- Check alias target pattern to detect stale team aliases - Fix PrismaClient type annotation to Optional - Eliminate in-place mutation in index update logic Made-with: Cursor
- Add deduplication guard in _update_team_model_index to prevent duplicate indices - Add wildcard comment in map_team_model for clarity - Add monkeypatch to test_team_alias_stale_bypass_disabled_by_default for determinism - Extract _get_team_deployments helper to centralize DB access pattern - Add clarifying comments for team_public_model_name assignment ordering Made-with: Cursor
- Cache LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS at module level to avoid hot-path secret lookups - Add clarifying comments for should_include_deployment team isolation logic - Add negative assertion for update_team.assert_not_called() in test - Add docstring clarification for _get_team_deployments helper pattern - Add explicit assertion message in test_get_model_list_alias_optimization Made-with: Cursor
- Reorder team_public_model_name assignment to happen before model_name mutation for clarity - Add comment explaining no-rename fast-exit case in _update_existing_team_model_assignment - Add comment explaining final patch_data.model_name = None applies to all code paths Made-with: Cursor
Reset _ENABLE_TEAM_STALE_ALIAS_BYPASS to None in both test functions to ensure test isolation and prevent ordering-dependent failures Made-with: Cursor
…lback2 feat(router): order-based fallback across deployment priority levels
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryThis PR fixes two team routing bugs: (1) sibling team deployments sharing the same public model name being collapsed to a single candidate because Confidence Score: 4/5Safe to merge after addressing the deleted test coverage in test_constants.py. The core team routing fix is well-designed and comprehensively tested with new test classes. The health-check routing and order-based fallback features are clean additions. The score is 4 rather than 5 because test_constants.py had its only test function deleted to avoid exposing that DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER cannot be overridden — a direct test coverage regression that should be resolved before merging. tests/test_litellm/test_constants.py (test deleted, zero tests remain), litellm/utils.py _get_order_filtered_deployments (target_order fallback may re-select failed-order deployments)
|
| Filename | Overview |
|---|---|
| litellm/router.py | Core routing logic: adds team_model_to_deployment_indices for O(1) team-scoped lookups, moves order filtering from _common_checks_available_deployment to async_get_healthy_deployments (with target_order support), adds order-based fallback in async_function_with_fallbacks_common_utils, and introduces health-check-driven routing filters. map_team_model now returns the public name unchanged instead of the internal UUID name. |
| litellm/proxy/management_endpoints/model_management_endpoints.py | Removes model_alias writes from team model create/update paths; _add_team_model_to_db and _setup_new_team_model_assignment no longer call update_team for alias registration. _update_existing_team_model_assignment now queries sibling deployments to decide whether to remove old public name from team models list. Adds _get_team_deployments helper with direct Prisma call. |
| litellm/proxy/litellm_pre_call_utils.py | update_model_if_team_alias_exists now detects stale team aliases (format model_name{team_id}_{uuid}) and optionally bypasses the alias rewrite when sibling team deployments exist; controlled by LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS env var cached at module level. |
| litellm/router_utils/health_state_cache.py | New file: DeploymentHealthCache wraps DualCache for per-deployment health state, with staleness-based filtering and safety-net (never excludes all deployments). Clean, well-tested implementation. |
| litellm/proxy/proxy_server.py | Adds _write_health_state_to_router_cache to push background health check results into the router's DeploymentHealthCache after each health check cycle; wires enable_health_check_routing and health_check_staleness_threshold from general_settings into Router constructor params. |
| litellm/proxy/health_check.py | Adds model_id to health check endpoint results for deployment tracking; adds build_deployment_health_states helper to convert endpoint lists to deployment_id -> health state mapping. |
| litellm/utils.py | _get_order_filtered_deployments gains optional target_order parameter: when set, filters to that exact order level (returns all if no match); default behavior (min order) preserved. |
| tests/test_litellm/test_constants.py | test_all_numeric_constants_can_be_overridden removed without replacement; file now contains only an unused helper function _build_constant_env_var_map() — zero tests remain, masking that DEFAULT_HEALTH_CHECK_STALENESS_MULTIPLIER is not env-var-overridable. |
| tests/test_litellm/proxy/management_endpoints/test_model_management_endpoints.py | Adds TestTeamModelSiblingRouting class with tests for no-alias-write behavior, sibling router discovery, and global deployment accessibility; updates TestTeamModelUpdate to assert update_team is not called; adds sibling_deployments support to MockPrismaClient. |
| tests/test_litellm/router_utils/test_health_check_routing.py | New file: comprehensive tests for health-check routing filter (sync/async, disabled, safety-net, stale cache, empty cache) and build_deployment_health_states helper. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming Request with team_id] --> B[route_llm_request]
B --> C[map_team_model\npublic_name, team_id]
C --> D{Team deployments\nexist?}
D -- Yes returns public_name unchanged --> E[data.model = public_name]
D -- No --> F[Standard model routing]
E --> G[llm_router.acompletion\nmodel=public_name]
G --> H[_common_checks_available_deployment\nmodel=public_name, team_id]
H --> I{model in\nmodel_names?}
I -- No --> J[team_model_to_deployment_indices\nO1 lookup via team_id+public_name]
J --> K[All sibling deployments returned]
I -- Yes --> L[Standard name lookup]
K --> M[async_get_healthy_deployments]
M --> N[Health-check filter]
N --> O[Cooldown filter]
O --> P[_get_order_filtered_deployments]
P --> Q{Deployments available?}
Q -- Yes --> R[Load balancing: select one]
Q -- No --> S[Exception raised]
S --> T[async_function_with_fallbacks\norder-based fallback entries]
T --> U[Retry with _target_order=next_level]
U --> G
Reviews (4): Last reviewed commit: "Fix codeql" | Re-trigger Greptile
| if len(order_values) > 1 and not _skip_order_fallback: | ||
| # Determine which order levels have already been tried | ||
| current_target = kwargs.get("_target_order") | ||
| skip_up_to = ( | ||
| current_target if current_target is not None else order_values[0] | ||
| ) | ||
| # Build order-based fallback entries (skip already-tried levels) | ||
| order_fallback_entries: List = [ | ||
| {"model": original_model_group, "_target_order": o} | ||
| for o in order_values | ||
| if o > skip_up_to |
There was a problem hiding this comment.
_target_order tracking may be unreliable after pop in the same kwargs dict
async_function_with_fallbacks reads current_target = kwargs.get("_target_order") to know which order level is currently being attempted. However, in async_get_healthy_deployments (and its sync equivalent) the value is removed via (request_kwargs or {}).pop("_target_order", None) before the LLM call is made. If request_kwargs is the same dict object as kwargs (which is typical in litellm's call chain), then _target_order is already gone from kwargs by the time the exception is caught and async_function_with_fallbacks runs.
The practical consequence is that current_target is always None, skip_up_to is always order_values[0] (min order), and the order-fallback list always includes every level above the minimum — meaning a failing order=2 fallback would re-enqueue an order=2 entry and potentially rely on max_fallbacks to terminate rather than cleanly progressing to order=3.
The existing tests pass because run_async_fallback iterates the combined fallback list sequentially (so order=3 is still reached), but the progression is less deterministic than intended. Consider preserving _target_order before it is consumed:
# In async_function_with_fallbacks, capture before any pop can happen
current_target = kwargs.get("_target_order") or input_kwargs.get("_target_order")Or pass it separately rather than through the shared kwargs dict.
Background health checks now feed deployment health state into the router candidate-filtering pipeline. Unhealthy deployments are excluded proactively instead of waiting for request failures to trigger cooldown. Gated by `enable_health_check_routing: true` in general_settings. Off by default — zero behavior change for existing users. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- model_id is now added after _clean_endpoint_data() so it survives health_check_details: False (MINIMAL_DISPLAY_PARAMS filtering) - Health state write failures logged at warning instead of debug Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The isort hook picked up a stale rename from the working directory. Both router.py and proxy_server.py need litellm._uuid, not _litellm_uuid. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iven-routing2 feat(router): health-check-driven routing
|
@greptile-apps the test has been added into another pR, please don't decrease the score |
|
Understood. If the test coverage is handled in another PR, that's fine. Can you share the PR number for reference so I can verify the test addresses the scenarios mentioned in this fix? |
Summary
team_public_model_name + team_idso sibling deployments remain in the candidate pool