Skip to content

feat(router): pluggable load-balancing strategies + live dashboard sw…#7

Merged
milk333445 merged 1 commit into
mainfrom
feat/routing-strategies
Jun 20, 2026
Merged

feat(router): pluggable load-balancing strategies + live dashboard sw…#7
milk333445 merged 1 commit into
mainfrom
feat/routing-strategies

Conversation

@milk333445

Copy link
Copy Markdown
Contributor

…itch

Make request routing a swappable policy instead of a hard-coded least-load selector. Adds a strategy registry with 7 strategies — least_load (default, unchanged behaviour), round_robin, random, least_inflight, p2c, and session_affinity / prefix_affinity (sticky routing with a load escape valve for multi-turn chat & shared-prompt cache reuse).

Router:

  • routing_strategies.py: SelectContext + registry + score_instance (extracted from the old selector, so least_load is byte-for-byte identical); failover, in-flight accounting, and per-backend cooldown stay in the proxy and apply to every strategy.
  • backend_selector.select_instance_least_load kept as a thin wrapper (compat).
  • Strategy chosen per-group (model_config.routing_strategy) > global env (LLMOPS_ROUTING_STRATEGY) > default; session/prefix keys extracted in the proxy.
  • GET/POST /routing to read + hot-swap the global strategy (no reload), exposed via nginx; dashboard Traffic page gets a selector + a per-strategy help card.

Frontend:

  • routing strategy is a first-class field in the add/edit dialog and the detail drawer, kept out of the raw vLLM-param list (shared routingStrategies.ts).

Fixes:

  • launchers: never pass router-only keys (routing_strategy) to vllm serve, which errored on the unknown argument.
  • AddModelDialog: a null lora_modules from the config no longer leaks into the param list and gets re-submitted as "" (failed list validation).

Tests: router unit suite (strategies + /routing endpoint) and a launcher regression; frontend type-check clean.

…itch

Make request routing a swappable policy instead of a hard-coded least-load
selector. Adds a strategy registry with 7 strategies — least_load (default,
unchanged behaviour), round_robin, random, least_inflight, p2c, and
session_affinity / prefix_affinity (sticky routing with a load escape valve for
multi-turn chat & shared-prompt cache reuse).

Router:
- routing_strategies.py: SelectContext + registry + score_instance (extracted
  from the old selector, so least_load is byte-for-byte identical); failover,
  in-flight accounting, and per-backend cooldown stay in the proxy and apply to
  every strategy.
- backend_selector.select_instance_least_load kept as a thin wrapper (compat).
- Strategy chosen per-group (model_config.routing_strategy) > global env
  (LLMOPS_ROUTING_STRATEGY) > default; session/prefix keys extracted in the proxy.
- GET/POST /routing to read + hot-swap the global strategy (no reload), exposed
  via nginx; dashboard Traffic page gets a selector + a per-strategy help card.

Frontend:
- routing strategy is a first-class field in the add/edit dialog and the detail
  drawer, kept out of the raw vLLM-param list (shared routingStrategies.ts).

Fixes:
- launchers: never pass router-only keys (routing_strategy) to `vllm serve`,
  which errored on the unknown argument.
- AddModelDialog: a null lora_modules from the config no longer leaks into the
  param list and gets re-submitted as "" (failed list validation).

Tests: router unit suite (strategies + /routing endpoint) and a launcher
regression; frontend type-check clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@milk333445 milk333445 merged commit 72db782 into main Jun 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant