feat(router): pluggable load-balancing strategies + live dashboard sw… by milk333445 · Pull Request #7 · LLMSystems/vLLMux

milk333445 · 2026-06-20T11:10:44Z

…itch

Make request routing a swappable policy instead of a hard-coded least-load selector. Adds a strategy registry with 7 strategies — least_load (default, unchanged behaviour), round_robin, random, least_inflight, p2c, and session_affinity / prefix_affinity (sticky routing with a load escape valve for multi-turn chat & shared-prompt cache reuse).

Router:

routing_strategies.py: SelectContext + registry + score_instance (extracted from the old selector, so least_load is byte-for-byte identical); failover, in-flight accounting, and per-backend cooldown stay in the proxy and apply to every strategy.
backend_selector.select_instance_least_load kept as a thin wrapper (compat).
Strategy chosen per-group (model_config.routing_strategy) > global env (LLMOPS_ROUTING_STRATEGY) > default; session/prefix keys extracted in the proxy.
GET/POST /routing to read + hot-swap the global strategy (no reload), exposed via nginx; dashboard Traffic page gets a selector + a per-strategy help card.

Frontend:

routing strategy is a first-class field in the add/edit dialog and the detail drawer, kept out of the raw vLLM-param list (shared routingStrategies.ts).

Fixes:

launchers: never pass router-only keys (routing_strategy) to vllm serve, which errored on the unknown argument.
AddModelDialog: a null lora_modules from the config no longer leaks into the param list and gets re-submitted as "" (failed list validation).

Tests: router unit suite (strategies + /routing endpoint) and a launcher regression; frontend type-check clean.

…itch Make request routing a swappable policy instead of a hard-coded least-load selector. Adds a strategy registry with 7 strategies — least_load (default, unchanged behaviour), round_robin, random, least_inflight, p2c, and session_affinity / prefix_affinity (sticky routing with a load escape valve for multi-turn chat & shared-prompt cache reuse). Router: - routing_strategies.py: SelectContext + registry + score_instance (extracted from the old selector, so least_load is byte-for-byte identical); failover, in-flight accounting, and per-backend cooldown stay in the proxy and apply to every strategy. - backend_selector.select_instance_least_load kept as a thin wrapper (compat). - Strategy chosen per-group (model_config.routing_strategy) > global env (LLMOPS_ROUTING_STRATEGY) > default; session/prefix keys extracted in the proxy. - GET/POST /routing to read + hot-swap the global strategy (no reload), exposed via nginx; dashboard Traffic page gets a selector + a per-strategy help card. Frontend: - routing strategy is a first-class field in the add/edit dialog and the detail drawer, kept out of the raw vLLM-param list (shared routingStrategies.ts). Fixes: - launchers: never pass router-only keys (routing_strategy) to `vllm serve`, which errored on the unknown argument. - AddModelDialog: a null lora_modules from the config no longer leaks into the param list and gets re-submitted as "" (failed list validation). Tests: router unit suite (strategies + /routing endpoint) and a launcher regression; frontend type-check clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

milk333445 merged commit 72db782 into main Jun 20, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(router): pluggable load-balancing strategies + live dashboard sw…#7

feat(router): pluggable load-balancing strategies + live dashboard sw…#7
milk333445 merged 1 commit into
mainfrom
feat/routing-strategies

milk333445 commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

milk333445 commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant