Add sglang router minimal/experimental support#3194
Closed
Bihan wants to merge 12 commits intodstackai:masterfrom
Closed
Add sglang router minimal/experimental support#3194Bihan wants to merge 12 commits intodstackai:masterfrom
Bihan wants to merge 12 commits intodstackai:masterfrom
Conversation
added 11 commits
October 13, 2025 14:56
peterschmidt85
requested changes
Oct 16, 2025
Contributor
There was a problem hiding this comment.
- Add an example of how to test the new router
- Please ensure auto-scaling works (incl. downscaling to 0), and also that dstack uses routers' API to add/remove workers without restarting the gateway
- And only after that, refactor the code to move the sgl-router implementation to a separate sg-lang-related subclass - to ensure the normal gateway code doesn't have any sgl-router specific code - similar to how each backend encapsulates its own logic
- Ensure tests are working
Collaborator
Author
Step 1 Eg: Step 2 Apply below gateway config. Step 3 Step 4 Apply below service config Step 5 Logs: Step 6 You you can query from terminal Note: You can check sglang-router logs: cat ~/dstack/router_logs/sgl-router. Also, maybe in the future we can show sglang-router's log instead of replica's log in dstack CLI Eg: |
Collaborator
Author
|
Will send a new PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intro
We want to make it possible to create a gateway which extends the gateway functionality with additional features (all sgl-router features such as cache aware routing, etc) while keeping all the standard gateway features (such as authentication, rate limits).
For the user, using such gateway should be very simple, e.g. setting router to sglang in gateway configurations. Eg:
The rest for the user should look the same - the
same service endpoint,authenticationandrate limits working,etc.While this first
experimental versionshould only bring minimum features - allow to route replicas traffic through the router (dstack’s gateway/ngnix -> sglang-router -> replica workers), in the future this may be extended with router-specific scaling metrics, such as ttft, e2e, Prefill-Decode Disaggregation, etc).As the first experimental version, the most critical is to come up with the minimum changes that are tested thoroughly that would allow embedding the
router: sglangwithout breaking any existing functionality.Note:
In this version installation of pip & sglang-router is done in gateway machine, irrespective of whether
router:sglangis in gateway config or not. To make it conditional in future, it should be implemented across backends that support gateway.Modified upstream block of
src/dstack/_internal/proxy/gateway/resources/nginx/service.jinja2to respectrouter: sglangin gateway config.src/dstack/_internal/proxy/gateway/resources/nginx/sglang_workers.jinja2This nginx conf forwards HTTP to Unix socket. dstack workers listen on Unix sockets, while the sglang-router speaks HTTP, so this bridge lets the router reach each worker via local TCP ports.