Closed
Conversation
added 16 commits
October 13, 2025 14:56
Collaborator
Author
|
Completed
Next Steps
|
Collaborator
Author
|
A new PR with generic implementation of router integration will be sent. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intro
We want to make it possible to create a gateway which extends the gateway functionality with additional features (all sgl-router features such as cache aware routing, etc) while keeping all the standard gateway features (such as authentication, rate limits).
For the user, using such gateway should be very simple, e.g. setting router to sglang in gateway configurations. Eg:
The rest for the user should look the same - the
same service endpoint,authenticationandrate limits working,etc.While this first
experimental versionshould only bring minimum features - allow to route replicas traffic through the router (dstack’s gateway/ngnix -> sglang-router -> replica workers), in the future this may be extended with router-specific scaling metrics, such as ttft, e2e, Prefill-Decode Disaggregation, etc).As the first experimental version, the most critical is to come up with the minimum changes that are tested thoroughly that would allow embedding the
router: sglangwithout breaking any existing functionality.Note:
In this version installation of pip & sglang-router is done in gateway machine, irrespective of whether
router:sglangis in gateway config or not. To make it conditional in future, it should be implemented across backends that support gateway.Modified upstream block of
src/dstack/_internal/proxy/gateway/resources/nginx/service.jinja2to respectrouter: sglangin gateway config.src/dstack/_internal/proxy/gateway/resources/nginx/sglang_workers.jinja2This nginx conf forwards HTTP to Unix socket. dstack workers listen on Unix sockets, while the sglang-router speaks HTTP, so this bridge lets the router reach each worker via local TCP ports.
How To Test
Step 1
Replace return value as shown in below example in method
get_dstack_gateway_wheel(exact path see here) .Eg:
Step 2
Apply below gateway config.
Step 3
Update DNS
Step 4
Apply below service config
Step 5
To automate request and test autoscaling, you can use below script:
autoscale_test_sglang.pyStep 6
After updating
tokenandservice endpoint, run above scriptpython autoscale_test_sglang.pyfrom your local machine.Once the automated requests start hitting the service endpoint; dstack submits the job. When the service get's deployed and
/healthcheck from sglang-router responds with 200 as shown below, you will start to see response from the model.As the automated requests continue, first dstack scales up to 3 jobs and later adjusts to 2 jobs. If we stop the requests, dstack scales down to 0 jobs.
Logs:
Step 7
You can also use dstack-frontend `http://localhost:3000/projects/main/models/sglang-service for manual requests.
Note: You can check sglang-router logs: cat ~/dstack/router_logs/sgl-router.
Also, maybe in the future we can show sglang-router's log instead of replica's log in dstack CLI
Eg: