dstackai · peterschmidt85 · Nov 21, 2025 · Nov 21, 2025 · Nov 21, 2025
diff --git a/docs/docs/concepts/gateways.md b/docs/docs/concepts/gateways.md
@@ -1,10 +1,9 @@
 # Gateways
 
-Gateways manage the ingress traffic of running [services](services.md),
-provide an HTTPS endpoint mapped to your domain, handle auto-scaling and rate limits.
+Gateways manage ingress traffic for running [services](services.md), handle auto-scaling and rate limits, enable HTTPS, and allow you to configure a custom domain. They also support custom routers, such as the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}.
 
-> If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
-> the gateway is already set up for you.
+<!-- > If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
+> the gateway is already set up for you. -->
 
 ## Apply a configuration
 
@@ -57,6 +56,48 @@ You can create gateways with the `aws`, `azure`, `gcp`, or `kubernetes` backends
     Gateways in `kubernetes` backend require an external load balancer. Managed Kubernetes solutions usually include a load balancer.
     For self-hosted Kubernetes, you must provide a load balancer by yourself.
 
+### Router
+
+By default, the gateway uses its own load balancer to route traffic between replicas. However, you can delegate this responsibility to a specific router by setting the `router` property. Currently, the only supported external router is `sglang`.
+
+#### SGLang
+
+The `sglang` router delegates routing logic to the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}.
+
+To enable it, set `type` field under `router` to `sglang`:
+
+<div editor-title="gateway.dstack.yml">
+
+```yaml
+type: gateway
+name: sglang-gateway
+
+backend: aws
+region: eu-west-1
+
+domain: example.com
+
+router:
+  type: sglang
+  policy: cache_aware
+```
+
+</div>
+
+!!! info "Policy"
+
+    The `router` property allows you to configure the routing `policy`:
+
+    * `cache_aware` &mdash; Default policy; combines cache locality with load balancing, falling back to shortest queue. 
+    * `power_of_two` &mdash; Samples two workers and picks the lighter one.                                               
+    * `random` &mdash; Uniform random selection.                                                                    
+    * `round_robin` &mdash; Cycles through workers in order.                                                             
+
+
+> Currently, services using this type of gateway must run standard SGLang workers. See the [example](../../examples/inference/sglang/index.md).
+>
+> Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.
+
 ### Public IP
 
 If you don't need/want a public IP for the gateway, you can set the `public_ip` to `false` (the default value is `true`), making the gateway private.

diff --git a/docs/docs/concepts/services.md b/docs/docs/concepts/services.md
@@ -100,12 +100,13 @@ If [authorization](#authorization) is not disabled, the service endpoint require
     However, you'll need a gateway in the following cases:
 
     * To use auto-scaling or rate limits
+    * To enable a support custom router, e.g. such as the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}
     * To enable HTTPS for the endpoint and map it to your domain
     * If your service requires WebSockets
     * If your service cannot work with a [path prefix](#path-prefix)
 
-    Note, if you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
-    a gateway is already pre-configured for you.
+    <!-- Note, if you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
+    a gateway is already pre-configured for you. -->
 
     If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
     `https://<run name>.<gateway domain>/`.

diff --git a/docs/docs/reference/dstack.yml/gateway.md b/docs/docs/reference/dstack.yml/gateway.md
@@ -10,6 +10,16 @@ The `gateway` configuration type allows creating and updating [gateways](../../c
       type:
         required: true
 
+### `router`
+
+=== "SGLang Model Gateway"
+
+    #SCHEMA# dstack._internal.core.models.routers.SGLangRouterConfig
+        overrides:
+          show_root_heading: false
+          type:
+            required: true
+
 ### `certificate`
 
 === "Let's encrypt"

diff --git a/examples/inference/sglang/README.md b/examples/inference/sglang/README.md
@@ -2,32 +2,21 @@
 
 This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using [SGLang :material-arrow-top-right-thin:{ .external }](https://github.com/sgl-project/sglang){:target="_blank"} and `dstack`.
 
-??? info "Prerequisites"
-    Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
-
-    <div class="termy">
-
-    ```shell
-    $ git clone https://github.com/dstackai/dstack
-    $ cd dstack
-    ```
-
-    </div>
+## Apply a configuration
 
-## Deployment
 Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.
 
-=== "AMD"
+=== "NVIDIA"
 
-    <div editor-title="examples/inference/sglang/amd/.dstack.yml">
+    <div editor-title="examples/inference/sglang/nvidia/.dstack.yml">
 
     ```yaml
     type: service
-    name: deepseek-r1-amd
+    name: deepseek-r1-nvidia
 
-    image: lmsysorg/sglang:v0.4.1.post4-rocm620
+    image: lmsysorg/sglang:latest
     env:
-      - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+      - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 
     commands:
       - python3 -m sglang.launch_server
@@ -36,25 +25,24 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
          --trust-remote-code
 
     port: 8000
-    model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+    model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 
     resources:
-      gpu: MI300x
-      disk: 300GB
+       gpu: 24GB
     ```
     </div>
 
-=== "NVIDIA"
+=== "AMD"
 
-    <div editor-title="examples/inference/sglang/nvidia/.dstack.yml">
+    <div editor-title="examples/inference/sglang/amd/.dstack.yml">
 
     ```yaml
     type: service
-    name: deepseek-r1-nvidia
+    name: deepseek-r1-amd
 
-    image: lmsysorg/sglang:latest
+    image: lmsysorg/sglang:v0.4.1.post4-rocm620
     env:
-      - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+      - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
 
     commands:
       - python3 -m sglang.launch_server
@@ -63,16 +51,14 @@ Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B
          --trust-remote-code
 
     port: 8000
-    model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+    model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
 
     resources:
-       gpu: 24GB
+      gpu: MI300x
+      disk: 300GB
     ```
     </div>
 
-
-### Applying the configuration
-
 To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
 
 <div class="termy">
@@ -118,8 +104,10 @@ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
 ```
 </div>
 
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
-is available at `https://gateway.<gateway domain>/`.
+!!! info "SGLang Model Gateway"
+    If you'd like to use a custom routing policy, e.g. by leveraging the [SGLang Model Gateway :material-arrow-top-right-thin:{ .external }](https://docs.sglang.ai/advanced_features/router.html#){:target="_blank"}, create a gateway with `router` set to `sglang`. Check out [gateways](https://dstack.ai/docs/concepts/gateways#router) for more details.
+
+> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling or HTTPs, rate-limits, etc), the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>/`.
 
 ## Source code
 
@@ -128,5 +116,5 @@ The source-code of this example can be found in
 
 ## What's next?
 
-1. Check [services](https://dstack.ai/docs/services)
+1. Read about [services](https://dstack.ai/docs/concepts/services) and [gateways](https://dstack.ai/docs/concepts/gateways)
 2. Browse the [SgLang DeepSeek Usage](https://docs.sglang.ai/references/deepseek.html), [Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html)
diff --git a/src/dstack/_internal/core/models/routers.py b/src/dstack/_internal/core/models/routers.py
@@ -1,6 +1,9 @@
 from enum import Enum
 from typing import Literal
 
+from pydantic import Field
+from typing_extensions import Annotated
+
 from dstack._internal.core.models.common import CoreModel
 
 
@@ -9,8 +12,13 @@ class RouterType(str, Enum):
 
 
 class SGLangRouterConfig(CoreModel):
-    type: Literal["sglang"] = "sglang"
-    policy: Literal["random", "round_robin", "cache_aware", "power_of_two"] = "cache_aware"
+    type: Annotated[Literal["sglang"], Field(description="The router type")] = "sglang"
+    policy: Annotated[
+        Literal["random", "round_robin", "cache_aware", "power_of_two"],
+        Field(
+            description="The routing policy. Options: `random`, `round_robin`, `cache_aware`, `power_of_two`"
+        ),
+    ] = "cache_aware"
 
 
 AnyRouterConfig = SGLangRouterConfig