diff --git a/docs/blog/posts/sglang-router.md b/docs/blog/posts/sglang-router.md
new file mode 100644
index 0000000000..b6a2bef836
--- /dev/null
+++ b/docs/blog/posts/sglang-router.md
@@ -0,0 +1,172 @@
+---
+title: "SGLang router integration and disaggregated inference roadmap"
+date: 2025-11-25
+description: "TBA"
+slug: sglang-router
+image: https://dstack.ai/static-assets/static-assets/images/dstack-sglang-router.png
+categories:
+ - Changelog
+---
+
+# SGLang router integration and disaggregated inference roadmap
+
+[dstack](https://github.com/dstackai/dstack/) provides a streamlined way to handle GPU provisioning and workload orchestration across GPU clouds, Kubernetes clusters, or on-prem environments. Built for interoperability, dstack bridges diverse hardware and open-source tooling.
+
+
+
+As disaggregated, low-latency inference emerges, we aim to ensure this new stack runs natively on `dstack`. To move this forward, we’re introducing native integration between dstack and [SGLang’s Model Gateway](https://docs.sglang.ai/advanced_features/router.html) (formerly known as the SGLang Router).
+
+
+
+Although `dstack` can run on Kubernetes, it differs by offering higher-level abstractions that cover the core AI use cases: [dev environments](../../docs/concepts/dev-environments.md) for development, [tasks](../../docs/concepts/tasks.md) for training, and [services](../../docs/concepts/services.md) for inference.
+
+## Services
+
+Here’s an example of a service:
+
+=== "NVIDIA"
+
+
+
+ ```yaml
+ type: service
+ name: qwen
+
+ image: lmsysorg/sglang:latest
+ env:
+ - HF_TOKEN
+ - MODEL_ID=qwen/qwen2.5-0.5b-instruct
+ commands:
+ - |
+ python3 -m sglang.launch_server \
+ --model-path $MODEL_ID \
+ --port 8000 \
+ --trust-remote-code
+ port: 8000
+ model: qwen/qwen2.5-0.5b-instruct
+
+ resources:
+ gpu: 8GB..24GB:1
+ ```
+
+
+
+=== "AMD"
+
+
+ ```yaml
+ type: service
+ name: qwen
+
+ image: lmsysorg/sglang:v0.5.5.post3-rocm700-mi30x
+ env:
+ - HF_TOKEN
+ - MODEL_ID=qwen/qwen2.5-0.5b-instruct
+ commands:
+ - |
+ python3 -m sglang.launch_server \
+ --model-path $MODEL_ID \
+ --port 8000 \
+ --trust-remote-code
+ port: 8000
+ model: qwen/qwen2.5-0.5b-instruct
+
+ resources:
+ gpu: MI300X:1
+ ```
+
+
+
+This service can be deployed via the following command:
+
+
+
+```shell
+$ HF_TOKEN=...
+$ dstack apply -f qwen.dstack.yml
+```
+
+
+
+This deploys the service as an OpenAI-compatible endpoint and manages provisioning and replicas automatically.
+
+## Gateways
+
+If you'd like to enable auto-scaling, HTTPS, or use a custom domain, create a gateway:
+
+
+
+ ```yaml
+ type: gateway
+ name: my-gateway
+
+ backend: aws
+ region: eu-west-1
+
+ # Specify your custom domain
+ domain: example.com
+ ```
+
+
+
+This gateway can be created via the following command:
+
+
+
+```shell
+$ dstack apply -f gateway.dstack.yml
+```
+
+
+
+Once the gateway has a hostname, update your domain’s DNS settings by adding a record for `*.`.
+
+After that, if you configure [replicas and scaling](../../docs/concepts/services.md#replicas-and-scaling), the gateway will automatically scale the number of replicas and route traffic across them.
+
+### Router
+
+By default, the gateway uses its built-in load balancer to route traffic across replicas. With the latest release, you can instead delegate traffic routing to the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html) by setting the `router` property to `sglang`:
+
+
+
+ ```yaml
+ type: gateway
+ name: my-gateway
+
+ backend: aws
+ region: eu-west-1
+
+ # Specify your custom domain
+ domain: example.com
+
+ router:
+ type: sglang
+ policy: cache_aware
+ ```
+
+
+
+The `policy` property allows you to configure the routing policy:
+
+* `cache_aware` — Default policy; combines cache locality with load balancing, falling back to shortest queue.
+* `power_of_two` — Samples two workers and picks the lighter one.
+* `random` — Uniform random selection.
+* `round_robin` — Cycles through workers in order.
+
+With this integration, K/V cache reuse across replicas becomes possible — a key step toward low-latency inference. It also sets the path for full disaggregated inference and native auto-scaling. And fundamentally, it reflects our commitment to collaborating with the open-source ecosystem instead of reinventing its core components.
+
+## Limitations and roadmap
+
+Looking ahead, this integration also shapes our roadmap. Over the coming releases, we plan to expand support in several key areas:
+
+* Enabling prefill and decode worker separation for full disaggregation (today, only standard workers are supported).
+* Introducing auto-scaling based on TTFT (Time to First Token) and ITL (Inter-Token Latency), complementing the current requests-per-second scaling metric.
+* Extending native support to more emerging inference stacks.
+
+## What's next?
+
+1. Check [dev environments](../../docs/concepts/dev-environments.md),
+ [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md),
+ and [gateways](../../docs/concepts/gateways.md)
+2. Follow [Quickstart](../../docs/quickstart.md)
+3. Join [Discord](https://discord.gg/u8SmfwPpMd)
diff --git a/docs/docs/concepts/gateways.md b/docs/docs/concepts/gateways.md
index eb433f7d36..728077addb 100644
--- a/docs/docs/concepts/gateways.md
+++ b/docs/docs/concepts/gateways.md
@@ -85,8 +85,7 @@ router:
!!! info "Policy"
-
- The `router` property allows you to configure the routing `policy`:
+ The `policy` property allows you to configure the routing policy:
* `cache_aware` — Default policy; combines cache locality with load balancing, falling back to shortest queue.
* `power_of_two` — Samples two workers and picks the lighter one.