Skip to content

Commit f5a32c3

Browse files
Bihan  RanaBihan  Rana
authored andcommitted
Add pd.dstack.yml file
1 parent c2ff413 commit f5a32c3

2 files changed

Lines changed: 56 additions & 8 deletions

File tree

examples/inference/sglang/README.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -156,9 +156,7 @@ replicas:
156156
--disaggregation-transfer-backend mooncake \
157157
--host 0.0.0.0 \
158158
--port 8000 \
159-
--disaggregation-bootstrap-port 8998 \
160-
--log-level debug \
161-
> worker-server.log 2>&1
159+
--disaggregation-bootstrap-port 8998
162160
resources:
163161
gpu: H200
164162

@@ -173,9 +171,7 @@ replicas:
173171
--disaggregation-mode decode \
174172
--disaggregation-transfer-backend mooncake \
175173
--host 0.0.0.0 \
176-
--port 8000 \
177-
--log-level debug \
178-
> worker-server.log 2>&1
174+
--port 8000
179175
resources:
180176
gpu: H200
181177

@@ -195,8 +191,8 @@ router:
195191
196192
## Source code
197193
198-
The source-code of this example can be found in
199-
[`examples/llms/deepseek/sglang`](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/sglang).
194+
The source-code of these examples can be found in
195+
[`examples/llms/deepseek/sglang`](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/sglang) and [`examples/inference/sglang`](https://github.com/dstackai/dstack/blob/master/examples/inference/sglang).
200196

201197
## What's next?
202198

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
type: service
2+
name: prefill-decode-test
3+
https: false
4+
image: lmsysorg/sglang:latest
5+
6+
env:
7+
- HF_TOKEN
8+
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
9+
10+
replicas:
11+
- count: 1..2
12+
scaling:
13+
metric: rps
14+
target: 3
15+
commands:
16+
- echo "Group Prefill" > /tmp/version.txt
17+
- |
18+
python -m sglang.launch_server \
19+
--model-path $MODEL_ID \
20+
--disaggregation-mode prefill \
21+
--disaggregation-transfer-backend mooncake \
22+
--host 0.0.0.0 \
23+
--port 8000 \
24+
--disaggregation-bootstrap-port 8998
25+
resources:
26+
gpu: 1
27+
28+
- count: 1
29+
commands:
30+
- echo "Group Decode" > /tmp/version.txt
31+
- |
32+
python -m sglang.launch_server \
33+
--model-path $MODEL_ID \
34+
--disaggregation-mode decode \
35+
--disaggregation-transfer-backend mooncake \
36+
--host 0.0.0.0 \
37+
--port 8000
38+
resources:
39+
gpu: 1
40+
41+
port: 8000
42+
model: zai-org/GLM-4.5-Air-FP8
43+
44+
probes:
45+
- type: http
46+
url: /health_generate
47+
interval: 15s
48+
49+
router:
50+
type: sglang
51+
policy: round_robin
52+
pd_disaggregation: true

0 commit comments

Comments
 (0)