File tree Expand file tree Collapse file tree
examples/inference/sglang Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -156,9 +156,7 @@ replicas:
156156 --disaggregation-transfer-backend mooncake \
157157 --host 0.0.0.0 \
158158 --port 8000 \
159- --disaggregation-bootstrap-port 8998 \
160- --log-level debug \
161- > worker-server.log 2>&1
159+ --disaggregation-bootstrap-port 8998
162160 resources :
163161 gpu : H200
164162
@@ -173,9 +171,7 @@ replicas:
173171 --disaggregation-mode decode \
174172 --disaggregation-transfer-backend mooncake \
175173 --host 0.0.0.0 \
176- --port 8000 \
177- --log-level debug \
178- > worker-server.log 2>&1
174+ --port 8000
179175 resources :
180176 gpu : H200
181177
@@ -195,8 +191,8 @@ router:
195191
196192## Source code
197193
198- The source-code of this example can be found in
199- [` examples/llms/deepseek/sglang`](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/sglang).
194+ The source-code of these examples can be found in
195+ [` examples/llms/deepseek/sglang`](https://github.com/dstackai/dstack/blob/master/examples/llms/deepseek/sglang) and [`examples/inference/sglang`](https://github.com/dstackai/dstack/blob/master/examples/inference/sglang) .
200196
201197# # What's next?
202198
Original file line number Diff line number Diff line change 1+ type : service
2+ name : prefill-decode-test
3+ https : false
4+ image : lmsysorg/sglang:latest
5+
6+ env :
7+ - HF_TOKEN
8+ - MODEL_ID=zai-org/GLM-4.5-Air-FP8
9+
10+ replicas :
11+ - count : 1..2
12+ scaling :
13+ metric : rps
14+ target : 3
15+ commands :
16+ - echo "Group Prefill" > /tmp/version.txt
17+ - |
18+ python -m sglang.launch_server \
19+ --model-path $MODEL_ID \
20+ --disaggregation-mode prefill \
21+ --disaggregation-transfer-backend mooncake \
22+ --host 0.0.0.0 \
23+ --port 8000 \
24+ --disaggregation-bootstrap-port 8998
25+ resources :
26+ gpu : 1
27+
28+ - count : 1
29+ commands :
30+ - echo "Group Decode" > /tmp/version.txt
31+ - |
32+ python -m sglang.launch_server \
33+ --model-path $MODEL_ID \
34+ --disaggregation-mode decode \
35+ --disaggregation-transfer-backend mooncake \
36+ --host 0.0.0.0 \
37+ --port 8000
38+ resources :
39+ gpu : 1
40+
41+ port : 8000
42+ model : zai-org/GLM-4.5-Air-FP8
43+
44+ probes :
45+ - type : http
46+ url : /health_generate
47+ interval : 15s
48+
49+ router :
50+ type : sglang
51+ policy : round_robin
52+ pd_disaggregation : true
You can’t perform that action at this time.
0 commit comments