You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2id="sglang-router-integration-and-disaggregated-inference-roadmap"><aclass="toclink" href="../sglang-router/">SGLang router integration and disaggregated inference roadmap</a></h2>
4017
+
<p><ahref="https://github.com/dstackai/dstack/">dstack</a> provides a streamlined way to handle GPU provisioning and workload orchestration across GPU clouds, Kubernetes clusters, or on-prem environments. Built for interoperability, dstack bridges diverse hardware and open-source tooling.</p>
<p>As disaggregated, low-latency inference emerges, we aim to ensure this new stack runs natively on <code>dstack</code>. To move this forward, we’re introducing native integration between dstack and <ahref="https://docs.sglang.ai/advanced_features/router.html">SGLang’s Model Gateway</a> (formerly known as the SGLang Router).</p>
<h2id="using-tpus-for-fine-tuning-and-deploying-llms"><aclass="toclink" href="../../../tpu-on-gcp/">Using TPUs for fine-tuning and deploying LLMs</a></h2>
4471
-
<p>If you’re using or planning to use TPUs with Google Cloud, you can now do so via <code>dstack</code>. Just specify the TPU version and the number of cores
4472
-
(separated by a dash), in the <code>gpu</code> property under <code>resources</code>. </p>
4473
-
<p>Read below to find out how to use TPUs with <code>dstack</code> for fine-tuning and deploying
4474
-
LLMs, leveraging open-source tools like Hugging Face’s
0 commit comments