Skip to content

Commit 4d13d71

Browse files
[Docs] GPU blocks and proxy jump blog post (WIP) (#2307)
1 parent ee79b68 commit 4d13d71

1 file changed

Lines changed: 183 additions & 0 deletions

File tree

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
---
2+
title: Orchestrating GPUs in data centers and private clouds
3+
date: 2025-02-18
4+
description: "TBA"
5+
slug: data-centers-and-private-clouds
6+
image: https://github.com/dstackai/static-assets/blob/main/static-assets/images/data-centers-and-private-clouds.png?raw=true
7+
categories:
8+
- Fleets
9+
- Data centers
10+
- Private clouds
11+
---
12+
13+
# Orchestrating GPUs in data centers and private clouds
14+
15+
Recent breakthroughs in open-source AI have made AI infrastructure accessible beyond public clouds, driving demand for
16+
running AI workloads in on-premises data centers and private clouds.
17+
This shift offers organizations both high-performant clusters and flexibility and control.
18+
19+
However, Kubernetes, while a popular choice for traditional deployments, is often too complex and low-level to address
20+
the needs of AI teams.
21+
22+
Originally, `dstack` was focused on public clouds. With the new release, `dstack`
23+
extends support to data centers and private clouds, offering a simpler, AI-native solution that replaces Kubernetes and
24+
Slurm.
25+
26+
<img src="https://github.com/dstackai/static-assets/blob/main/static-assets/images/data-centers-and-private-clouds.png?raw=true" width="630"/>
27+
28+
<!-- more -->
29+
30+
Private clouds offer the scalability and performance needed for large GPU clusters, while on-premises data centers
31+
provide stronger security and privacy controls.
32+
33+
In both cases, the focus isn’t just on seamless orchestration but also on maximizing infrastructure efficiency. This has
34+
long been a strength of Kubernetes, which enables concurrent workload execution across provisioned nodes to minimize
35+
resource waste.
36+
37+
### GPU blocks
38+
39+
The newest version of `dstack` introduces a feature called GPU blocks, bringing this level of efficiency to `dstack`. It
40+
enables optimal hardware utilization by allowing concurrent workloads to run on the same hosts, using slices of the
41+
available resources on each host.
42+
43+
> For example, imagine you’ve reserved a cluster with multiple bare-metal nodes, each equipped with 8x MI300X GPUs from
44+
[Hot Aisle :material-arrow-top-right-thin:{ .external }](https://hotaisle.xyz/){:target="_blank"}.
45+
46+
With `dstack`, you can define your fleet configuration like this:
47+
48+
<div editor-title="my-hotaisle-fleet.dstack.yml">
49+
50+
```yaml
51+
type: fleet
52+
name: my-hotaisle-fleet
53+
54+
ssh_config:
55+
user: ubuntu
56+
identity_file: ~/.ssh/hotaisle_id_rsa
57+
hosts:
58+
- hostname: ssh.hotaisle.cloud
59+
port: 22013
60+
blocks: auto
61+
- hostname: ssh.hotaisle.cloud
62+
port: 22014
63+
blocks: auto
64+
65+
placement: cluster
66+
```
67+
68+
</div>
69+
70+
When you run `dstack apply`, each host appears as an available fleet instance, showing `0/8` next to `busy`. By setting `blocks`
71+
to `auto`, you automatically slice each host into 8 GPU blocks.
72+
73+
<div class="termy">
74+
75+
```shell
76+
$ dstack apply -f my-hotaisle-fleet.dstack.yml
77+
78+
Provisioning...
79+
---> 100%
80+
81+
FLEET INSTANCE RESOURCES STATUS CREATED
82+
my-hotaisle-fleet 0 8xMI300X (192GB) 0/8 busy 3 mins ago
83+
1 8xMI300X (192GB) 0/8 busy 3 mins ago
84+
```
85+
86+
</div>
87+
88+
For instance, you can run two workloads, each using 4 GPUs, and `dstack` will execute them concurrently on a single instance.
89+
90+
As the fleet owner, you can set the `blocks` parameter to any number. If you set it to `2`, `dstack` will slice each
91+
host into 2 blocks, each with 4 GPUs. This flexibility allows you to define the minimum block size, ensuring the most
92+
efficient utilization of your resources.
93+
94+
!!! info "Fractional GPU"
95+
While we plan to eventually support fractions of a single GPU too, this is not the primary use case, as most modern AI
96+
teams require full GPUs for their workloads.
97+
98+
Regardless whether you're using dstack with a data center or a private cloud, once a fleet is created,
99+
you’re free to run [dev environments](../../docs/concepts/dev-environments.md),
100+
[tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md) while maximizing the
101+
cost-efficiency of GPU utilization by concurrent runs.
102+
103+
## Proxy jump
104+
105+
Private clouds typically provide access to GPU clusters via SSH through a login node. In these setups, only the login
106+
node is internet-accessible, while cluster nodes can only be reached via SSH from the login node. This prevents creating
107+
an SSH fleet by directly listing the cluster nodes' hostnames.
108+
109+
The latest `dstack` release introduces the `proxy_jump` property in SSH fleet configurations, enabling creating fleets
110+
through a login node.
111+
112+
> For example, imagine you’ve reserved a 1-Click Cluster from
113+
> [Lambda :material-arrow-top-right-thin:{ .external }](https://lambdalabs.com/){:target="_blank"} with multiple nodes, each equipped with 8x H100 GPUs from.
114+
115+
With `dstack`, you can define your fleet configuration like this:
116+
117+
<div editor-title="my-lambda-fleet.dstack.yml">
118+
119+
```yaml
120+
type: fleet
121+
name: my-lambda-fleet
122+
123+
ssh_config:
124+
user: ubuntu
125+
identity_file: ~/.ssh/lambda_node_id_rsa
126+
hosts:
127+
- us-east-2-1cc-node-1
128+
- us-east-2-1cc-node-2
129+
- us-east-2-1cc-node-3
130+
- us-east-2-1cc-node-4
131+
proxy_jump:
132+
hostname: 12.34.567.890
133+
user: ubuntu
134+
identity_file: ~/.ssh/lambda_head_id_rsa
135+
136+
placement: cluster
137+
```
138+
139+
</div>
140+
141+
When you run `dstack apply`, `dstack` creates an SSH fleet and connects to the configured hosts through the login node
142+
specified via `proxy_jump`. Fleet instances appear as normal instances, enabling you to run
143+
[dev environments](../../docs/concepts/dev-environments.md),
144+
[tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md)
145+
just as you would without `proxy_jump`.
146+
147+
<div class="termy">
148+
149+
```shell
150+
$ dstack apply -f my-lambda-fleet.dstack.yml
151+
152+
Provisioning...
153+
---> 100%
154+
155+
FLEET INSTANCE RESOURCES STATUS CREATED
156+
my-lambda-fleet 0 8xH100 (80GB) idle 3 mins ago
157+
1 8xH100 (80GB) idle 3 mins ago
158+
2 8xH100 (80GB) idle 3 mins ago
159+
3 8xH100 (80GB) idle 3 mins ago
160+
```
161+
162+
</div>
163+
164+
The `dstack` CLI automatically handles SSH tunneling and port forwarding when running workloads.
165+
166+
## What's next
167+
168+
To sum it up, the latest release enables `dstack` to be used efficiently not only with public clouds but also with private
169+
clouds and data centers. It natively supports NVIDIA, AMD, Intel Gaudi, and soon other upcoming chips.
170+
171+
What’s also important is that `dstack` comes with a control plane that not only simplifies orchestration but also provides
172+
a console for monitoring and managing workloads across projects (also known as tenants).
173+
174+
As a container orchestrator, `dstack` remains a streamlined alternative to Kubernetes and Slurm for AI teams, focusing on
175+
an AI-native experience, simplicity, and vendor-agnostic orchestration for both cloud and on-prem.
176+
177+
!!! info "Roadmap"
178+
We plan to further enhance `dstack`'s support for both cloud and on-premises setups. For more details on our roadmap,
179+
refer to our [GitHub :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues/2184){:target="_blank"}.
180+
181+
> Have questions? You're welcome to join
182+
> our [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"} or talk
183+
> directly to [our team :material-arrow-top-right-thin:{ .external }](https://calendly.com/dstackai/discovery-call){:target="_blank"}.

0 commit comments

Comments
 (0)