Skip to content

Commit 236d193

Browse files
1. Added Nebius example under Clusters
2. Minor updates (for consistency) to `Lambda` and `Crusoe`
1 parent 0155a28 commit 236d193

File tree

6 files changed

+311
-41
lines changed

6 files changed

+311
-41
lines changed

docs/examples.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,16 @@ hide:
122122
Set up Crusoe clusters with optimized networking
123123
</p>
124124
</a>
125+
<a href="/examples/clusters/nebius"
126+
class="feature-cell sky">
127+
<h3>
128+
Nebius
129+
</h3>
130+
131+
<p>
132+
Set up Nebius clusters with optimized networking
133+
</p>
134+
</a>
125135
<a href="/examples/clusters/nccl-rccl-tests"
126136
class="feature-cell sky">
127137
<h3>

docs/examples/clusters/nebius/index.md

Whitespace-only changes.

examples/clusters/crusoe/README.md

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,25 @@
11
---
22
title: Crusoe
3-
description: Setting up Crusoe clusters using Managed Kubernetes or VMs with InfiniBand support
3+
description: Using Crusoe clusters with InfiniBand support via Kubernetes or VMs
44
---
55

66
# Crusoe
77

8-
Crusoe offers two ways to use clusters with fast interconnect:
8+
`dstack` allows using Crusoe clusters with fast interconnect via two ways:
99

10-
* [Crusoe Managed Kubernetes](#kubernetes) – Lets you interact with clusters through the Kubernetes API and includes support for NVIDIA and AMD GPU operators and related tools.
11-
* [Virtual Machines (VMs)](#vms) – Gives you direct access to clusters in the form of virtual machines with NVIDIA and AMD GPUs.
10+
* [Kubernetes](#kubernetes) – If you create a Kubernetes cluster on Crusoe and configure a `kubernetes` backend and create a backend fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
11+
* [VMs](#vms) – If you create a VM cluster on Crusoe and create an SSH fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
12+
13+
## Kubernetes
1214

13-
Both options use the same underlying networking infrastructure. This example walks you through how to set up Crusoe clusters to use with `dstack`.
15+
### Create a cluster
1416

15-
## Crusoe Managed Kubernetes { #kubernetes }
17+
1. Go `Networking``Firewall Rules`, click `Create Firewall Rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
18+
2. Go to `Orchestration` and click `Create Cluster`. Make sure to enable the `NVIDIA GPU Operator` add-on.
19+
3. Go the the cluster, and click `Create Node Pool`. Select the right type of the instance, and `Desired Number of Nodes`.
20+
4. Wait until nodes are provisioned.
1621

17-
!!! info "Prerequsisites"
18-
1. Go `Networking``Firewall Rules`, click `Create Firewall Rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
19-
2. Go to `Orchestration` and click `Create Cluster`. Make sure to enable the `NVIDIA GPU Operator` add-on.
20-
3. Go the the cluster, and click `Create Node Pool`. Select the right type of the instance. If you intend to auto-scale the cluster, make sure to set `Desired Number of Nodes` at least to `1`, since `dstack` doesn't currently support clusters that scale down to `0` nodes.
21-
4. Wait until at least one node is running.
22+
> Even if you enable `autoscaling`, `dstack` can use only the nodes that are already provisioned.
2223
2324
### Configure the backend
2425

@@ -56,7 +57,7 @@ backends: [kubernetes]
5657
5758
resources:
5859
# Specify requirements to filter nodes
59-
gpu: 1..8
60+
gpu: 8
6061
```
6162

6263
</div>
@@ -75,12 +76,13 @@ Once the fleet is created, you can run [dev environments](https://dstack.ai/docs
7576

7677
## VMs
7778

78-
Another way to work with Crusoe clusters is through VMs. While `dstack` typically supports VM-based compute providers via [dedicated backends](https://dstack.ai/docs/concepts/backends#vm-based) that automate provisioning, Crusoe does not yet have [such a backend](https://github.com/dstackai/dstack/issues/3378). As a result, to use a VM-based Crusoe cluster with `dstack`, you should use [SSH fleets](https://dstack.ai/docs/concepts/fleets).
79+
Another way to work with Crusoe clusters is through VMs. While `dstack` typically supports VM-based compute providers via [dedicated backends](https://dstack.ai/docs/concepts/backends#vm-based) that automate provisioning, Crusoe does not yet have [such a backend](https://github.com/dstackai/dstack/issues/3378). As a result, to use a VM-based Crusoe cluster with `dstack`, you should use [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh-fleets).
7980

80-
!!! info "Prerequsisites"
81-
1. Go to `Compute`, then `Instances`, and click `Create Instance`. Make sure to select the right instance type and VM image (that [support interconnect](https://docs.crusoecloud.com/networking/infiniband/managing-infiniband-networks/index.html)). Make sure to create as many instances as needed.
81+
### Create instances
8282

83-
### Create a fleet
83+
1. Go to `Compute`, then `Instances`, and click `Create Instance`. Make sure to select the right instance type and VM image (that [support interconnect](https://docs.crusoecloud.com/networking/infiniband/managing-infiniband-networks/index.html)). Make sure to create as many instances as needed.
84+
85+
### Create a `dstack` fleet
8486

8587
Follow the standard instructions for setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets/#ssh-fleets):
8688

@@ -104,7 +106,7 @@ ssh_config:
104106
</div>
105107

106108
Pass the fleet configuration to `dstack apply`:
107-
109+
p
108110
<div class="termy">
109111

110112
```shell
@@ -115,9 +117,9 @@ $ dstack apply -f crusoe-fleet.dstack.yml
115117

116118
Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services).
117119

118-
## Run NCCL tests
120+
## NCCL tests
119121

120-
Use a [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-task) that runs NCCL tests to validate cluster network bandwidth.
122+
Use a [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-tasks) that runs NCCL tests to validate cluster network bandwidth.
121123

122124
=== "Crusoe Managed Kubernetes"
123125

@@ -253,9 +255,9 @@ Provisioning...
253255
254256
nccl-tests provisioning completed (running)
255257
256-
# out-of-place in-place
257-
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
258-
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
258+
out-of-place in-place
259+
size count type redop root time algbw busbw #wrong time algbw busbw #wrong
260+
(B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
259261
8 2 float sum -1 27.70 0.00 0.00 0 29.82 0.00 0.00 0
260262
16 4 float sum -1 28.78 0.00 0.00 0 28.99 0.00 0.00 0
261263
32 8 float sum -1 28.49 0.00 0.00 0 28.16 0.00 0.00 0
@@ -285,8 +287,8 @@ nccl-tests provisioning completed (running)
285287
536870912 134217728 float sum -1 5300.49 101.29 189.91 0 5314.91 101.01 189.40 0
286288
1073741824 268435456 float sum -1 10472.2 102.53 192.25 0 10485.6 102.40 192.00 0
287289
2147483648 536870912 float sum -1 20749.1 103.50 194.06 0 20745.7 103.51 194.09 0
288-
# Out of bounds values : 0 OK
289-
# Avg bus bandwidth : 53.7387
290+
Out of bounds values : 0 OK
291+
Avg bus bandwidth : 53.7387
290292
```
291293

292294
</div>

examples/clusters/lambda/README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,17 @@ description: Setting up Lambda clusters using Kubernetes or 1-Click Clusters wit
55

66
# Lambda
77

8-
[Lambda](https://lambda.ai/) offers two ways to use clusters with a fast interconnect:
8+
`dstack` allows using Lambda clusters with fast interconnect via two ways:
99

10-
* [Kubernetes](#kubernetes) – Lets you interact with clusters through the Kubernetes API and includes support for NVIDIA GPU operators and related tools.
11-
* [1-Click Clusters (1CC)](#1-click-clusters) – Gives you direct access to clusters in the form of bare-metal nodes.
12-
13-
Both options use the same underlying networking infrastructure. This example walks you through how to set up Lambda clusters to use with `dstack`.
10+
* [Kubernetes](#kubernetes) – If you create a Kubernetes cluster on Lambda and configure a `kubernetes` backend and create a backend fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
11+
* [VMs](#vms) – If you create a 1CC cluster on Lambda and create an SSH fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
1412

1513
## Kubernetes
1614

17-
!!! info "Prerequsisites"
18-
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/managed-kubernetes/#accessing-mk8s) on accessing MK8s.
19-
2. Go to `Firewall``Edit rules`, click `Add rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
15+
### Prerequsisites
16+
17+
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/managed-kubernetes/#accessing-mk8s) on accessing MK8s.
18+
2. Go to `Firewall``Edit rules`, click `Add rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
2019

2120
### Configure the backend
2221

@@ -75,8 +74,9 @@ Once the fleet is created, you can run [dev environments](https://dstack.ai/docs
7574

7675
Another way to work with Lambda clusters is through [1CC](https://lambda.ai/1-click-clusters). While `dstack` supports automated cluster provisioning via [VM-based backends](https://dstack.ai/docs/concepts/backends#vm-based), there is currently no programmatic way to provision Lambda 1CCs. As a result, to use a 1CC cluster with `dstack`, you must use [SSH fleets](https://dstack.ai/docs/concepts/fleets).
7776

78-
!!! info "Prerequsisites"
79-
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/) on working with 1-Click Clusters
77+
### Prerequsisites
78+
79+
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/) on working with 1-Click Clusters
8080

8181
### Create a fleet
8282

@@ -171,11 +171,11 @@ $ dstack apply -f lambda-nccl-tests.dstack.yml
171171
Provisioning...
172172
---> 100%
173173
174-
# nccl-tests version 2.17.6 nccl-headers=22602 nccl-library=22602
175-
# Collective test starting: all_reduce_perf
176-
#
177-
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
178-
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
174+
nccl-tests version 2.17.6 nccl-headers=22602 nccl-library=22602
175+
Collective test starting: all_reduce_perf
176+
177+
size count type redop root time algbw busbw #wrong time algbw busbw #wrong
178+
(B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
179179
8 2 float sum -1 36.50 0.00 0.00 0 36.16 0.00 0.00 0
180180
16 4 float sum -1 35.55 0.00 0.00 0 35.49 0.00 0.00 0
181181
32 8 float sum -1 35.49 0.00 0.00 0 36.28 0.00 0.00 0
@@ -205,8 +205,8 @@ Provisioning...
205205
536870912 134217728 float sum -1 1625.63 330.25 619.23 0 1687.31 318.18 596.59 0
206206
1073741824 268435456 float sum -1 2972.25 361.26 677.35 0 2971.33 361.37 677.56 0
207207
2147483648 536870912 float sum -1 5784.75 371.23 696.06 0 5728.40 374.88 702.91 0
208-
# Out of bounds values : 0 OK
209-
# Avg bus bandwidth : 137.179
208+
Out of bounds values : 0 OK
209+
Avg bus bandwidth : 137.179
210210
```
211211

212212
</div>

0 commit comments

Comments
 (0)