|
| 1 | +# Crusoe |
| 2 | + |
| 3 | +Crusoe offers two ways to use clusters with fast interconnect: |
| 4 | + |
| 5 | +* [Kubernetes](#kubernetes) – Lets you interact with clusters through the Kubernetes API and includes support for NVIDIA GPU operators and related tools. |
| 6 | +* [Virtual Machines (VMs)](#vms) – Gives you direct access to clusters in the form of virtual machines. |
| 7 | + |
| 8 | +Both options use the same underlying networking infrastructure. This example walks you through how to set up Crusoe clusters to use with `dstack`. |
| 9 | + |
| 10 | +## Kubernetes |
| 11 | + |
| 12 | +!!! info "Prerequsisites" |
| 13 | + 1. Go `Networking` → `Firewall Rules`, click `Create Firewall Rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host. |
| 14 | + 2. Go to `Orchestration` and click `Create Cluster`. Make sure to enable the `NVIDIA GPU Operator` add-on. |
| 15 | + 3. Go the the cluster, and click `Create Node Pool`. Select the right type of the instance. If you intend to auto-scale the cluster, make sure to set `Desired Number of Nodes` at least to `1`, since `dstack` doesn't currently support clusters that scale down to `0` nodes. |
| 16 | + 4. Wait until at least one node is running. |
| 17 | + |
| 18 | +### Configure the backend |
| 19 | + |
| 20 | +Follow the standard instructions for setting up a [Kubernetes](https://dstack.ai/docs/concepts/backends/#kubernetes) backend: |
| 21 | + |
| 22 | +<div editor-title="~/.dstack/server/config.yml"> |
| 23 | + |
| 24 | +```yaml |
| 25 | +projects: |
| 26 | + - name: main |
| 27 | + backends: |
| 28 | + - type: kubernetes |
| 29 | + kubeconfig: |
| 30 | + filename: <kubeconfig path> |
| 31 | + proxy_jump: |
| 32 | + port: 30022 |
| 33 | +``` |
| 34 | +
|
| 35 | +</div> |
| 36 | +
|
| 37 | +### Create a fleet |
| 38 | +
|
| 39 | +Once the Kubernetes cluster and the `dstack` server are running, you can create a fleet: |
| 40 | + |
| 41 | +<div editor-title="crusoe-fleet.dstack.yml"> |
| 42 | + |
| 43 | +```yaml |
| 44 | +type: fleet |
| 45 | +name: crusoe-fleet |
| 46 | +
|
| 47 | +placement: cluster |
| 48 | +nodes: 0.. |
| 49 | +
|
| 50 | +backends: [kubernetes] |
| 51 | +
|
| 52 | +resources: |
| 53 | + # Specify requirements to filter nodes |
| 54 | + gpu: 1..8 |
| 55 | +``` |
| 56 | + |
| 57 | +</div> |
| 58 | + |
| 59 | +Pass the fleet configuration to `dstack apply`: |
| 60 | + |
| 61 | +<div class="termy"> |
| 62 | + |
| 63 | +```shell |
| 64 | +$ dstack apply -f crusoe-fleet.dstack.yml |
| 65 | +``` |
| 66 | + |
| 67 | +</div> |
| 68 | + |
| 69 | +Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services). |
| 70 | + |
| 71 | +## VMs |
| 72 | + |
| 73 | +Another way to work with Crusoe clusters is through VMs. While `dstack` typically supports VM-based compute providers via [dedicated backends](https://dstack.ai/docs/concepts/backends#vm-based) that automate provisioning, Crusoe does not yet have [such a backend](https://github.com/dstackai/dstack/issues/3378). As a result, to use a VM-based Crusoe cluster with `dstack`, you should use [SSH fleets](https://dstack.ai/docs/concepts/fleets). |
| 74 | + |
| 75 | +!!! info "Prerequsisites" |
| 76 | + 1. Go to `Compute`, then `Instances`, and click `Create Instance`. Make sure to select the right instance type and VM image (that [support interconnect](https://docs.crusoecloud.com/networking/infiniband/managing-infiniband-networks/index.html)). Make sure to create as many instances as needed. |
| 77 | + |
| 78 | +### Create a fleet |
| 79 | + |
| 80 | +Follow the standard instructions for setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets/#ssh-fleets): |
| 81 | + |
| 82 | +<div editor-title="crusoe-fleet.dstack.yml"> |
| 83 | + |
| 84 | +```yaml |
| 85 | +type: fleet |
| 86 | +name: crusoe-fleet |
| 87 | +
|
| 88 | +placement: cluster |
| 89 | +
|
| 90 | +# SSH credentials for the on-prem servers |
| 91 | +ssh_config: |
| 92 | + user: ubuntu |
| 93 | + identity_file: ~/.ssh/id_rsa |
| 94 | + hosts: |
| 95 | + - 3.255.177.51 |
| 96 | + - 3.255.177.52 |
| 97 | +``` |
| 98 | + |
| 99 | +</div> |
| 100 | + |
| 101 | +Pass the fleet configuration to `dstack apply`: |
| 102 | + |
| 103 | +<div class="termy"> |
| 104 | + |
| 105 | +```shell |
| 106 | +$ dstack apply -f crusoe-fleet.dstack.yml |
| 107 | +``` |
| 108 | + |
| 109 | +</div> |
| 110 | + |
| 111 | +Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services). |
| 112 | + |
| 113 | +## Run NCCL tests |
| 114 | + |
| 115 | +Use a [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-task) that runs NCCL tests to validate cluster network bandwidth. |
| 116 | + |
| 117 | +=== "Kubernetes" |
| 118 | + |
| 119 | + If you’re running on Crusoe’s Kubernetes, make sure to install HPC-X and provide an up-to-date topology file. |
| 120 | + |
| 121 | + <div editor-title="crusoe-nccl-tests.dstack.yml"> |
| 122 | + |
| 123 | + ```yaml |
| 124 | + type: task |
| 125 | + name: nccl-tests |
| 126 | +
|
| 127 | + nodes: 2 |
| 128 | + startup_order: workers-first |
| 129 | + stop_criteria: master-done |
| 130 | +
|
| 131 | + commands: |
| 132 | + # Install NCCL topology files |
| 133 | + - curl -sSL https://gist.github.com/un-def/48df8eea222fa9547ad4441986eb15af/archive/df51d56285c5396a0e82bb42f4f970e7bb0a9b65.tar.gz -o nccl_topo.tar.gz |
| 134 | + - mkdir -p /etc/crusoe/nccl_topo |
| 135 | + - tar -C /etc/crusoe/nccl_topo -xf nccl_topo.tar.gz --strip-components=1 |
| 136 | + # Install and initialize HPC-X |
| 137 | + - curl -sSL https://content.mellanox.com/hpc/hpc-x/v2.21.3/hpcx-v2.21.3-gcc-doca_ofed-ubuntu22.04-cuda12-x86_64.tbz -o hpcx.tar.bz |
| 138 | + - mkdir -p /opt/hpcx |
| 139 | + - tar -C /opt/hpcx -xf hpcx.tar.bz --strip-components=1 --checkpoint=10000 |
| 140 | + - . /opt/hpcx/hpcx-init.sh |
| 141 | + - hpcx_load |
| 142 | + # Run NCCL Tests |
| 143 | + - | |
| 144 | + if [ $DSTACK_NODE_RANK -eq 0 ]; then |
| 145 | + mpirun \ |
| 146 | + --allow-run-as-root \ |
| 147 | + --hostfile $DSTACK_MPI_HOSTFILE \ |
| 148 | + -n $DSTACK_GPUS_NUM \ |
| 149 | + -N $DSTACK_GPUS_PER_NODE \ |
| 150 | + --bind-to none \ |
| 151 | + -mca btl tcp,self \ |
| 152 | + -mca coll_hcoll_enable 0 \ |
| 153 | + -x PATH \ |
| 154 | + -x LD_LIBRARY_PATH \ |
| 155 | + -x CUDA_DEVICE_ORDER=PCI_BUS_ID \ |
| 156 | + -x NCCL_SOCKET_NTHREADS=4 \ |
| 157 | + -x NCCL_NSOCKS_PERTHREAD=8 \ |
| 158 | + -x NCCL_TOPO_FILE=/etc/crusoe/nccl_topo/a100-80gb-sxm-ib-cloud-hypervisor.xml \ |
| 159 | + -x NCCL_IB_MERGE_VFS=0 \ |
| 160 | + -x NCCL_IB_AR_THRESHOLD=0 \ |
| 161 | + -x NCCL_IB_PCI_RELAXED_ORDERING=1 \ |
| 162 | + -x NCCL_IB_SPLIT_DATA_ON_QPS=0 \ |
| 163 | + -x NCCL_IB_QPS_PER_CONNECTION=2 \ |
| 164 | + -x NCCL_IB_HCA=mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1 \ |
| 165 | + -x UCX_NET_DEVICES=mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1 \ |
| 166 | + /opt/nccl-tests/build/all_reduce_perf -b 8 -e 2G -f 2 -t 1 -g 1 -c 1 -n 100 |
| 167 | + else |
| 168 | + sleep infinity |
| 169 | + fi |
| 170 | +
|
| 171 | + # Required for IB |
| 172 | + privileged: true |
| 173 | +
|
| 174 | + resources: |
| 175 | + gpu: A100:8 |
| 176 | + shm_size: 16GB |
| 177 | + ``` |
| 178 | + |
| 179 | + </div> |
| 180 | + |
| 181 | + > The task above downloads an A100 topology file from a Gist. The most reliable way to obtain the latest topology is to copy it from a Crusoe-provisioned VM (see [VMs](#vms)). |
| 182 | + |
| 183 | + ??? info "Privileged" |
| 184 | + When running on Kubernetes, set `privileged` to `true` to ensure access to InfiniBand. |
| 185 | + |
| 186 | +=== "SSH fleets" |
| 187 | + |
| 188 | +With Crusoe VMs, HPC-X and up-to-date topology files are already available on the hosts. When using SSH fleets, simply mount them via [instance volumes](https://dstack.ai/docs/concepts/volumes#instance-volumes). |
| 189 | + |
| 190 | +```yaml |
| 191 | +type: task |
| 192 | +name: nccl-tests |
| 193 | +
|
| 194 | +nodes: 2 |
| 195 | +startup_order: workers-first |
| 196 | +stop_criteria: master-done |
| 197 | +
|
| 198 | +volumes: |
| 199 | + - /opt/hpcx:/opt/hpcx |
| 200 | + - /etc/crusoe/nccl_topo:/etc/crusoe/nccl_topo |
| 201 | +
|
| 202 | +commands: |
| 203 | + - . /opt/hpcx/hpcx-init.sh |
| 204 | + - hpcx_load |
| 205 | + # Run NCCL Tests |
| 206 | + - | |
| 207 | + if [ $DSTACK_NODE_RANK -eq 0 ]; then |
| 208 | + mpirun \ |
| 209 | + --allow-run-as-root \ |
| 210 | + --hostfile $DSTACK_MPI_HOSTFILE \ |
| 211 | + -n $DSTACK_GPUS_NUM \ |
| 212 | + -N $DSTACK_GPUS_PER_NODE \ |
| 213 | + --bind-to none \ |
| 214 | + -mca btl tcp,self \ |
| 215 | + -mca coll_hcoll_enable 0 \ |
| 216 | + -x PATH \ |
| 217 | + -x LD_LIBRARY_PATH \ |
| 218 | + -x CUDA_DEVICE_ORDER=PCI_BUS_ID \ |
| 219 | + -x NCCL_SOCKET_NTHREADS=4 \ |
| 220 | + -x NCCL_NSOCKS_PERTHREAD=8 \ |
| 221 | + -x NCCL_TOPO_FILE=/etc/crusoe/nccl_topo/a100-80gb-sxm-ib-cloud-hypervisor.xml \ |
| 222 | + -x NCCL_IB_MERGE_VFS=0 \ |
| 223 | + -x NCCL_IB_AR_THRESHOLD=0 \ |
| 224 | + -x NCCL_IB_PCI_RELAXED_ORDERING=1 \ |
| 225 | + -x NCCL_IB_SPLIT_DATA_ON_QPS=0 \ |
| 226 | + -x NCCL_IB_QPS_PER_CONNECTION=2 \ |
| 227 | + -x NCCL_IB_HCA=mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1 \ |
| 228 | + -x UCX_NET_DEVICES=mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1 \ |
| 229 | + /opt/nccl-tests/build/all_reduce_perf -b 8 -e 2G -f 2 -t 1 -g 1 -c 1 -n 100 |
| 230 | + else |
| 231 | + sleep infinity |
| 232 | + fi |
| 233 | +
|
| 234 | +resources: |
| 235 | + gpu: A100:8 |
| 236 | + shm_size: 16GB |
| 237 | +``` |
| 238 | + |
| 239 | +Pass the configuration to `dstack apply`: |
| 240 | + |
| 241 | +<div class="termy"> |
| 242 | + |
| 243 | +```shell |
| 244 | +$ dstack apply -f crusoe-nccl-tests.dstack.yml |
| 245 | +
|
| 246 | +Provisioning... |
| 247 | +---> 100% |
| 248 | +
|
| 249 | +nccl-tests provisioning completed (running) |
| 250 | +
|
| 251 | +# out-of-place in-place |
| 252 | +# size count type redop root time algbw busbw #wrong time algbw busbw #wrong |
| 253 | +# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) |
| 254 | + 8 2 float sum -1 27.70 0.00 0.00 0 29.82 0.00 0.00 0 |
| 255 | + 16 4 float sum -1 28.78 0.00 0.00 0 28.99 0.00 0.00 0 |
| 256 | + 32 8 float sum -1 28.49 0.00 0.00 0 28.16 0.00 0.00 0 |
| 257 | + 64 16 float sum -1 28.41 0.00 0.00 0 28.69 0.00 0.00 0 |
| 258 | + 128 32 float sum -1 28.94 0.00 0.01 0 28.58 0.00 0.01 0 |
| 259 | + 256 64 float sum -1 29.46 0.01 0.02 0 29.45 0.01 0.02 0 |
| 260 | + 512 128 float sum -1 30.23 0.02 0.03 0 29.85 0.02 0.03 0 |
| 261 | + 1024 256 float sum -1 30.79 0.03 0.06 0 34.03 0.03 0.06 0 |
| 262 | + 2048 512 float sum -1 37.90 0.05 0.10 0 33.22 0.06 0.12 0 |
| 263 | + 4096 1024 float sum -1 35.91 0.11 0.21 0 35.30 0.12 0.22 0 |
| 264 | + 8192 2048 float sum -1 36.84 0.22 0.42 0 38.30 0.21 0.40 0 |
| 265 | + 16384 4096 float sum -1 47.08 0.35 0.65 0 37.26 0.44 0.82 0 |
| 266 | + 32768 8192 float sum -1 45.20 0.72 1.36 0 48.70 0.67 1.26 0 |
| 267 | + 65536 16384 float sum -1 49.43 1.33 2.49 0 50.97 1.29 2.41 0 |
| 268 | + 131072 32768 float sum -1 51.08 2.57 4.81 0 50.17 2.61 4.90 0 |
| 269 | + 262144 65536 float sum -1 192.78 1.36 2.55 0 100.00 2.62 4.92 0 |
| 270 | + 524288 131072 float sum -1 68.02 7.71 14.45 0 69.40 7.55 14.16 0 |
| 271 | + 1048576 262144 float sum -1 81.71 12.83 24.06 0 88.58 11.84 22.20 0 |
| 272 | + 2097152 524288 float sum -1 113.03 18.55 34.79 0 102.21 20.52 38.47 0 |
| 273 | + 4194304 1048576 float sum -1 123.50 33.96 63.68 0 131.71 31.84 59.71 0 |
| 274 | + 8388608 2097152 float sum -1 189.42 44.29 83.04 0 183.01 45.84 85.95 0 |
| 275 | + 16777216 4194304 float sum -1 274.05 61.22 114.79 0 265.91 63.09 118.30 0 |
| 276 | + 33554432 8388608 float sum -1 490.77 68.37 128.20 0 490.53 68.40 128.26 0 |
| 277 | + 67108864 16777216 float sum -1 854.62 78.52 147.23 0 853.49 78.63 147.43 0 |
| 278 | + 134217728 33554432 float sum -1 1483.43 90.48 169.65 0 1479.22 90.74 170.13 0 |
| 279 | + 268435456 67108864 float sum -1 2700.36 99.41 186.39 0 2700.49 99.40 186.38 0 |
| 280 | + 536870912 134217728 float sum -1 5300.49 101.29 189.91 0 5314.91 101.01 189.40 0 |
| 281 | + 1073741824 268435456 float sum -1 10472.2 102.53 192.25 0 10485.6 102.40 192.00 0 |
| 282 | + 2147483648 536870912 float sum -1 20749.1 103.50 194.06 0 20745.7 103.51 194.09 0 |
| 283 | +# Out of bounds values : 0 OK |
| 284 | +# Avg bus bandwidth : 53.7387 |
| 285 | +``` |
| 286 | + |
| 287 | +</div> |
| 288 | + |
| 289 | +## What's next |
| 290 | + |
| 291 | +1. Learn about [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), [services](https://dstack.ai/docs/concepts/services) |
| 292 | +2. Read the [Kuberentes](https://dstack.ai/docs/guides/kubernetes), and [Clusters](https://dstack.ai/docs/guides/clusters) guides |
| 293 | +3. Check Crusoe's docs on [networking](https://docs.crusoecloud.com/networking/infiniband/) and [Kubernetes](https://docs.crusoecloud.com/orchestration/cmk/index.html) |
0 commit comments