You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/reference/dstack.yml/service.md
+11-1Lines changed: 11 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@ The `service` configuration type allows running [services](../../concepts/servic
63
63
1. Doesn't work if your `chat_template` uses `bos_token`. As a workaround, replace `bos_token` inside `chat_template` with the token content itself.
64
64
2. Doesn't work if `eos_token` is defined in the model repository as a dictionary. As a workaround, set `eos_token` manually, as shown in the example above (see Chat template).
65
65
66
-
If you encounter any other issues, please make sure to file a
66
+
If you encounter any ofther issues, please make sure to file a
Copy file name to clipboardExpand all lines: examples/clusters/crusoe/README.md
+25-23Lines changed: 25 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,25 @@
1
1
---
2
2
title: Crusoe
3
-
description: Setting up Crusoe clusters using Managed Kubernetes or VMs with InfiniBand support
3
+
description: Using Crusoe clusters with InfiniBand support via Kubernetes or VMs
4
4
---
5
5
6
6
# Crusoe
7
7
8
-
Crusoe offers two ways to use clusters with fast interconnect:
8
+
`dstack` allows using Crusoe clusters with fast interconnect via two ways:
9
9
10
-
*[Crusoe Managed Kubernetes](#kubernetes) – Lets you interact with clusters through the Kubernetes API and includes support for NVIDIA and AMD GPU operators and related tools.
11
-
*[Virtual Machines (VMs)](#vms) – Gives you direct access to clusters in the form of virtual machines with NVIDIA and AMD GPUs.
10
+
*[Kubernetes](#kubernetes) – If you create a Kubernetes cluster on Crusoe and configure a `kubernetes` backend and create a backend fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
11
+
*[VMs](#vms) – If you create a VM cluster on Crusoe and create an SSH fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
12
+
13
+
## Kubernetes
12
14
13
-
Both options use the same underlying networking infrastructure. This example walks you through how to set up Crusoe clusters to use with `dstack`.
15
+
### Create a cluster
14
16
15
-
## Crusoe Managed Kubernetes { #kubernetes }
17
+
1. Go `Networking` → `Firewall Rules`, click `Create Firewall Rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
18
+
2. Go to `Orchestration` and click `Create Cluster`. Make sure to enable the `NVIDIA GPU Operator` add-on.
19
+
3. Go the the cluster, and click `Create Node Pool`. Select the right type of the instance, and `Desired Number of Nodes`.
20
+
4. Wait until nodes are provisioned.
16
21
17
-
!!! info "Prerequsisites"
18
-
1. Go `Networking` → `Firewall Rules`, click `Create Firewall Rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
19
-
2. Go to `Orchestration` and click `Create Cluster`. Make sure to enable the `NVIDIA GPU Operator` add-on.
20
-
3. Go the the cluster, and click `Create Node Pool`. Select the right type of the instance. If you intend to auto-scale the cluster, make sure to set `Desired Number of Nodes` at least to `1`, since `dstack` doesn't currently support clusters that scale down to `0` nodes.
21
-
4. Wait until at least one node is running.
22
+
> Even if you enable `autoscaling`, `dstack` can use only the nodes that are already provisioned.
22
23
23
24
### Configure the backend
24
25
@@ -56,7 +57,7 @@ backends: [kubernetes]
56
57
57
58
resources:
58
59
# Specify requirements to filter nodes
59
-
gpu: 1..8
60
+
gpu: 8
60
61
```
61
62
62
63
</div>
@@ -75,12 +76,13 @@ Once the fleet is created, you can run [dev environments](https://dstack.ai/docs
75
76
76
77
## VMs
77
78
78
-
Another way to work with Crusoe clusters is through VMs. While `dstack` typically supports VM-based compute providers via [dedicated backends](https://dstack.ai/docs/concepts/backends#vm-based) that automate provisioning, Crusoe does not yet have [such a backend](https://github.com/dstackai/dstack/issues/3378). As a result, to use a VM-based Crusoe cluster with `dstack`, you should use [SSH fleets](https://dstack.ai/docs/concepts/fleets).
79
+
Another way to work with Crusoe clusters is through VMs. While `dstack` typically supports VM-based compute providers via [dedicated backends](https://dstack.ai/docs/concepts/backends#vm-based) that automate provisioning, Crusoe does not yet have [such a backend](https://github.com/dstackai/dstack/issues/3378). As a result, to use a VM-based Crusoe cluster with `dstack`, you should use [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh-fleets).
79
80
80
-
!!! info "Prerequsisites"
81
-
1. Go to `Compute`, then `Instances`, and click `Create Instance`. Make sure to select the right instance type and VM image (that [support interconnect](https://docs.crusoecloud.com/networking/infiniband/managing-infiniband-networks/index.html)). Make sure to create as many instances as needed.
81
+
### Create instances
82
82
83
-
### Create a fleet
83
+
1. Go to `Compute`, then `Instances`, and click `Create Instance`. Make sure to select the right instance type and VM image (that [support interconnect](https://docs.crusoecloud.com/networking/infiniband/managing-infiniband-networks/index.html)). Make sure to create as many instances as needed.
84
+
85
+
### Create a `dstack` fleet
84
86
85
87
Follow the standard instructions for setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets/#ssh-fleets):
Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services).
117
119
118
-
## Run NCCL tests
120
+
## NCCL tests
119
121
120
-
Use a [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-task) that runs NCCL tests to validate cluster network bandwidth.
122
+
Use a [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-tasks) that runs NCCL tests to validate cluster network bandwidth.
121
123
122
124
=== "Crusoe Managed Kubernetes"
123
125
@@ -253,9 +255,9 @@ Provisioning...
253
255
254
256
nccl-tests provisioning completed (running)
255
257
256
-
# out-of-place in-place
257
-
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
Copy file name to clipboardExpand all lines: examples/clusters/lambda/README.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,18 +5,17 @@ description: Setting up Lambda clusters using Kubernetes or 1-Click Clusters wit
5
5
6
6
# Lambda
7
7
8
-
[Lambda](https://lambda.ai/) offers two ways to use clusters with a fast interconnect:
8
+
`dstack` allows using Lambda clusters with fast interconnect via two ways:
9
9
10
-
*[Kubernetes](#kubernetes) – Lets you interact with clusters through the Kubernetes API and includes support for NVIDIA GPU operators and related tools.
11
-
*[1-Click Clusters (1CC)](#1-click-clusters) – Gives you direct access to clusters in the form of bare-metal nodes.
12
-
13
-
Both options use the same underlying networking infrastructure. This example walks you through how to set up Lambda clusters to use with `dstack`.
10
+
*[Kubernetes](#kubernetes) – If you create a Kubernetes cluster on Lambda and configure a `kubernetes` backend and create a backend fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
11
+
*[VMs](#vms) – If you create a 1CC cluster on Lambda and create an SSH fleet in `dstack`, `dstack` lets you fully use this cluster through `dstack`.
14
12
15
13
## Kubernetes
16
14
17
-
!!! info "Prerequsisites"
18
-
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/managed-kubernetes/#accessing-mk8s) on accessing MK8s.
19
-
2. Go to `Firewall` → `Edit rules`, click `Add rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
15
+
### Prerequsisites
16
+
17
+
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/managed-kubernetes/#accessing-mk8s) on accessing MK8s.
18
+
2. Go to `Firewall` → `Edit rules`, click `Add rule`, and allow ingress traffic on port `30022`. This port will be used by the `dstack` server to access the jump host.
20
19
21
20
### Configure the backend
22
21
@@ -75,8 +74,9 @@ Once the fleet is created, you can run [dev environments](https://dstack.ai/docs
75
74
76
75
Another way to work with Lambda clusters is through [1CC](https://lambda.ai/1-click-clusters). While `dstack` supports automated cluster provisioning via [VM-based backends](https://dstack.ai/docs/concepts/backends#vm-based), there is currently no programmatic way to provision Lambda 1CCs. As a result, to use a 1CC cluster with `dstack`, you must use [SSH fleets](https://dstack.ai/docs/concepts/fleets).
77
76
78
-
!!! info "Prerequsisites"
79
-
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/) on working with 1-Click Clusters
77
+
### Prerequsisites
78
+
79
+
1. Follow the instructions in [Lambda's guide](https://docs.lambda.ai/public-cloud/1-click-clusters/) on working with 1-Click Clusters
0 commit comments