You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/guides/migration/slurm.md
+14-3Lines changed: 14 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,14 @@
1
+
---
2
+
title: Migrate from Slurm
3
+
description: This guide compares Slurm and dstack, and shows how to orchestrate equivalent GPU-based workloads using dstack.
4
+
---
5
+
1
6
# Migrate from Slurm
2
7
3
8
Both Slurm and `dstack` are open-source workload orchestration systems designed to manage compute resources and schedule jobs. This guide compares Slurm and `dstack`, maps features between the two systems, and shows their `dstack` equivalents.
4
9
5
10
!!! tip "Slurm vs dstack"
6
-
Slurm is a battle-tested system with decades of production use in HPC environments. `dstack` is designed for modern ML/AI workloads with cloud-native provisioning and container-first architecture. Slurm is better suited for traditional HPC centers with static clusters; `dstack` is better suited for cloud-native ML teams working with cloud GPUs. Both systems can handle distributed training and batch workloads—the choice depends on your preferences.
11
+
Slurm is a battle-tested system with decades of production use in HPC environments. `dstack` is designed for modern ML/AI workloads with cloud-native provisioning and container-first architecture. Slurm is better suited for traditional HPC centers with static clusters; `dstack` is better suited for cloud-native ML teams working with cloud GPUs. Both systems can handle distributed training and batch workloads.
7
12
8
13
|| Slurm | dstack |
9
14
|---|-------|--------|
@@ -12,7 +17,7 @@ Both Slurm and `dstack` are open-source workload orchestration systems designed
12
17
|**Use cases**| Batch job scheduling and distributed training | Interactive development, distributed training, and production inference services |
13
18
|**Personas**| HPC centers, academic institutions, research labs | ML engineering teams, AI startups, cloud-native organizations |
14
19
15
-
While `dstack` is use-case agnostic and natively supports development and production-grade inference, this guide focuses only on training workloads.
20
+
While `dstack` is designed to be use-case agnostic and supports both development and production-grade inference, this guide focuses specifically on training workloads.
16
21
17
22
## Architecture
18
23
@@ -424,7 +429,7 @@ Both systems support core scheduling features and efficient resource utilization
424
429
425
430
### Slurm
426
431
427
-
Slurm may use a multi-factor priority system, and limit usage across accounts, QOS, users, and single runs.
432
+
Slurm may use a multi-factor priority system, and limit usage across accounts, users, and runs.
428
433
429
434
#### QOS
430
435
@@ -1837,3 +1842,9 @@ fi
1837
1842
### dstack
1838
1843
1839
1844
`dstack`does not support heterogeneous jobs natively. Use separate runs with [workflow orchestration tools (Prefect, Airflow)](#dstack-workflow-orchestration) or submit multiple runs programmatically to coordinate components with different resource requirements.
1845
+
1846
+
## What's next?
1847
+
1848
+
1. Check out [Quickstart](../../quickstart.md)
1849
+
2. Read about [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), and [services](../../concepts/services.md)
0 commit comments