0.18.38
Intel Gaudi
dstack now supports Intel Gaudi accelerators with SSH fleets.
To use Intel Gaudi with dstack, create an SSH fleet, and once it's up, feel free to specify gaudi, gaudi2, or gaudi3 as a GPU name (or intel as a vendor name) in your run configuration:
type: dev-environment
python: "3.12"
ide: vscode
resources:
gpu: gaudi2:8 # 8 × Gaudi 2 Note
To use SSH fleets with Intel Gaudi, ensure that the Gaudi software and drivers are installed on each host. This should include the drivers, hl-smi, and Habana Container Runtime.
Volumes
Stop duration and force detachment
In some cases, a volume may get stuck in the detaching state. When this happens, the run is marked as stopped, but the instance remains in an inconsistent state, preventing its deletion or reuse. Additionally, the volume cannot be used with other runs.
To address this, dstack now ensures that the run remains in the terminating state until the volume is fully detached. By default, dstack waits for 5m before forcing a detach. You can override this using stop_duration by setting a different duration or disabling it (off) for an unlimited duration.
Note
Force detaching a volume may corrupt the file system and should only be used as a last resort. If volumes frequently require force detachment, contact your cloud provider’s support to identify the root cause.
Bug-fixes
This update also resolves an issue where dstack mistakenly marked a volume as attached even though it was actually detached.
UI
Fleets
The UI has been updated to simplify fleet and instance management. The Fleets page now allows users to terminate fleets and displays both active and terminated fleets. The new Instances page shows active and terminated instances across all fleets.
What's changed
- Add Intel Gaudi support for SSH fleets by @un-def in #2216
- Support models with non-standard
finish_reasonby @jvstme in #2229 - [Internal]: Ensure all files end with a newline by @jvstme in #2227
- [chore]: Refactor gateway modules by @jvstme in #2226
- [chore]: Move connection pool to proxy deps by @jvstme in #2235
- [chore]: Update migration
ffa99edd1988by @jvstme in #2217 - [chore]: Update/remove dstack-proxy TODOs by @jvstme in #2239
- [UI] New UI for fleets and instances by @olgenn in #2236
- Improve UX when no offers found by @jvstme in #2240
- Implement volumes force detach by @r4victor in #2242
Full changelog: 0.18.37...0.18.38
