0.19.35
Runpod
Instant Clusters
dstack adds support for Runpod Instant Clusters enabling multi-node tasks on Runpod:
✗ dstack apply -f nccl-tests.dstack.yaml -b runpod
Project main
User admin
Configuration .dstack/confs/nccl-tests-simple.yaml
Type task
Resources cpu=2.. mem=8GB.. disk=100GB.. gpu:1..8
Spot policy auto
Max price -
Retry policy -
Creation policy reuse-or-create
Idle duration 5m
Max duration -
Reservation -
# BACKEND RESOURCES INSTANCE TYPE PRICE
1 runpod (US-KS-2) cpu=128 mem=2008GB disk=100GB NVIDIA A100-SXM… $16.7…
A100:80GB:8
2 runpod (US-MO-1) cpu=128 mem=2008GB disk=100GB NVIDIA A100-SXM… $16.7…
A100:80GB:8
3 runpod cpu=160 mem=1504GB disk=100GB NVIDIA H100 80G… $25.8…
(CA-MTL-1) H100:80GB:8
...
Shown 3 of 5 offers, $34.464max
Submit the run nccl-tests? [y/n]:
Runpod offers clusters of 2 to 8 nodes with H200, B200, H100, and A100 GPUs and InfiniBand networking up to 3200 Gbps.
What's Changed
- Fix postgres migrations deadlocks by @r4victor in #3220
- Detect nvidia inside WSL2 by @r4victor in #3221
- Fix examples link in contributing doc by @matiasinsaurralde in #3228
- Fix context usage in internal metrics package by @matiasinsaurralde in #3226
- Fix working_dir compatibility with pre-0.19.27 clients by @un-def in #3231
- Fix autocreated fleets warning by @r4victor in #3233
- Improve Go code error handling by @r4victor in #3230
- Support Runpod Instant Clusters by @r4victor in #3214
- Switch to nebius sdk 0.3 by @r4victor in #3222
- Do not terminate fleet instances on idle_duration at nodes.min by @r4victor in #3235
- [shim] Log successful API calls with trace level by @un-def in #3237
- Drop hardcoded Nebius InfiniBand fabrics by @jvstme in #3234
- [runner] Clone repo before working dir is created by @un-def in #3240
New Contributors
- @matiasinsaurralde made their first contribution in #3228
Full Changelog: 0.19.34...0.19.35