Release 0.19.35 · dstackai/dstack

Runpod

Instant Clusters

dstack adds support for Runpod Instant Clusters enabling multi-node tasks on Runpod:

✗ dstack apply -f nccl-tests.dstack.yaml -b runpod

 Project          main                                    
 User             admin                                   
 Configuration    .dstack/confs/nccl-tests-simple.yaml    
 Type             task                                    
 Resources        cpu=2.. mem=8GB.. disk=100GB.. gpu:1..8 
 Spot policy      auto                                    
 Max price        -                                       
 Retry policy     -                                       
 Creation policy  reuse-or-create                         
 Idle duration    5m                                      
 Max duration     -                                       
 Reservation      -                                       

 #  BACKEND           RESOURCES                          INSTANCE TYPE     PRICE    
 1  runpod (US-KS-2)  cpu=128 mem=2008GB disk=100GB      NVIDIA A100-SXM…  $16.7…   
                      A100:80GB:8                                                   
 2  runpod (US-MO-1)  cpu=128 mem=2008GB disk=100GB      NVIDIA A100-SXM…  $16.7…   
                      A100:80GB:8                                                   
 3  runpod            cpu=160 mem=1504GB disk=100GB      NVIDIA H100 80G…  $25.8…   
    (CA-MTL-1)        H100:80GB:8                                                   
    ...                                                                             
 Shown 3 of 5 offers, $34.464max

Submit the run nccl-tests? [y/n]:

Runpod offers clusters of 2 to 8 nodes with H200, B200, H100, and A100 GPUs and InfiniBand networking up to 3200 Gbps.

What's Changed

Fix postgres migrations deadlocks by @r4victor in #3220
Detect nvidia inside WSL2 by @r4victor in #3221
Fix examples link in contributing doc by @matiasinsaurralde in #3228
Fix context usage in internal metrics package by @matiasinsaurralde in #3226
Fix working_dir compatibility with pre-0.19.27 clients by @un-def in #3231
Fix autocreated fleets warning by @r4victor in #3233
Improve Go code error handling by @r4victor in #3230
Support Runpod Instant Clusters by @r4victor in #3214
Switch to nebius sdk 0.3 by @r4victor in #3222
Do not terminate fleet instances on idle_duration at nodes.min by @r4victor in #3235
[shim] Log successful API calls with trace level by @un-def in #3237
Drop hardcoded Nebius InfiniBand fabrics by @jvstme in #3234
[runner] Clone repo before working dir is created by @un-def in #3240

New Contributors

@matiasinsaurralde made their first contribution in #3228

Full Changelog: 0.19.34...0.19.35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.19.35

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Runpod

Instant Clusters

What's Changed

New Contributors

Contributors

Uh oh!