Implement requirements-independent offers cache by r4victor · Pull Request #3091 · dstackai/dstack

r4victor · 2025-09-11T11:05:48Z

Part of #3021

This PR reworks how dstack caches backend offers. Common get_offers_cached() is dropped. For most backends for which offers do not depend on requirements (or offers can be adjusted as in case with disk), it introduces ComputeWithAllOffersCached class that implements caching of all offers with availability and requirements post-filtering. This allows reusing cache when getting offers for different requirements. For cudo, vastai, kubernetes, the old behavior is preserved via ComputeWithFilteredOffersCached.

Upsides:

Cache is reused frequently as it's requirements-independent. dstack offer and dstack apply become quick even when called with different requirements consequently.
It's now possible to implement fleet selection considering backend offers for each fleet (Consider backend offers when choosing optimal fleet for run #3020).

Downsides:

When cache is cold, requesting offers may be slower since backend salways collect all offers. In my testing, I observed time dstack apply -b aws --gpu H100 goes from ~15s to ~20s, time dstack apply -b gcp --gpu H100 goes from ~10s to ~15s.

This implementation is for transitioning period. The plan is to experiment with fleet selection logic considering fleet offers, and see if it works well. Then, implement requirements-independent cache for cudo, vastai (requires caching provider's API calls, which currently can be done in gpuhunt only). Then move offers collection into dstack and simplify the code.

Tested offers for all backends except for cloudrift.

…rements Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

…vation handling Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

…rs method Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

…_SIZE

jvstme · 2025-09-12T15:34:19Z

src/dstack/_internal/core/backends/template/compute.py.jinja


 class {{ backend_name }}Compute(
    # TODO: Choose ComputeWith* classes to extend and implement
+    # ComputeWithAllOffersCached,


(nit) I think ComputeWithFilteredOffersCached could be a good default for the template, since it works for all backends. Guidance on how to choose between ComputeWithAllOffersCached and ComputeWithFilteredOffersCached could also be helpful, unless we expect to move away from this model before anyone contributes the next backend.

My idea is to drop ComputeWithFilteredOffersCached soon – all providers should implement requirements-independent cache.

r4victor and others added 23 commits September 10, 2025 10:56

Cache GCP offers with availability

9e2a6b0

refactor: update get_offers method signature to remove optional requi…

b183ae7

…rements Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

Introduce ComputeWithAllOffersCached

a3b5136

feat: migrate AWSCompute to use ComputeWithAllOffersCached with reser…

191a408

…vation handling Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

refactor: update compute classes to use flexible requirements filtering

f3ceb96

Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

Cache AWS offers with availability

ee49d7d

refactor: migrate AzureCompute to use ComputeWithAllOffersCached

dbbf3dc

Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

refactor: migrate CloudriftCompute to use ComputeWithAllOffersCached

43a8b63

Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

refactor: migrate DatacrunchCompute to use ComputeWithAllOffersCached

fa6d39b

Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

fix missing Compute

693a33f

Migrate all backends to ComputeWithAllOffersCached

aa3e6ac

refactor: inherit from ComputeWithAllOffersCached and update get_offe…

29a0fbc

…rs method Co-authored-by: aider (bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0) <aider@aider.chat>

Move by requirements cache to ComputeWithFilteredOffersCached

16ba873

Implement get_offers_modifier for AWS

c64e01e

Implement get_offers_modifier for all backends with CONFIGURABLE_DISK…

cadd0f1

…_SIZE

Fix backend offers

044c14d

Fix nebius

03d15b3

Fix oci

469a9e2

Use ComputeWithAllOffersCached for kuberenetes

a8babca

Cache AWS.get_offers_post_filter

9707064

Update template

9d64349

Fix tests

7a5de8f

Lint

e14c6fb

r4victor requested a review from jvstme September 11, 2025 11:20

jvstme approved these changes Sep 12, 2025

View reviewed changes

Merge branch 'master' into pr_offers_with_availability_cache

d6dece8

r4victor merged commit 90fc7c9 into master Sep 15, 2025
28 checks passed

r4victor deleted the pr_offers_with_availability_cache branch September 15, 2025 07:18

r4victor mentioned this pull request Sep 17, 2025

Optimize getting run plan when using region/instance-type filters #1206

Closed

r4victor mentioned this pull request Oct 9, 2025

Implement backend-specific offers cache #3021

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement requirements-independent offers cache#3091

Implement requirements-independent offers cache#3091
r4victor merged 24 commits intomasterfrom
pr_offers_with_availability_cache

r4victor commented Sep 11, 2025

Uh oh!

jvstme Sep 12, 2025

Uh oh!

r4victor Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

r4victor commented Sep 11, 2025

Uh oh!

jvstme Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

r4victor Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants