fix(cloud-sdk): reclaim builder sandbox on cancellation#727
Open
jare1686 wants to merge 1 commit into
Open
Conversation
Contributor
|
@jare1686 Thanks for the contribution! Can you please update the PRs so that lint passes and rebase from main? |
af16889 to
a3d0d7e
Compare
Contributor
Author
|
Done! My pleasure. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
build_sandbox_image(crates/cloud-sdk/src/sandbox_images.rs) creates a remote buildersandbox and reclaims it with a sequential
deleteafter the build's awaited block. If thefuture is cancelled — a caller timeout, a losing
select!branch, or client shutdown — duringthe build, that delete never runs and the builder sandbox is leaked until the server-side
timeout reaps it. The keepalive task compounds this: it is aborted only on the success path, so
on cancellation its
JoinHandleis dropped and detached, leaving a background loop pinging theorphaned sandbox.
This change ties cleanup to ownership. A small
BuilderSandboxCleanupguard captures theruntime handle at creation and reclaims the sandbox via a spawned (detached) delete, so
cancellation before or during cleanup cannot suppress it; the normal path still awaits the
delete and emits the existing warning, and a
404is treated as success. The keepalive handleis wrapped so it aborts on drop. No new dependencies and no public API surface.
This is best-effort SDK-side cleanup once the sandbox ID is known; guaranteed reclamation across
process exit or runtime shutdown is a server-side concern (a sandbox lease/TTL or a reaper) and
is out of scope here.
Validation
Tests cover delete-on-drop, awaited reclaim (no double delete), cancellation during cleanup,
404-as-success, non-404 error propagation, successful delete, and keepalive abort-on-drop.
cargo nextest run -p tensorlakecargo fmt -p tensorlake --checkRelated: #528