Skip to content

Degrade gracefully when sticky disk mount fails#60

Merged
aayushshah15 merged 3 commits into
mainfrom
fix/graceful-mount-failure
Jun 12, 2026
Merged

Degrade gracefully when sticky disk mount fails#60
aayushshah15 merged 3 commits into
mainfrom
fix/graceful-mount-failure

Conversation

@aayushshah15

@aayushshah15 aayushshah15 commented Jun 12, 2026

Copy link
Copy Markdown

When the action fails to obtain or mount the sticky disk (e.g. a Connect/gRPC error like ConnectError: [canceled] This operation was aborted), it warned and continued without ever creating the user-specified path. Downstream steps referencing files under it then failed with "No such file or directory" (real example: tar -xf /mnt/deps/deps.tzst failing because /mnt/deps didn't exist). This PR makes a failed mount look like a fresh, empty sticky disk (a cache miss) instead of a missing path.

How it works:

  • In the main step's error path, after the existing Error getting sticky disk warning, ensureFallbackDirectory creates the path with the same sudo mkdir -p + non-recursive chown to the runner user that a successful mount performs (including the nested-workspace parent chown), so an existing directory and its contents are left untouched. The step still succeeds, preserving the non-blocking behavior.
  • The mount-failed state was already tracked via the STICKYDISK_ERROR action state; the post step now reads it up front. When the path isn't mounted and the mount had failed, it logs an info notice and returns before any unmount or commit, so commit: true can't push the empty fallback directory back and clobber the customer's existing cached snapshot.
  • The rare case where the device mounted but a later setup step failed still flows through the existing unmount path, which already skips the commit on STICKYDISK_ERROR.

Validated with eslint, prettier, and tsc. dist/ is rebuilt and committed (built in CI with BUF_TOKEN access since the private @buf/blacksmith_vm-agent.* packages cannot be installed locally); the Build Action check passes.


View with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is enabled. (Staging)


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

aayushshah15 and others added 3 commits June 12, 2026 01:25
When the mount fails (e.g. a Connect/gRPC error talking to the host
agent), create the requested path as an empty runner-owned directory so
downstream steps see a cache miss instead of a missing path. In the
post step, skip unmount and commit when the disk was never mounted so
an empty fallback directory can't clobber existing cached data.

Co-authored-by: Codesmith Staging <codesmith-bot@users.noreply.github.com>
Co-authored-by: Codesmith Staging <codesmith-bot@users.noreply.github.com>
dist/ was rebuilt by CI (run 27388547061) with BUF_TOKEN access since
the private @buf packages cannot be installed outside CI.

Co-authored-by: Codesmith Staging <codesmith-bot@users.noreply.github.com>
@aayushshah15 aayushshah15 requested a review from adityamaru June 12, 2026 01:36
@aayushshah15 aayushshah15 merged commit 4c034ba into main Jun 12, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant