Skip to content

Post-step cleanup fails & skips sticky-disk commit when buildkitd already exited (pkill exit 1 treated as fatal) #102

@ben-xo

Description

@ben-xo

Summary

In the post-job step, shutdownBuildkitd() runs sudo pkill -TERM buildkitd and awaits it directly. pkill exits with code 1 when no process matches. If buildkitd has already exited by the time the post-step runs, that exit-1 rejects the promise, shutdownBuildkitd() throws, and the action reports a fatal cleanup failure — which then skips the sticky-disk cache commit. So a perfectly successful build silently loses its layer cache for that run.

This isn't only cosmetic noise: the skipped commit defeats the action's main purpose (persisting the BuildKit cache on the sticky disk).

Environment

  • Action: useblacksmith/setup-docker-builder@v1 (resolves to v1.8.0, tag sha a592b831ebb20e68f7cf47329cf2c3c67b8a7655)
  • buildkitd: 0.29.3-blacksmith
  • Followed by docker/bake-action@v7; max-cache-size-mb: "30720"

Observed logs (post-job cleanup)

Starting buildkitd with command: nohup sudo buildkitd --debug --config=buildkitd.toml ... &
buildkitd daemon started successfully with PID 4159
buildkitd version: 0.29.3-blacksmith
...
Post job cleanup.
buildkitd addr: tcp://127.0.0.1:1234
buildkitd process: 4159
Sending SIGTERM to buildkitd for graceful shutdown
##[error]error shutting down buildkitd process: Command failed: sudo pkill -TERM buildkitd
##[error]Cleanup failed: Command failed: sudo pkill -TERM buildkitd
##[warning]Skipping sticky disk commit due to cleanup error: Command failed: sudo pkill -TERM buildkitd

The build itself succeeded; only the post-step cleanup "failed". The job is green overall, but the two red ##[error] annotations are misleading and the cache commit is dropped.

Root cause

Decompiled from dist/index.js (v1.8.0), lightly reformatted:

async function shutdownBuildkitd() {
  const TIMEOUT = 3e4;
  try {
    info("Sending SIGTERM to buildkitd for graceful shutdown");
    await exec(`sudo pkill -TERM buildkitd`);          // ← throws when pkill exits 1 (no match)
    const start = Date.now();
    while (Date.now() - start < TIMEOUT) {
      try {
        const { stdout } = await exec("pgrep buildkitd");
        debug(`buildkitd process still running with PID: ${stdout.trim()}, waiting...`);
        await new Promise(r => setTimeout(r, 300));
      } catch (e) {
        if (e.code === 1) { info("buildkitd successfully shutdown gracefully"); return }  // ← already handles "no process"
        throw e;
      }
    }
    // ... SIGKILL fallback ...
  }
}

The initial await exec("sudo pkill -TERM buildkitd") is intolerant of pkill's exit code 1 (= "no processes matched"). When buildkitd has already exited before the post-step — idle reap, or a crash (the action even ships a logBuildkitdCrashLogs() helper, so this is anticipated) — the pkill returns 1, the promise rejects, and the whole shutdown is treated as an error.

Notably, the very next loop already treats "no buildkitd process" (pgrep exit 1) as the success case (buildkitd successfully shutdown gracefully). The initial pkill just needs the same tolerance.

Suggested fix

Treat exit code 1 from the initial pkill -TERM buildkitd as "already gone → success", mirroring the existing pgrep handling. For example:

try {
  await exec(`sudo pkill -TERM buildkitd`);
} catch (e) {
  if (e.code === 1) { info("buildkitd already stopped"); return; }  // nothing to terminate
  throw e;
}

(Equivalently, sudo pkill -TERM buildkitd || true, though catching exit 1 specifically keeps real failures fatal.) This would clear the spurious ##[error] annotations and, more importantly, stop dropping the sticky-disk commit when the daemon exited on its own.


Bug report drafted with assistance from Claude Code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions