fix(cubebox): allow deleting paused sandboxes via out-of-band teardown#480
fix(cubebox): allow deleting paused sandboxes via out-of-band teardown#480HuChundong wants to merge 1 commit into
Conversation
|
Claude finished @HuChundong's task in 1m 29s —— View job Code Review for PR #480PR Title: OverviewThis PR addresses a real problem: a paused/frozen VM can't serve its ttrpc runtime API, so the shim rejects per-container delete with "sandbox not in normal state" and the sandbox becomes undeletable. The fix routes paused sandbox teardown through a binary force-delete that SIGKILLs the shim and detaches it from the runtime manager. Files changed: 2 files, +34/-8 ✅ Good Design Decisions
|
| Category | Verdict |
|---|---|
| Correctness | The approach is correct and addresses a real bug. |
| Design | Clean extraction of destroySandboxByBinary; consistent across Destroy/CleanUp. |
| Safety | Mostly safe. Stale-PID risk is mitigated by shims.Delete. |
| Test Coverage | Missing tests for the new function and the paused-sandbox path. |
| Edge Cases | ESRCH and IsNotFound handled; stale PID is the notable gap. |
The PR is well-structured and solves the stated problem. The most impactful improvement would be adding unit tests for destroySandboxByBinary. The stale-PID concern is defense-in-depth and unlikely to manifest in practice given how sandbox PID lifecycle works.
|
Claude finished @HuChundong's task in 1m 39s —— View job Review of PR #480: fix(cubebox): allow deleting paused sandboxes via out-of-band teardownSummaryThe codebase has been updated since the initial automated scan. The PR now contains 2 commits and the stale-PID concern from the first-pass review has been addressed with the ✅ What Was Improved Since First Scan
Design ReviewSeparation of concerns in
|
| Priority | Suggestion |
|---|---|
| 🟢 Low | Add a one-line comment in isCubeShimPid noting the expected argv format (-id <sandboxID>) so future readers understand why a raw token match is correct. |
| 🟢 Low | Consider adding a brief comment in CleanUp's paused check acknowledging the unlocked read, so future maintainers know it's intentional. |
| 🟡 Medium | The sandboxDeletable double-call at destroy.go:77-79 and 87-89 (pre-existing) is a no-op duplicate — consider cleaning up separately. |
Conclusion
The PR is correct, well-structured, and solves a real problem (undeletable paused sandboxes). The second commit (4b1636e) addresses the stale-PID safety concern raised in the initial review by adding isCubeShimPid. The force-delete path is consistently applied across both Destroy and CleanUp. No blocking issues remain.
Verification Checklist
- Paused sandbox
Destroy→ selectsdestroySandboxByBinary→ SIGKILLs shim (with stale-PID guard) → detaches from shim manager - Running sandbox
Destroy→ unchanged path (per-containerdestroyContainer) - Paused sandbox
CleanUp(GC) → samedestroySandboxByBinarypath, preventing infinite GC loop - Running sandbox
CleanUp→ unchanged path (per-containerstopTask) -
isCubeShimPidguards against stale/recycled PIDs via/proc/<pid>/cmdlineverification - Tests cover the PID guard function
-
syscall.ESRCHanderrdefs.IsNotFoundare handled gracefully -
procfsdependency already exists inCubelet/go.mod
| Branch: fix/delete-paused-sandbox
|
Claude finished @HuChundong's task in 1m 2s —— View job Review of PR #480Branch: SummaryThis PR fixes a real problem: a paused/frozen VM cannot serve its ttrpc runtime API, so the shim rejects per-container delete with "sandbox not in normal state", making the sandbox permanently undeletable (errors 130593/130506). The fix routes paused sandbox teardown through a binary force-delete that SIGKILLs the shim and detaches it from containerd's shim manager, letting dead-shim cleanup reclaim the workdir and pause snapshot. ✅ What's Good1. Clean extraction of
|
| # | Severity | Issue | Suggestion |
|---|---|---|---|
| 1 | 🟡 Medium | Non-fatal SIGKILL failure may orphan the shim (destroy.go:181-184) |
If syscall.Kill fails with a reason other than ESRCH (e.g., EPERM from a security restriction like AppArmor or seccomp), the function logs a warning and continues to shims.Delete. This detaches the sandbox from containerd's management while the shim may still be running — creating an orphan process. Consider returning the kill error rather than just logging it, or adding a post-kill ProcessExists check to confirm the shim actually exited. At minimum, a comment explaining why falling through is safe would help future readers. |
| 2 | 🟡 Medium | Unlocked status read in CleanUp (local.go:459) |
Destroy holds sb.Lock() (destroy.go:65-66) before checking IsPaused(), but CleanUp reads info.GetStatus().IsPaused() without any lock. A state transition (paused → running) between the check and the SIGKILL would kill the shim of a now-running sandbox, terminating the VM. This matches the pre-existing pattern in CleanUp (it never locked), and killing the shim is safe — containerd would restart it — but adding a comment acknowledging the unlocked read would help future maintainers understand it's intentional. |
| 3 | 🟢 Low | isCubeShimPid uses exact token matching (destroy.go:205-209) |
The function iterates over cmdline args and matches arg == sandboxID. The shim is launched with -id <sandboxID> as separate argv entries, so this is correct. However, if any other argument happens to equal the sandbox ID (e.g., a filename in the same directory), there's a false positive. In practice the sandbox ID is a UUID, so collision risk is negligible. Consider documenting the expected argv shape in a comment, or tightening to look for a -id / <sandboxID> pair pattern. |
| 4 | 🟢 Low | collectSandboxMaxMemoryUsage also reads /proc/<pid> without stale-PID guard (destroy.go:396-397) |
The pre-existing readVmHWM(pid) call uses procfs.NewProc(pid) to read memory stats, but unlike the new isCubeShimPid, it doesn't verify the PID still belongs to the right shim. A recycled PID could lead to reading memory stats from an unrelated process. This is pre-existing and likely low-impact (the metric read is advisory), but worth noting for consistency. |
| 5 | 🟢 Low | procfs import footprint (destroy.go:24) |
procfs was already imported by local.go for readVmHWM. Using the same library in destroy.go for isCubeShimPid is consistent, but note that procfs.NewProc does a stat(2) on /proc/<pid>, which has a tiny race window with syscall.Kill. The isCubeShimPid → syscall.Kill sequence is not atomic — the PID could die between the check and the kill, but that's benign (returns ESRCH which is handled). |
Code Structure Notes
Destroy control flow (destroy.go:114-124)
The separation of detection (is paused?) from action (how to delete?) is clean:
if sb.GetStatus().IsPaused() {
ctx = constants.WithSkipRuntimeAPI(ctx)
}
if constants.SkipRuntimeAPI(ctx) {
if err := l.destroySandboxByBinary(ctx, sb); err != nil {
...
}
} else {
// normal per-container deletion
}This is extensible — future callers can set the SkipRuntimeAPI flag independently without duplicating the paused check.
CleanUp GC parity (local.go:458-470)
The mirroring of destroySandboxByBinary in CleanUp is critical: without it, the GC loop would keep finding the shim process still running (ProcessExists at line 473) and retry indefinitely. This fix breaks that loop cleanly.
Redundant sandboxDeletable removal (commit 0129d18)
The commit correctly identifies that sandboxDeletable is a pure function of sb.Labels and the filter — nothing between the two calls changes either, so the second check (previously at line 87) always returned the same result as the first. Removing it is the right cleanup.
Test Coverage Assessment
| Function | Tested? | Notes |
|---|---|---|
isCubeShimPid |
✅ Yes | Good coverage of match, non-match, nonexistent PID |
destroySandboxByBinary |
❌ No | Depends on l.shims (ShimManager), which requires a running containerd. Acceptable for now. |
Destroy paused path |
❌ No | Integration-level; would need a test harness with a mock shim |
CleanUp paused path |
❌ No | Same integration constraint |
Adding an integration test or interface abstraction over ShimManager to enable unit testing of destroySandboxByBinary would be a nice follow-up but is out of scope for this PR.
Conclusion
| Category | Verdict |
|---|---|
| Correctness | ✅ The approach is correct and addresses the root cause |
| Safety | ✅ Stale-PID guard (isCubeShimPid) is well-implemented |
| Design | ✅ Clean extraction of destroySandboxByBinary; consistent across Destroy/CleanUp |
| Edge Cases | ✅ ESRCH, IsNotFound, zero-PID handled. Stale-PID mitigated |
| Tests | 🟡 PID guard tested. destroySandboxByBinary untested (integration constraint) |
| Non-blocking concerns |
The PR is well-structured, addresses a real and impactful bug, and the second commit (4b1636e) appropriately addresses the stale-PID safety concern. No blocking issues remain.
Most impactful follow-up: Consider hardening the SIGKILL failure path in destroySandboxByBinary — returning the error (or at minimum confirming the process exited) for non-ESRCH failures would prevent potential orphaned shims.
| Branch: fix/delete-paused-sandbox
|
Claude finished @HuChundong's task in 1m 17s —— View job Code Review for PR #480Branch: SummaryThis PR fixes a real bug: a paused/frozen VM cannot serve its ttrpc runtime API, so the shim rejects per-container delete with "sandbox not in normal state", making the sandbox permanently undeletable. The fix routes paused sandbox teardown through a binary force-delete that SIGKILLs the shim (with stale-PID guard) and detaches it from containerd's shim manager, letting dead-shim cleanup reclaim the workdir and pause snapshot. The code has evolved through 4 commits and has been reviewed multiple times. The latest commit (3028b33) addresses several concerns raised in earlier passes. ✅ What's Good1. Clean extraction of
|
| Category | Verdict |
|---|---|
| Correctness | ✅ Addresses the root cause — paused sandboxes are now deletable |
| Safety | ✅ Stale-PID guard (cmdlineServesSandbox) properly prevents signaling recycled PIDs |
| SIGKILL failure | ✅ Now correctly returns an error instead of potentially orphaning the shim |
| Edge cases | ✅ ESRCH, IsNotFound, zero-PID, nonexistent PID all handled |
| Tests | ✅ cmdlineServesSandbox well-tested with 7 cases; isCubeShimPid tested for nonexistent and non-shim PIDs |
| Design | ✅ Clean extraction of destroySandboxByBinary; consistent across Destroy/CleanUp |
| Unlocked CleanUp read | 🟡 Pre-existing pattern, safe in practice, no action needed |
destroySandboxByBinary test |
🟢 Integration constraint, acceptable |
The PR is well-structured and the safety concerns from earlier iterations have been properly addressed. The cmdlineServesSandbox function is a particularly nice touch — it's more conservative than a simple token match, requiring the -id flag prefix. No blocking issues remain.
|
Can you squash it into one commit? |
A paused/frozen VM cannot serve its runtime (ttrpc) API, so the shim rejects the regular per-container delete with "sandbox not in normal state" and the sandbox can never be removed (error 130593/130506). Route teardown of paused sandboxes through a binary force-delete that SIGKILLs the shim (reaping the in-process cube-hypervisor) and detaches it from the runtime manager, letting containerd's dead-shim cleanup reclaim the VM workdir and pause snapshot: - Destroy: select the SkipRuntimeAPI path when the sandbox is paused and factor the force-delete into destroySandboxByBinary. - CleanUp (GC reconcile / workflow compensation): mirror the same path so a paused sandbox is reclaimed instead of looping forever on "shim process still Exists". Harden destroySandboxByBinary: - Verify via /proc that the recorded shim PID is still this sandbox's cube shim (matching the "-id <sandboxID>" launch arg) before SIGKILL, so a stale/recycled PID is never signaled. - Propagate non-ESRCH SIGKILL failures instead of detaching a shim we failed to kill, which would orphan the shim and its in-process VMM. - Drop a redundant sandboxDeletable check in Destroy. Covered by table-driven tests for the PID/cmdline matching. Verified end-to-end (running -> pause -> delete): RetCode 200, shim and VMM reaped, VM workdir / pause snapshot / bundle all reclaimed. Assisted-by: Cursor:Claude-Opus-4.8 Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: carmake <gycm520@gmail.com>
3028b33 to
598dfdf
Compare
| // The stored shim PID may be stale (shim already gone, PID recycled), so only | ||
| // SIGKILL it after /proc confirms it is still this sandbox's cube shim. | ||
| if pid := int(sb.Endpoint.Pid); pid != 0 && isCubeShimPid(pid, sb.ID) { | ||
| if err := syscall.Kill(pid, syscall.SIGKILL); err != nil && !errors.Is(err, syscall.ESRCH) { | ||
| // Don't detach a shim we couldn't kill: shims.Delete would drop it from | ||
| // containerd's management and orphan the shim (and its in-process VMM). | ||
| return fmt.Errorf("kill shim %d of sandbox %s: %w", pid, sb.ID, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
TOCTOU race: PID verified then signaled, but PID can be recycled between the two operations
isCubeShimPid() reads /proc/<pid>/cmdline on line 180, and then syscall.Kill(pid, SIGKILL) is called on line 181. If the shim exits between these two calls, the kernel can recycle that PID to an unrelated process (another sandbox's VMM, a Cubelet worker, etc.), and the SIGKILL terminates an innocent process.
On Linux 5.3+, use pidfd_open(pid, 0) to pin the PID, verify via /proc/self/fdinfo/<pidfd>, and signal through the pidfd using pidfd_send_signal. This closes the recycle window because the pidfd holds a reference preventing PID reuse.
If pidfd is unavailable, explicitly document this race and consider that shims.Delete (line 188) is the safer primary mechanism — the manual syscall.Kill is a best-effort emergency measure.
| if info.GetStatus().IsPaused() { | ||
| if er := l.destroySandboxByBinary(ctx, info); er != nil { | ||
| stepLog.Warnf("CleanUp force delete sandbox %s by binary fail: %v", sandBoxID, er) | ||
| } |
There was a problem hiding this comment.
Race condition: CleanUp checks IsPaused() without holding the sandbox lock
The Destroy path (destroy.go:66) holds sb.Lock() before checking IsPaused(), preventing a concurrent Resume from changing state mid-flight. But here in CleanUp, no lock is held on info. If a concurrent Resume transitions the sandbox from paused to running between the IsPaused() check and the destroySandboxByBinary call, this code will force-kill a now-running sandbox with active workloads.
Consider acquiring the sandbox lock before the IsPaused() check, or atomically re-verify the state under the lock before the destructive operation.
| if err := l.destroySandboxByBinary(ctx, sb); err != nil { | ||
| result = multierror.Append(result, fmt.Errorf("delete sandbox by binary failed: %w", err)) | ||
| } |
There was a problem hiding this comment.
Missing container metadata/filesystem cleanup in the paused Destroy path
When the paused path is taken, destroySandboxByBinary is called (which kills the shim and detaches it from containerd), but the container-by-container teardown in the else branch (lines 129-162) is entirely skipped. This means:
deleteContainer()is never called — containerd container metadata entries remain in boltdbrootfs.CleanRootfs()is not called for individual containers — rootfs mounts may leaktaskio.Clean()is not called — FIFO directories may leak
Compare with CleanUp (local.go:564-575), which DOES call cleanFn for each container after destroySandboxByBinary to perform this cleanup.
Consider adding container metadata cleanup after the runc.Clean call, something like:
for id := range sb.All() {
if cntr, _ := l.client.LoadContainer(ctx, id); cntr != nil {
_ = deleteContainer(ctx, cntr)
}
}
if first := sb.FirstContainer(); first != nil && first.Container != nil {
_ = deleteContainer(ctx, first.Container)
}
fslongjin
left a comment
There was a problem hiding this comment.
Issues
The intent (allowing deletion of paused sandboxes) is correct, but the implementation has problems:
-
Shouldn't write custom kill logic. The cube shim already implements
delete_shim()(srv.rs:56-72), which handles SIGKILL of the original shim process, socket cleanup, and removal of VM workdir and pause snapshot. The PR reimplements this in Cubelet with manual SIGKILL +/procPID verification (isCubeShimPid/cmdlineServesSandbox), duplicating the shim's own logic while also missing VM workdir and pause snapshot cleanup. -
Bypasses containerd's standard mechanism. containerd has a standard
binary.Deleteflow for unresponsive shims: launch the shim binary with thedeletesubcommand, which triggersdelete_shim()to handle cleanup. The PR doesn't use this path.
Suggestion
Use the shim's binary.Delete: launch the cube-shim binary with -id <sandboxID> delete, let the shim's own delete_shim() handle kill + resource cleanup, then call l.shims.Delete to remove from containerd. Cubelet doesn't need any custom kill logic.
Remove:
isCubeShimPid/cmdlineServesSandboxfunctions and their tests- Manual
syscall.Kill+/procverification logic
Revert:
sandboxDeletablecheck (unrelated to the paused sandbox fix)
A paused/frozen VM cannot serve its runtime (ttrpc) API, so the shim rejects the regular per-container delete with "sandbox not in normal state" and the sandbox can never be removed (error 130593/130506).
Route teardown of paused sandboxes through a binary force-delete that SIGKILLs the shim (reaping the in-process cube-hypervisor) and detaches it from the runtime manager, letting containerd's dead-shim cleanup reclaim the VM workdir and pause snapshot:
Verified end-to-end (running -> pause -> delete): RetCode 200, shim and VMM reaped, VM workdir / pause snapshot / bundle all reclaimed, no error spam.
Assisted-by: Cursor:Claude-Opus-4.8