fix(onboard): detach sandbox providers before rebuild deletes the sandbox#4943
Conversation
…dbox Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a shared SANDBOX_PROVIDER_SUFFIXES contract and a pre-delete cleanup that best-effort detaches per-sandbox providers; onboard and destroy flows invoke the cleanup before sandbox deletion and tests cover tolerant-not-found cases, failures, redaction, length-capping, and ordering. ChangesSandbox Provider Cleanup Before Deletion
sequenceDiagram
participant CLI as CLI (onboard/destroy)
participant PreDelete as runSandboxProviderPreDeleteCleanup
participant OpenShell as runOpenshell
CLI->>PreDelete: runSandboxProviderPreDeleteCleanup(sandboxName, { runOpenshell, redact })
PreDelete->>OpenShell: sandbox provider detach <sandbox>-<suffix> (iterates SANDBOX_PROVIDER_SUFFIXES)
OpenShell-->>PreDelete: exit status + stdout/stderr
PreDelete->>CLI: return { detached, failures } (warn emitted for failures)
CLI->>OpenShell: openshell sandbox delete <sandbox> (runs after detach pass)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
PR Review AdvisorFindings: 2 needs attention, 6 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3205-3212: The new explanatory comment block around the sandbox
deletion/ provider-detach logic (referencing "sandbox delete" and the follow-up
`provider delete` driven by `upsertMessagingProviders` with `replaceExisting:
true`) added near the sandbox teardown code is causing +20 net lines; trim or
relocate it to keep net file growth within CI budget—either shorten the comment
to a one- or two-line summary, move the full explanation to a nearby README or a
single shared comment at top of the file, or offset by removing or consolidating
other non-essential comments in the same region (also apply the same trimming
for the similar block around lines referenced near `upsertMessagingProviders` at
the second occurrence).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2fd74ae6-b656-421c-8d08-9a6895645c13
📒 Files selected for processing (5)
src/lib/actions/sandbox/destroy.tssrc/lib/onboard.tssrc/lib/onboard/sandbox-provider-cleanup.tstest/destroy-cleanup-sandbox-services.test.tstest/sandbox-provider-cleanup.test.ts
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
… destroy Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27128281702
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/sandbox-provider-cleanup.test.ts (1)
15-26:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix the mock signature to match the actual
RunOpenshelltype.The mock function on line 20 only accepts one argument
(args: Argv), but the actualRunOpenshelltype accepts two:(args: string[], opts?: Record<string, unknown>). This causes a TypeScript compilation error at line 192 where the mock is called with both arguments, as confirmed by the static analysis hint.The actual
RunOpenshellsignature fromdestroy.tslines 55-58 is:type RunOpenshell = ( args: string[], opts?: Record<string, unknown>, ) => { status: number | null };🔧 Proposed fix
function buildRunOpenshell( responses: Map<string, RunResult>, defaultResponse: RunResult = { status: 0 }, ) { const calls: Argv[] = []; - const fn = vi.fn((args: Argv) => { + const fn = vi.fn((args: Argv, _opts?: Record<string, unknown>) => { calls.push(args); const key = args.join(" "); return responses.get(key) ?? defaultResponse; }); return { runOpenshell: fn, calls }; }Note: The parameter is prefixed with
_since it's unused, following the coding guideline for unused variables in TypeScript.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/sandbox-provider-cleanup.test.ts` around lines 15 - 26, The mock in buildRunOpenshell has the wrong signature — change the vi.fn to match the real RunOpenshell type so it accepts (args: string[], opts?: Record<string, unknown>) and preserve tracking of calls; specifically update the anonymous function used for runOpenshell to (args: string[], _opts?: Record<string, unknown>) => { calls.push(args as Argv); const key = args.join(" "); return responses.get(key) ?? defaultResponse; } so runOpenshell and buildRunOpenshell match the RunOpenshell signature and TypeScript no longer errors when the second opts argument is passed.Sources: Coding guidelines, Linters/SAST tools
🧹 Nitpick comments (1)
src/lib/onboard/sandbox-provider-cleanup.ts (1)
38-39: 💤 Low valueConsider adding word boundaries to lowercase variants for precision.
The regex correctly matches camelCase variants with word boundaries (
\bNotFound\b,\bNotAttached\b) but lacks them for lowercase variants (not\s+found,not\s+attached). This could theoretically match partial words like "not foundational", though in the specific context of OpenShell provider detach errors this is unlikely.For maximum precision, consider:
-const TOLERATED_DETACH_OUTPUT_RE = - /\bNotFound\b|\bNotAttached\b|not\s+found|not\s+attached/i; +const TOLERATED_DETACH_OUTPUT_RE = + /\bNotFound\b|\bNotAttached\b|\bnot\s+found\b|\bnot\s+attached\b/i;Given the narrow domain and the upstream tolerance pattern, this is optional.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/sandbox-provider-cleanup.ts` around lines 38 - 39, The current TOLERATED_DETACH_OUTPUT_RE constant lacks word boundaries around the lowercase alternatives ("not\s+found" and "not\s+attached"), which can match unintended substrings; update the regex in TOLERATED_DETACH_OUTPUT_RE to use word boundaries around those alternatives (e.g., \bnot\s+found\b and \bnot\s+attached\b) while preserving the existing \bNotFound\b, \bNotAttached\b variants and the case‑insensitive flag so the pattern remains precise and tolerant.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/sandbox-provider-cleanup.test.ts`:
- Around line 15-26: The mock in buildRunOpenshell has the wrong signature —
change the vi.fn to match the real RunOpenshell type so it accepts (args:
string[], opts?: Record<string, unknown>) and preserve tracking of calls;
specifically update the anonymous function used for runOpenshell to (args:
string[], _opts?: Record<string, unknown>) => { calls.push(args as Argv); const
key = args.join(" "); return responses.get(key) ?? defaultResponse; } so
runOpenshell and buildRunOpenshell match the RunOpenshell signature and
TypeScript no longer errors when the second opts argument is passed.
---
Nitpick comments:
In `@src/lib/onboard/sandbox-provider-cleanup.ts`:
- Around line 38-39: The current TOLERATED_DETACH_OUTPUT_RE constant lacks word
boundaries around the lowercase alternatives ("not\s+found" and
"not\s+attached"), which can match unintended substrings; update the regex in
TOLERATED_DETACH_OUTPUT_RE to use word boundaries around those alternatives
(e.g., \bnot\s+found\b and \bnot\s+attached\b) while preserving the existing
\bNotFound\b, \bNotAttached\b variants and the case‑insensitive flag so the
pattern remains precise and tolerant.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ad397301-b80b-4813-bdec-78db40796975
📒 Files selected for processing (4)
src/lib/actions/sandbox/destroy.tssrc/lib/onboard.tssrc/lib/onboard/sandbox-provider-cleanup.tstest/sandbox-provider-cleanup.test.ts
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/cli/destroy-detach-order.test.ts (1)
21-23: 💤 Low valueConsider adding cleanup for the temporary directory.
The test creates a temporary directory but doesn't explicitly clean it up after completion. While the OS will eventually clean
/tmp, adding an explicit cleanup (e.g.,fs.rmSync(home, { recursive: true })in anafterEachor try-finally block) would improve test hygiene.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/cli/destroy-detach-order.test.ts` around lines 21 - 23, The test creates a temporary directory assigned to the variable home but never removes it; add explicit cleanup by removing home after the test (e.g., call fs.rmSync(home, { recursive: true, force: true }) in an afterEach or finally block) so the temp directory is deleted; update the test harness around the code that creates home (or the surrounding describe/it) to ensure fs.rmSync runs even on failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/cli/destroy-detach-order.test.ts`:
- Around line 21-23: The test creates a temporary directory assigned to the
variable home but never removes it; add explicit cleanup by removing home after
the test (e.g., call fs.rmSync(home, { recursive: true, force: true }) in an
afterEach or finally block) so the temp directory is deleted; update the test
harness around the code that creates home (or the surrounding describe/it) to
ensure fs.rmSync runs even on failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b07f9e2d-9202-4ca7-b60b-e06aff998f2d
📒 Files selected for processing (3)
src/lib/onboard/sandbox-provider-cleanup.tstest/cli/destroy-detach-order.test.tstest/sandbox-provider-cleanup.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- test/sandbox-provider-cleanup.test.ts
- src/lib/onboard/sandbox-provider-cleanup.ts
Selective E2E Results — ✅ All requested jobs passedRun: 27129306865
|
Selective E2E Results — ✅ All requested jobs passedRun: 27130844850
|
…ach before resume upsert Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…dbox list and retrying delete Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27133989585
|
…me detach, surface recovery failures Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27135372965
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/lib/onboard/providers.test.ts (2)
409-409: ⚡ Quick winUse consistent type casting for process.exit mock.
For better type safety and consistency with the existing test at line 298-300, consider using
as typeof process.exitinstead ofas never.♻️ Suggested improvement for consistency
- process.exit = ((code?: number) => { + process.exit = ((code?: number | string | null) => { - throw new Error(`exit(${code})`); + throw new Error(`exit(${code ?? 1})`); - }) as never; + }) as typeof process.exit;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/providers.test.ts` at line 409, The test replaces process.exit with a mock using an unsafe cast; update the mock assignment of process.exit (the line assigning process.exit = ((code?: number) => { ... })) to use a consistent and type-safe cast "as typeof process.exit" instead of "as never" so the mocked function matches the original process.exit signature and aligns with the earlier test pattern.
398-398: 💤 Low valueConsider using
constfor immutable variable.Since
originalExitis never reassigned,constwould more clearly communicate the immutability intent.♻️ Optional style improvement
- let originalExit: typeof process.exit = process.exit; + const originalExit: typeof process.exit = process.exit;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/providers.test.ts` at line 398, Change the declaration of the test variable originalExit from a mutable binding to an immutable one by replacing "let originalExit: typeof process.exit = process.exit" with a const declaration (retain the type annotation typeof process.exit and the assigned value), so the intent that originalExit is never reassigned is explicit in the test setup for the providers tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/onboard/providers.test.ts`:
- Line 409: The test replaces process.exit with a mock using an unsafe cast;
update the mock assignment of process.exit (the line assigning process.exit =
((code?: number) => { ... })) to use a consistent and type-safe cast "as typeof
process.exit" instead of "as never" so the mocked function matches the original
process.exit signature and aligns with the earlier test pattern.
- Line 398: Change the declaration of the test variable originalExit from a
mutable binding to an immutable one by replacing "let originalExit: typeof
process.exit = process.exit" with a const declaration (retain the type
annotation typeof process.exit and the assigned value), so the intent that
originalExit is never reassigned is explicit in the test setup for the providers
tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 58be8791-aa4a-4e52-a3d3-ba29ce01aac0
📒 Files selected for processing (6)
src/lib/actions/sandbox/destroy.tssrc/lib/onboard.tssrc/lib/onboard/providers.test.tssrc/lib/onboard/providers.tssrc/lib/onboard/sandbox-provider-cleanup.tstest/sandbox-provider-cleanup.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- src/lib/onboard/providers.ts
- src/lib/onboard.ts
- src/lib/actions/sandbox/destroy.ts
- test/sandbox-provider-cleanup.test.ts
- src/lib/onboard/sandbox-provider-cleanup.ts
…alidate parsed sandbox names Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27136992165
|
Selective E2E Results — ✅ All requested jobs passedRun: 27137705209
|
Selective E2E Results — ✅ All requested jobs passedRun: 27147228417
|
Summary
nemoclaw onboardrebuilds (andnemoclaw <name> destroy) tore down the sandbox without first detaching its attached messaging and search providers, so the immediately-followingprovider delete(driven by either the destroy cleanup loop orupsertMessagingProviders({replaceExisting: true})on rebuild) hitFailedPrecondition: provider '<name>' is attached to sandbox(es): <sandbox>and the per-failureprocess.exit(1)aborted onboard. The same orphan-attachment also brokeonboard --resumewhen the previous sandbox had been pruned out from under NemoClaw.This PR adds an explicit
sandbox provider detachpass over the shared per-sandbox provider suffix set before every destructive lifecycle step, recovers from any residual FailedPrecondition by parsing the attached-sandbox list out of the diagnostic and forcing a per-sandbox detach + retry, and surfaces destroy-time residuals with a manual cleanup hint.Related Issue
Fixes #4890
Changes
src/lib/onboard/sandbox-provider-cleanup.ts:SANDBOX_PROVIDER_SUFFIXESshared suffix set (now includesbrave-search).detachSandboxProviders()— best-effort detach across the suffix set, with a tightened tolerance regex that only acceptsNotAttached/ "not attached" and provider-scopedNotFound/ "not found"; bare sandbox-not-found is intentionally not tolerated so stale-attachment failures stay visible.runSandboxProviderPreDeleteCleanup()— wraps the detach pass, redacts and length-caps any non-tolerated failure output through the injectedwarnchannel, and returns the failure list to callers that want stricter semantics.parseAttachedSandboxes()+recoverAttachedProvider()— parse a FailedPrecondition diagnostic of the shapeprovider 'X' is attached to sandbox(es): A, Band detach the provider from each named sandbox, even when the local pre-delete pass couldn't reach the now-missing sandbox (resume case).src/lib/onboard.tsrebuild path now callsrunSandboxProviderPreDeleteCleanup(sandboxName)immediately beforerunOpenshell(["sandbox", "delete", ...]), and again right beforeupsertMessagingProviders({replaceExisting: true})so the--resume-after-prune path also clears stale attachments before the replace-existing provider deletes run.src/lib/onboard/providers.tsupsertProviderrecovery: on aprovider deleteFailedPrecondition with an attached-sandbox list, force the detach viarecoverAttachedProviderand retry the delete before failing.src/lib/actions/sandbox/destroy.tsconsumes the shared constant in place of its previous hard-coded suffix list (which addsbrave-searchto the destroy-time cleanup), runsrunSandboxProviderPreDeleteCleanupbefore its ownsandbox delete, captures the result, and prints a loud manual-cleanup hint at the end if any non-tolerated detach failures occurred.NotAttachedspelling, sandbox-not-found is not tolerated, redaction, length cap, parser shapes, recovery success / partial / failure), 1 providers integration test asserting the fullget → delete → recover → delete → createchain on FailedPrecondition, 1 CLI integration test asserting the destroy path issues all six detach calls beforesandbox delete, and the existingtest/destroy-cleanup-sandbox-services.test.tsassertion for the expanded suffix list.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
Bug Fixes
New Features
Tests