fix(onboard): harden GPU patch reconnect and rc cleanup against non-fatal failures by laitingsheng · Pull Request #4777 · NVIDIA/NemoClaw

laitingsheng · 2026-06-04T18:11:28Z

Summary

Two related hardening changes on the Docker GPU patch path.

The supervisor reconnect wait now gets a larger default debounce window. The patch also rolls back to the pre-patch container when reconnect fails. A transient or environment-specific reconnect issue no longer leaves the user with a deleted-backup, failed-new state.

The non-root rc shim cleanup at container start now exits cleanly when it cannot legally rewrite a root-owned rc file. Previously it crashed the container under errexit.

Related Issue

Fixes #4664
Fixes #4713

Changes

Raised DOCKER_GPU_SUPERVISOR_RECONNECT_ERROR_PHASE_DEFAULT_DEBOUNCE_POLLS from 5 to 15. That is ~30 seconds of sustained Error tolerated instead of ~10 seconds. The original WSL2 repro for [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 showed about 12 seconds of sandbox list cache divergence after recreate, sitting at the edge of the old window on slower hosts.
Reordered recreateOpenShellDockerSandboxWithGpu so the supervisor reconnect wait runs before the backup container is removed. If the wait fails, a new rollbackToBackupContainer helper takes over. It stops and removes the new GPU container, renames the backup back to the original name, then starts it. The thrown error message reports whether the rollback succeeded.
Added a dockerStart adapter alongside the existing dockerStop and friends. Threaded it through DockerGpuPatchDeps and depsWithDefaults so the rollback path can issue the start.
Extended DockerGpuPatchFailureContext with a rolledBack flag. Surfaced it as a rolled_back= line in the on-disk diag summary.
Added an ownership guard to the ensure_runtime_shell_env_shim Python heredoc in scripts/nemoclaw-start.sh. When the rc file is not owned by the current uid and we are not root, the cleanup now logs a [SECURITY] marker and exits 0. Previously it tried fchmod on a file it could not chmod, raised EPERM, and took the entire container down through errexit. This matches the failure path observed in sandbox container is fallen with exit code 1 when deploying NemoClaw with GPU #4713.
Tests cover the new debounce default, the rollback success path, the rollback restore-failure path, and both the ownership-skip and owner-rewrite branches of the rc cleanup.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

Bug Fixes
- GPU patch rollback now preserves backups and reliably restores them if supervisor reconnect fails; backups removed only after successful reconnect (or reported when not).
Improvements
- Increased supervisor-reconnect debounce window (default ~30s) for more stable GPU patching.
- Entrypoint cleanup now delegates to an installed runtime helper (with configurable fallback) and ships that helper in the runtime image.
- Added a helper to start Docker containers programmatically.
Tests
- New and expanded tests covering rc cleanup, finalize/rollback and composed GPU patch flows.
Documentation
- Updated troubleshooting guidance to reflect new debounce defaults.

…atal failures Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-06-04T18:11:43Z

Warning

Review limit reached

@laitingsheng, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 58 minutes and 43 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9098dfb5-8f59-4ec5-a02b-3fef337e5615

📥 Commits

Reviewing files that changed from the base of the PR and between 756d64c and de04db9.

📒 Files selected for processing (2)

src/lib/sandbox/build-context.ts
test/sandbox-build-context.test.ts

📝 Walkthrough

Walkthrough

Adds supervisor-reconnect reconciliation and finalize/rollback for Docker GPU patches, bumps reconnect debounce default to 15 polls, exports a dockerStart helper, and extracts rc-file cleanup into a standalone Python helper included in the runtime image with tests and startup wiring updates.

Changes

Docker GPU patch rollback and reconnect tuning

Layer / File(s)	Summary
Docker start helper `src/lib/adapters/docker/container.ts`	`dockerStart(containerName, opts)` exported as a wrapper around `dockerRun(["start", ...])` used by finalize/rollback logic.
Supervisor reconnect debounce tuning `src/lib/onboard/docker-gpu-supervisor-reconnect.ts`, `src/lib/onboard/docker-gpu-supervisor-reconnect.test.ts`, `docs/reference/troubleshooting.mdx`, `.agents/...`, `skills/...`	Default consecutive Error-phase debounce polls increased from 5→15; tests and troubleshooting docs updated to reflect new default and example override guidance.
Finalize / rollback API and impl `src/lib/onboard/docker-gpu-patch-finalize.ts`, `src/lib/onboard/docker-gpu-patch.ts`	Adds `rollbackToBackupContainer`, `finalizeDockerGpuPatchBackup`, and `reconcileSupervisorReconnect` with Docker-dep resolution and zero-status handling; updates types and default deps.
Sandbox create: finalize wiring & overrides `src/lib/onboard/docker-gpu-sandbox-create.ts`, `src/lib/onboard/docker-gpu-patch.ts`	Wires finalize into recreate/create failure path, adds overrideable test seams, extends patch result/context (`originalName`, `backupRemoved`, `rolledBack`), and persists `rolled_back` to diagnostics.
GPU patch tests and ordering assertions `src/lib/onboard/docker-gpu-patch.test.ts`, `src/lib/onboard/docker-gpu-patch-finalize.test.ts`, `src/lib/onboard/docker-gpu-patch-rollback.test.ts`, `src/lib/onboard/docker-gpu-sandbox-create.test.ts`	Adds/updates tests covering backup removal ordering, deferred cleanup, reconnect-failure rollback, restore/start failure cases, finalize outcomes, and composed create flow behavior.

RC cleanup packaging and ownership guard

Layer / File(s)	Summary
Packaged rc cleanup script and image wiring `scripts/lib/clean_runtime_shell_env_shim.py`, `scripts/nemoclaw-start.sh`, `Dockerfile`	Extracts inline heredoc cleanup to `clean_runtime_shell_env_shim.py`, resolves helper by precedence (`/usr/local/lib/nemoclaw/...` → `NEMOCLAW_RC_CLEAN_SCRIPT` → `scripts/lib/...`), calls it with `(rc_file, shim_text, uid)`, includes script in image, and updates permissions. Script performs secure open, ownership checks, in-place rewrite or atomic replace, and strict error reporting.
RC cleanup tests and startup wrapper updates `test/clean-runtime-shell-env-shim.test.ts`, `test/nemoclaw-start.test.ts`, `test/service-env.test.ts`, `test/sandbox-provisioning.test.ts`	Adds Vitest coverage for owned-file cleanup, ownership-skip diagnostics, symlink refusal, idempotence, and partial-stanza removal; updates startup-snippet extraction and test wrapper headers; adds composed regression test for uid-mismatch verifying `proxy-env.sh` mode `444` and unchanged `.bashrc`.

Sequence Diagram

sequenceDiagram
  participant Recreate as recreateOpenShellDockerSandboxWithGpu
  participant DockerAdapter as Docker adapter
  participant Supervisor as waitForOpenShellSupervisorReconnect
  participant Finalize as finalizeDockerGpuPatchBackup

  Recreate->>DockerAdapter: create GPU-enabled container (rename/backup → new)
  DockerAdapter-->>Recreate: new container created
  Recreate->>Supervisor: waitForOpenShellSupervisorReconnect (debounced)
  alt supervisor reconnects
    Supervisor-->>Recreate: execReady
    Recreate->>DockerAdapter: docker rm backup container
    DockerAdapter-->>Recreate: rm result
    Recreate-->>Recreate: return success (backupRemoved=true)
  else reconnect fails
    Supervisor-->>Recreate: timeout/error
    Recreate->>Finalize: finalizeDockerGpuPatchBackup({result, supervisorReady:false})
    Finalize->>DockerAdapter: stop/remove new container
    Finalize->>DockerAdapter: rename backup -> original
    Finalize->>DockerAdapter: docker start original (dockerStart)
    DockerAdapter-->>Finalize: start result
    Finalize-->>Recreate: {backupRemoved:false, rolledBack}
    Recreate-->>Recreate: record rolledBack in failure context and throw error
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4668: Related supervisor-reconnect debounce logic and its introduction.
NVIDIA/NemoClaw#4407: Related GPU patch diagnostics changes that touch summary.txt output.

Suggested labels

fix, Sandbox, Docker, area: onboarding, platform: container

Suggested reviewers

prekshivyas
cv

"🐰 A rollback hop when reconnects stall,
The backup waits — it steadies all.
Fifteen beats the watchful drum,
Start the old, or let the new one hum.
RC guards keep files safe for all."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(onboard): harden GPU patch reconnect and rc cleanup against non-fatal failures' accurately and concisely describes the main changes: increasing reconnect tolerance, adding rollback support, and hardening rc cleanup against non-fatal failures.
Linked Issues check	✅ Passed	The PR comprehensively addresses both linked issues `#4664` and `#4713`: increases debounce polls for WSL2 timing race (`#4664`), implements rollback path when supervisor reconnect fails, adds dockerStart adapter, hardens rc cleanup with ownership guard against non-fatal failures (`#4713`), and includes test coverage for all hardened paths.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to addressing GPU patch reconnect hardening and rc cleanup ownership issues. Modifications include debounce constant updates, rollback logic, supervisorReconnect reconciliation, rc cleanup safety, Docker adapter additions, and corresponding test coverage with no unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4664-4713-gpu-patch-hardening

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T18:12:04Z

E2E Advisor Recommendation

Required E2E: test-non-root-sandbox-smoke, test-e2e-sandbox, cloud-e2e, openclaw-onboard-security-posture-e2e, gpu-e2e
Optional E2E: gpu-double-onboard-e2e, sandbox-operations-e2e, runtime-overrides-e2e, test-e2e-gateway-isolation

Dispatch hint: cloud-e2e,openclaw-onboard-security-posture-e2e,gpu-e2e

Auto-dispatched E2E: cloud-e2e, openclaw-onboard-security-posture-e2e, gpu-e2e via nightly-e2e.yaml at de04db9e067397c0d30b2e1ddfd160cd2ada3fa5 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

test-non-root-sandbox-smoke (low): Validates the production sandbox entrypoint under --security-opt no-new-privileges. This is the most direct existing E2E guard for the new rc cleanup helper and non-root startup path in nemoclaw-start.sh.
test-e2e-sandbox (medium): Builds the changed Dockerfile into a sandbox image and runs the in-container sandbox/blueprint smoke suite, catching Dockerfile copy/chmod/build-context regressions before merge.
cloud-e2e (high): Runs the full install → onboard → sandbox verify → live inference user journey. Required because the PR changes sandbox startup, provisioning/build context, and Docker-backed onboarding behavior that can break non-GPU users too.
openclaw-onboard-security-posture-e2e (high): Full OpenClaw onboarding with non-root host/security-posture assertions is required for the rc-file cleanup/security-boundary change and proxy-env trust invariant.
gpu-e2e (high): Exercises real local Ollama GPU onboarding on a GPU runner. Required because the PR changes the Docker GPU patch/create path, supervisor reconnect wait, backup removal, and rollback-adjacent lifecycle behavior.

Optional E2E

gpu-double-onboard-e2e (high): Useful adjacent coverage for repeated Ollama/GPU onboarding and token/proxy consistency after the GPU patch changes, but the core success path is covered by gpu-e2e.
sandbox-operations-e2e (high): Adds confidence for general sandbox lifecycle operations, multi-sandbox behavior, and gateway recovery after changes to sandbox provisioning/build context and container lifecycle helpers.
runtime-overrides-e2e (medium): Adjacent confidence for runtime environment propagation after moving legacy shell-env cleanup into an installed Python helper.
test-e2e-gateway-isolation (medium): Optional image-level security boundary check because Dockerfile and startup environment wiring changed, though the PR does not directly alter network isolation policy.

New E2E recommendations

Docker GPU rollback on supervisor reconnect failure (high): Existing GPU E2E validates successful GPU onboarding, while the new rollback/finalize behavior is primarily unit-tested. Add an E2E that forces or simulates post-recreate supervisor reconnect failure and asserts the backup container is restored, the backup is not prematurely removed, and diagnostics report rolled_back correctly.
- Suggested test: docker-gpu-supervisor-reconnect-rollback-e2e
legacy rc shim cleanup with mismatched ownership (medium): Unit tests cover the helper, and non-root smoke covers startup, but there is no full image E2E fixture that starts with a legacy /tmp/nemoclaw-proxy-env.sh rc stanza in a mismatched-owner rc file and verifies startup remains safe and proxy env file permissions stay hardened.
- Suggested test: legacy-runtime-shell-env-shim-mismatched-owner-e2e

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: cloud-e2e,openclaw-onboard-security-posture-e2e,gpu-e2e

github-actions · 2026-06-04T18:12:05Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw, gpu-repo-local-ollama-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: Primary Ubuntu repo-checkout Docker scenario exercises Dockerfile/build-context changes plus sandbox startup via scripts/nemoclaw-start.sh and the new rc-cleanup helper during normal onboarding and smoke validation.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
gpu-repo-local-ollama-openclaw: Only routed scenario that exercises the NVIDIA GPU Docker runtime path affected by docker-gpu-patch, supervisor reconnect, deferred backup finalization, rollback, and dockerStart adapter changes.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

wsl-repo-cloud-openclaw: Optional adjacent platform coverage for sandbox startup and Docker Desktop/WSL behavior; useful because the changed startup and GPU reconnect documentation mention slow WSL2 + Docker Desktop setups, but the routed WSL scenario does not exercise the GPU patch path directly.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

Dockerfile
scripts/lib/clean_runtime_shell_env_shim.py
scripts/nemoclaw-start.sh
src/lib/adapters/docker/container.ts
src/lib/onboard/docker-gpu-patch-finalize.ts
src/lib/onboard/docker-gpu-patch.ts
src/lib/onboard/docker-gpu-sandbox-create.ts
src/lib/onboard/docker-gpu-supervisor-reconnect.ts
src/lib/sandbox/build-context.ts

github-actions · 2026-06-04T18:12:08Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 1 nice ideas
Since last review: 1 prior item resolved, 3 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Runtime validation is still important for the changed sandbox lifecycle paths: The PR adds useful unit and composed tests, including rc cleanup fixture tests and mocked Docker rollback/finalize tests. However, the changed behavior depends on real Docker container state, OpenShell supervisor reconnect timing, rollback side effects, production image helper precedence, root/non-root file ownership, and /tmp trust-boundary permissions. Those interactions are security-sensitive and not fully proven by mocks or extracted shell snippets.
- Recommendation: Add or identify targeted runtime validation for the behavior-specific cases listed in testDepth.suggestedTests, focused on actual Docker/OpenShell and ownership behavior rather than external job status.
- Evidence: Changed surfaces include scripts/nemoclaw-start.sh, scripts/lib/clean_runtime_shell_env_shim.py, Dockerfile, src/lib/onboard/docker-gpu-patch-finalize.ts, src/lib/onboard/docker-gpu-patch.ts, src/lib/onboard/docker-gpu-sandbox-create.ts, and src/lib/onboard/docker-gpu-supervisor-reconnect.ts. Tests cover helper and rollback contracts mostly through fixtures/mocks.
[WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 Health=starting retry clause is addressed by an alternate design (src/lib/onboard/docker-gpu-supervisor-reconnect.ts:39): One [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 suggested fix says to retry reconnect when the patched Docker container is `Status=running` plus `Health=starting`. This PR intentionally implements a generic Error-phase debounce plus rollback path instead; that may be acceptable, but the reconnect wait still does not inspect or test the exact Docker State/Health condition from the issue.
- Recommendation: Confirm the debounce-plus-rollback alternate design is the accepted resolution for [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664, or add a Docker-state observation path and behavior test for `Status=running`, `ExitCode=0`, `Health=starting` while `openshell sandbox list` temporarily reports Error.
- Evidence: [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 suggested fix 2 is: "If the patched container is `Status=running` + `Health=starting`, retry reconnect rather than declaring fatal". The code comment in docker-gpu-supervisor-reconnect.ts explains this was considered and rejected in favor of generic debounce plus rollback.

🌱 Nice ideas

Deferred create success path still drops backup cleanup failure status (src/lib/onboard/docker-gpu-sandbox-create.ts:184): The helper now returns `backupRemoved=false` when backup removal fails after a successful supervisor reconnect, but the deferred create caller returns immediately on `supervisorReady` without surfacing or logging that incomplete cleanup state. The sandbox is healthy, so this is not fatal, but it can leave a leaked backup container without a clear diagnostic from this path.
- Recommendation: Consider logging or otherwise preserving `finalizeOutcome.backupRemoved=false` on the `supervisorReady=true` path, and add a behavior test where the deferred create flow succeeds but backup `docker rm` fails.
- Evidence: finalizeDockerGpuPatchBackup returns `{ backupRemoved: false, rolledBack: false }` when docker rm fails, while waitForSupervisorReconnectIfNeeded calls finalizeBackup and then immediately returns when `supervisorReady` is true.

Consider writing more tests for

**Runtime validation** — Runtime/image: installed `/usr/local/lib/nemoclaw/clean_runtime_shell_env_shim.py` exists while `NEMOCLAW_RC_CLEAN_SCRIPT` points to a failing executable, and `ensure_runtime_shell_env_shim` still uses the installed helper.. The PR adds meaningful unit and composed tests, but the changed runtime/sandbox/infrastructure behavior still depends on real Docker daemon semantics, OpenShell supervisor timing, container rollback side effects, production image staging, root/non-root file ownership, and /tmp trust-boundary permissions.
**Runtime validation** — Runtime/image: non-root `nemoclaw-start` with a root-owned `.bashrc` containing the legacy shim exits 0 and leaves `/tmp/nemoclaw-proxy-env.sh` non-user-writable or mode 444 where privilege separation applies.. The PR adds meaningful unit and composed tests, but the changed runtime/sandbox/infrastructure behavior still depends on real Docker daemon semantics, OpenShell supervisor timing, container rollback side effects, production image staging, root/non-root file ownership, and /tmp trust-boundary permissions.
**Runtime validation** — Real Docker/OpenShell: after a replacement GPU container starts but supervisor reconnect fails, the backup container is restored under the original name and started.. The PR adds meaningful unit and composed tests, but the changed runtime/sandbox/infrastructure behavior still depends on real Docker daemon semantics, OpenShell supervisor timing, container rollback side effects, production image staging, root/non-root file ownership, and /tmp trust-boundary permissions.
**Runtime validation** — Real Docker/OpenShell: supervisor reconnect succeeds but backup `docker rm` fails, and the caller logs or persists that backup cleanup was incomplete.. The PR adds meaningful unit and composed tests, but the changed runtime/sandbox/infrastructure behavior still depends on real Docker daemon semantics, OpenShell supervisor timing, container rollback side effects, production image staging, root/non-root file ownership, and /tmp trust-boundary permissions.
**Runtime validation** — Unit behavior: `waitForOpenShellSupervisorReconnect` sees 14 consecutive Error polls followed by Ready at poll 15 and succeeds with the default K=15.. The PR adds meaningful unit and composed tests, but the changed runtime/sandbox/infrastructure behavior still depends on real Docker daemon semantics, OpenShell supervisor timing, container rollback side effects, production image staging, root/non-root file ownership, and /tmp trust-boundary permissions.
**Runtime validation is still important for the changed sandbox lifecycle paths** — Add or identify targeted runtime validation for the behavior-specific cases listed in testDepth.suggestedTests, focused on actual Docker/OpenShell and ownership behavior rather than external job status.
**Acceptance clause:** [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 Description: "the GPU-patched container starts with `--gpus all` OK and shows `Status=running`, `ExitCode=0`, `Health=starting`." — add test evidence or identify existing coverage. The PR raises the generic Error-phase debounce to 15 polls and tests transient Error recovery, but waitForOpenShellSupervisorReconnect does not inspect Docker State.Status, ExitCode, or Health.Status.
**Acceptance clause:** [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 Expected Result: "After GPU patch recreates the sandbox container with `--gpus all`, OpenShell supervisor should reconnect to the new container, the sandbox phase should transition to `Ready`, and onboard should proceed to `[7/8]` / `[8/8]`." — add test evidence or identify existing coverage. createDockerGpuSandboxCreatePatch waits for supervisor reconnect and finalizes the backup after `supervisorReady=true`, with composed tests for that flow. Issue comments report PR fix(onboard): harden GPU patch reconnect and rc cleanup against non-fatal failures #4777 reaching `[7/8]` and `[8/8]`, but there is no in-repo test that drives the full onboard progression through those phases.

Since last review details

Current findings:

Runtime validation is still important for the changed sandbox lifecycle paths: The PR adds useful unit and composed tests, including rc cleanup fixture tests and mocked Docker rollback/finalize tests. However, the changed behavior depends on real Docker container state, OpenShell supervisor reconnect timing, rollback side effects, production image helper precedence, root/non-root file ownership, and /tmp trust-boundary permissions. Those interactions are security-sensitive and not fully proven by mocks or extracted shell snippets.
- Recommendation: Add or identify targeted runtime validation for the behavior-specific cases listed in testDepth.suggestedTests, focused on actual Docker/OpenShell and ownership behavior rather than external job status.
- Evidence: Changed surfaces include scripts/nemoclaw-start.sh, scripts/lib/clean_runtime_shell_env_shim.py, Dockerfile, src/lib/onboard/docker-gpu-patch-finalize.ts, src/lib/onboard/docker-gpu-patch.ts, src/lib/onboard/docker-gpu-sandbox-create.ts, and src/lib/onboard/docker-gpu-supervisor-reconnect.ts. Tests cover helper and rollback contracts mostly through fixtures/mocks.
[WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 Health=starting retry clause is addressed by an alternate design (src/lib/onboard/docker-gpu-supervisor-reconnect.ts:39): One [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 suggested fix says to retry reconnect when the patched Docker container is `Status=running` plus `Health=starting`. This PR intentionally implements a generic Error-phase debounce plus rollback path instead; that may be acceptable, but the reconnect wait still does not inspect or test the exact Docker State/Health condition from the issue.
- Recommendation: Confirm the debounce-plus-rollback alternate design is the accepted resolution for [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664, or add a Docker-state observation path and behavior test for `Status=running`, `ExitCode=0`, `Health=starting` while `openshell sandbox list` temporarily reports Error.
- Evidence: [WSL2 x86_64][Sandbox] OpenShell supervisor fails to reconnect to GPU-patched sandbox container; sandbox enters Error phase #4664 suggested fix 2 is: "If the patched container is `Status=running` + `Health=starting`, retry reconnect rather than declaring fatal". The code comment in docker-gpu-supervisor-reconnect.ts explains this was considered and rejected in favor of generic debounce plus rollback.
Deferred create success path still drops backup cleanup failure status (src/lib/onboard/docker-gpu-sandbox-create.ts:184): The helper now returns `backupRemoved=false` when backup removal fails after a successful supervisor reconnect, but the deferred create caller returns immediately on `supervisorReady` without surfacing or logging that incomplete cleanup state. The sandbox is healthy, so this is not fatal, but it can leave a leaked backup container without a clear diagnostic from this path.
- Recommendation: Consider logging or otherwise preserving `finalizeOutcome.backupRemoved=false` on the `supervisorReady=true` path, and add a behavior test where the deferred create flow succeeds but backup `docker rm` fails.
- Evidence: finalizeDockerGpuPatchBackup returns `{ backupRemoved: false, rolledBack: false }` when docker rm fails, while waitForSupervisorReconnectIfNeeded calls finalizeBackup and then immediately returns when `supervisorReady` is true.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/docker-gpu-patch.ts`:
- Around line 896-903: The rollback currently treats dockerStart as
fire-and-forget and always returns true; update the rollback to capture the
result of d.dockerStart(refs.originalName, containerOpts), check it with
isZeroStatus (like the restored variable), and return false if dockerStart fails
(non-zero), otherwise return true; reference d.dockerRename/restored,
isZeroStatus, d.dockerStart, refs.backupContainerName and refs.originalName so
the change is applied to the same rollback flow and ensure any caller sees the
actual start failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c35031f-d881-49ad-b95c-b752b4a841f6

📥 Commits

Reviewing files that changed from the base of the PR and between 949dde7 and 8f580c2.

📒 Files selected for processing (7)

scripts/nemoclaw-start.sh
src/lib/adapters/docker/container.ts
src/lib/onboard/docker-gpu-patch.test.ts
src/lib/onboard/docker-gpu-patch.ts
src/lib/onboard/docker-gpu-supervisor-reconnect.test.ts
src/lib/onboard/docker-gpu-supervisor-reconnect.ts
test/nemoclaw-start.test.ts

…ainer Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-04T18:45:53Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26972019075
Target ref: bf9577bd8a47131c80bd9d6dab412ae16622a62a
Workflow ref: main
Requested jobs: gpu-e2e,openclaw-onboard-security-posture-e2e
Summary: 1 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped
openclaw-onboard-security-posture-e2e	✅ success

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/docker-gpu-sandbox-create.ts`:
- Around line 157-161: Update the deferred finalize error message to explicitly
indicate rollback failure when finalizeOutcome.rolledBack is false: locate the
new Error(...) construction that uses finalizeOutcome.rolledBack and replace the
generic "did not reconnect" message in the rolledBack === false branch with a
clear message such as "OpenShell supervisor did not reconnect to the GPU-enabled
container and rollback failed; pre-patch sandbox was NOT restored" (or
equivalent wording) so operators can distinguish reconnect-only failures from
failed rollbacks; keep the rolledBack === true message unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 036a97c9-da7c-4dd1-a309-28633f77f5a0

📥 Commits

Reviewing files that changed from the base of the PR and between 8f580c2 and 73a053a.

📒 Files selected for processing (3)

src/lib/onboard/docker-gpu-patch.test.ts
src/lib/onboard/docker-gpu-patch.ts
src/lib/onboard/docker-gpu-sandbox-create.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-05T02:16:14Z

🌿 Preview your docs: https://nvidia-preview-pr-4777.docs.buildwithfern.com/nemoclaw

github-actions · 2026-06-05T02:25:18Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26991352191
Target ref: 97c77f53e07779c44d706469d055a0b908708f59
Workflow ref: main
Requested jobs: gpu-e2e,gpu-double-onboard-e2e,openclaw-onboard-security-posture-e2e
Summary: 1 passed, 0 failed, 2 skipped

Job	Result
gpu-double-onboard-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped
openclaw-onboard-security-posture-e2e	✅ success

…ch-hardening

cv · 2026-06-05T06:18:44Z

@laitingsheng can you address the findings and suggestions in #4777 (comment), please? Otherwise, looking great!

…ocused module Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…oot rc ownership mismatch is fixture-tested Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-05T08:10:21Z

Selective E2E Results — ❌ Some jobs failed

Run: 27003399558
Target ref: e8b94a91a39d843217054248a4c9edf0b08c8029
Workflow ref: main
Requested jobs: gpu-e2e,openclaw-onboard-security-posture-e2e
Summary: 0 passed, 1 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped
openclaw-onboard-security-posture-e2e	❌ failure

Failed jobs: openclaw-onboard-security-posture-e2e. Check run artifacts for logs.

… it against CodeQL findings Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-05T08:56:27Z

Selective E2E Results — ❌ Some jobs failed

Run: 27005449457
Target ref: e37e7e18fe3a4ee408be756a0d867c754084a050
Workflow ref: main
Requested jobs: gpu-e2e,openclaw-onboard-security-posture-e2e
Summary: 0 passed, 1 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped
openclaw-onboard-security-posture-e2e	❌ failure

Failed jobs: openclaw-onboard-security-posture-e2e. Check run artifacts for logs.

…invariant, trim docker-gpu-patch test hotspot Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-05T09:27:12Z

Selective E2E Results — ❌ Some jobs failed

Run: 27006820492
Target ref: 6647bf25e4db2081dc13dff303c21838c92b1d9e
Workflow ref: main
Requested jobs: gpu-e2e,openclaw-onboard-security-posture-e2e
Summary: 0 passed, 1 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped
openclaw-onboard-security-posture-e2e	❌ failure

Failed jobs: openclaw-onboard-security-posture-e2e. Check run artifacts for logs.

…inst advisor findings Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…nd drop issue refs from comments Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…d context Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…ch-hardening

github-actions · 2026-06-05T12:05:33Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27013516426
Target ref: fix/4664-4713-gpu-patch-hardening
Requested jobs: openclaw-onboard-security-posture-e2e,gpu-e2e,gpu-double-onboard-e2e
Summary: 1 passed, 0 failed, 2 skipped

Job	Result
gpu-double-onboard-e2e	⏭️ skipped
gpu-e2e	⏭️ skipped
openclaw-onboard-security-posture-e2e	✅ success

github-actions · 2026-06-05T12:08:36Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27013613190
Target ref: de04db9e067397c0d30b2e1ddfd160cd2ada3fa5
Workflow ref: main
Requested jobs: cloud-e2e,openclaw-onboard-security-posture-e2e,gpu-e2e
Summary: 2 passed, 0 failed, 1 skipped

Job	Result
cloud-e2e	✅ success
gpu-e2e	⏭️ skipped
openclaw-onboard-security-posture-e2e	✅ success

prekshivyas

Re-reviewed against the current head (de04db9) — refreshing the earlier approval, which predated a substantial rework. Verified the substantive logic:

Rollback path (docker-gpu-patch-finalize.ts): reconcileSupervisorReconnect runs the reconnect probe before removing the backup, so a failed reconnect rolls back instead of leaving the deleted-backup/failed-new state from #4664. rollbackToBackupContainer ordering is correct — stop+rm the new container (freeing the original name), rename backup→original (bail with false if that fails), start it — and the thrown error honestly reports whether the pre-patch sandbox was restored. On the healthy path, backup removal is best-effort with the real rm status surfaced for diagnostics.
rc-shim cleanup (clean_runtime_shell_env_shim.py): the ownership guard if uid != 0 && st.st_uid != uid → skip(exit 0) is correct — it only leaves the legacy stanza when it genuinely cannot legally rewrite (not root, not owner), fixing the #4713 errexit crash. The script is otherwise hardened: O_NOFOLLOW (symlink refusal), regular-file check, post-clean verification that the shim is actually gone (exit 1 if it survives a rewritable case), TOCTOU-safe /proc/self/fd reopen with same_file re-validation, and original-mode preservation. The threat-model note (leftover shim only sources the mode-444 proxy-env file) holds.
Shipping: the extracted script is staged via build-context.ts, COPY'd to /usr/local/lib/nemoclaw/ in the Dockerfile, and invoked from there by nemoclaw-start.sh (with a source-relative fallback for tests) — no missing-file risk.

The debounce bump (5→15 polls) is a straightforward tolerance increase. Tests cover the rollback success/restore-failure paths, the debounce default, and both rc-cleanup branches. CI green. Good to merge.

## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests  ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments.  --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>

fix(onboard): harden GPU patch reconnect and rc cleanup against non-f…

8f580c2

…atal failures Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added platform: ubuntu Affects Ubuntu Linux environments bug-fix PR fixes a bug or regression area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery and removed platform: ubuntu Affects Ubuntu Linux environments labels Jun 4, 2026

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread src/lib/onboard/docker-gpu-patch.ts Outdated

fix(onboard): propagate dockerStart status in rollback to backup cont…

bf9577b

…ainer Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(onboard): defer GPU patch backup removal to caller-driven finalizer

73a053a

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread src/lib/onboard/docker-gpu-sandbox-create.ts Outdated

docs(troubleshooting): refresh debounce default to 15 polls / 30s

97c77f5

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

MasakiYamagishiSCSK mentioned this pull request Jun 5, 2026

sandbox container is fallen with exit code 1 when deploying NemoClaw with GPU #4713

Closed

2 tasks

Merge remote-tracking branch 'origin/main' into fix/4664-4713-gpu-pat…

c6c0ed6

…ch-hardening

laitingsheng added v0.0.59 Release target v0.0.60 Release target and removed v0.0.59 Release target labels Jun 5, 2026

cv approved these changes Jun 5, 2026

View reviewed changes

laitingsheng added 2 commits June 5, 2026 08:04

refactor(onboard): split GPU patch rollback and finalisation into a f…

c69777a

…ocused module Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(scripts): extract rc-cleanup helper to standalone Python so non-r…

e8b94a9

…oot rc ownership mismatch is fixture-tested Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-advanced-security AI found potential problems Jun 5, 2026

View reviewed changes

Comment thread scripts/lib/clean_runtime_shell_env_shim.py Fixed

Comment thread scripts/lib/clean_runtime_shell_env_shim.py Fixed

github-advanced-security AI found potential problems Jun 5, 2026

View reviewed changes

Comment thread test/clean-runtime-shell-env-shim.test.ts Fixed

Comment thread test/clean-runtime-shell-env-shim.test.ts Fixed

Comment thread test/clean-runtime-shell-env-shim.test.ts Fixed

Comment thread test/clean-runtime-shell-env-shim.test.ts Fixed

fix(onboard): keep rc-cleanup script discoverable in tests and harden…

e37e7e1

… it against CodeQL findings Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng removed the v0.0.60 Release target label Jun 5, 2026

fix(onboard): cover GPU patch composed create-flow and #4713 startup …

6647bf2

…invariant, trim docker-gpu-patch test hotspot Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added 4 commits June 5, 2026 09:54

fix(onboard): harden GPU patch finalisation and rc-cleanup helper aga…

4748d84

…inst advisor findings Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(onboard): track docker rm status through reconnect success path a…

756d64c

…nd drop issue refs from comments Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(sandbox): stage clean_runtime_shell_env_shim.py into sandbox buil…

cff0334

…d context Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Merge remote-tracking branch 'origin/main' into fix/4664-4713-gpu-pat…

de04db9

…ch-hardening

laitingsheng added the v0.0.60 Release target label Jun 5, 2026

prekshivyas self-assigned this Jun 5, 2026

cv merged commit 43efeba into main Jun 5, 2026
97 of 98 checks passed

cv deleted the fix/4664-4713-gpu-patch-hardening branch June 5, 2026 16:02

prekshivyas approved these changes Jun 5, 2026

View reviewed changes

miyoungc mentioned this pull request Jun 6, 2026

docs: refresh v0.0.60 release notes #4879

Merged

Conversation

laitingsheng commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

cv commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 5, 2026

laitingsheng commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading