fix(ci): stabilize platform Vitest failures by cv · Pull Request #4903 · NVIDIA/NemoClaw

cv · 2026-06-07T05:20:39Z

Summary

Stabilizes the macOS and WSL platform Vitest failures from the main watch run by removing platform-dependent assumptions in the gateway env and Hermes startup fixtures.

Changes

Split the OpenShell gateway env-file write path so package-managed startup does not re-run service detection after it has already selected the service path.
Isolate the Hermes startup cleanup fixture from host uid/account state by aligning the embedded Python helper with the mocked non-root shell identity.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Refactor
- Restructured Docker gateway env handling for improved maintainability and clearer error behavior during startup.
Tests
- Improved gateway startup test harness to align Python/runtime fixtures with shell mocks.
- Strengthened CLI tests to include richer failure diagnostics when parser-required arguments are missing.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai · 2026-06-07T05:20:51Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0035f09a-f2fa-424f-80b7-77c312fc2955

📥 Commits

Reviewing files that changed from the base of the PR and between dac36d1 and 2a85605.

📒 Files selected for processing (1)

test/cli/sandbox-mutations.test.ts

✅ Files skipped from review due to trivial changes (1)

test/cli/sandbox-mutations.test.ts

📝 Walkthrough

Walkthrough

Extracts the gateway env-file write into an internal helper, updates the exported wrapper to delegate behind the OpenShell availability check, changes startup wiring to call the internal helper directly, and adjusts tests to inject a Python UID override and improve an assertion message.

Changes

Gateway env-file write refactoring and test alignment

Layer / File(s)	Summary
Gateway env-file write logic refactoring `src/lib/onboard/docker-driver-gateway-env.ts`	New internal helper `writeDockerGatewayDebEnvOverrideFile` encapsulates the env-file write; exported `writeDockerGatewayDebEnvOverride` delegates to it while keeping the `hasOpenShellGatewayUserService` guard; startup path `startPackageManagedDockerDriverGatewayWithEnvOverride` now calls the internal writer directly.
Test fixture and assertion updates `test/hermes-start.test.ts`, `test/cli/sandbox-mutations.test.ts`	Test harness creates a temporary `sitecustomize.py` overriding `os.geteuid` and exports `PYTHONPATH` so the Python helper sees it; a missing-argument test assertion now reports the actual exit code and stdout/stderr when it fails.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/NemoClaw#4580: Earlier changes to docker-driver-gateway-env.ts that introduced hasOpenShellGatewayUserService-gated gateway env writing.

Suggested labels

area: ci, bug-fix, area: cli, chore

Suggested reviewers

prekshivyas

Poem

🐰 I hopped through files to tidy the gate,
Pulled out the writer and left checks in their state,
Tests wear a sitecustomize hat for the run,
Assertions now whisper the output they've spun,
A tiny refactor, and the suite basks in sun.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(ci): stabilize platform Vitest failures' accurately describes the main objective: stabilizing CI test failures on different platforms. However, it is somewhat generic about what aspect of tests is being fixed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/investigate-ci-failures

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-07T05:21:20Z

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, sandbox-survival-e2e, openshell-gateway-upgrade-e2e
Optional E2E: gateway-drift-preflight-e2e, hermes-e2e

Dispatch hint: cloud-onboard-e2e,sandbox-survival-e2e,openshell-gateway-upgrade-e2e

Auto-dispatched E2E: cloud-onboard-e2e, sandbox-survival-e2e, openshell-gateway-upgrade-e2e via nightly-e2e.yaml at 2a85605dcd342cacc61d6d81635a6be1acf3a736 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

cloud-onboard-e2e (high): Runs a real install/onboard flow and is the most direct existing E2E guard for the changed Docker-driver gateway env/start path before a sandbox is created.
sandbox-survival-e2e (medium): Validates that an onboarded sandbox survives gateway stop/start and that gateway restart recovery keeps sandbox state and inference usable, covering lifecycle risk from the gateway service env change.
openshell-gateway-upgrade-e2e (high): Exercises the OpenShell gateway install/upgrade path and host-process gateway behavior; useful as merge-blocking coverage because the changed code is specifically in package-managed gateway env preparation.

Optional E2E

gateway-drift-preflight-e2e (low): Focused regression coverage around Docker-driver/host-process gateway state and drift detection. Adjacent to the changed gateway startup code, but it does not directly validate service env-file preparation.
hermes-e2e (high): Provides optional multi-agent confidence that the shared gateway startup path still works for Hermes onboarding and live inference. The Hermes file touched in this PR is test-only, so this is not merge-blocking.

New E2E recommendations

package-managed OpenShell gateway user-service env override (high): Existing E2E coverage appears to validate onboarding/gateway health broadly, but not the exact contract changed here: writing ~/.config/openshell/gateway.env for the package-managed user service, preserving unmanaged keys, enforcing 0700/0600 permissions, and starting without a second service-availability gate in the prepare callback.
- Suggested test: Add a focused E2E or scenario that installs/uses the package-managed openshell-gateway user service, runs onboarding through startPackageManagedDockerDriverGatewayWithEnvOverride, and asserts gateway.env content/permissions plus successful gateway health.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: cloud-onboard-e2e,sandbox-survival-e2e,openshell-gateway-upgrade-e2e

github-actions · 2026-06-07T05:21:21Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: ubuntu-repo-cloud-hermes, wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: The PR changes Docker driver gateway environment/override handling used during Linux repo-current onboarding. The Ubuntu repo cloud OpenClaw scenario is the smallest routed scenario that exercises Docker-backed onboarding, gateway startup, gateway health, and sandbox readiness on the primary Linux path.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

ubuntu-repo-cloud-hermes: Optional adjacent coverage for the same Docker-backed gateway startup path with the Hermes agent, especially relevant because the PR also adjusts Hermes startup unit tests outside the scenario tree.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes
wsl-repo-cloud-openclaw: Optional platform coverage for Docker-backed OpenClaw onboarding under WSL. It exercises related gateway/sandbox startup behavior on a special Windows/WSL runner, so it is not required for the primary Linux onboarding helper change.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

src/lib/onboard/docker-driver-gateway-env.ts

github-actions · 2026-06-07T05:23:54Z

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 3 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Hermes runtime cleanup fixture Python UID monkeypatch: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `sitecustomize.py` sets `os.geteuid = lambda: 1000`; the generated script exports `PYTHONPATH`; production `ensure_hermes_history_file()` branches on `os.geteuid() == 0` before `os.fchown()`.
Restore security coverage for Hermes root history repair after UID monkeypatch (test/hermes-start.test.ts:446): The Hermes runtime cleanup fixture now forces the embedded Python helper to see os.geteuid() as 1000. That stabilizes host-dependent tests, but it also means these fixture runs no longer exercise the production root branch that performs sandbox account lookup and os.fchown() after the no-follow, regular-file, and hardlink checks.
- Recommendation: Add or identify a targeted test for the root path, such as proving Hermes history repair fchowns only the opened regular single-link history file when os.geteuid() is 0, while preserving the symlink/directory/hardlink refusal checks.
- Evidence: test/hermes-start.test.ts writes sitecustomize.py with `os.geteuid = lambda: 1000` and exports PYTHONPATH before extracting cleanup functions. agents/hermes/start.sh branches on `os.geteuid() == 0` before pwd/grp lookup and `os.fchown(fd, uid, gid)`.
Add negative coverage for package-managed gateway env preparation (src/lib/onboard/docker-driver-gateway-env.ts:185): The package-managed startup callback now calls the private env-file writer directly instead of the exported writer that re-checks service availability. Nearby production code still gates the callback behind package-managed service selection and trusted service startup, but the wrapper test covers only the successful path where the mocked service invokes prepareServiceEnv.
- Recommendation: Add tests proving `gateway.env` is not created when package-managed service detection returns false, when user-service startup falls back before invoking prepareServiceEnv, and that prepareServiceEnv write failures are surfaced as non-fallback package-managed startup failures.
- Evidence: `startPackageManagedDockerDriverGatewayWithEnvOverride()` passes `writeDockerGatewayDebEnvOverrideFile(() => gatewayEnv)` as `prepareOpenShellGatewayUserServiceEnv`; `src/lib/onboard/docker-driver-gateway-env.test.ts` currently verifies only a successful mocked `startOpenShellGatewayUserService` that calls `opts?.prepareServiceEnv?.()`.
Complete source-of-truth rationale for the Hermes Python UID fixture workaround (test/hermes-start.test.ts:435): The test fixture workaround handles a host-dependent mismatch between mocked shell identity and Python's real effective UID, but the code does not fully capture the source boundary, why the source cannot be fixed more directly, what regression test pins the invariant, or when the workaround can be removed.
- Recommendation: Document or encode the fixture invariant more explicitly, or move the fix closer to the source boundary. At minimum, identify the invalid state, why direct fixture identity/account injection is not used here, the regression test that would fail if shell and Python UID diverge again, and the removal condition.
- Evidence: The added comment says `Keep the Python helper aligned with the shell fixture's mocked id -u`, while the fixture globally injects `sitecustomize.py` and `PYTHONPATH`; production `agents/hermes/start.sh` uses Python `os.geteuid()` to choose the root-only fchown path.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — startPackageManagedDockerDriverGatewayWithEnvOverride does not create gateway.env when package-managed service detection returns false. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
**Runtime validation** — startPackageManagedDockerDriverGatewayWithEnvOverride does not create gateway.env when user service startup returns fallbackAllowed before invoking prepareServiceEnv. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
**Runtime validation** — startPackageManagedDockerDriverGatewayWithEnvOverride reports prepareServiceEnv write failures as non-fallback package-managed startup failures. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
**Runtime validation** — Hermes history repair root path fchowns only the opened regular single-link history file when os.geteuid() is 0. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
**Runtime validation** — Hermes cleanup fixture fails or clearly reports if mocked shell id -u and embedded Python os.geteuid diverge. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
**Restore security coverage for Hermes root history repair after UID monkeypatch** — Add or identify a targeted test for the root path, such as proving Hermes history repair fchowns only the opened regular single-link history file when os.geteuid() is 0, while preserving the symlink/directory/hardlink refusal checks.
**Add negative coverage for package-managed gateway env preparation** — Add tests proving `gateway.env` is not created when package-managed service detection returns false, when user-service startup falls back before invoking prepareServiceEnv, and that prepareServiceEnv write failures are surfaced as non-fallback package-managed startup failures.
**Acceptance clause:** Stabilizes the macOS and WSL platform Vitest failures from the main watch run by removing platform-dependent assumptions in the gateway env and Hermes startup fixtures. — add test evidence or identify existing coverage. The diff removes a second direct service-detection call from the package-managed gateway env prepare callback and aligns the Hermes embedded Python helper with the mocked shell UID. Static review did not directly validate macOS or WSL runtime behavior.

Since last review details

Current findings:

Source-of-truth review needed: Hermes runtime cleanup fixture Python UID monkeypatch: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `sitecustomize.py` sets `os.geteuid = lambda: 1000`; the generated script exports `PYTHONPATH`; production `ensure_hermes_history_file()` branches on `os.geteuid() == 0` before `os.fchown()`.
Restore security coverage for Hermes root history repair after UID monkeypatch (test/hermes-start.test.ts:446): The Hermes runtime cleanup fixture now forces the embedded Python helper to see os.geteuid() as 1000. That stabilizes host-dependent tests, but it also means these fixture runs no longer exercise the production root branch that performs sandbox account lookup and os.fchown() after the no-follow, regular-file, and hardlink checks.
- Recommendation: Add or identify a targeted test for the root path, such as proving Hermes history repair fchowns only the opened regular single-link history file when os.geteuid() is 0, while preserving the symlink/directory/hardlink refusal checks.
- Evidence: test/hermes-start.test.ts writes sitecustomize.py with `os.geteuid = lambda: 1000` and exports PYTHONPATH before extracting cleanup functions. agents/hermes/start.sh branches on `os.geteuid() == 0` before pwd/grp lookup and `os.fchown(fd, uid, gid)`.
Add negative coverage for package-managed gateway env preparation (src/lib/onboard/docker-driver-gateway-env.ts:185): The package-managed startup callback now calls the private env-file writer directly instead of the exported writer that re-checks service availability. Nearby production code still gates the callback behind package-managed service selection and trusted service startup, but the wrapper test covers only the successful path where the mocked service invokes prepareServiceEnv.
- Recommendation: Add tests proving `gateway.env` is not created when package-managed service detection returns false, when user-service startup falls back before invoking prepareServiceEnv, and that prepareServiceEnv write failures are surfaced as non-fallback package-managed startup failures.
- Evidence: `startPackageManagedDockerDriverGatewayWithEnvOverride()` passes `writeDockerGatewayDebEnvOverrideFile(() => gatewayEnv)` as `prepareOpenShellGatewayUserServiceEnv`; `src/lib/onboard/docker-driver-gateway-env.test.ts` currently verifies only a successful mocked `startOpenShellGatewayUserService` that calls `opts?.prepareServiceEnv?.()`.
Complete source-of-truth rationale for the Hermes Python UID fixture workaround (test/hermes-start.test.ts:435): The test fixture workaround handles a host-dependent mismatch between mocked shell identity and Python's real effective UID, but the code does not fully capture the source boundary, why the source cannot be fixed more directly, what regression test pins the invariant, or when the workaround can be removed.
- Recommendation: Document or encode the fixture invariant more explicitly, or move the fix closer to the source boundary. At minimum, identify the invalid state, why direct fixture identity/account injection is not used here, the regression test that would fail if shell and Python UID diverge again, and the removal condition.
- Evidence: The added comment says `Keep the Python helper aligned with the shell fixture's mocked id -u`, while the fixture globally injects `sitecustomize.py` and `PYTHONPATH`; production `agents/hermes/start.sh` uses Python `os.geteuid()` to choose the root-only fchown path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-actions · 2026-06-07T06:20:28Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27084384078
Target ref: 2a85605dcd342cacc61d6d81635a6be1acf3a736
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-survival-e2e,openshell-gateway-upgrade-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success
openshell-gateway-upgrade-e2e	✅ success
sandbox-survival-e2e	✅ success

fix(ci): stabilize platform Vitest failures

dac36d1

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 7, 2026

test(ci): include parser failure output

2a85605

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv merged commit dceeb60 into main Jun 7, 2026
40 checks passed

cv deleted the codex/investigate-ci-failures branch June 7, 2026 06:36

coderabbitai Bot mentioned this pull request Jun 7, 2026

test(cli): replace slow subprocess matrices with unit coverage #4911

Merged

12 tasks

cv added the v0.0.61 Release target label Jun 7, 2026

Conversation

cv commented Jun 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 7, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading