Skip to content

fix(ci): stabilize platform Vitest failures#4903

Merged
cv merged 2 commits into
mainfrom
codex/investigate-ci-failures
Jun 7, 2026
Merged

fix(ci): stabilize platform Vitest failures#4903
cv merged 2 commits into
mainfrom
codex/investigate-ci-failures

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented Jun 7, 2026

Summary

Stabilizes the macOS and WSL platform Vitest failures from the main watch run by removing platform-dependent assumptions in the gateway env and Hermes startup fixtures.

Changes

  • Split the OpenShell gateway env-file write path so package-managed startup does not re-run service detection after it has already selected the service path.
  • Isolate the Hermes startup cleanup fixture from host uid/account state by aligning the embedded Python helper with the mocked non-root shell identity.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Refactor

    • Restructured Docker gateway env handling for improved maintainability and clearer error behavior during startup.
  • Tests

    • Improved gateway startup test harness to align Python/runtime fixtures with shell mocks.
    • Strengthened CLI tests to include richer failure diagnostics when parser-required arguments are missing.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 7, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0035f09a-f2fa-424f-80b7-77c312fc2955

📥 Commits

Reviewing files that changed from the base of the PR and between dac36d1 and 2a85605.

📒 Files selected for processing (1)
  • test/cli/sandbox-mutations.test.ts
✅ Files skipped from review due to trivial changes (1)
  • test/cli/sandbox-mutations.test.ts

📝 Walkthrough

Walkthrough

Extracts the gateway env-file write into an internal helper, updates the exported wrapper to delegate behind the OpenShell availability check, changes startup wiring to call the internal helper directly, and adjusts tests to inject a Python UID override and improve an assertion message.

Changes

Gateway env-file write refactoring and test alignment

Layer / File(s) Summary
Gateway env-file write logic refactoring
src/lib/onboard/docker-driver-gateway-env.ts
New internal helper writeDockerGatewayDebEnvOverrideFile encapsulates the env-file write; exported writeDockerGatewayDebEnvOverride delegates to it while keeping the hasOpenShellGatewayUserService guard; startup path startPackageManagedDockerDriverGatewayWithEnvOverride now calls the internal writer directly.
Test fixture and assertion updates
test/hermes-start.test.ts, test/cli/sandbox-mutations.test.ts
Test harness creates a temporary sitecustomize.py overriding os.geteuid and exports PYTHONPATH so the Python helper sees it; a missing-argument test assertion now reports the actual exit code and stdout/stderr when it fails.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4580: Earlier changes to docker-driver-gateway-env.ts that introduced hasOpenShellGatewayUserService-gated gateway env writing.

Suggested labels

area: ci, bug-fix, area: cli, chore

Suggested reviewers

  • prekshivyas

Poem

🐰 I hopped through files to tidy the gate,
Pulled out the writer and left checks in their state,
Tests wear a sitecustomize hat for the run,
Assertions now whisper the output they've spun,
A tiny refactor, and the suite basks in sun.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix(ci): stabilize platform Vitest failures' accurately describes the main objective: stabilizing CI test failures on different platforms. However, it is somewhat generic about what aspect of tests is being fixed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/investigate-ci-failures

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, sandbox-survival-e2e, openshell-gateway-upgrade-e2e
Optional E2E: gateway-drift-preflight-e2e, hermes-e2e

Dispatch hint: cloud-onboard-e2e,sandbox-survival-e2e,openshell-gateway-upgrade-e2e

Auto-dispatched E2E: cloud-onboard-e2e, sandbox-survival-e2e, openshell-gateway-upgrade-e2e via nightly-e2e.yaml at 2a85605dcd342cacc61d6d81635a6be1acf3a736nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

  • cloud-onboard-e2e (high): Runs a real install/onboard flow and is the most direct existing E2E guard for the changed Docker-driver gateway env/start path before a sandbox is created.
  • sandbox-survival-e2e (medium): Validates that an onboarded sandbox survives gateway stop/start and that gateway restart recovery keeps sandbox state and inference usable, covering lifecycle risk from the gateway service env change.
  • openshell-gateway-upgrade-e2e (high): Exercises the OpenShell gateway install/upgrade path and host-process gateway behavior; useful as merge-blocking coverage because the changed code is specifically in package-managed gateway env preparation.

Optional E2E

  • gateway-drift-preflight-e2e (low): Focused regression coverage around Docker-driver/host-process gateway state and drift detection. Adjacent to the changed gateway startup code, but it does not directly validate service env-file preparation.
  • hermes-e2e (high): Provides optional multi-agent confidence that the shared gateway startup path still works for Hermes onboarding and live inference. The Hermes file touched in this PR is test-only, so this is not merge-blocking.

New E2E recommendations

  • package-managed OpenShell gateway user-service env override (high): Existing E2E coverage appears to validate onboarding/gateway health broadly, but not the exact contract changed here: writing ~/.config/openshell/gateway.env for the package-managed user service, preserving unmanaged keys, enforcing 0700/0600 permissions, and starting without a second service-availability gate in the prepare callback.
    • Suggested test: Add a focused E2E or scenario that installs/uses the package-managed openshell-gateway user service, runs onboarding through startPackageManagedDockerDriverGatewayWithEnvOverride, and asserts gateway.env content/permissions plus successful gateway health.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: cloud-onboard-e2e,sandbox-survival-e2e,openshell-gateway-upgrade-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: ubuntu-repo-cloud-hermes, wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: The PR changes Docker driver gateway environment/override handling used during Linux repo-current onboarding. The Ubuntu repo cloud OpenClaw scenario is the smallest routed scenario that exercises Docker-backed onboarding, gateway startup, gateway health, and sandbox readiness on the primary Linux path.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

  • ubuntu-repo-cloud-hermes: Optional adjacent coverage for the same Docker-backed gateway startup path with the Hermes agent, especially relevant because the PR also adjusts Hermes startup unit tests outside the scenario tree.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes
  • wsl-repo-cloud-openclaw: Optional platform coverage for Docker-backed OpenClaw onboarding under WSL. It exercises related gateway/sandbox startup behavior on a special Windows/WSL runner, so it is not required for the primary Linux onboarding helper change.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

  • src/lib/onboard/docker-driver-gateway-env.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 3 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Hermes runtime cleanup fixture Python UID monkeypatch: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `sitecustomize.py` sets `os.geteuid = lambda: 1000`; the generated script exports `PYTHONPATH`; production `ensure_hermes_history_file()` branches on `os.geteuid() == 0` before `os.fchown()`.
  • Restore security coverage for Hermes root history repair after UID monkeypatch (test/hermes-start.test.ts:446): The Hermes runtime cleanup fixture now forces the embedded Python helper to see os.geteuid() as 1000. That stabilizes host-dependent tests, but it also means these fixture runs no longer exercise the production root branch that performs sandbox account lookup and os.fchown() after the no-follow, regular-file, and hardlink checks.
    • Recommendation: Add or identify a targeted test for the root path, such as proving Hermes history repair fchowns only the opened regular single-link history file when os.geteuid() is 0, while preserving the symlink/directory/hardlink refusal checks.
    • Evidence: test/hermes-start.test.ts writes sitecustomize.py with `os.geteuid = lambda: 1000` and exports PYTHONPATH before extracting cleanup functions. agents/hermes/start.sh branches on `os.geteuid() == 0` before pwd/grp lookup and `os.fchown(fd, uid, gid)`.
  • Add negative coverage for package-managed gateway env preparation (src/lib/onboard/docker-driver-gateway-env.ts:185): The package-managed startup callback now calls the private env-file writer directly instead of the exported writer that re-checks service availability. Nearby production code still gates the callback behind package-managed service selection and trusted service startup, but the wrapper test covers only the successful path where the mocked service invokes prepareServiceEnv.
    • Recommendation: Add tests proving `gateway.env` is not created when package-managed service detection returns false, when user-service startup falls back before invoking prepareServiceEnv, and that prepareServiceEnv write failures are surfaced as non-fallback package-managed startup failures.
    • Evidence: `startPackageManagedDockerDriverGatewayWithEnvOverride()` passes `writeDockerGatewayDebEnvOverrideFile(() => gatewayEnv)` as `prepareOpenShellGatewayUserServiceEnv`; `src/lib/onboard/docker-driver-gateway-env.test.ts` currently verifies only a successful mocked `startOpenShellGatewayUserService` that calls `opts?.prepareServiceEnv?.()`.
  • Complete source-of-truth rationale for the Hermes Python UID fixture workaround (test/hermes-start.test.ts:435): The test fixture workaround handles a host-dependent mismatch between mocked shell identity and Python's real effective UID, but the code does not fully capture the source boundary, why the source cannot be fixed more directly, what regression test pins the invariant, or when the workaround can be removed.
    • Recommendation: Document or encode the fixture invariant more explicitly, or move the fix closer to the source boundary. At minimum, identify the invalid state, why direct fixture identity/account injection is not used here, the regression test that would fail if shell and Python UID diverge again, and the removal condition.
    • Evidence: The added comment says `Keep the Python helper aligned with the shell fixture's mocked id -u`, while the fixture globally injects `sitecustomize.py` and `PYTHONPATH`; production `agents/hermes/start.sh` uses Python `os.geteuid()` to choose the root-only fchown path.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — startPackageManagedDockerDriverGatewayWithEnvOverride does not create gateway.env when package-managed service detection returns false. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
  • **Runtime validation** — startPackageManagedDockerDriverGatewayWithEnvOverride does not create gateway.env when user service startup returns fallbackAllowed before invoking prepareServiceEnv. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
  • **Runtime validation** — startPackageManagedDockerDriverGatewayWithEnvOverride reports prepareServiceEnv write failures as non-fallback package-managed startup failures. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
  • **Runtime validation** — Hermes history repair root path fchowns only the opened regular single-link history file when os.geteuid() is 0. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
  • **Runtime validation** — Hermes cleanup fixture fails or clearly reports if mocked shell id -u and embedded Python os.geteuid diverge. The changed surfaces sit on package-managed gateway startup and sandbox/Hermes startup cleanup boundaries. Static review supports the intended refactor, but negative-path and root-branch behavioral coverage would improve confidence at security-sensitive trusted-code boundaries.
  • **Restore security coverage for Hermes root history repair after UID monkeypatch** — Add or identify a targeted test for the root path, such as proving Hermes history repair fchowns only the opened regular single-link history file when os.geteuid() is 0, while preserving the symlink/directory/hardlink refusal checks.
  • **Add negative coverage for package-managed gateway env preparation** — Add tests proving `gateway.env` is not created when package-managed service detection returns false, when user-service startup falls back before invoking prepareServiceEnv, and that prepareServiceEnv write failures are surfaced as non-fallback package-managed startup failures.
  • **Acceptance clause:** Stabilizes the macOS and WSL platform Vitest failures from the main watch run by removing platform-dependent assumptions in the gateway env and Hermes startup fixtures. — add test evidence or identify existing coverage. The diff removes a second direct service-detection call from the package-managed gateway env prepare callback and aligns the Hermes embedded Python helper with the mocked shell UID. Static review did not directly validate macOS or WSL runtime behavior.
Since last review details

Current findings:

  • Source-of-truth review needed: Hermes runtime cleanup fixture Python UID monkeypatch: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `sitecustomize.py` sets `os.geteuid = lambda: 1000`; the generated script exports `PYTHONPATH`; production `ensure_hermes_history_file()` branches on `os.geteuid() == 0` before `os.fchown()`.
  • Restore security coverage for Hermes root history repair after UID monkeypatch (test/hermes-start.test.ts:446): The Hermes runtime cleanup fixture now forces the embedded Python helper to see os.geteuid() as 1000. That stabilizes host-dependent tests, but it also means these fixture runs no longer exercise the production root branch that performs sandbox account lookup and os.fchown() after the no-follow, regular-file, and hardlink checks.
    • Recommendation: Add or identify a targeted test for the root path, such as proving Hermes history repair fchowns only the opened regular single-link history file when os.geteuid() is 0, while preserving the symlink/directory/hardlink refusal checks.
    • Evidence: test/hermes-start.test.ts writes sitecustomize.py with `os.geteuid = lambda: 1000` and exports PYTHONPATH before extracting cleanup functions. agents/hermes/start.sh branches on `os.geteuid() == 0` before pwd/grp lookup and `os.fchown(fd, uid, gid)`.
  • Add negative coverage for package-managed gateway env preparation (src/lib/onboard/docker-driver-gateway-env.ts:185): The package-managed startup callback now calls the private env-file writer directly instead of the exported writer that re-checks service availability. Nearby production code still gates the callback behind package-managed service selection and trusted service startup, but the wrapper test covers only the successful path where the mocked service invokes prepareServiceEnv.
    • Recommendation: Add tests proving `gateway.env` is not created when package-managed service detection returns false, when user-service startup falls back before invoking prepareServiceEnv, and that prepareServiceEnv write failures are surfaced as non-fallback package-managed startup failures.
    • Evidence: `startPackageManagedDockerDriverGatewayWithEnvOverride()` passes `writeDockerGatewayDebEnvOverrideFile(() => gatewayEnv)` as `prepareOpenShellGatewayUserServiceEnv`; `src/lib/onboard/docker-driver-gateway-env.test.ts` currently verifies only a successful mocked `startOpenShellGatewayUserService` that calls `opts?.prepareServiceEnv?.()`.
  • Complete source-of-truth rationale for the Hermes Python UID fixture workaround (test/hermes-start.test.ts:435): The test fixture workaround handles a host-dependent mismatch between mocked shell identity and Python's real effective UID, but the code does not fully capture the source boundary, why the source cannot be fixed more directly, what regression test pins the invariant, or when the workaround can be removed.
    • Recommendation: Document or encode the fixture invariant more explicitly, or move the fix closer to the source boundary. At minimum, identify the invalid state, why direct fixture identity/account injection is not used here, the regression test that would fail if shell and Python UID diverge again, and the removal condition.
    • Evidence: The added comment says `Keep the Python helper aligned with the shell fixture's mocked id -u`, while the fixture globally injects `sitecustomize.py` and `PYTHONPATH`; production `agents/hermes/start.sh` uses Python `os.geteuid()` to choose the root-only fchown path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 27084384078
Target ref: 2a85605dcd342cacc61d6d81635a6be1acf3a736
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-survival-e2e,openshell-gateway-upgrade-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
openshell-gateway-upgrade-e2e ✅ success
sandbox-survival-e2e ✅ success

@cv cv merged commit dceeb60 into main Jun 7, 2026
40 checks passed
@cv cv deleted the codex/investigate-ci-failures branch June 7, 2026 06:36
@cv cv added the v0.0.61 Release target label Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.61 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants