test(e2e): add scenario fixture context by cv · Pull Request #4965 · NVIDIA/NemoClaw

cv · 2026-06-08T16:42:05Z

Summary

Adds the core Vitest fixture context for E2E scenarios, including artifacts, cleanup, secret redaction, and shell probing. Also hardens cleanup/probe behavior based on advisor feedback.

Related Issue

Refs #4941.

Changes

Adds artifact collection helpers under test/e2e-scenario/framework/artifacts.ts.
Adds cleanup registration with redacted failure reporting and negative-path cleanup coverage.
Adds secret redaction utilities for fixture-scoped resources.
Adds ShellProbe helpers with trusted command descriptors, explicit environment passing, redacted artifacts, timeout/abort SIGKILL escalation, and spawn-error cleanup handling.
Exposes the scenario-aware Vitest fixture from framework/e2e-test.ts using Vitest object fixtures.
Adds framework tests for fixture lifecycle, artifacts, cleanup, redaction, environment isolation, timeout/abort handling, trusted command guardrails, and missing-command failures.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Expanded E2E test suite with fixtures for artifacts, secrets, cleanup, and shell probing; adds artifact writing, redaction, and cleanup assertions.
- New scenarios cover artifact-path confinement, secret redaction/enforcement, cleanup reverse-order semantics, robust shell-execution handling (spawn failures, aborts, timeouts, SIGKILL escalation, process-group reaping), and probe evidence path validation.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

copy-pr-bot · 2026-06-08T16:42:09Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-08T16:42:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7839890b-04db-4330-921b-f0b9b8f2758c

📥 Commits

Reviewing files that changed from the base of the PR and between 9f30578 and 6e6c80b.

📒 Files selected for processing (8)

test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts
test/e2e-scenario/framework-tests/e2e-probes.test.ts
test/e2e-scenario/framework/secrets.ts
test/e2e-scenario/framework/shell-probe.ts
test/e2e-scenario/scenarios/probes/injection-blocked.ts
test/e2e-scenario/scenarios/probes/network-policy.ts
test/e2e-scenario/scenarios/probes/shields-config.ts
test/e2e-scenario/scenarios/probes/util.ts

🚧 Files skipped from review as they are similar to previous changes (3)

test/e2e-scenario/framework/secrets.ts
test/e2e-scenario/framework/shell-probe.ts
test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts

📝 Walkthrough

Walkthrough

Adds an E2E test framework: confined artifact sink, secret-store redaction, cleanup registry, trusted ShellProbe (validation, timeout/abort escalation, redacted artifacts), Vitest fixture wiring, and tests + probe-evidence utilities.

Changes

E2E Test Framework

Layer / File(s)	Summary
Artifact sink and path safety `test/e2e-scenario/framework/artifacts.ts`	`ArtifactSink` ensures root exists, resolves paths confined to the root (rejects absolute/traversal), and writes text/JSON artifacts; `slugifyArtifactName` and `createArtifactSink` provide per-test sinks.
Cleanup management and ordering `test/e2e-scenario/framework/cleanup.ts`	`CleanupRegistry` registers cleanup functions, runs them in reverse registration order with optional redaction, returns passed/failures, and `assertCleanupPassed` aggregates failures into an Error.
Vitest fixture orchestration `test/e2e-scenario/framework/e2e-test.ts`	`E2EScenarioFixtures` and `test` fixture factory wire `artifacts`, `secrets`, `cleanup`, and `shellProbe`, manage setup/teardown, write summary artifacts, and re-export `expect`.
Secrets and text redaction `test/e2e-scenario/framework/secrets.ts`	`redactText()` replaces explicit secret values with `[REDACTED]` and runs shared redactor; `SecretStore` reads env with `optional()`/`required()`, collects redaction candidates, and exposes `redact()` and `redactionValues()`.
Trusted shell command and probe runtime `test/e2e-scenario/framework/shell-probe.ts`	`trustedShellCommand()` validates command/reason (trims, rejects NULs, optional args validator) and returns branded descriptor. `ShellProbe.run()` spawns a detached child, captures stdout/stderr, enforces timeout (SIGTERM→SIGKILL) and AbortSignal escalation, writes redacted stdout/stderr/result artifacts, and throws normalized redacted errors on spawn/runtime failure.
Framework validation and shell-probe tests `test/e2e-scenario/framework-tests/*`	Vitest suites covering helper utilities, artifact-root confinement, cleanup reverse-order/redaction/clear semantics, secret redaction and required-secret behavior, trusted command validation, spawn-failure redaction, abort/timeout escalation (including SIGKILL), process-group reaping, and fixture-context end-to-end scenarios.
Probe evidence API and callsites `test/e2e-scenario/scenarios/probes/util.ts`, `test/e2e-scenario/scenarios/probes/*`, `test/e2e-scenario/framework-tests/e2e-probes.test.ts`	`writeProbeEvidence` now accepts `ProbeContext`, centrally resolves/validates the evidence path against `ctx.contextDir` (rejecting escapes), and all probe callsites updated to pass `ctx`; tests verify in-context writes and ignored escape-path attempts.

Sequence Diagram

sequenceDiagram
  participant Vitest
  participant ShellProbe
  participant ChildProcess
  participant ArtifactSink
  Vitest->>ShellProbe: run(trustedCommand, options)
  ShellProbe->>ChildProcess: spawn(detached, cwd, env)
  ChildProcess-->>ShellProbe: stdout/stderr/events
  ShellProbe->>ShellProbe: enforce timeout/abort (SIGTERM -> SIGKILL)
  ShellProbe->>ArtifactSink: writeText(redacted stdout)
  ShellProbe->>ArtifactSink: writeText(redacted stderr)
  ShellProbe->>ArtifactSink: writeJson(redacted result)
  ShellProbe-->>Vitest: return ShellProbeResult

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Consolidate E2E fixture shell probe with scenario spawn infra #4988: New shell-probe features (trusted command validation, NUL-byte rejection, env handling, redaction, group cleanup) overlap the objectives described in this issue.
Consolidate E2E fixture secret redaction with scenario redaction infra #4989: The fixture-only SecretStore and redactText layer explicit secret-value replacement over existing redaction, matching consolidation concerns raised in the issue.

Suggested labels

bug-fix

Suggested reviewers

prekshivyas

Poem

🐇 I hid the secrets, penned each log with care,
Artifacts safe in a root that's fair,
Cleanup hops backward, tidy and spry,
Shell probes time out — then KILL says goodbye,
Tests sing redacted tales beneath the moonlit air.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(e2e): add scenario fixture context' clearly and concisely describes the main change—introducing a Vitest fixture context for E2E scenarios. It is specific enough to distinguish from generic updates, directly relates to the primary deliverable across the changeset, and follows conventional commit patterns.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/e2e-4941-stack-03-fixture-context

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-08T16:43:38Z

E2E Advisor Recommendation

Required E2E: ubuntu-repo-cloud-openclaw
Optional E2E: telegram-injection

Dispatch hint: ubuntu-repo-cloud-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

ubuntu-repo-cloud-openclaw (high): This canonical typed scenario includes the supplemental required security suites security-shields, security-policy, and security-injection, so it directly exercises the changed shieldsConfigProbe, networkPolicyProbe, injectionBlockedProbe, and shared probe evidence writer against a live sandbox/gateway.

Optional E2E

telegram-injection (high): Useful adjacent regression coverage for command-injection defenses through a real assistant messaging path, but not merge-blocking because the changed typed injectionBlockedProbe is already exercised by ubuntu-repo-cloud-openclaw.

New E2E recommendations

None.

Dispatch hint

Workflow: .github/workflows/e2e-scenarios.yaml
jobs input: ubuntu-repo-cloud-openclaw

github-actions · 2026-06-08T16:43:40Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: Changes touch shared scenario E2E fixture/probe infrastructure and built-in security probe utilities under test/e2e-scenario. These modules are part of the scenario runner/probe surface and can affect scenario startup, evidence handling, and security assertions across the dispatchable catalog, so run the all-scenarios fan-out rather than a single targeted scenario.
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts
test/e2e-scenario/framework-tests/e2e-probes.test.ts
test/e2e-scenario/framework/artifacts.ts
test/e2e-scenario/framework/cleanup.ts
test/e2e-scenario/framework/e2e-test.ts
test/e2e-scenario/framework/secrets.ts
test/e2e-scenario/framework/shell-probe.ts
test/e2e-scenario/scenarios/probes/injection-blocked.ts
test/e2e-scenario/scenarios/probes/network-policy.ts
test/e2e-scenario/scenarios/probes/shields-config.ts
test/e2e-scenario/scenarios/probes/util.ts

github-actions · 2026-06-08T16:45:52Z

PR Review Advisor

Findings: 0 needs attention, 8 worth checking, 1 nice ideas
Since last review: 0 prior items resolved, 6 still apply, 1 new item found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: test/e2e-scenario/framework/shell-probe.ts bridge-only host shell probe: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Comment identifies the helper as bridge-only; implementation uses `detached: true` and `process.kill(-pgid, signal)`, but the `finally` block leaves `killTimer` armed when termination started.
Source-of-truth review needed: test/e2e-scenario/scenarios/probes/util.ts best-effort evidence writer: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes, but `writeProbeEvidence()` catches all errors and does not surface validation failures.
Malformed probe evidence paths are silently dropped (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` now validates that evidence stays under `ProbeContext.contextDir`, which is the right security boundary, but it catches the validation error together with ordinary IO failures. A malformed or escaping evidence path therefore makes required security-probe evidence disappear while the probe can still report pass/fail normally, reducing auditability of sandbox, shields, injection, and network-policy checks.
- Recommendation: Keep ordinary evidence IO best-effort if that is the intended contract, but distinguish path-validation failures. Fail the probe, return a write status, or write a redacted diagnostic to a safe fallback artifact when `evidencePath` escapes `contextDir`.
- Evidence: `resolveProbeEvidencePath(ctx)` throws on paths outside `contextDir`, and `writeProbeEvidence(ctx, payload)` immediately catches all errors. The new test only asserts that the outside file is not created; it does not assert that the invalid state is surfaced.
ShellProbe leaves the SIGKILL escalation timer armed after terminated children close (test/e2e-scenario/framework/shell-probe.ts:219): When timeout or abort starts termination, `terminationStarted` is set to true and the `finally` block intentionally does not clear `killTimer`. `run()` can return after the child has closed while an unref'd SIGKILL callback remains scheduled against the old process group id. This is low probability, but it is undesirable at a host process boundary because a delayed cleanup action can fire after ownership of that process group has ended.
- Recommendation: Clear `killTimer` in `finally` once the child has closed, regardless of whether termination was started. Add a regression test with a short grace window that verifies no post-return escalation callback remains or takes effect.
- Evidence: `terminate()` sets `terminationStarted = true` and schedules `killTimer`; the `finally` block only runs `clearTimeout(killTimer)` when `killTimer && !terminationStarted`.
Abort process-group cleanup still lacks direct grandchild coverage (test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts:192): The timeout path now verifies that a background grandchild exits, but the abort path only exercises a direct child. Vitest cancellation uses the abort path, so the process-group behavior that protects live fixture cleanup is not directly guarded against regression.
- Recommendation: Add a behavior test where an abort signal terminates a trusted command that spawns a SIGTERM-ignoring grandchild, and assert the grandchild is gone before `ShellProbe.run()` returns.
- Evidence: The abort test runs `process.execPath` directly, while the process-group grandchild test is timeout-only (`shell probe reaps timed-out command process groups`).
Bridge-only ShellProbe still needs source-of-truth follow-up (test/e2e-scenario/framework/shell-probe.ts:8): `ShellProbe` is explicitly a bridge-only host shell probe pending shared spawn/evidence helper consolidation. The local bridge does identify Consolidate E2E fixture shell probe with scenario spawn infra #4988, but the current implementation still carries cleanup edge cases and incomplete abort-grandchild regression coverage, so the localized behavior is not yet fully justified as a safe temporary source of truth.
- Recommendation: Before relying on this as the fixture shell boundary, either fix the cleanup/test gaps in this PR or document the exact invalid states, source boundary, source-fix constraint, regression coverage, and removal condition tied to the Consolidate E2E fixture shell probe with scenario spawn infra #4988 consolidation.
- Evidence: The file comment says the end state is a shared spawn/evidence helper tracked by Consolidate E2E fixture shell probe with scenario spawn infra #4988. The bridge uses detached process groups and negative pgid signaling, but the stale escalation timer and abort-grandchild coverage gaps remain.
Best-effort evidence writer handles invalid paths locally instead of fixing the source (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` locally validates and suppresses malformed evidence paths, but the code does not show where invalid `ProbeContext.evidencePath` values are created, why that source cannot be made impossible here, or when this tolerant behavior can be removed. For required security probes, losing evidence is a source-of-truth concern, not just an IO convenience.
- Recommendation: Move evidence-path normalization/validation to the `ProbeContext` construction boundary where possible. If best-effort writes must remain, document which invalid state is tolerated, why the source cannot be fixed now, and how invalid-path regressions are surfaced.
- Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes under `contextDir`, but `writeProbeEvidence()` catches all errors and does not surface the invalid configured path.
Inherited environment mode needs explicit credential-redaction coverage (test/e2e-scenario/framework/shell-probe.ts:151): `ShellProbe` defaults to an explicit env, which is good, but `inheritEnv: true` can pass the full parent environment to a host child. The redaction path should cover sensitive inherited values, but this high-risk mode is not directly tested for TOKEN, SECRET, and API_KEY values across stdout, stderr, command metadata, and result artifacts.
- Recommendation: Add a regression test for `inheritEnv: true` that sets inherited TOKEN, SECRET, and API_KEY-like values and verifies they are redacted from stdout, stderr, command argv metadata, thrown errors, and result artifacts.
- Evidence: The spawn env is `{ ...process.env, ...(options.env ?? {}) }` when `inheritEnv` is true. Existing tests cover explicit env isolation and explicit redaction values, but not inherited sensitive env redaction.

🌱 Nice ideas

Artifact directories can collide for duplicate Vitest task names (test/e2e-scenario/framework/artifacts.ts:50): `createArtifactSink()` derives the artifact directory only from the slugified Vitest task name. Duplicate test names in different suites or nested scopes can write to the same `.e2e/vitest/<slug>` directory, mixing shell evidence and cleanup summaries.
- Recommendation: Include stable suite/file context or a unique Vitest task identifier in the artifact path, or reject duplicate scenario test names. Add a regression test covering duplicate Vitest task names.
- Evidence: `createArtifactSink(testName)` returns `new ArtifactSink(path.join(baseDir, slugifyArtifactName(testName)))`, and `e2e-test.ts` passes only `task.name`.

Consider writing more tests for

**Runtime validation** — ShellProbe abort kills a SIGTERM-ignoring grandchild before run returns. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
**Runtime validation** — ShellProbe clears pending SIGKILL escalation timer after terminated child close. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
**Runtime validation** — ShellProbe inheritEnv true redacts inherited TOKEN SECRET and API_KEY values from stdout stderr command and result artifacts. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
**Runtime validation** — writeProbeEvidence records or reports an escaping evidence path instead of silently losing required probe evidence. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
**Runtime validation** — createArtifactSink keeps duplicate Vitest task names distinct or rejects duplicate task names. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
**Abort process-group cleanup still lacks direct grandchild coverage** — Add a behavior test where an abort signal terminates a trusted command that spawns a SIGTERM-ignoring grandchild, and assert the grandchild is gone before `ShellProbe.run()` returns.
**Artifact directories can collide for duplicate Vitest task names** — Include stable suite/file context or a unique Vitest task identifier in the artifact path, or reject duplicate scenario test names. Add a regression test covering duplicate Vitest task names.
**Acceptance clause:** Refs Adopt Vitest fixtures as the E2E scenario execution model #4941. — add test evidence or identify existing coverage. No linked issue body or comments were included in the deterministic context, so literal Adopt Vitest fixtures as the E2E scenario execution model #4941 acceptance clauses could not be verified. Repository docs identify Adopt Vitest fixtures as the E2E scenario execution model #4941 as the Vitest fixture scenario-runner decision, and this PR adds the initial fixture context.

Since last review details

Current findings:

Source-of-truth review needed: test/e2e-scenario/framework/shell-probe.ts bridge-only host shell probe: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Comment identifies the helper as bridge-only; implementation uses `detached: true` and `process.kill(-pgid, signal)`, but the `finally` block leaves `killTimer` armed when termination started.
Source-of-truth review needed: test/e2e-scenario/scenarios/probes/util.ts best-effort evidence writer: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes, but `writeProbeEvidence()` catches all errors and does not surface validation failures.
Malformed probe evidence paths are silently dropped (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` now validates that evidence stays under `ProbeContext.contextDir`, which is the right security boundary, but it catches the validation error together with ordinary IO failures. A malformed or escaping evidence path therefore makes required security-probe evidence disappear while the probe can still report pass/fail normally, reducing auditability of sandbox, shields, injection, and network-policy checks.
- Recommendation: Keep ordinary evidence IO best-effort if that is the intended contract, but distinguish path-validation failures. Fail the probe, return a write status, or write a redacted diagnostic to a safe fallback artifact when `evidencePath` escapes `contextDir`.
- Evidence: `resolveProbeEvidencePath(ctx)` throws on paths outside `contextDir`, and `writeProbeEvidence(ctx, payload)` immediately catches all errors. The new test only asserts that the outside file is not created; it does not assert that the invalid state is surfaced.
ShellProbe leaves the SIGKILL escalation timer armed after terminated children close (test/e2e-scenario/framework/shell-probe.ts:219): When timeout or abort starts termination, `terminationStarted` is set to true and the `finally` block intentionally does not clear `killTimer`. `run()` can return after the child has closed while an unref'd SIGKILL callback remains scheduled against the old process group id. This is low probability, but it is undesirable at a host process boundary because a delayed cleanup action can fire after ownership of that process group has ended.
- Recommendation: Clear `killTimer` in `finally` once the child has closed, regardless of whether termination was started. Add a regression test with a short grace window that verifies no post-return escalation callback remains or takes effect.
- Evidence: `terminate()` sets `terminationStarted = true` and schedules `killTimer`; the `finally` block only runs `clearTimeout(killTimer)` when `killTimer && !terminationStarted`.
Abort process-group cleanup still lacks direct grandchild coverage (test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts:192): The timeout path now verifies that a background grandchild exits, but the abort path only exercises a direct child. Vitest cancellation uses the abort path, so the process-group behavior that protects live fixture cleanup is not directly guarded against regression.
- Recommendation: Add a behavior test where an abort signal terminates a trusted command that spawns a SIGTERM-ignoring grandchild, and assert the grandchild is gone before `ShellProbe.run()` returns.
- Evidence: The abort test runs `process.execPath` directly, while the process-group grandchild test is timeout-only (`shell probe reaps timed-out command process groups`).
Bridge-only ShellProbe still needs source-of-truth follow-up (test/e2e-scenario/framework/shell-probe.ts:8): `ShellProbe` is explicitly a bridge-only host shell probe pending shared spawn/evidence helper consolidation. The local bridge does identify Consolidate E2E fixture shell probe with scenario spawn infra #4988, but the current implementation still carries cleanup edge cases and incomplete abort-grandchild regression coverage, so the localized behavior is not yet fully justified as a safe temporary source of truth.
- Recommendation: Before relying on this as the fixture shell boundary, either fix the cleanup/test gaps in this PR or document the exact invalid states, source boundary, source-fix constraint, regression coverage, and removal condition tied to the Consolidate E2E fixture shell probe with scenario spawn infra #4988 consolidation.
- Evidence: The file comment says the end state is a shared spawn/evidence helper tracked by Consolidate E2E fixture shell probe with scenario spawn infra #4988. The bridge uses detached process groups and negative pgid signaling, but the stale escalation timer and abort-grandchild coverage gaps remain.
Best-effort evidence writer handles invalid paths locally instead of fixing the source (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` locally validates and suppresses malformed evidence paths, but the code does not show where invalid `ProbeContext.evidencePath` values are created, why that source cannot be made impossible here, or when this tolerant behavior can be removed. For required security probes, losing evidence is a source-of-truth concern, not just an IO convenience.
- Recommendation: Move evidence-path normalization/validation to the `ProbeContext` construction boundary where possible. If best-effort writes must remain, document which invalid state is tolerated, why the source cannot be fixed now, and how invalid-path regressions are surfaced.
- Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes under `contextDir`, but `writeProbeEvidence()` catches all errors and does not surface the invalid configured path.
Inherited environment mode needs explicit credential-redaction coverage (test/e2e-scenario/framework/shell-probe.ts:151): `ShellProbe` defaults to an explicit env, which is good, but `inheritEnv: true` can pass the full parent environment to a host child. The redaction path should cover sensitive inherited values, but this high-risk mode is not directly tested for TOKEN, SECRET, and API_KEY values across stdout, stderr, command metadata, and result artifacts.
- Recommendation: Add a regression test for `inheritEnv: true` that sets inherited TOKEN, SECRET, and API_KEY-like values and verifies they are redacted from stdout, stderr, command argv metadata, thrown errors, and result artifacts.
- Evidence: The spawn env is `{ ...process.env, ...(options.env ?? {}) }` when `inheritEnv` is true. Existing tests cover explicit env isolation and explicit redaction values, but not inherited sensitive env redaction.
Artifact directories can collide for duplicate Vitest task names (test/e2e-scenario/framework/artifacts.ts:50): `createArtifactSink()` derives the artifact directory only from the slugified Vitest task name. Duplicate test names in different suites or nested scopes can write to the same `.e2e/vitest/<slug>` directory, mixing shell evidence and cleanup summaries.
- Recommendation: Include stable suite/file context or a unique Vitest task identifier in the artifact path, or reject duplicate scenario test names. Add a regression test covering duplicate Vitest task names.
- Evidence: `createArtifactSink(testName)` returns `new ArtifactSink(path.join(baseDir, slugifyArtifactName(testName)))`, and `e2e-test.ts` passes only `task.name`.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

# Conflicts: # test/e2e-scenario/framework-tests/e2e-live-project-config.test.ts # test/e2e-scenario/framework/live-project-gate.ts # vitest.config.ts

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv · 2026-06-08T19:08:04Z

Addressed the PR review advisor feedback in ea012f918: cleanup failure reporting is redacted, cleanup failure handling has negative-path coverage, ShellProbe defaults to explicit env only with inheritEnv for broad inheritance, and timeout now escalates from SIGTERM to SIGKILL.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv · 2026-06-08T19:25:42Z

Addressed the remaining PR review advisor item in b117d3cec: ShellProbe now clears timeout/abort cleanup in a finally path for spawn errors, writes redacted failure artifacts before rethrowing, and has a missing-command regression test covering listener cleanup and redaction.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv · 2026-06-08T19:37:42Z

Addressed the PR review advisor fixture API finding in 5c3cc53b2: framework/e2e-test.ts now uses Vitest object fixtures with use(...) and teardown after the test body for artifact summaries and cleanup assertions.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv · 2026-06-08T19:51:44Z

Addressed the remaining ShellProbe advisor items in 9f305784e: ShellProbe.run now accepts a TrustedShellCommand descriptor instead of raw command strings, trustedShellCommand(...) records the trusted boundary and validates command/argument tokens, fixture tests assert the descriptor API at type level, and abort-triggered termination now shares the timeout SIGTERM→SIGKILL escalation path with regression coverage.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/framework/shell-probe.ts`:
- Around line 170-175: ShellProbe.run() currently attaches an "abort" listener
to this.signal without checking if this.signal.aborted first, so a pre-aborted
AbortSignal can skip immediate termination; update ShellProbe.run() to check
this.signal.aborted before adding the listener and call the existing
abort/terminate logic immediately if true (referencing ShellProbe.run(),
this.signal, the abort handler, timeout/terminate logic), keep the existing
setTimeout fallback, and ensure the listener is still added only when not
already aborted; additionally add an e2e test that constructs a pre-aborted
AbortSignal and passes it to ShellProbe.run() to verify immediate termination
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7e3216a3-ba1a-46ea-94cd-1a87acbe7589

📥 Commits

Reviewing files that changed from the base of the PR and between 015a1b4 and 9f30578.

📒 Files selected for processing (6)

test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts
test/e2e-scenario/framework/artifacts.ts
test/e2e-scenario/framework/cleanup.ts
test/e2e-scenario/framework/e2e-test.ts
test/e2e-scenario/framework/secrets.ts
test/e2e-scenario/framework/shell-probe.ts

jyaunches

Thanks for landing the foundation — the Vitest fixture model is the right target architecture and the chained stack (#4965 → #4969) is a clean way to ship it. But before this foundation lands, three of the five new modules reimplement infrastructure that #4380 just hardened, and two of those reimplementations are weaker than the existing typed-shell-runner code. Calling these out as a Request Changes so we don't ship two parallel spawn boundaries / two redaction layers / two artifact layouts in test/e2e-scenario/.

Concrete duplications

1. `framework/shell-probe.ts` reimplements `scenarios/orchestrators/phase.ts:executeStep` + `scenarios/probes/util.ts:spawnBash`/`runHostCmd`

Both spawn a child, capture stdout/stderr, enforce a timeout, write artifacts. The differences are regressions:

Timeout reaping: shell-probe.ts does SIGTERM → grace → SIGKILL on the leader pid only. The existing phase.ts uses a detached process group + negative-pid SIGTERM specifically to reap orphan grandchildren (bash + sleep is the canonical example). On a wedged sandbox-exec the new fixture leaves the inner process running.
Input validation contract: probes/util.ts has a NUL-byte rejector + an lgtm[js/shell-command-injection-from-environment] justification we just landed in #4380 (commits bd4d3ce0b, 7887adb92, 70df0519b) to clear CodeQL alerts 715/792/793. shell-probe.ts has no equivalent contract; it'll trip the same alerts as soon as a fixture runs against an untrusted-shape input.
Sandbox-exec transport: runSandboxCmd routes through validation_suites/sandbox-exec.sh so ssh-config-preferred / openshell-exec-fallback lives in one place. shell-probe.ts doesn't have a sandbox-exec story yet, which is fine for host commands but means scenarios that need to enter the sandbox can't use it without re-importing the existing helper.

2. `framework/secrets.ts` reimplements `scenarios/orchestrators/redaction.ts` — with weaker coverage

This one is a real correctness regression:

Existing: redaction.ts imports TOKEN_PREFIX_PATTERNS and CONTEXT_PATTERNS directly from src/lib/security/secret-patterns.ts (the canonical product source). framework-tests/e2e-redaction-parity.test.ts (landed in #4380) locks the framework mirror to the product source via fingerprint comparison. Adding a new token shape in secret-patterns.ts automatically protects the framework path.
framework/secrets.ts: uses SENSITIVE_NAME_PATTERN = /(api[_-]?key|token|secret|password|credential)/i against env var names, then substring-replaces the values it found. Tokens that appear in output without a matching env-var name (e.g. an nvapi-... echoed from a help string) are not redacted at all.

The new framework participates in zero parity contract with secret-patterns.ts. Adding a new token shape upstream will silently leak through framework/.

3. `framework/artifacts.ts` reimplements `phase.ts:writeEvidence` + `probes/util.ts:writeProbeEvidence` — with one improvement

ArtifactSink.pathFor() rejects absolute paths and escape paths via prefix check. The existing writeProbeEvidence does plain mkdirSync + writeFileSync — no path validation. That part of framework/ is genuinely better and we should backport it to probes/util.ts.
The artifact layouts are incompatible: framework/ writes <base>.stdout.txt/<base>.stderr.txt/<base>.result.json rooted at .e2e/vitest/<test-slug>/, the existing typed-shell-runner writes .e2e/logs/<step-id>.log + .e2e/actions/<action-id>.log + .e2e/<phase>.result.json. Two layouts under the same .e2e/ root will fragment the artifact-bundle redaction allowlist in e2e-scenarios.yaml.

4 + 5: Genuinely new

framework/cleanup.ts: per-test cleanup-registry pattern. The existing ScenarioRunner does imperative phase teardown without a registry. New, fine.
framework/e2e-test.ts: Vitest test.extend fixture composition. This is the architectural shift the migration exists for, not duplication.

What I'd ask before merging

Either of these works:

Option A — consume shared infrastructure (preferred).
Extract the existing typed-shell-runner spawn/redact/evidence code into a shared test/e2e-scenario/framework-infra/ module that both scenarios/orchestrators/ and framework/ consume. The Vitest fixtures inherit:

the parity-tested redaction patterns,
the NUL-validated + lgtm-justified spawn contract,
the detached-pgid timeout reaping.

Option B — explicitly bridge-only + backport.
Mark framework/secrets.ts and framework/shell-probe.ts as bridge-only in their doc comments, file follow-up issues to consolidate (one per module), and:

update framework/secrets.ts to import the canonical regex sets so e2e-redaction-parity.test.ts covers both layers, or add a separate parity test for framework/secrets.ts against secret-patterns.ts,
backport artifacts.ts:pathFor path-traversal check to probes/util.ts:writeProbeEvidence.

Without one of those, this PR ships test/e2e-scenario/ with two spawn boundaries, two redaction layers, and two artifact layouts — and two of the three are weaker than what we just hardened to land #4380.

Happy to pair on Option A if it'd help — extracting framework-infra/spawn.ts + framework-infra/redact.ts + framework-infra/artifacts.ts is mostly mechanical, and we'd land #4380's hardening on the new fixture path for free.

Refs: #4380, parity test in test/e2e-scenario/framework-tests/e2e-redaction-parity.test.ts, spawn safety contract in test/e2e-scenario/scenarios/probes/util.ts:spawnBash.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv · 2026-06-08T20:05:04Z

Addressed the CodeRabbit pre-aborted AbortSignal finding in a013e04: ShellProbe now checks signal.aborted before registering the abort listener and immediately runs the same terminate path for pre-aborted signals. Added a regression test that proves a pre-aborted signal terminates the child promptly without waiting for the timeout.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv · 2026-06-08T20:22:48Z

Addressed the changes-requested review in 929efde.

What changed:

Marked the fixture shell probe and SecretStore as bridge-only and filed the consolidation follow-ups: Consolidate E2E fixture shell probe with scenario spawn infra #4988 for shared shell/spawn infra, Consolidate E2E fixture secret redaction with scenario redaction infra #4989 for shared fixture redaction infra.
Updated SecretStore redaction to layer explicit fixture secret-value replacement over the parity-tested framework redactor from scenarios/orchestrators/redaction.ts, so canonical token shapes are covered by the same test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 redaction source.
Updated ShellProbe to run commands in a detached process group and terminate the group on timeout/abort, while keeping the trusted descriptor + NUL rejection + explicit-env defaults.
Backported artifact path safety into writeProbeEvidence by validating ProbeContext.evidencePath under ProbeContext.contextDir, and updated built-in probes to pass the full context.
Added regressions for canonical fixture redaction, process-group reaping, pre-aborted AbortSignal handling, and probe evidence path escape prevention.

cv added 3 commits June 8, 2026 09:34

docs(e2e): choose Vitest fixtures as scenario runner

3e98b63

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): add opt-in live scenario project

9e355f0

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): add scenario fixture context

a11bc2a

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 8, 2026

cv self-assigned this Jun 8, 2026

cv requested a review from jyaunches June 8, 2026 16:42

cv mentioned this pull request Jun 8, 2026

Adopt Vitest fixtures as the E2E scenario execution model #4941

Open

7 tasks

Base automatically changed from codex/e2e-4941-stack-02-live-project to main June 8, 2026 18:57

cv added 2 commits June 8, 2026 11:58

merge: bring e2e fixture context branch onto main

c96c0c0

# Conflicts: # test/e2e-scenario/framework-tests/e2e-live-project-config.test.ts # test/e2e-scenario/framework/live-project-gate.ts # vitest.config.ts

test(e2e): harden scenario fixture cleanup and probes

ea012f9

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv marked this pull request as ready for review June 8, 2026 19:08

test(e2e): harden shell probe spawn errors

b117d3c

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): use vitest fixture object api

5c3cc53

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): add shell probe command guardrails

9f30578

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread test/e2e-scenario/framework/shell-probe.ts

jyaunches requested changes Jun 8, 2026

View reviewed changes

test(e2e): handle pre-aborted shell probe signals

a013e04

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv added the v0.0.62 Release target label Jun 8, 2026

This was referenced Jun 8, 2026

Consolidate E2E fixture shell probe with scenario spawn infra #4988

Open

Consolidate E2E fixture secret redaction with scenario redaction infra #4989

Open

test(e2e): align fixture bridge with scenario infra

929efde

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv requested a review from jyaunches June 8, 2026 20:36

jyaunches mentioned this pull request Jun 8, 2026

Adopt phase fixtures + registry-driven test discovery for Vitest E2E scenarios #4990

Open

8 tasks

Merge branch 'main' into codex/e2e-4941-stack-03-fixture-context

6e6c80b

jyaunches approved these changes Jun 8, 2026

View reviewed changes

cv merged commit 0151ff3 into main Jun 8, 2026
40 checks passed

cv deleted the codex/e2e-4941-stack-03-fixture-context branch June 8, 2026 22:13

coderabbitai Bot mentioned this pull request Jun 8, 2026

test(e2e): add fixture-friendly clients #4966

Merged

12 tasks

wscurran added the chore Build, CI, dependency, or tooling maintenance label Jun 8, 2026

coderabbitai Bot mentioned this pull request Jun 8, 2026

chore(e2e): delete deprecated bash entrypoints and orphan helper libs #4996

Merged

12 tasks

jyaunches mentioned this pull request Jun 9, 2026

Vitest E2E coverage: assert NemoClaw's high-value behavioral catalog #4999

Open

This was referenced Jun 9, 2026

test(e2e): add environment phase fixture #5002

Merged

test(e2e): add onboarding phase fixture #5003

Open

refactor(test/e2e): route fixture redaction through canonical entry #5022

Open

Conversation

cv commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

cv commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cv commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cv commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cv commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jyaunches left a comment

Choose a reason for hiding this comment

Concrete duplications

1. framework/shell-probe.ts reimplements scenarios/orchestrators/phase.ts:executeStep + scenarios/probes/util.ts:spawnBash/runHostCmd

2. framework/secrets.ts reimplements scenarios/orchestrators/redaction.ts — with weaker coverage

3. framework/artifacts.ts reimplements phase.ts:writeEvidence + probes/util.ts:writeProbeEvidence — with one improvement

4 + 5: Genuinely new

What I'd ask before merging

Uh oh!

cv commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cv commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cv commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

cv commented Jun 8, 2026 •

edited

Loading

cv commented Jun 8, 2026 •

edited

Loading

cv commented Jun 8, 2026 •

edited

Loading

cv commented Jun 8, 2026 •

edited

Loading

1. `framework/shell-probe.ts` reimplements `scenarios/orchestrators/phase.ts:executeStep` + `scenarios/probes/util.ts:spawnBash`/`runHostCmd`

2. `framework/secrets.ts` reimplements `scenarios/orchestrators/redaction.ts` — with weaker coverage

3. `framework/artifacts.ts` reimplements `phase.ts:writeEvidence` + `probes/util.ts:writeProbeEvidence` — with one improvement

cv commented Jun 8, 2026 •

edited

Loading

cv commented Jun 8, 2026 •

edited

Loading