Skip to content

test(e2e): add scenario fixture context#4965

Merged
cv merged 11 commits into
mainfrom
codex/e2e-4941-stack-03-fixture-context
Jun 8, 2026
Merged

test(e2e): add scenario fixture context#4965
cv merged 11 commits into
mainfrom
codex/e2e-4941-stack-03-fixture-context

Conversation

@cv

@cv cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the core Vitest fixture context for E2E scenarios, including artifacts, cleanup, secret redaction, and shell probing. Also hardens cleanup/probe behavior based on advisor feedback.

Related Issue

Refs #4941.

Changes

  • Adds artifact collection helpers under test/e2e-scenario/framework/artifacts.ts.
  • Adds cleanup registration with redacted failure reporting and negative-path cleanup coverage.
  • Adds secret redaction utilities for fixture-scoped resources.
  • Adds ShellProbe helpers with trusted command descriptors, explicit environment passing, redacted artifacts, timeout/abort SIGKILL escalation, and spawn-error cleanup handling.
  • Exposes the scenario-aware Vitest fixture from framework/e2e-test.ts using Vitest object fixtures.
  • Adds framework tests for fixture lifecycle, artifacts, cleanup, redaction, environment isolation, timeout/abort handling, trusted command guardrails, and missing-command failures.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests
    • Expanded E2E test suite with fixtures for artifacts, secrets, cleanup, and shell probing; adds artifact writing, redaction, and cleanup assertions.
    • New scenarios cover artifact-path confinement, secret redaction/enforcement, cleanup reverse-order semantics, robust shell-execution handling (spawn failures, aborts, timeouts, SIGKILL escalation, process-group reaping), and probe evidence path validation.

cv added 3 commits June 8, 2026 09:34
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 8, 2026
@cv cv self-assigned this Jun 8, 2026
@cv cv requested a review from jyaunches June 8, 2026 16:42
@copy-pr-bot

copy-pr-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7839890b-04db-4330-921b-f0b9b8f2758c

📥 Commits

Reviewing files that changed from the base of the PR and between 9f30578 and 6e6c80b.

📒 Files selected for processing (8)
  • test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts
  • test/e2e-scenario/framework-tests/e2e-probes.test.ts
  • test/e2e-scenario/framework/secrets.ts
  • test/e2e-scenario/framework/shell-probe.ts
  • test/e2e-scenario/scenarios/probes/injection-blocked.ts
  • test/e2e-scenario/scenarios/probes/network-policy.ts
  • test/e2e-scenario/scenarios/probes/shields-config.ts
  • test/e2e-scenario/scenarios/probes/util.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/e2e-scenario/framework/secrets.ts
  • test/e2e-scenario/framework/shell-probe.ts
  • test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts

📝 Walkthrough

Walkthrough

Adds an E2E test framework: confined artifact sink, secret-store redaction, cleanup registry, trusted ShellProbe (validation, timeout/abort escalation, redacted artifacts), Vitest fixture wiring, and tests + probe-evidence utilities.

Changes

E2E Test Framework

Layer / File(s) Summary
Artifact sink and path safety
test/e2e-scenario/framework/artifacts.ts
ArtifactSink ensures root exists, resolves paths confined to the root (rejects absolute/traversal), and writes text/JSON artifacts; slugifyArtifactName and createArtifactSink provide per-test sinks.
Cleanup management and ordering
test/e2e-scenario/framework/cleanup.ts
CleanupRegistry registers cleanup functions, runs them in reverse registration order with optional redaction, returns passed/failures, and assertCleanupPassed aggregates failures into an Error.
Vitest fixture orchestration
test/e2e-scenario/framework/e2e-test.ts
E2EScenarioFixtures and test fixture factory wire artifacts, secrets, cleanup, and shellProbe, manage setup/teardown, write summary artifacts, and re-export expect.
Secrets and text redaction
test/e2e-scenario/framework/secrets.ts
redactText() replaces explicit secret values with [REDACTED] and runs shared redactor; SecretStore reads env with optional()/required(), collects redaction candidates, and exposes redact() and redactionValues().
Trusted shell command and probe runtime
test/e2e-scenario/framework/shell-probe.ts
trustedShellCommand() validates command/reason (trims, rejects NULs, optional args validator) and returns branded descriptor. ShellProbe.run() spawns a detached child, captures stdout/stderr, enforces timeout (SIGTERM→SIGKILL) and AbortSignal escalation, writes redacted stdout/stderr/result artifacts, and throws normalized redacted errors on spawn/runtime failure.
Framework validation and shell-probe tests
test/e2e-scenario/framework-tests/*
Vitest suites covering helper utilities, artifact-root confinement, cleanup reverse-order/redaction/clear semantics, secret redaction and required-secret behavior, trusted command validation, spawn-failure redaction, abort/timeout escalation (including SIGKILL), process-group reaping, and fixture-context end-to-end scenarios.
Probe evidence API and callsites
test/e2e-scenario/scenarios/probes/util.ts, test/e2e-scenario/scenarios/probes/*, test/e2e-scenario/framework-tests/e2e-probes.test.ts
writeProbeEvidence now accepts ProbeContext, centrally resolves/validates the evidence path against ctx.contextDir (rejecting escapes), and all probe callsites updated to pass ctx; tests verify in-context writes and ignored escape-path attempts.

Sequence Diagram

sequenceDiagram
  participant Vitest
  participant ShellProbe
  participant ChildProcess
  participant ArtifactSink
  Vitest->>ShellProbe: run(trustedCommand, options)
  ShellProbe->>ChildProcess: spawn(detached, cwd, env)
  ChildProcess-->>ShellProbe: stdout/stderr/events
  ShellProbe->>ShellProbe: enforce timeout/abort (SIGTERM -> SIGKILL)
  ShellProbe->>ArtifactSink: writeText(redacted stdout)
  ShellProbe->>ArtifactSink: writeText(redacted stderr)
  ShellProbe->>ArtifactSink: writeJson(redacted result)
  ShellProbe-->>Vitest: return ShellProbeResult
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested labels

bug-fix

Suggested reviewers

  • prekshivyas

Poem

🐇 I hid the secrets, penned each log with care,
Artifacts safe in a root that's fair,
Cleanup hops backward, tidy and spry,
Shell probes time out — then KILL says goodbye,
Tests sing redacted tales beneath the moonlit air.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): add scenario fixture context' clearly and concisely describes the main change—introducing a Vitest fixture context for E2E scenarios. It is specific enough to distinguish from generic updates, directly relates to the primary deliverable across the changeset, and follows conventional commit patterns.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/e2e-4941-stack-03-fixture-context

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: ubuntu-repo-cloud-openclaw
Optional E2E: telegram-injection

Dispatch hint: ubuntu-repo-cloud-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • ubuntu-repo-cloud-openclaw (high): This canonical typed scenario includes the supplemental required security suites security-shields, security-policy, and security-injection, so it directly exercises the changed shieldsConfigProbe, networkPolicyProbe, injectionBlockedProbe, and shared probe evidence writer against a live sandbox/gateway.

Optional E2E

  • telegram-injection (high): Useful adjacent regression coverage for command-injection defenses through a real assistant messaging path, but not merge-blocking because the changed typed injectionBlockedProbe is already exercised by ubuntu-repo-cloud-openclaw.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-scenarios.yaml
  • jobs input: ubuntu-repo-cloud-openclaw

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Changes touch shared scenario E2E fixture/probe infrastructure and built-in security probe utilities under test/e2e-scenario. These modules are part of the scenario runner/probe surface and can affect scenario startup, evidence handling, and security assertions across the dispatchable catalog, so run the all-scenarios fan-out rather than a single targeted scenario.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts
  • test/e2e-scenario/framework-tests/e2e-probes.test.ts
  • test/e2e-scenario/framework/artifacts.ts
  • test/e2e-scenario/framework/cleanup.ts
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/framework/secrets.ts
  • test/e2e-scenario/framework/shell-probe.ts
  • test/e2e-scenario/scenarios/probes/injection-blocked.ts
  • test/e2e-scenario/scenarios/probes/network-policy.ts
  • test/e2e-scenario/scenarios/probes/shields-config.ts
  • test/e2e-scenario/scenarios/probes/util.ts

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 8 worth checking, 1 nice ideas
Since last review: 0 prior items resolved, 6 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: test/e2e-scenario/framework/shell-probe.ts bridge-only host shell probe: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Comment identifies the helper as bridge-only; implementation uses `detached: true` and `process.kill(-pgid, signal)`, but the `finally` block leaves `killTimer` armed when termination started.
  • Source-of-truth review needed: test/e2e-scenario/scenarios/probes/util.ts best-effort evidence writer: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes, but `writeProbeEvidence()` catches all errors and does not surface validation failures.
  • Malformed probe evidence paths are silently dropped (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` now validates that evidence stays under `ProbeContext.contextDir`, which is the right security boundary, but it catches the validation error together with ordinary IO failures. A malformed or escaping evidence path therefore makes required security-probe evidence disappear while the probe can still report pass/fail normally, reducing auditability of sandbox, shields, injection, and network-policy checks.
    • Recommendation: Keep ordinary evidence IO best-effort if that is the intended contract, but distinguish path-validation failures. Fail the probe, return a write status, or write a redacted diagnostic to a safe fallback artifact when `evidencePath` escapes `contextDir`.
    • Evidence: `resolveProbeEvidencePath(ctx)` throws on paths outside `contextDir`, and `writeProbeEvidence(ctx, payload)` immediately catches all errors. The new test only asserts that the outside file is not created; it does not assert that the invalid state is surfaced.
  • ShellProbe leaves the SIGKILL escalation timer armed after terminated children close (test/e2e-scenario/framework/shell-probe.ts:219): When timeout or abort starts termination, `terminationStarted` is set to true and the `finally` block intentionally does not clear `killTimer`. `run()` can return after the child has closed while an unref'd SIGKILL callback remains scheduled against the old process group id. This is low probability, but it is undesirable at a host process boundary because a delayed cleanup action can fire after ownership of that process group has ended.
    • Recommendation: Clear `killTimer` in `finally` once the child has closed, regardless of whether termination was started. Add a regression test with a short grace window that verifies no post-return escalation callback remains or takes effect.
    • Evidence: `terminate()` sets `terminationStarted = true` and schedules `killTimer`; the `finally` block only runs `clearTimeout(killTimer)` when `killTimer && !terminationStarted`.
  • Abort process-group cleanup still lacks direct grandchild coverage (test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts:192): The timeout path now verifies that a background grandchild exits, but the abort path only exercises a direct child. Vitest cancellation uses the abort path, so the process-group behavior that protects live fixture cleanup is not directly guarded against regression.
    • Recommendation: Add a behavior test where an abort signal terminates a trusted command that spawns a SIGTERM-ignoring grandchild, and assert the grandchild is gone before `ShellProbe.run()` returns.
    • Evidence: The abort test runs `process.execPath` directly, while the process-group grandchild test is timeout-only (`shell probe reaps timed-out command process groups`).
  • Bridge-only ShellProbe still needs source-of-truth follow-up (test/e2e-scenario/framework/shell-probe.ts:8): `ShellProbe` is explicitly a bridge-only host shell probe pending shared spawn/evidence helper consolidation. The local bridge does identify Consolidate E2E fixture shell probe with scenario spawn infra #4988, but the current implementation still carries cleanup edge cases and incomplete abort-grandchild regression coverage, so the localized behavior is not yet fully justified as a safe temporary source of truth.
  • Best-effort evidence writer handles invalid paths locally instead of fixing the source (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` locally validates and suppresses malformed evidence paths, but the code does not show where invalid `ProbeContext.evidencePath` values are created, why that source cannot be made impossible here, or when this tolerant behavior can be removed. For required security probes, losing evidence is a source-of-truth concern, not just an IO convenience.
    • Recommendation: Move evidence-path normalization/validation to the `ProbeContext` construction boundary where possible. If best-effort writes must remain, document which invalid state is tolerated, why the source cannot be fixed now, and how invalid-path regressions are surfaced.
    • Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes under `contextDir`, but `writeProbeEvidence()` catches all errors and does not surface the invalid configured path.
  • Inherited environment mode needs explicit credential-redaction coverage (test/e2e-scenario/framework/shell-probe.ts:151): `ShellProbe` defaults to an explicit env, which is good, but `inheritEnv: true` can pass the full parent environment to a host child. The redaction path should cover sensitive inherited values, but this high-risk mode is not directly tested for TOKEN, SECRET, and API_KEY values across stdout, stderr, command metadata, and result artifacts.
    • Recommendation: Add a regression test for `inheritEnv: true` that sets inherited TOKEN, SECRET, and API_KEY-like values and verifies they are redacted from stdout, stderr, command argv metadata, thrown errors, and result artifacts.
    • Evidence: The spawn env is `{ ...process.env, ...(options.env ?? {}) }` when `inheritEnv` is true. Existing tests cover explicit env isolation and explicit redaction values, but not inherited sensitive env redaction.

🌱 Nice ideas

  • Artifact directories can collide for duplicate Vitest task names (test/e2e-scenario/framework/artifacts.ts:50): `createArtifactSink()` derives the artifact directory only from the slugified Vitest task name. Duplicate test names in different suites or nested scopes can write to the same `.e2e/vitest/<slug>` directory, mixing shell evidence and cleanup summaries.
    • Recommendation: Include stable suite/file context or a unique Vitest task identifier in the artifact path, or reject duplicate scenario test names. Add a regression test covering duplicate Vitest task names.
    • Evidence: `createArtifactSink(testName)` returns `new ArtifactSink(path.join(baseDir, slugifyArtifactName(testName)))`, and `e2e-test.ts` passes only `task.name`.
Consider writing more tests for
  • **Runtime validation** — ShellProbe abort kills a SIGTERM-ignoring grandchild before run returns. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
  • **Runtime validation** — ShellProbe clears pending SIGKILL escalation timer after terminated child close. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
  • **Runtime validation** — ShellProbe inheritEnv true redacts inherited TOKEN SECRET and API_KEY values from stdout stderr command and result artifacts. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
  • **Runtime validation** — writeProbeEvidence records or reports an escaping evidence path instead of silently losing required probe evidence. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
  • **Runtime validation** — createArtifactSink keeps duplicate Vitest task names distinct or rejects duplicate task names. Unit/framework coverage is strong for the new fixture primitives, but the PR introduces live host process behavior, signal escalation, process-group cleanup, inherited environment handling, and security-probe evidence capture. Targeted runtime validation would reduce risk at these boundaries.
  • **Abort process-group cleanup still lacks direct grandchild coverage** — Add a behavior test where an abort signal terminates a trusted command that spawns a SIGTERM-ignoring grandchild, and assert the grandchild is gone before `ShellProbe.run()` returns.
  • **Artifact directories can collide for duplicate Vitest task names** — Include stable suite/file context or a unique Vitest task identifier in the artifact path, or reject duplicate scenario test names. Add a regression test covering duplicate Vitest task names.
  • **Acceptance clause:** Refs Adopt Vitest fixtures as the E2E scenario execution model #4941. — add test evidence or identify existing coverage. No linked issue body or comments were included in the deterministic context, so literal Adopt Vitest fixtures as the E2E scenario execution model #4941 acceptance clauses could not be verified. Repository docs identify Adopt Vitest fixtures as the E2E scenario execution model #4941 as the Vitest fixture scenario-runner decision, and this PR adds the initial fixture context.
Since last review details

Current findings:

  • Source-of-truth review needed: test/e2e-scenario/framework/shell-probe.ts bridge-only host shell probe: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Comment identifies the helper as bridge-only; implementation uses `detached: true` and `process.kill(-pgid, signal)`, but the `finally` block leaves `killTimer` armed when termination started.
  • Source-of-truth review needed: test/e2e-scenario/scenarios/probes/util.ts best-effort evidence writer: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes, but `writeProbeEvidence()` catches all errors and does not surface validation failures.
  • Malformed probe evidence paths are silently dropped (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` now validates that evidence stays under `ProbeContext.contextDir`, which is the right security boundary, but it catches the validation error together with ordinary IO failures. A malformed or escaping evidence path therefore makes required security-probe evidence disappear while the probe can still report pass/fail normally, reducing auditability of sandbox, shields, injection, and network-policy checks.
    • Recommendation: Keep ordinary evidence IO best-effort if that is the intended contract, but distinguish path-validation failures. Fail the probe, return a write status, or write a redacted diagnostic to a safe fallback artifact when `evidencePath` escapes `contextDir`.
    • Evidence: `resolveProbeEvidencePath(ctx)` throws on paths outside `contextDir`, and `writeProbeEvidence(ctx, payload)` immediately catches all errors. The new test only asserts that the outside file is not created; it does not assert that the invalid state is surfaced.
  • ShellProbe leaves the SIGKILL escalation timer armed after terminated children close (test/e2e-scenario/framework/shell-probe.ts:219): When timeout or abort starts termination, `terminationStarted` is set to true and the `finally` block intentionally does not clear `killTimer`. `run()` can return after the child has closed while an unref'd SIGKILL callback remains scheduled against the old process group id. This is low probability, but it is undesirable at a host process boundary because a delayed cleanup action can fire after ownership of that process group has ended.
    • Recommendation: Clear `killTimer` in `finally` once the child has closed, regardless of whether termination was started. Add a regression test with a short grace window that verifies no post-return escalation callback remains or takes effect.
    • Evidence: `terminate()` sets `terminationStarted = true` and schedules `killTimer`; the `finally` block only runs `clearTimeout(killTimer)` when `killTimer && !terminationStarted`.
  • Abort process-group cleanup still lacks direct grandchild coverage (test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts:192): The timeout path now verifies that a background grandchild exits, but the abort path only exercises a direct child. Vitest cancellation uses the abort path, so the process-group behavior that protects live fixture cleanup is not directly guarded against regression.
    • Recommendation: Add a behavior test where an abort signal terminates a trusted command that spawns a SIGTERM-ignoring grandchild, and assert the grandchild is gone before `ShellProbe.run()` returns.
    • Evidence: The abort test runs `process.execPath` directly, while the process-group grandchild test is timeout-only (`shell probe reaps timed-out command process groups`).
  • Bridge-only ShellProbe still needs source-of-truth follow-up (test/e2e-scenario/framework/shell-probe.ts:8): `ShellProbe` is explicitly a bridge-only host shell probe pending shared spawn/evidence helper consolidation. The local bridge does identify Consolidate E2E fixture shell probe with scenario spawn infra #4988, but the current implementation still carries cleanup edge cases and incomplete abort-grandchild regression coverage, so the localized behavior is not yet fully justified as a safe temporary source of truth.
  • Best-effort evidence writer handles invalid paths locally instead of fixing the source (test/e2e-scenario/scenarios/probes/util.ts:294): `writeProbeEvidence()` locally validates and suppresses malformed evidence paths, but the code does not show where invalid `ProbeContext.evidencePath` values are created, why that source cannot be made impossible here, or when this tolerant behavior can be removed. For required security probes, losing evidence is a source-of-truth concern, not just an IO convenience.
    • Recommendation: Move evidence-path normalization/validation to the `ProbeContext` construction boundary where possible. If best-effort writes must remain, document which invalid state is tolerated, why the source cannot be fixed now, and how invalid-path regressions are surfaced.
    • Evidence: `resolveProbeEvidencePath(ctx)` rejects escapes under `contextDir`, but `writeProbeEvidence()` catches all errors and does not surface the invalid configured path.
  • Inherited environment mode needs explicit credential-redaction coverage (test/e2e-scenario/framework/shell-probe.ts:151): `ShellProbe` defaults to an explicit env, which is good, but `inheritEnv: true` can pass the full parent environment to a host child. The redaction path should cover sensitive inherited values, but this high-risk mode is not directly tested for TOKEN, SECRET, and API_KEY values across stdout, stderr, command metadata, and result artifacts.
    • Recommendation: Add a regression test for `inheritEnv: true` that sets inherited TOKEN, SECRET, and API_KEY-like values and verifies they are redacted from stdout, stderr, command argv metadata, thrown errors, and result artifacts.
    • Evidence: The spawn env is `{ ...process.env, ...(options.env ?? {}) }` when `inheritEnv` is true. Existing tests cover explicit env isolation and explicit redaction values, but not inherited sensitive env redaction.
  • Artifact directories can collide for duplicate Vitest task names (test/e2e-scenario/framework/artifacts.ts:50): `createArtifactSink()` derives the artifact directory only from the slugified Vitest task name. Duplicate test names in different suites or nested scopes can write to the same `.e2e/vitest/<slug>` directory, mixing shell evidence and cleanup summaries.
    • Recommendation: Include stable suite/file context or a unique Vitest task identifier in the artifact path, or reject duplicate scenario test names. Add a regression test covering duplicate Vitest task names.
    • Evidence: `createArtifactSink(testName)` returns `new ArtifactSink(path.join(baseDir, slugifyArtifactName(testName)))`, and `e2e-test.ts` passes only `task.name`.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Base automatically changed from codex/e2e-4941-stack-02-live-project to main June 8, 2026 18:57
cv added 2 commits June 8, 2026 11:58
# Conflicts:
#	test/e2e-scenario/framework-tests/e2e-live-project-config.test.ts
#	test/e2e-scenario/framework/live-project-gate.ts
#	vitest.config.ts
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the PR review advisor feedback in ea012f918: cleanup failure reporting is redacted, cleanup failure handling has negative-path coverage, ShellProbe defaults to explicit env only with inheritEnv for broad inheritance, and timeout now escalates from SIGTERM to SIGKILL.

@cv cv marked this pull request as ready for review June 8, 2026 19:08
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the remaining PR review advisor item in b117d3cec: ShellProbe now clears timeout/abort cleanup in a finally path for spawn errors, writes redacted failure artifacts before rethrowing, and has a missing-command regression test covering listener cleanup and redaction.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the PR review advisor fixture API finding in 5c3cc53b2: framework/e2e-test.ts now uses Vitest object fixtures with use(...) and teardown after the test body for artifact summaries and cleanup assertions.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the remaining ShellProbe advisor items in 9f305784e: ShellProbe.run now accepts a TrustedShellCommand descriptor instead of raw command strings, trustedShellCommand(...) records the trusted boundary and validates command/argument tokens, fixture tests assert the descriptor API at type level, and abort-triggered termination now shares the timeout SIGTERM→SIGKILL escalation path with regression coverage.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/framework/shell-probe.ts`:
- Around line 170-175: ShellProbe.run() currently attaches an "abort" listener
to this.signal without checking if this.signal.aborted first, so a pre-aborted
AbortSignal can skip immediate termination; update ShellProbe.run() to check
this.signal.aborted before adding the listener and call the existing
abort/terminate logic immediately if true (referencing ShellProbe.run(),
this.signal, the abort handler, timeout/terminate logic), keep the existing
setTimeout fallback, and ensure the listener is still added only when not
already aborted; additionally add an e2e test that constructs a pre-aborted
AbortSignal and passes it to ShellProbe.run() to verify immediate termination
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7e3216a3-ba1a-46ea-94cd-1a87acbe7589

📥 Commits

Reviewing files that changed from the base of the PR and between 015a1b4 and 9f30578.

📒 Files selected for processing (6)
  • test/e2e-scenario/framework-tests/e2e-fixture-context.test.ts
  • test/e2e-scenario/framework/artifacts.ts
  • test/e2e-scenario/framework/cleanup.ts
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/framework/secrets.ts
  • test/e2e-scenario/framework/shell-probe.ts

Comment thread test/e2e-scenario/framework/shell-probe.ts

@jyaunches jyaunches left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for landing the foundation — the Vitest fixture model is the right target architecture and the chained stack (#4965#4969) is a clean way to ship it. But before this foundation lands, three of the five new modules reimplement infrastructure that #4380 just hardened, and two of those reimplementations are weaker than the existing typed-shell-runner code. Calling these out as a Request Changes so we don't ship two parallel spawn boundaries / two redaction layers / two artifact layouts in test/e2e-scenario/.

Concrete duplications

1. framework/shell-probe.ts reimplements scenarios/orchestrators/phase.ts:executeStep + scenarios/probes/util.ts:spawnBash/runHostCmd

Both spawn a child, capture stdout/stderr, enforce a timeout, write artifacts. The differences are regressions:

  • Timeout reaping: shell-probe.ts does SIGTERM → grace → SIGKILL on the leader pid only. The existing phase.ts uses a detached process group + negative-pid SIGTERM specifically to reap orphan grandchildren (bash + sleep is the canonical example). On a wedged sandbox-exec the new fixture leaves the inner process running.
  • Input validation contract: probes/util.ts has a NUL-byte rejector + an lgtm[js/shell-command-injection-from-environment] justification we just landed in #4380 (commits bd4d3ce0b, 7887adb92, 70df0519b) to clear CodeQL alerts 715/792/793. shell-probe.ts has no equivalent contract; it'll trip the same alerts as soon as a fixture runs against an untrusted-shape input.
  • Sandbox-exec transport: runSandboxCmd routes through validation_suites/sandbox-exec.sh so ssh-config-preferred / openshell-exec-fallback lives in one place. shell-probe.ts doesn't have a sandbox-exec story yet, which is fine for host commands but means scenarios that need to enter the sandbox can't use it without re-importing the existing helper.

2. framework/secrets.ts reimplements scenarios/orchestrators/redaction.ts — with weaker coverage

This one is a real correctness regression:

  • Existing: redaction.ts imports TOKEN_PREFIX_PATTERNS and CONTEXT_PATTERNS directly from src/lib/security/secret-patterns.ts (the canonical product source). framework-tests/e2e-redaction-parity.test.ts (landed in #4380) locks the framework mirror to the product source via fingerprint comparison. Adding a new token shape in secret-patterns.ts automatically protects the framework path.
  • framework/secrets.ts: uses SENSITIVE_NAME_PATTERN = /(api[_-]?key|token|secret|password|credential)/i against env var names, then substring-replaces the values it found. Tokens that appear in output without a matching env-var name (e.g. an nvapi-... echoed from a help string) are not redacted at all.

The new framework participates in zero parity contract with secret-patterns.ts. Adding a new token shape upstream will silently leak through framework/.

3. framework/artifacts.ts reimplements phase.ts:writeEvidence + probes/util.ts:writeProbeEvidence — with one improvement

  • ArtifactSink.pathFor() rejects absolute paths and escape paths via prefix check. The existing writeProbeEvidence does plain mkdirSync + writeFileSyncno path validation. That part of framework/ is genuinely better and we should backport it to probes/util.ts.
  • The artifact layouts are incompatible: framework/ writes <base>.stdout.txt/<base>.stderr.txt/<base>.result.json rooted at .e2e/vitest/<test-slug>/, the existing typed-shell-runner writes .e2e/logs/<step-id>.log + .e2e/actions/<action-id>.log + .e2e/<phase>.result.json. Two layouts under the same .e2e/ root will fragment the artifact-bundle redaction allowlist in e2e-scenarios.yaml.

4 + 5: Genuinely new

  • framework/cleanup.ts: per-test cleanup-registry pattern. The existing ScenarioRunner does imperative phase teardown without a registry. New, fine.
  • framework/e2e-test.ts: Vitest test.extend fixture composition. This is the architectural shift the migration exists for, not duplication.

What I'd ask before merging

Either of these works:

Option A — consume shared infrastructure (preferred).
Extract the existing typed-shell-runner spawn/redact/evidence code into a shared test/e2e-scenario/framework-infra/ module that both scenarios/orchestrators/ and framework/ consume. The Vitest fixtures inherit:

  • the parity-tested redaction patterns,
  • the NUL-validated + lgtm-justified spawn contract,
  • the detached-pgid timeout reaping.

Option B — explicitly bridge-only + backport.
Mark framework/secrets.ts and framework/shell-probe.ts as bridge-only in their doc comments, file follow-up issues to consolidate (one per module), and:

  • update framework/secrets.ts to import the canonical regex sets so e2e-redaction-parity.test.ts covers both layers, or add a separate parity test for framework/secrets.ts against secret-patterns.ts,
  • backport artifacts.ts:pathFor path-traversal check to probes/util.ts:writeProbeEvidence.

Without one of those, this PR ships test/e2e-scenario/ with two spawn boundaries, two redaction layers, and two artifact layouts — and two of the three are weaker than what we just hardened to land #4380.

Happy to pair on Option A if it'd help — extracting framework-infra/spawn.ts + framework-infra/redact.ts + framework-infra/artifacts.ts is mostly mechanical, and we'd land #4380's hardening on the new fixture path for free.

Refs: #4380, parity test in test/e2e-scenario/framework-tests/e2e-redaction-parity.test.ts, spawn safety contract in test/e2e-scenario/scenarios/probes/util.ts:spawnBash.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the CodeRabbit pre-aborted AbortSignal finding in a013e04: ShellProbe now checks signal.aborted before registering the abort listener and immediately runs the same terminate path for pre-aborted signals. Added a regression test that proves a pre-aborted signal terminates the child promptly without waiting for the timeout.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the changes-requested review in 929efde.

What changed:

@cv cv merged commit 0151ff3 into main Jun 8, 2026
40 checks passed
@cv cv deleted the codex/e2e-4941-stack-03-fixture-context branch June 8, 2026 22:13
@coderabbitai coderabbitai Bot mentioned this pull request Jun 8, 2026
12 tasks
@wscurran wscurran added the chore Build, CI, dependency, or tooling maintenance label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance v0.0.62 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants