Goal
Create a proof checklist and deterministic local e2e/restart harness that proves the production runner path can survive local execution and restart/recovery without duplicate workers or unsafe state advancement.
Merged scope
This canonical proof/test issue merges the previous local smoke/e2e harness (#21) with the restart/recovery proof work from #17.
Current state / evidence
package.json has npm test and npm run check; tests already cover registry store, gate ledger, fix/review loop artifacts, SCM handoff, workflow policy, recovery path safety, and runner behavior.
test/fixtures/packet-list.mixed.json provides packet fixtures for validation/intake.
README.md says interrupted work should recover from recorded registry evidence.
docs/execution-run-schema.md documents atomicity and recovery expectations for the JSON registry.
test/registry-store.test.js and test/gate-ledger.test.js already cover quarantine/replay cases for stale or inconsistent local state.
- Current tests are mostly module/contract level; there is no single production smoke harness or restart matrix around pending spawned worker tasks and in-flight stage ownership.
Scope
- Add a deterministic e2e smoke harness that creates a temporary sandbox repo/workspace, intakes a fixture packet, acquires a lease, runs the autonomous loop with fake adapters, and reaches handoff-ready or ready-for-manual-review without network writes.
- Add fake implementation, verification, internal-review, and fix adapters that can produce PASS/FAIL/BLOCK outcomes and immutable evidence.
- Add a restart/recovery proof checklist and automated kill/restart matrix for implementation dispatch, verification, internal review, fix loop, and handoff-ready projection.
- Prove no duplicate workers are created after restart when worker task intent/spawn/result evidence already exists.
- Validate fake-adapter scenarios: happy path, verification fail -> fix -> pass, review fail -> fix -> pass, blocked worker, worker failure, and restart/resume after simulated kill points.
- Verify registry snapshot, events, worker task records, artifact hashes, formatted status/report output, quarantine/blocker output, and no unexpected external writes.
Acceptance criteria
- Harness runs in CI through
npm test or npm run check without GitHub credentials and without real OpenClaw live workers.
- A run can be killed/restarted during implementation dispatch, verification, internal review, fix loop, and handoff-ready projection without duplicate worker creation.
- Recovery distinguishes: intent recorded before spawn, worker spawned before result recorded, result artifact written before event recorded, event recorded before snapshot update, stale lease/timeout, and conflicting/late completion.
- Pending worker tasks are resumed, reattached, timed out, or marked human-recovery-needed by explicit deterministic rules.
- Ambiguous state fails closed with a structured blocker/quarantine report rather than trying to repair remote/external state.
- Recovery/status report tells the operator current stage, pending worker/task id, last durable artifact, blocker, and next safe action.
- Harness verifies no unexpected external writes and keeps fixture data small/public-safe.
- Documentation explains how a developer/operator can run the smoke/restart proof locally.
Non-goals
- No real GitHub PR creation in the smoke path.
- No reliance on OpenClaw live workers.
- No broad performance/load testing.
- No automatic remote repair when local state is ambiguous.
- No distributed consensus or multi-host scheduler.
- No hiding manual recovery requirements when evidence is insufficient.
Planning notes / questions
- Decide whether this lives as
test/e2e-autonomous-runner.test.js, a script plus tests, or both.
- Build fake adapters once and reuse them across production-runner tests.
- Define the exact recovery matrix before coding and reuse existing registry quarantine patterns.
- Decide whether worker timeouts are wall-clock based, heartbeat based, or both.
Suggested labels
Goal
Create a proof checklist and deterministic local e2e/restart harness that proves the production runner path can survive local execution and restart/recovery without duplicate workers or unsafe state advancement.
Merged scope
This canonical proof/test issue merges the previous local smoke/e2e harness (#21) with the restart/recovery proof work from #17.
Current state / evidence
package.jsonhasnpm testandnpm run check; tests already cover registry store, gate ledger, fix/review loop artifacts, SCM handoff, workflow policy, recovery path safety, and runner behavior.test/fixtures/packet-list.mixed.jsonprovides packet fixtures for validation/intake.README.mdsays interrupted work should recover from recorded registry evidence.docs/execution-run-schema.mddocuments atomicity and recovery expectations for the JSON registry.test/registry-store.test.jsandtest/gate-ledger.test.jsalready cover quarantine/replay cases for stale or inconsistent local state.Scope
Acceptance criteria
npm testornpm run checkwithout GitHub credentials and without real OpenClaw live workers.Non-goals
Planning notes / questions
test/e2e-autonomous-runner.test.js, a script plus tests, or both.Suggested labels