Production runner: proof checklist for local e2e and restart recovery

## Goal
Create a proof checklist and deterministic local e2e/restart harness that proves the production runner path can survive local execution and restart/recovery without duplicate workers or unsafe state advancement.

## Merged scope
This canonical proof/test issue merges the previous local smoke/e2e harness (#21) with the restart/recovery proof work from #17.

## Current state / evidence
- `package.json` has `npm test` and `npm run check`; tests already cover registry store, gate ledger, fix/review loop artifacts, SCM handoff, workflow policy, recovery path safety, and runner behavior.
- `test/fixtures/packet-list.mixed.json` provides packet fixtures for validation/intake.
- `README.md` says interrupted work should recover from recorded registry evidence.
- `docs/execution-run-schema.md` documents atomicity and recovery expectations for the JSON registry.
- `test/registry-store.test.js` and `test/gate-ledger.test.js` already cover quarantine/replay cases for stale or inconsistent local state.
- Current tests are mostly module/contract level; there is no single production smoke harness or restart matrix around pending spawned worker tasks and in-flight stage ownership.

## Scope
- Add a deterministic e2e smoke harness that creates a temporary sandbox repo/workspace, intakes a fixture packet, acquires a lease, runs the autonomous loop with fake adapters, and reaches handoff-ready or ready-for-manual-review without network writes.
- Add fake implementation, verification, internal-review, and fix adapters that can produce PASS/FAIL/BLOCK outcomes and immutable evidence.
- Add a restart/recovery proof checklist and automated kill/restart matrix for implementation dispatch, verification, internal review, fix loop, and handoff-ready projection.
- Prove no duplicate workers are created after restart when worker task intent/spawn/result evidence already exists.
- Validate fake-adapter scenarios: happy path, verification fail -> fix -> pass, review fail -> fix -> pass, blocked worker, worker failure, and restart/resume after simulated kill points.
- Verify registry snapshot, events, worker task records, artifact hashes, formatted status/report output, quarantine/blocker output, and no unexpected external writes.

## Acceptance criteria
- Harness runs in CI through `npm test` or `npm run check` without GitHub credentials and without real OpenClaw live workers.
- A run can be killed/restarted during implementation dispatch, verification, internal review, fix loop, and handoff-ready projection without duplicate worker creation.
- Recovery distinguishes: intent recorded before spawn, worker spawned before result recorded, result artifact written before event recorded, event recorded before snapshot update, stale lease/timeout, and conflicting/late completion.
- Pending worker tasks are resumed, reattached, timed out, or marked human-recovery-needed by explicit deterministic rules.
- Ambiguous state fails closed with a structured blocker/quarantine report rather than trying to repair remote/external state.
- Recovery/status report tells the operator current stage, pending worker/task id, last durable artifact, blocker, and next safe action.
- Harness verifies no unexpected external writes and keeps fixture data small/public-safe.
- Documentation explains how a developer/operator can run the smoke/restart proof locally.

## Non-goals
- No real GitHub PR creation in the smoke path.
- No reliance on OpenClaw live workers.
- No broad performance/load testing.
- No automatic remote repair when local state is ambiguous.
- No distributed consensus or multi-host scheduler.
- No hiding manual recovery requirements when evidence is insufficient.

## Planning notes / questions
- Decide whether this lives as `test/e2e-autonomous-runner.test.js`, a script plus tests, or both.
- Build fake adapters once and reuse them across production-runner tests.
- Define the exact recovery matrix before coding and reuse existing registry quarantine patterns.
- Decide whether worker timeouts are wall-clock based, heartbeat based, or both.

## Suggested labels
- enhancement


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production runner: proof checklist for local e2e and restart recovery #21

Goal

Merged scope

Current state / evidence

Scope

Acceptance criteria

Non-goals

Planning notes / questions

Suggested labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Production runner: proof checklist for local e2e and restart recovery #21

Description

Goal

Merged scope

Current state / evidence

Scope

Acceptance criteria

Non-goals

Planning notes / questions

Suggested labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions