Skip to content

Production runner: proof checklist for local e2e and restart recovery #21

@MrFlashAccount

Description

@MrFlashAccount

Goal

Create a proof checklist and deterministic local e2e/restart harness that proves the production runner path can survive local execution and restart/recovery without duplicate workers or unsafe state advancement.

Merged scope

This canonical proof/test issue merges the previous local smoke/e2e harness (#21) with the restart/recovery proof work from #17.

Current state / evidence

  • package.json has npm test and npm run check; tests already cover registry store, gate ledger, fix/review loop artifacts, SCM handoff, workflow policy, recovery path safety, and runner behavior.
  • test/fixtures/packet-list.mixed.json provides packet fixtures for validation/intake.
  • README.md says interrupted work should recover from recorded registry evidence.
  • docs/execution-run-schema.md documents atomicity and recovery expectations for the JSON registry.
  • test/registry-store.test.js and test/gate-ledger.test.js already cover quarantine/replay cases for stale or inconsistent local state.
  • Current tests are mostly module/contract level; there is no single production smoke harness or restart matrix around pending spawned worker tasks and in-flight stage ownership.

Scope

  • Add a deterministic e2e smoke harness that creates a temporary sandbox repo/workspace, intakes a fixture packet, acquires a lease, runs the autonomous loop with fake adapters, and reaches handoff-ready or ready-for-manual-review without network writes.
  • Add fake implementation, verification, internal-review, and fix adapters that can produce PASS/FAIL/BLOCK outcomes and immutable evidence.
  • Add a restart/recovery proof checklist and automated kill/restart matrix for implementation dispatch, verification, internal review, fix loop, and handoff-ready projection.
  • Prove no duplicate workers are created after restart when worker task intent/spawn/result evidence already exists.
  • Validate fake-adapter scenarios: happy path, verification fail -> fix -> pass, review fail -> fix -> pass, blocked worker, worker failure, and restart/resume after simulated kill points.
  • Verify registry snapshot, events, worker task records, artifact hashes, formatted status/report output, quarantine/blocker output, and no unexpected external writes.

Acceptance criteria

  • Harness runs in CI through npm test or npm run check without GitHub credentials and without real OpenClaw live workers.
  • A run can be killed/restarted during implementation dispatch, verification, internal review, fix loop, and handoff-ready projection without duplicate worker creation.
  • Recovery distinguishes: intent recorded before spawn, worker spawned before result recorded, result artifact written before event recorded, event recorded before snapshot update, stale lease/timeout, and conflicting/late completion.
  • Pending worker tasks are resumed, reattached, timed out, or marked human-recovery-needed by explicit deterministic rules.
  • Ambiguous state fails closed with a structured blocker/quarantine report rather than trying to repair remote/external state.
  • Recovery/status report tells the operator current stage, pending worker/task id, last durable artifact, blocker, and next safe action.
  • Harness verifies no unexpected external writes and keeps fixture data small/public-safe.
  • Documentation explains how a developer/operator can run the smoke/restart proof locally.

Non-goals

  • No real GitHub PR creation in the smoke path.
  • No reliance on OpenClaw live workers.
  • No broad performance/load testing.
  • No automatic remote repair when local state is ambiguous.
  • No distributed consensus or multi-host scheduler.
  • No hiding manual recovery requirements when evidence is insufficient.

Planning notes / questions

  • Decide whether this lives as test/e2e-autonomous-runner.test.js, a script plus tests, or both.
  • Build fake adapters once and reuse them across production-runner tests.
  • Define the exact recovery matrix before coding and reuse existing registry quarantine patterns.
  • Decide whether worker timeouts are wall-clock based, heartbeat based, or both.

Suggested labels

  • enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions