enderzcx · enderzcx · Jun 15, 2026 · Jun 15, 2026
diff --git a/docs/CORE.md b/docs/CORE.md
@@ -27,6 +27,8 @@ then returns the result to the same conversation by foreground, background, or h
 12. Async without drift: long workflows may use SDK/background/heartbeat adapters, but the coordinator contract and safety gates stay the same.
 13. Controller-first state: `cwf-start.mjs` creates preview, run-plan, state, return-envelope, final, worker-packets, and worker-results slots before any worker dispatch.
 14. Adapter honesty: native subagent, SDK, Desktop-thread, and heartbeat helpers must write `fixture`, `real-smoke`, `requires_approval`, `unavailable`, or `deferred` evidence labels instead of upgrading claims silently.
+15. Checker-owned verification: maker workers may write attempted/proposed/changed state, but `verified`, `passed`, `done`, and `regression_locked` belong to a verifier, deterministic test, replay, or human reviewer.
+16. Failure to regression: recurring workflow, helper, route, connector, skill, or harness failures should preserve the failing input or trace and leave behind a regression artifact or explicit skip reason.
 
 ## Failure Modes
 
@@ -49,6 +51,8 @@ A non-trivial run plan should name:
 - exact scope and exclusions;
 - phases and workers;
 - verifier or challenger role;
+- verified-state owner;
+- failure-to-regression receipt when applicable;
 - write scopes;
 - untrusted input route;
 - token budget and stop rule;
@@ -95,6 +99,30 @@ Run experience is part of the core contract:
 
 The proven return path is coordinator synthesis in the originating conversation. Heartbeat synthesis is allowed only after a real heartbeat reply with the expected marker is observed in the originating thread; `heartbeat-scheduled` and `heartbeat-scheduled-not-returned` are not delivery proof. Platform automatic callback is not claimed until a future Codex platform API and real smoke prove it.
 
+## Verified State
+
+CWF treats verification state as a separate ownership boundary:
+
+- maker workers can write `attempted`, `proposed`, `changed`, and `needs_review`;
+- verifier workers, deterministic tests, replay commands, external evidence, or human reviewers can write `verified`, `passed`, `done`, and `regression_locked`;
+- the coordinator may synthesize verified state only by pointing at the verifier receipt.
+
+Persistent run artifacts should avoid mixing maker narrative with checker-owned truth. If a status file or `goal_delta` will be read by a future run, write verified state after the verifier receipt exists and keep partial writes from looking authoritative.
+
+## Failure To Regression
+
+When a CWF run repairs a repeated failure or a harness-level issue, the repair is not complete until the failing input is replayed or preserved as a future check when feasible:
+
+```text
+failing input / trace
+  -> diagnosis
+  -> fix or mitigation
+  -> replay
+  -> regression artifact
+```
+
+Valid regression artifacts include a test, fixture, eval case, route trigger case, helper smoke, documented replay command, or sanitized error-pattern entry. If the input contains secrets, customer data, or private chat, sanitize or hash it before storing. If no safe artifact exists, record the skip reason in the run plan and closeout.
+
 ## Budget
 
 Every saved workflow should include a visible `budget` with a token cap and stop rule. Dynamic workflows can cost far more than a normal Codex turn; budget is part of the contract, not an afterthought.

diff --git a/docs/CWF_ASYNC_RUNTIME.md b/docs/CWF_ASYNC_RUNTIME.md
@@ -62,6 +62,9 @@ Async runs should record these fields in `.cwf/runs/RUN_ID/return-envelope.json`
 - `heartbeat_status`: `not_requested`, `fixture`, `scheduled`, `scheduled-not-returned`, `delivered`, `failed`, or `unavailable`;
 - `sdk_thread_ids`: SDK worker ids when known;
 - `desktop_thread_ids`: visible Desktop worker thread ids when created;
+- `closeout_gate`: whether completed status can stand or must be downgraded pending checker-owned verification or regression lock;
+- `verified_state`: maker-owned versus checker-owned state and the verification receipt;
+- `failure_to_regression`: recurring-failure receipt, including regression artifact or skip reason when required;
 - `final_summary_path`;
 - `evidence_path`;
 - `deferred_items`.

diff --git a/docs/CWF_RELEASE_READINESS.md b/docs/CWF_RELEASE_READINESS.md
@@ -31,14 +31,14 @@ This checklist tracks public-release readiness evidence. It is not an npm publis
 
 | Phase | Current implementation evidence |
 |---|---|
-| E1 Return envelope | `scripts/cwf-return-envelope.mjs`; `cwf-run-state init/update` writes `.cwf/runs/RUN_ID/return-envelope.json`; `npm run check` validates required fields and deferred platform callback status. |
+| E1 Return envelope | `scripts/cwf-return-envelope.mjs`; `cwf-run-state init/update` writes `.cwf/runs/RUN_ID/return-envelope.json`; `npm run check` validates required fields, closeout gate downgrade, checker-owned verified state, regression lock fields, and deferred platform callback status. |
 | Full native runtime v1 real smoke | `scripts/cwf-start.mjs` initializes controller artifacts; `scripts/cwf-worker-sdk.mjs` now calls `@openai/codex-sdk` for real marker runs; host-native `spawn_agent` explorers returned to the coordinator. Checked-in evidence: [docs/evidence/CWF_FULL_NATIVE_RUNTIME_REAL_SMOKE_20260609.md](evidence/CWF_FULL_NATIVE_RUNTIME_REAL_SMOKE_20260609.md). Fixture evidence remains in [docs/evidence/CWF_FULL_NATIVE_RUNTIME_FIXTURES_20260608.md](evidence/CWF_FULL_NATIVE_RUNTIME_FIXTURES_20260608.md). |
 | E2 Desktop-thread preflight | `desktop-thread-stdio-observed`: the failed probe used the wrong path (`codex app-server proxy` against the remote-control socket). The correct path is a fresh `codex app-server --listen stdio://` JSONL session. Historical evidence recorded thread `019ea726-a070-73f2-b182-602b905cd9ec` and marker `CWF_LEFT_THREAD_TURN_OK_20260608`. Latest checked-in local dynamic smoke evidence is [docs/evidence/CWF_REAL_DYNAMIC_SMOKE_20260608.md](evidence/CWF_REAL_DYNAMIC_SMOKE_20260608.md). This proves Desktop-thread creation/execution/readback locally, not platform automatic callback. |
 | E3 Resume/checkpoint | `scripts/cwf-run-state.mjs` resumes only from the last contiguous completed phase boundary; `npm run check` covers completed, blocked, failed, skipped, missing, and partial fixtures. |
 | E4 Safe write | `scripts/cwf-safe-write.mjs` evaluates approval gate, changed paths, forbidden/out-of-scope paths, apply-check result, verification status, changed files, and rollback command. A disposable `/tmp` git-repo real-smoke passed after approval with `git apply --check`, apply, verification, changed files, and rollback evidence. |
 | E5 Dynamic generation | `scripts/cwf-generate-workflow.mjs` generates bounded data-only repo-audit and safe-fix-loop workflows and rejects unsafe generated content tokens. |
 | E6 Catalog/user workflows | `scripts/cwf-catalog.mjs` contains built-in catalog metadata and project-local `.cwf/workflows/*.workflow.js` discovery with fail-closed validation. |
-| E7 Verifier gates | `scripts/cwf-safe-write.mjs` implements `pass`, `blocked`, `needs-waiver`, and `advisory`; `blocked` and unwaived findings prevent final pass. |
+| E7 Verifier gates | `scripts/cwf-safe-write.mjs` implements `pass`, `blocked`, `needs-waiver`, and `advisory`; `scripts/cwf-return-envelope.mjs` prevents completed status unless checker-owned closeout state passes; `blocked` and unwaived findings prevent final pass. |
 | E8 Budget/cost | Preview helpers fail closed without `budget.max_tokens` or `budget.stop_when`, warn before workers run when `max_tokens > 50000`, and label local token accounting as `estimated`. `npm run check` covers expensive-run warning and unbounded-refusal fixtures. |
 | E9 Human status UX | `scripts/cwf-run-state.mjs status` includes conclusion, phase, worker counts, blocker, evidence, next action, final destination, return mode, and verifier status. Final summaries start with a Chinese conclusion. |
 | E10 Public readiness | This file plus README/docs/skill synchronization, package dry-run, old-runtime absence, and final review. |

diff --git a/docs/RUN_EXPERIENCE.md b/docs/RUN_EXPERIENCE.md
@@ -129,7 +129,7 @@ node scripts/cwf-run-state.mjs status --run-id demo
 node scripts/cwf-run-state.mjs resume-plan --run-id demo
 ```
 
-The return envelope records `final_destination`, `return_mode`, `final_summary_path`, `evidence_path`, `verifier_status`, deferred items, and completion status. `return_mode=coordinator_synthesis` is the proven default. Platform automatic callback remains deferred until a real platform smoke proves it.
+The return envelope records `final_destination`, `return_mode`, `final_summary_path`, `evidence_path`, `verifier_status`, `closeout_gate`, `verified_state`, `failure_to_regression`, deferred items, and completion status. A completed run is downgraded to pending closeout when checker-owned verified state is missing or a required regression artifact has neither artifact nor skip reason. `return_mode=coordinator_synthesis` is the proven default. Platform automatic callback remains deferred until a real platform smoke proves it.
 
 For async runs, also record `runtime_mode`, adapter status, `sdk_thread_ids`, and `desktop_thread_ids` when known. `return_mode=heartbeat_synthesis` means the background run completed, a follow-up in the originating conversation read the local result, and the coordinator observed the expected marker reply before recording delivery. It is not the same as platform automatic callback.
 

diff --git a/scripts/check-core.mjs b/scripts/check-core.mjs
@@ -147,6 +147,8 @@ mustContain(skill, "sunny_skill_type: library");
 mustContain(skill, "Agent-readable Skill Registry");
 mustContain(skill, "Goal Anchor");
 mustContain(skill, "goal_delta");
+mustContain(skill, "checker-owned");
+mustContain(skill, "failure-to-regression");
 mustContain(skill, "Output Contract");
 mustContain(skill, "references/routing.md");
 mustContain(skillRouting, "goal-writer");
@@ -157,6 +159,8 @@ mustContain(skillRunPlanTemplate, "## Objective");
 mustContain(skillRunPlanTemplate, "## Goal Anchor");
 mustContain(skillRunPlanTemplate, "## Goal Delta");
 mustContain(skillRunPlanTemplate, "## Resume Checkpoint");
+mustContain(skillRunPlanTemplate, "## Verified State Ownership");
+mustContain(skillRunPlanTemplate, "## Failure To Regression");
 for (const key of ["should_trigger", "should_not_trigger", "near_neighbors"]) {
   if (!Array.isArray(skillTriggerCases[key]) || skillTriggerCases[key].length === 0) {
     throw new Error(`skills/codex-workflows/evals/trigger_cases.json missing ${key}`);
@@ -589,9 +593,13 @@ async function checkRunPlanRules() {
     "## Budget",
     "## Stop Rules",
     "## Evidence",
+    "## Verified State Ownership",
+    "## Failure To Regression",
     "## Resume Checkpoint",
     "## Goal Delta",
     "goal_delta:",
+    "verified_by:",
+    "regression_added:",
     ".cwf/runs/check/",
   ]) {
     mustContain(markdown, needle);
@@ -718,6 +726,18 @@ function checkReturnEnvelopeRules() {
     status: "completed",
     updated_at: "2026-06-08T00:00:00.000Z",
     verifier_evaluations: [{ status: "advisory", summary: "follow-up optional" }],
+    verified_state: {
+      maker_owned: ["changed"],
+      checker_owned: ["npm run check"],
+      verification_receipt: "npm run check passed",
+      status: "verified",
+    },
+    failure_to_regression: {
+      required: false,
+      regression_artifact: "",
+      verified_by: "",
+      skip_reason: "",
+    },
     deferred_items: [{ id: "desktop-thread-execution-preflight", status: "requires_approval" }],
   };
   const envelope = buildReturnEnvelope(state);
@@ -735,6 +755,9 @@ function checkReturnEnvelopeRules() {
     "sdk_thread_ids",
     "desktop_thread_ids",
     "verifier_status",
+    "closeout_gate",
+    "verified_state",
+    "failure_to_regression",
     "deferred_items",
     "completion_status",
   ]) {
@@ -752,6 +775,9 @@ function checkReturnEnvelopeRules() {
   if (!envelope.deferred_items.some((item) => item.status === "requires_approval")) {
     throw new Error("return envelope must preserve deferred approval items");
   }
+  if (envelope.completion_status !== "completed" || envelope.closeout_gate.status !== "pass") {
+    throw new Error("return envelope must require closeout gate pass before completed status");
+  }
 
   const idsEnvelope = buildReturnEnvelope({
     ...state,
@@ -768,6 +794,37 @@ function checkReturnEnvelopeRules() {
   if (heartbeatEnvelope.return_mode !== "heartbeat_synthesis") {
     throw new Error("return envelope must preserve state return_mode when no override is provided");
   }
+
+  const missingVerifiedEnvelope = buildReturnEnvelope({
+    ...state,
+    verified_state: {
+      maker_owned: ["changed"],
+      checker_owned: [],
+      verification_receipt: "",
+      status: "pending",
+    },
+  });
+  if (missingVerifiedEnvelope.completion_status !== "pending-verified-state") {
+    throw new Error("return envelope must not complete without checker-owned verified state");
+  }
+
+  const missingRegressionEnvelope = buildReturnEnvelope({
+    ...state,
+    failure_to_regression: {
+      required: true,
+      failing_input_or_trace: "sanitized trace id fixture",
+      diagnosis: "fixture recurring helper failure",
+      fix_or_mitigation: "fixture patch",
+      replay_command_or_fixture: "npm run check",
+      regression_artifact: "",
+      verified_by: "npm run check",
+      sensitive_data_handling: "sanitized",
+      skip_reason: "",
+    },
+  });
+  if (missingRegressionEnvelope.completion_status !== "pending-regression-lock") {
+    throw new Error("return envelope must not complete required regression loop without artifact or skip reason");
+  }
 }
 
 function checkDynamicGenerationRules() {

diff --git a/scripts/cwf-return-envelope.mjs b/scripts/cwf-return-envelope.mjs
@@ -8,7 +8,8 @@ import { parseArgs, printHelp, readJsonFile, wantsHelp } from "./lib/cli.mjs";
 export function buildReturnEnvelope(state, options = {}) {
   const runDir = options.runDir ?? `.cwf/runs/${state.run_id}`;
   const verifier = evaluateVerifierGate(state.verifier_evaluations ?? []);
-  const completionStatus = deriveCompletionStatus(state, verifier);
+  const closeoutGate = evaluateCloseoutGate(state, verifier);
+  const completionStatus = deriveCompletionStatus(state, verifier, closeoutGate);
   const deferredItems = [
     ...(state.deferred_items ?? []),
     ...(options.deferredItems ?? []),
@@ -39,13 +40,66 @@ export function buildReturnEnvelope(state, options = {}) {
     desktop_thread_ids: collectWorkerIds(state, "desktop_thread_id"),
     verifier_status: verifier.status,
     verifier: verifier,
+    closeout_gate: closeoutGate,
+    verified_state: state.verified_state ?? {
+      maker_owned: [],
+      checker_owned: [],
+      verification_receipt: "",
+      status: "pending",
+    },
+    failure_to_regression: state.failure_to_regression ?? {
+      required: false,
+      regression_artifact: "",
+      verified_by: "",
+      skip_reason: "",
+    },
     deferred_items: deferredItems,
     completion_status: completionStatus,
     run_status: state.status ?? "planned",
     updated_at: state.updated_at ?? new Date().toISOString(),
   };
 }
 
+export function evaluateCloseoutGate(state, verifier = evaluateVerifierGate(state.verifier_evaluations ?? [])) {
+  if (state.status !== "completed" || !verifier.final_pass) {
+    return { status: "not_applicable", issues: [] };
+  }
+
+  const issues = [];
+  const verifiedState = state.verified_state ?? {};
+  const checkerOwned = Array.isArray(verifiedState.checker_owned) ? verifiedState.checker_owned.filter(Boolean) : [];
+  const verificationReceipt = String(verifiedState.verification_receipt ?? "").trim();
+  const verifiedStatus = String(verifiedState.status ?? "pending");
+  if (
+    verifiedStatus === "pending" ||
+    verifiedStatus === "needs_review" ||
+    (checkerOwned.length === 0 && !verificationReceipt)
+  ) {
+    issues.push({
+      id: "verified-state-missing",
+      status: "pending-verified-state",
+      reason: "Completed runs need checker-owned state or a verification receipt before they can claim done.",
+    });
+  }
+
+  const regression = state.failure_to_regression ?? {};
+  if (
+    regression.required === true &&
+    !String(regression.regression_artifact ?? "").trim() &&
+    !String(regression.skip_reason ?? "").trim()
+  ) {
+    issues.push({
+      id: "regression-lock-missing",
+      status: "pending-regression-lock",
+      reason: "Recurring-failure repairs need a regression artifact or an explicit skip reason before closeout.",
+    });
+  }
+
+  if (issues.length === 0) return { status: "pass", issues: [] };
+  if (issues.length === 1) return { status: issues[0].status, issues };
+  return { status: "pending-closeout-gate", issues };
+}
+
 function collectWorkerIds(state, key) {
   return [...new Set((state.workers ?? []).map((worker) => worker[key]).filter(Boolean))];
 }
@@ -58,9 +112,12 @@ export async function writeReturnEnvelope(runDir, state, options = {}) {
   return { path: outputPath, envelope };
 }
 
-function deriveCompletionStatus(state, verifier) {
-  if (state.status === "completed" && verifier.final_pass) return "completed";
+function deriveCompletionStatus(state, verifier, closeoutGate = evaluateCloseoutGate(state, verifier)) {
+  if (state.status === "completed" && verifier.final_pass && closeoutGate.status === "pass") return "completed";
   if (state.status === "completed" && verifier.status === "pending") return "pending-verification";
+  if (state.status === "completed" && closeoutGate.status !== "not_applicable" && closeoutGate.status !== "pass") {
+    return closeoutGate.status;
+  }
   if (verifier.status === "blocked") return "blocked";
   if (verifier.status === "needs-waiver") return "needs-waiver";
   if (state.status === "cancelled") return "cancelled";

diff --git a/scripts/cwf-run-plan.mjs b/scripts/cwf-run-plan.mjs
@@ -130,6 +130,28 @@ export function renderRunPlanMarkdown(plan) {
   lines.push("- Record final synthesis in the originating Codex conversation.");
   lines.push("- Label evidence as local, fixture, dry-run, real-smoke, requires_approval, or blocked.");
 
+  lines.push("", "## Verified State Ownership");
+  lines.push("- Maker-owned fields: attempted / proposed / changed / needs_review");
+  lines.push("- Checker-owned fields: verified / passed / done / regression_locked");
+  if (verifierAgents.length > 0) {
+    lines.push(`- Checker: ${verifierAgents.map((agent) => agent.id).join(", ")}`);
+  } else {
+    lines.push("- Checker: coordinator-held deterministic test, replay command, external evidence, or human reviewer");
+  }
+  lines.push("- Verification receipt: required before any verified/passed/done claim");
+  lines.push(`- Atomic status artifact: ${runId ? `.cwf/runs/${runId}/state.json` : ".cwf/runs/RUN_ID/state.json"}`);
+  lines.push("- Rule: implementer/maker workers must not write verified state directly.");
+
+  lines.push("", "## Failure To Regression");
+  lines.push("- Failing input or trace: N/A unless this run repairs a recurring workflow, helper, route, connector, skill, or harness failure.");
+  lines.push("- Diagnosis:");
+  lines.push("- Fix or mitigation:");
+  lines.push("- Replay command or fixture:");
+  lines.push("- Regression artifact:");
+  lines.push("- Verified by:");
+  lines.push("- Sensitive data handling:");
+  lines.push("- Skip reason:");
+
   lines.push("", "## Resume Checkpoint");
   lines.push(`- ${resumeCheckpoint}`);
 
@@ -139,6 +161,8 @@ export function renderRunPlanMarkdown(plan) {
   lines.push(`  run_id: ${runId || ""}`);
   lines.push("  completed:");
   lines.push("  evidence_added:");
+  lines.push("  verified_by:");
+  lines.push("  regression_added:");
   lines.push("  blockers:");
   lines.push("  next_slice:");
   lines.push("  next_cwf_run:");