diff --git a/docs/CORE.md b/docs/CORE.md index 03356d3..be370e8 100644 --- a/docs/CORE.md +++ b/docs/CORE.md @@ -27,6 +27,8 @@ then returns the result to the same conversation by foreground, background, or h 12. Async without drift: long workflows may use SDK/background/heartbeat adapters, but the coordinator contract and safety gates stay the same. 13. Controller-first state: `cwf-start.mjs` creates preview, run-plan, state, return-envelope, final, worker-packets, and worker-results slots before any worker dispatch. 14. Adapter honesty: native subagent, SDK, Desktop-thread, and heartbeat helpers must write `fixture`, `real-smoke`, `requires_approval`, `unavailable`, or `deferred` evidence labels instead of upgrading claims silently. +15. Checker-owned verification: maker workers may write attempted/proposed/changed state, but `verified`, `passed`, `done`, and `regression_locked` belong to a verifier, deterministic test, replay, or human reviewer. +16. Failure to regression: recurring workflow, helper, route, connector, skill, or harness failures should preserve the failing input or trace and leave behind a regression artifact or explicit skip reason. ## Failure Modes @@ -49,6 +51,8 @@ A non-trivial run plan should name: - exact scope and exclusions; - phases and workers; - verifier or challenger role; +- verified-state owner; +- failure-to-regression receipt when applicable; - write scopes; - untrusted input route; - token budget and stop rule; @@ -95,6 +99,30 @@ Run experience is part of the core contract: The proven return path is coordinator synthesis in the originating conversation. Heartbeat synthesis is allowed only after a real heartbeat reply with the expected marker is observed in the originating thread; `heartbeat-scheduled` and `heartbeat-scheduled-not-returned` are not delivery proof. Platform automatic callback is not claimed until a future Codex platform API and real smoke prove it. +## Verified State + +CWF treats verification state as a separate ownership boundary: + +- maker workers can write `attempted`, `proposed`, `changed`, and `needs_review`; +- verifier workers, deterministic tests, replay commands, external evidence, or human reviewers can write `verified`, `passed`, `done`, and `regression_locked`; +- the coordinator may synthesize verified state only by pointing at the verifier receipt. + +Persistent run artifacts should avoid mixing maker narrative with checker-owned truth. If a status file or `goal_delta` will be read by a future run, write verified state after the verifier receipt exists and keep partial writes from looking authoritative. + +## Failure To Regression + +When a CWF run repairs a repeated failure or a harness-level issue, the repair is not complete until the failing input is replayed or preserved as a future check when feasible: + +```text +failing input / trace + -> diagnosis + -> fix or mitigation + -> replay + -> regression artifact +``` + +Valid regression artifacts include a test, fixture, eval case, route trigger case, helper smoke, documented replay command, or sanitized error-pattern entry. If the input contains secrets, customer data, or private chat, sanitize or hash it before storing. If no safe artifact exists, record the skip reason in the run plan and closeout. + ## Budget Every saved workflow should include a visible `budget` with a token cap and stop rule. Dynamic workflows can cost far more than a normal Codex turn; budget is part of the contract, not an afterthought. diff --git a/docs/CWF_ASYNC_RUNTIME.md b/docs/CWF_ASYNC_RUNTIME.md index 8afd241..6208138 100644 --- a/docs/CWF_ASYNC_RUNTIME.md +++ b/docs/CWF_ASYNC_RUNTIME.md @@ -62,6 +62,9 @@ Async runs should record these fields in `.cwf/runs/RUN_ID/return-envelope.json` - `heartbeat_status`: `not_requested`, `fixture`, `scheduled`, `scheduled-not-returned`, `delivered`, `failed`, or `unavailable`; - `sdk_thread_ids`: SDK worker ids when known; - `desktop_thread_ids`: visible Desktop worker thread ids when created; +- `closeout_gate`: whether completed status can stand or must be downgraded pending checker-owned verification or regression lock; +- `verified_state`: maker-owned versus checker-owned state and the verification receipt; +- `failure_to_regression`: recurring-failure receipt, including regression artifact or skip reason when required; - `final_summary_path`; - `evidence_path`; - `deferred_items`. diff --git a/docs/CWF_RELEASE_READINESS.md b/docs/CWF_RELEASE_READINESS.md index 693559a..5fbfdd6 100644 --- a/docs/CWF_RELEASE_READINESS.md +++ b/docs/CWF_RELEASE_READINESS.md @@ -31,14 +31,14 @@ This checklist tracks public-release readiness evidence. It is not an npm publis | Phase | Current implementation evidence | |---|---| -| E1 Return envelope | `scripts/cwf-return-envelope.mjs`; `cwf-run-state init/update` writes `.cwf/runs/RUN_ID/return-envelope.json`; `npm run check` validates required fields and deferred platform callback status. | +| E1 Return envelope | `scripts/cwf-return-envelope.mjs`; `cwf-run-state init/update` writes `.cwf/runs/RUN_ID/return-envelope.json`; `npm run check` validates required fields, closeout gate downgrade, checker-owned verified state, regression lock fields, and deferred platform callback status. | | Full native runtime v1 real smoke | `scripts/cwf-start.mjs` initializes controller artifacts; `scripts/cwf-worker-sdk.mjs` now calls `@openai/codex-sdk` for real marker runs; host-native `spawn_agent` explorers returned to the coordinator. Checked-in evidence: [docs/evidence/CWF_FULL_NATIVE_RUNTIME_REAL_SMOKE_20260609.md](evidence/CWF_FULL_NATIVE_RUNTIME_REAL_SMOKE_20260609.md). Fixture evidence remains in [docs/evidence/CWF_FULL_NATIVE_RUNTIME_FIXTURES_20260608.md](evidence/CWF_FULL_NATIVE_RUNTIME_FIXTURES_20260608.md). | | E2 Desktop-thread preflight | `desktop-thread-stdio-observed`: the failed probe used the wrong path (`codex app-server proxy` against the remote-control socket). The correct path is a fresh `codex app-server --listen stdio://` JSONL session. Historical evidence recorded thread `019ea726-a070-73f2-b182-602b905cd9ec` and marker `CWF_LEFT_THREAD_TURN_OK_20260608`. Latest checked-in local dynamic smoke evidence is [docs/evidence/CWF_REAL_DYNAMIC_SMOKE_20260608.md](evidence/CWF_REAL_DYNAMIC_SMOKE_20260608.md). This proves Desktop-thread creation/execution/readback locally, not platform automatic callback. | | E3 Resume/checkpoint | `scripts/cwf-run-state.mjs` resumes only from the last contiguous completed phase boundary; `npm run check` covers completed, blocked, failed, skipped, missing, and partial fixtures. | | E4 Safe write | `scripts/cwf-safe-write.mjs` evaluates approval gate, changed paths, forbidden/out-of-scope paths, apply-check result, verification status, changed files, and rollback command. A disposable `/tmp` git-repo real-smoke passed after approval with `git apply --check`, apply, verification, changed files, and rollback evidence. | | E5 Dynamic generation | `scripts/cwf-generate-workflow.mjs` generates bounded data-only repo-audit and safe-fix-loop workflows and rejects unsafe generated content tokens. | | E6 Catalog/user workflows | `scripts/cwf-catalog.mjs` contains built-in catalog metadata and project-local `.cwf/workflows/*.workflow.js` discovery with fail-closed validation. | -| E7 Verifier gates | `scripts/cwf-safe-write.mjs` implements `pass`, `blocked`, `needs-waiver`, and `advisory`; `blocked` and unwaived findings prevent final pass. | +| E7 Verifier gates | `scripts/cwf-safe-write.mjs` implements `pass`, `blocked`, `needs-waiver`, and `advisory`; `scripts/cwf-return-envelope.mjs` prevents completed status unless checker-owned closeout state passes; `blocked` and unwaived findings prevent final pass. | | E8 Budget/cost | Preview helpers fail closed without `budget.max_tokens` or `budget.stop_when`, warn before workers run when `max_tokens > 50000`, and label local token accounting as `estimated`. `npm run check` covers expensive-run warning and unbounded-refusal fixtures. | | E9 Human status UX | `scripts/cwf-run-state.mjs status` includes conclusion, phase, worker counts, blocker, evidence, next action, final destination, return mode, and verifier status. Final summaries start with a Chinese conclusion. | | E10 Public readiness | This file plus README/docs/skill synchronization, package dry-run, old-runtime absence, and final review. | diff --git a/docs/RUN_EXPERIENCE.md b/docs/RUN_EXPERIENCE.md index 0a4bf4d..745fbbd 100644 --- a/docs/RUN_EXPERIENCE.md +++ b/docs/RUN_EXPERIENCE.md @@ -129,7 +129,7 @@ node scripts/cwf-run-state.mjs status --run-id demo node scripts/cwf-run-state.mjs resume-plan --run-id demo ``` -The return envelope records `final_destination`, `return_mode`, `final_summary_path`, `evidence_path`, `verifier_status`, deferred items, and completion status. `return_mode=coordinator_synthesis` is the proven default. Platform automatic callback remains deferred until a real platform smoke proves it. +The return envelope records `final_destination`, `return_mode`, `final_summary_path`, `evidence_path`, `verifier_status`, `closeout_gate`, `verified_state`, `failure_to_regression`, deferred items, and completion status. A completed run is downgraded to pending closeout when checker-owned verified state is missing or a required regression artifact has neither artifact nor skip reason. `return_mode=coordinator_synthesis` is the proven default. Platform automatic callback remains deferred until a real platform smoke proves it. For async runs, also record `runtime_mode`, adapter status, `sdk_thread_ids`, and `desktop_thread_ids` when known. `return_mode=heartbeat_synthesis` means the background run completed, a follow-up in the originating conversation read the local result, and the coordinator observed the expected marker reply before recording delivery. It is not the same as platform automatic callback. diff --git a/scripts/check-core.mjs b/scripts/check-core.mjs index f84dd68..51bec54 100644 --- a/scripts/check-core.mjs +++ b/scripts/check-core.mjs @@ -147,6 +147,8 @@ mustContain(skill, "sunny_skill_type: library"); mustContain(skill, "Agent-readable Skill Registry"); mustContain(skill, "Goal Anchor"); mustContain(skill, "goal_delta"); +mustContain(skill, "checker-owned"); +mustContain(skill, "failure-to-regression"); mustContain(skill, "Output Contract"); mustContain(skill, "references/routing.md"); mustContain(skillRouting, "goal-writer"); @@ -157,6 +159,8 @@ mustContain(skillRunPlanTemplate, "## Objective"); mustContain(skillRunPlanTemplate, "## Goal Anchor"); mustContain(skillRunPlanTemplate, "## Goal Delta"); mustContain(skillRunPlanTemplate, "## Resume Checkpoint"); +mustContain(skillRunPlanTemplate, "## Verified State Ownership"); +mustContain(skillRunPlanTemplate, "## Failure To Regression"); for (const key of ["should_trigger", "should_not_trigger", "near_neighbors"]) { if (!Array.isArray(skillTriggerCases[key]) || skillTriggerCases[key].length === 0) { throw new Error(`skills/codex-workflows/evals/trigger_cases.json missing ${key}`); @@ -589,9 +593,13 @@ async function checkRunPlanRules() { "## Budget", "## Stop Rules", "## Evidence", + "## Verified State Ownership", + "## Failure To Regression", "## Resume Checkpoint", "## Goal Delta", "goal_delta:", + "verified_by:", + "regression_added:", ".cwf/runs/check/", ]) { mustContain(markdown, needle); @@ -718,6 +726,18 @@ function checkReturnEnvelopeRules() { status: "completed", updated_at: "2026-06-08T00:00:00.000Z", verifier_evaluations: [{ status: "advisory", summary: "follow-up optional" }], + verified_state: { + maker_owned: ["changed"], + checker_owned: ["npm run check"], + verification_receipt: "npm run check passed", + status: "verified", + }, + failure_to_regression: { + required: false, + regression_artifact: "", + verified_by: "", + skip_reason: "", + }, deferred_items: [{ id: "desktop-thread-execution-preflight", status: "requires_approval" }], }; const envelope = buildReturnEnvelope(state); @@ -735,6 +755,9 @@ function checkReturnEnvelopeRules() { "sdk_thread_ids", "desktop_thread_ids", "verifier_status", + "closeout_gate", + "verified_state", + "failure_to_regression", "deferred_items", "completion_status", ]) { @@ -752,6 +775,9 @@ function checkReturnEnvelopeRules() { if (!envelope.deferred_items.some((item) => item.status === "requires_approval")) { throw new Error("return envelope must preserve deferred approval items"); } + if (envelope.completion_status !== "completed" || envelope.closeout_gate.status !== "pass") { + throw new Error("return envelope must require closeout gate pass before completed status"); + } const idsEnvelope = buildReturnEnvelope({ ...state, @@ -768,6 +794,37 @@ function checkReturnEnvelopeRules() { if (heartbeatEnvelope.return_mode !== "heartbeat_synthesis") { throw new Error("return envelope must preserve state return_mode when no override is provided"); } + + const missingVerifiedEnvelope = buildReturnEnvelope({ + ...state, + verified_state: { + maker_owned: ["changed"], + checker_owned: [], + verification_receipt: "", + status: "pending", + }, + }); + if (missingVerifiedEnvelope.completion_status !== "pending-verified-state") { + throw new Error("return envelope must not complete without checker-owned verified state"); + } + + const missingRegressionEnvelope = buildReturnEnvelope({ + ...state, + failure_to_regression: { + required: true, + failing_input_or_trace: "sanitized trace id fixture", + diagnosis: "fixture recurring helper failure", + fix_or_mitigation: "fixture patch", + replay_command_or_fixture: "npm run check", + regression_artifact: "", + verified_by: "npm run check", + sensitive_data_handling: "sanitized", + skip_reason: "", + }, + }); + if (missingRegressionEnvelope.completion_status !== "pending-regression-lock") { + throw new Error("return envelope must not complete required regression loop without artifact or skip reason"); + } } function checkDynamicGenerationRules() { diff --git a/scripts/cwf-return-envelope.mjs b/scripts/cwf-return-envelope.mjs index 4ecf389..c45a892 100644 --- a/scripts/cwf-return-envelope.mjs +++ b/scripts/cwf-return-envelope.mjs @@ -8,7 +8,8 @@ import { parseArgs, printHelp, readJsonFile, wantsHelp } from "./lib/cli.mjs"; export function buildReturnEnvelope(state, options = {}) { const runDir = options.runDir ?? `.cwf/runs/${state.run_id}`; const verifier = evaluateVerifierGate(state.verifier_evaluations ?? []); - const completionStatus = deriveCompletionStatus(state, verifier); + const closeoutGate = evaluateCloseoutGate(state, verifier); + const completionStatus = deriveCompletionStatus(state, verifier, closeoutGate); const deferredItems = [ ...(state.deferred_items ?? []), ...(options.deferredItems ?? []), @@ -39,6 +40,19 @@ export function buildReturnEnvelope(state, options = {}) { desktop_thread_ids: collectWorkerIds(state, "desktop_thread_id"), verifier_status: verifier.status, verifier: verifier, + closeout_gate: closeoutGate, + verified_state: state.verified_state ?? { + maker_owned: [], + checker_owned: [], + verification_receipt: "", + status: "pending", + }, + failure_to_regression: state.failure_to_regression ?? { + required: false, + regression_artifact: "", + verified_by: "", + skip_reason: "", + }, deferred_items: deferredItems, completion_status: completionStatus, run_status: state.status ?? "planned", @@ -46,6 +60,46 @@ export function buildReturnEnvelope(state, options = {}) { }; } +export function evaluateCloseoutGate(state, verifier = evaluateVerifierGate(state.verifier_evaluations ?? [])) { + if (state.status !== "completed" || !verifier.final_pass) { + return { status: "not_applicable", issues: [] }; + } + + const issues = []; + const verifiedState = state.verified_state ?? {}; + const checkerOwned = Array.isArray(verifiedState.checker_owned) ? verifiedState.checker_owned.filter(Boolean) : []; + const verificationReceipt = String(verifiedState.verification_receipt ?? "").trim(); + const verifiedStatus = String(verifiedState.status ?? "pending"); + if ( + verifiedStatus === "pending" || + verifiedStatus === "needs_review" || + (checkerOwned.length === 0 && !verificationReceipt) + ) { + issues.push({ + id: "verified-state-missing", + status: "pending-verified-state", + reason: "Completed runs need checker-owned state or a verification receipt before they can claim done.", + }); + } + + const regression = state.failure_to_regression ?? {}; + if ( + regression.required === true && + !String(regression.regression_artifact ?? "").trim() && + !String(regression.skip_reason ?? "").trim() + ) { + issues.push({ + id: "regression-lock-missing", + status: "pending-regression-lock", + reason: "Recurring-failure repairs need a regression artifact or an explicit skip reason before closeout.", + }); + } + + if (issues.length === 0) return { status: "pass", issues: [] }; + if (issues.length === 1) return { status: issues[0].status, issues }; + return { status: "pending-closeout-gate", issues }; +} + function collectWorkerIds(state, key) { return [...new Set((state.workers ?? []).map((worker) => worker[key]).filter(Boolean))]; } @@ -58,9 +112,12 @@ export async function writeReturnEnvelope(runDir, state, options = {}) { return { path: outputPath, envelope }; } -function deriveCompletionStatus(state, verifier) { - if (state.status === "completed" && verifier.final_pass) return "completed"; +function deriveCompletionStatus(state, verifier, closeoutGate = evaluateCloseoutGate(state, verifier)) { + if (state.status === "completed" && verifier.final_pass && closeoutGate.status === "pass") return "completed"; if (state.status === "completed" && verifier.status === "pending") return "pending-verification"; + if (state.status === "completed" && closeoutGate.status !== "not_applicable" && closeoutGate.status !== "pass") { + return closeoutGate.status; + } if (verifier.status === "blocked") return "blocked"; if (verifier.status === "needs-waiver") return "needs-waiver"; if (state.status === "cancelled") return "cancelled"; diff --git a/scripts/cwf-run-plan.mjs b/scripts/cwf-run-plan.mjs index c8166ef..18f12ef 100644 --- a/scripts/cwf-run-plan.mjs +++ b/scripts/cwf-run-plan.mjs @@ -130,6 +130,28 @@ export function renderRunPlanMarkdown(plan) { lines.push("- Record final synthesis in the originating Codex conversation."); lines.push("- Label evidence as local, fixture, dry-run, real-smoke, requires_approval, or blocked."); + lines.push("", "## Verified State Ownership"); + lines.push("- Maker-owned fields: attempted / proposed / changed / needs_review"); + lines.push("- Checker-owned fields: verified / passed / done / regression_locked"); + if (verifierAgents.length > 0) { + lines.push(`- Checker: ${verifierAgents.map((agent) => agent.id).join(", ")}`); + } else { + lines.push("- Checker: coordinator-held deterministic test, replay command, external evidence, or human reviewer"); + } + lines.push("- Verification receipt: required before any verified/passed/done claim"); + lines.push(`- Atomic status artifact: ${runId ? `.cwf/runs/${runId}/state.json` : ".cwf/runs/RUN_ID/state.json"}`); + lines.push("- Rule: implementer/maker workers must not write verified state directly."); + + lines.push("", "## Failure To Regression"); + lines.push("- Failing input or trace: N/A unless this run repairs a recurring workflow, helper, route, connector, skill, or harness failure."); + lines.push("- Diagnosis:"); + lines.push("- Fix or mitigation:"); + lines.push("- Replay command or fixture:"); + lines.push("- Regression artifact:"); + lines.push("- Verified by:"); + lines.push("- Sensitive data handling:"); + lines.push("- Skip reason:"); + lines.push("", "## Resume Checkpoint"); lines.push(`- ${resumeCheckpoint}`); @@ -139,6 +161,8 @@ export function renderRunPlanMarkdown(plan) { lines.push(` run_id: ${runId || ""}`); lines.push(" completed:"); lines.push(" evidence_added:"); + lines.push(" verified_by:"); + lines.push(" regression_added:"); lines.push(" blockers:"); lines.push(" next_slice:"); lines.push(" next_cwf_run:"); diff --git a/scripts/cwf-start.mjs b/scripts/cwf-start.mjs index 6a834d1..0cf0bfe 100644 --- a/scripts/cwf-start.mjs +++ b/scripts/cwf-start.mjs @@ -61,6 +61,23 @@ export async function startRun(options = {}) { })), verification_evidence: [], verifier_evaluations: [], + verified_state: { + maker_owned: [], + checker_owned: [], + verification_receipt: "", + status: "pending", + }, + failure_to_regression: { + required: false, + failing_input_or_trace: "", + diagnosis: "", + fix_or_mitigation: "", + replay_command_or_fixture: "", + regression_artifact: "", + verified_by: "", + sensitive_data_handling: "", + skip_reason: "", + }, adapter_status: { native_subagent: "pending", sdk_background_worker: "pending", @@ -148,8 +165,10 @@ async function writeWorkerPackets(runDir, state, preview) { "", "## Return Contract", "- Write or report a normalized worker result with status, summary, evidence, and runtime ids.", + "- Maker workers may report attempted/proposed/changed, but must not mark verified/passed/done.", "- Do not apply writes directly; patch proposals must return to the coordinator safe-write gate.", "- Label fixture, local, real-smoke, unavailable, deferred, and requires_approval honestly.", + "- If you diagnose a recurring harness/helper/route/connector failure, preserve the failing input or a sanitized replay pointer for the coordinator.", "", ]; await writeFile(join(runDir, worker.worker_packet_path), lines.join("\n"), "utf8"); diff --git a/skills/codex-workflows/SKILL.md b/skills/codex-workflows/SKILL.md index 0ff3219..56d8064 100644 --- a/skills/codex-workflows/SKILL.md +++ b/skills/codex-workflows/SKILL.md @@ -40,7 +40,8 @@ Goal Anchor when needed -> optionally promote important workers to Desktop threads -> wait or run in background -> adapt if needed - -> verify + -> verify with checker-owned state + -> preserve recurring failures as regression artifacts when applicable -> emit goal_delta when goal-anchored -> answer in this same conversation ``` @@ -110,6 +111,8 @@ When routing is ambiguous, read `references/routing.md` and prefer the narrower 12. Wait for worker results only when needed for the next critical-path step. 13. Summarize results back in the current conversation. 14. For long runs, prefer background + heartbeat instead of making the main conversation wait. +15. Treat maker and verifier state separately: workers can report attempted/proposed/changed, but only a verifier, test, replay, or human review may mark verified/passed/done. +16. When a workflow, helper, route, connector, or harness failure is likely to recur, preserve the failing input or trace and add a regression artifact, fixture, eval, or explicit skip reason before calling the repair complete. If native subagent tools are unavailable, stop and say the workflow cannot run natively in this host. Do not silently fall back to an external runner. @@ -129,6 +132,8 @@ They are readable JavaScript specs, not executable Node scripts. They may use pl - `run_experience` - `write_scopes` for workflow-level write boundaries, or per-agent `write_scope` for worker ownership - `verification` +- `verified_state` +- `failure_to_regression` - `stop_conditions` - `quarantine_rules` - `failure_policy` @@ -268,6 +273,8 @@ Every workflow closeout must include: - which agents were spawned and why; - what changed, if anything; - verification evidence; +- who owns verified state, and which evidence allowed any `verified` / `passed` / `done` claim; +- regression artifact or skip reason when the run fixed a recurring workflow, helper, route, connector, or harness failure; - `goal_delta` when the run is under Goal Mode or a Goal Anchor; - remaining risks or stop condition; - a short human-readable summary. @@ -285,6 +292,8 @@ Every CWF response should include the smallest useful subset of this contract: - `goal anchor`: goal id, acceptance, current slice, continue/stop/pause conditions when applicable; - `execution summary`: worker count, which workers ran, which were skipped, and why; - `goal_delta`: `run_id`, `completed`, `evidence_added`, `blockers`, `next_slice`, `next_cwf_run`, `continue_or_stop`, and `progress_artifact_update` when applicable; +- `verified state`: maker-owned attempted/proposed state versus checker-owned verified/passed/done state; +- `failure-to-regression`: failing input or trace, replay command, regression artifact, or explicit skip reason when applicable; - `return path`: coordinator_synthesis or heartbeat_synthesis status; - `write boundary`: no writes, proposed patch only, or approved safe write gate; - `verification`: commands, artifacts, thread ids, screenshots, logs, or explicit not-verified reason; diff --git a/skills/codex-workflows/templates/run-plan.md b/skills/codex-workflows/templates/run-plan.md index e1b9a60..1c32263 100644 --- a/skills/codex-workflows/templates/run-plan.md +++ b/skills/codex-workflows/templates/run-plan.md @@ -78,6 +78,29 @@ If no trigger boundary is met, stop and use the smaller route instead of CWF. - Evidence required: - Commands or artifacts: +## Verified State Ownership + +- Maker-owned fields: attempted / proposed / changed / needs_review +- Checker-owned fields: verified / passed / done / regression_locked +- Checker: test / verifier agent / human reviewer / external system +- Verification receipt: +- Atomic status artifact: + +Rule: implementer/maker workers must not write `verified`, `passed`, `done`, or `regression_locked` into state. The coordinator may promote those fields only from checker/test/replay/human-review evidence. + +## Failure To Regression + +Required when this run fixes a recurring workflow, helper, route, connector, skill, or harness failure. Use `N/A` only when no reusable failing input or safe regression artifact exists. + +- Failing input or trace: +- Diagnosis: +- Fix or mitigation: +- Replay command or fixture: +- Regression artifact: +- Verified by: +- Sensitive data handling: +- Skip reason: + ## Write Gate - Write mode: none / proposed patch / approved safe write @@ -108,6 +131,8 @@ goal_delta: run_id: completed: evidence_added: + verified_by: + regression_added: blockers: next_slice: next_cwf_run: @@ -120,6 +145,8 @@ goal_delta: ```text 这次 CWF 做了什么: 证据在哪: +谁验证了: +是否新增回归夹具: 目标推进了什么: 还没做什么: 下一步: diff --git a/workflows/safe-fix-loop.workflow.js b/workflows/safe-fix-loop.workflow.js index ab2fcab..b955e94 100644 --- a/workflows/safe-fix-loop.workflow.js +++ b/workflows/safe-fix-loop.workflow.js @@ -12,10 +12,10 @@ export default { }, run_experience: { preview: "Show diagnosis agents, proposed write scope, implementer visibility, verification commands, budget, and stop conditions.", - status: "Report diagnosis / fix / verify phase, changed files if any, last verification result, and budget pressure.", + status: "Report diagnosis / fix / verify phase, changed files if any, checker-owned verification result, regression artifact status, and budget pressure.", cancel: "Stop further fixes, keep current diff evidence, and say whether the target is safe to keep or should be reverted by the user.", resume: "Continue from the last verification result; if the diff state changed, rediagnose before writing.", - final_output: "Return changed files, verification evidence, remaining risks, and whether the acceptance criteria passed.", + final_output: "Return changed files, checker-owned verification evidence, regression artifact or skip reason, remaining risks, and whether the acceptance criteria passed.", }, phases: [ { @@ -47,7 +47,7 @@ export default { }, { id: "verify", - coordinator: "Run the narrowest meaningful verification. If it fails, spawn one debugger or stop with a concrete blocker.", + coordinator: "Run the narrowest meaningful verification with checker-owned state. If it fails, spawn one debugger or stop with a concrete blocker. When the failure is likely to recur, preserve the failing input and add a regression artifact or explicit skip reason.", }, ], write_rules: [ @@ -60,6 +60,8 @@ export default { verification: [ "Run git apply --check or an equivalent dry-run before applying any patch.", "Run the declared targeted verification command after applying the patch.", + "Only the verifier, deterministic test, replay command, or human reviewer may mark verified/passed/done; the implementer may only mark attempted/proposed/changed.", + "If the fix addresses a recurring failure, route confusion, helper bug, connector drift, or harness issue, replay the failing input and add a regression test, fixture, eval case, trigger case, helper smoke, or documented replay command.", "Record changed files, rollback command, and remaining risks before final synthesis.", ], stop_conditions: [