fall-out-bug · fall-out-bug · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
@@ -116,6 +116,17 @@ Every commit in SDP-managed repos SHOULD carry provenance trailer:
 - Edit files in main tree (always use worktree)
 - Commit raw `.sdp/runs/pi-review/*` telemetry unless the workstream explicitly requires it; use compact verdict/evidence instead.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The change is small enough to skip the workstream." | Small changes still need an executable owner. If no WS exists, stop and create or request one. |
+| "I can test at the end." | Late testing hides which slice introduced the failure. Use the narrowest relevant test before and after behavior changes. |
+| "The model says it verified this." | Model prose is not evidence. Use tool output, file state, schema validation, or Beads/GitHub state. |
+| "Prompt instructions are enough to prevent unsafe actions." | Prompt-only boundaries are not security boundaries. Runtime support is `not_assessed_runtime` unless dispatch evidence proves enforcement. |
+| "One broad review after implementation is enough." | Trust-sensitive changes need selected review planes, and degraded evidence must remain visible. |
+| "Unrelated cleanup will leave the repo better." | Cleanup is in scope only when required by the WS or explicitly accepted in the write plan. |
+
 ## Response Format
 
 After completing work, report:

@@ -54,6 +54,17 @@ gates are green, no P0/P1 remain, and `.sdp/review_verdict.json` records a
 compact maintainer note. Never commit raw `.sdp/runs/pi-review/*` telemetry by
 default.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The reviewer returned nothing, so there were no findings." | Empty, timed-out, or off-task output is degraded evidence, not PASS. |
+| "All reviewers used the same strong model, so the panel is strong." | Multi-plane review and model-family diversity are separate. For trust-sensitive work, record missing diversity as `not_assessed_runtime`. |
+| "The adapter files exist, so the harness is supported." | Static parity is not runtime dispatch evidence. Mark runtime coverage `not_assessed_runtime` until the harness loads and runs the surface. |
+| "Network access means the reviewer verified the current docs." | Network permission is not evidence. Cite the source or mark the claim unverified. |
+| "Rubber-stamp roles are harmless." | They are acceptable only when explicitly recorded as shallow coverage; do not blend them into a full green verdict. |
+| "A compact maintainer note can hide provider failure." | It may justify accepting degraded coverage, but the degraded state must remain visible. |
+
 ## Routing Rules
 
 Dimension based on: (1) Diff size: small (<50 lines) → code only, large → multiple dimensions.

@@ -358,6 +358,7 @@
 {"_type":"issue","id":"sdplab-7","title":"F061-02: bd ready → sdp ready bridge","status":"closed","priority":1,"issue_type":"task","owner":"a_v_zhukov@outlook.com","created_at":"2026-02-28T21:35:17Z","created_by":"Andrey Zhukov","updated_at":"2026-04-20T14:27:34Z","closed_at":"2026-04-20T14:27:34Z","close_reason":"Verified: code exists, 218 tests pass across guard/evidence/monitor/beads/workstream packages. WS files marked done with all acceptance criteria checked.","labels":["F061","beads","ecosystem"],"dependency_count":0,"dependent_count":0,"comment_count":0}
 {"_type":"issue","id":"sdplab-2","title":"F059-02: Session evidence emitter","status":"closed","priority":1,"issue_type":"task","owner":"a_v_zhukov@outlook.com","created_at":"2026-02-28T21:34:39Z","created_by":"Andrey Zhukov","updated_at":"2026-04-20T14:27:00Z","closed_at":"2026-04-20T14:27:00Z","close_reason":"Verified: code exists, 218 tests pass across guard/evidence/monitor/beads/workstream packages. WS files marked done with all acceptance criteria checked.","labels":["F059","ecosystem","ohmyopencode"],"dependency_count":0,"dependent_count":0,"comment_count":0}
 {"_type":"issue","id":"sdplab-3","title":"F059-01: Pre-tool-call guard hook","status":"closed","priority":1,"issue_type":"task","owner":"a_v_zhukov@outlook.com","created_at":"2026-02-28T21:34:39Z","created_by":"Andrey Zhukov","updated_at":"2026-04-20T14:27:00Z","closed_at":"2026-04-20T14:27:00Z","close_reason":"Verified: code exists, 218 tests pass across guard/evidence/monitor/beads/workstream packages. WS files marked done with all acceptance criteria checked.","labels":["F059","ecosystem","ohmyopencode"],"dependency_count":0,"dependent_count":0,"comment_count":0}
+{"_type":"issue","id":"sdplab-4cxu","title":"F168-09: Apply harness/skill operating discipline phase 1","description":"Follow-up to the 2026-05-15 harness/skill synthesis. Apply Phase 1 only: update skill authoring policy with trigger/exclusion/verification/degraded-evidence requirements; add reference vocabulary for tool risk classes and degraded evidence; add common rationalizations to build and review. Runtime manifest enforcement and model-routing measurement are out of scope.","acceptance_criteria":"- [ ] docs/reference/skill-authoring.md defines Do Not Use When, Verification, and Degraded Evidence requirements.\n- [ ] docs/reference contains a reusable tool-risk/degraded-evidence reference.\n- [ ] prompts/skills/build/SKILL.md has Common Rationalizations for skipped specs, evidence, prompt-only safety, and review shortcuts.\n- [ ] prompts/skills/review/SKILL.md has Common Rationalizations for empty reviewer output, single-family review, missing provenance, and rubber-stamp coverage.\n- [ ] Runtime support claims introduced by this work are explicitly not_assessed_runtime unless dispatch evidence exists.\n- [ ] Skill lint is run and result recorded.","status":"in_progress","priority":2,"issue_type":"task","assignee":"Andrei","owner":"a_v_zhukov@outlook.com","created_at":"2026-05-15T08:26:12Z","created_by":"Andrei","updated_at":"2026-05-15T08:26:18Z","started_at":"2026-05-15T08:26:18Z","labels":["F168","docs","harness","skills"],"dependency_count":0,"dependent_count":0,"comment_count":0}
 {"_type":"issue","id":"sdplab-tsbi","title":"F168 finding: stale Claude sweep command is outside manifest source of truth","description":"source=pi-review+local verification; feature=F168; workstream=00-168-02; blocking=false. .claude/commands/sweep.md exists and advertises broad autonomous backlog execution, but sdp.manifest.yaml and prompts/commands have no sweep command source. Decide whether to delete it, add it to manifest as experimental, or move it behind explicit future-work docs so generated adapter inventory is truthful.","status":"closed","priority":2,"issue_type":"bug","owner":"a_v_zhukov@outlook.com","created_at":"2026-05-13T09:34:15Z","created_by":"Andrei","updated_at":"2026-05-14T08:47:41Z","closed_at":"2026-05-14T08:47:41Z","close_reason":"merged in PR #153 (F168 onboarding quality taxonomy)","dependencies":[{"issue_id":"sdplab-tsbi","depends_on_id":"sdplab-o8gk","type":"discovered-from","created_at":"2026-05-13T12:34:15Z","created_by":"Andrei","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0}
 {"_type":"issue","id":"sdplab-o8gk.8","title":"F168-08: End-to-end onboarding quality calibration run","description":"Run the completed F168 flow against SDP onboarding and record calibration evidence: actual commands, docs promises, review axes, created findings, and unresolved gaps.","status":"closed","priority":2,"issue_type":"task","owner":"a_v_zhukov@outlook.com","created_at":"2026-05-13T05:46:50Z","created_by":"Andrei","updated_at":"2026-05-14T08:47:53Z","closed_at":"2026-05-14T08:47:53Z","close_reason":"merged in PR #153 (F168 onboarding quality taxonomy)","labels":["F168","calibration","onboarding","pi-review","quality"],"dependencies":[{"issue_id":"sdplab-o8gk.8","depends_on_id":"sdplab-o8gk","type":"parent-child","created_at":"2026-05-13T08:46:49Z","created_by":"Andrei","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0}
 {"_type":"issue","id":"sdplab-o8gk.7","title":"F168-07: CI/advisory rollout and Beads findings loop","description":"Connect deterministic and model-review axes into CI/advisory rollout with Beads finding creation for blocking issues. Avoid fake-green checks; absent credentials and missing tools must produce cannot_verify or not_assessed.","status":"closed","priority":2,"issue_type":"task","owner":"a_v_zhukov@outlook.com","created_at":"2026-05-13T05:46:49Z","created_by":"Andrei","updated_at":"2026-05-14T08:47:52Z","closed_at":"2026-05-14T08:47:52Z","close_reason":"merged in PR #153 (F168 onboarding quality taxonomy)","labels":["F168","beads","ci","onboarding","pi-review","quality"],"dependencies":[{"issue_id":"sdplab-o8gk.7","depends_on_id":"sdplab-o8gk","type":"parent-child","created_at":"2026-05-13T08:46:48Z","created_by":"Andrei","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0}

@@ -40,6 +40,17 @@ Continuation is the orchestrator's job (@oneshot / sdp orchestrate).
 5. **MODERN GO FOR GO CODE** — When touched files are Go, load `@go-modern` and prefer safe stdlib modernizations before inventing helpers.
 6. **PI FINDINGS NEED REGRESSION TESTS** — For prompt-injection or review-finding fixes, add a deterministic regression test for the exact failed vector before closing the finding bead.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The change is small enough to skip the workstream." | Small changes still need an executable owner. If no WS exists, stop and create or request one. |
+| "I can test at the end." | Late testing hides which slice introduced the failure. Use the narrowest relevant test before and after behavior changes. |
+| "The model says it verified this." | Model prose is not evidence. Use tool output, file state, schema validation, or Beads/GitHub state. |
+| "Prompt instructions are enough to prevent unsafe actions." | Prompt-only boundaries are not security boundaries. Runtime support is `not_assessed_runtime` unless dispatch evidence proves enforcement. |
+| "One broad review after implementation is enough." | Trust-sensitive changes need selected review planes, and degraded evidence must remain visible. |
+| "Unrelated cleanup will leave the repo better." | Cleanup is in scope only when required by the WS or explicitly accepted in the write plan. |
+
 ---
 
 ## Git Safety

@@ -111,6 +111,17 @@ Rules:
   huge provider error text or full prompts into the verdict, replace it with a
   compact verdict that preserves model status, P0/P1 counts, and override reason.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The reviewer returned nothing, so there were no findings." | Empty, timed-out, or off-task output is degraded evidence, not PASS. |
+| "All reviewers used the same strong model, so the panel is strong." | Multi-plane review and model-family diversity are separate. For trust-sensitive work, record missing diversity as `not_assessed_runtime`. |
+| "The adapter files exist, so the harness is supported." | Static parity is not runtime dispatch evidence. Mark runtime coverage `not_assessed_runtime` until the harness loads and runs the surface. |
+| "Network access means the reviewer verified the current docs." | Network permission is not evidence. Cite the source or mark the claim unverified. |
+| "Rubber-stamp roles are harmless." | They are acceptable only when explicitly recorded as shallow coverage; do not blend them into a full green verdict. |
+| "A compact maintainer note can hide provider failure." | It may justify accepting degraded coverage, but the degraded state must remain visible. |
+
 ## Write Plan (F101)
 
 Before writing review output files (verdict, findings), emit a write plan:

@@ -40,6 +40,17 @@ Continuation is the orchestrator's job (@oneshot / sdp orchestrate).
 5. **MODERN GO FOR GO CODE** — When touched files are Go, load `@go-modern` and prefer safe stdlib modernizations before inventing helpers.
 6. **PI FINDINGS NEED REGRESSION TESTS** — For prompt-injection or review-finding fixes, add a deterministic regression test for the exact failed vector before closing the finding bead.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The change is small enough to skip the workstream." | Small changes still need an executable owner. If no WS exists, stop and create or request one. |
+| "I can test at the end." | Late testing hides which slice introduced the failure. Use the narrowest relevant test before and after behavior changes. |
+| "The model says it verified this." | Model prose is not evidence. Use tool output, file state, schema validation, or Beads/GitHub state. |
+| "Prompt instructions are enough to prevent unsafe actions." | Prompt-only boundaries are not security boundaries. Runtime support is `not_assessed_runtime` unless dispatch evidence proves enforcement. |
+| "One broad review after implementation is enough." | Trust-sensitive changes need selected review planes, and degraded evidence must remain visible. |
+| "Unrelated cleanup will leave the repo better." | Cleanup is in scope only when required by the WS or explicitly accepted in the write plan. |
+
 ---
 
 ## Git Safety

@@ -111,6 +111,17 @@ Rules:
   huge provider error text or full prompts into the verdict, replace it with a
   compact verdict that preserves model status, P0/P1 counts, and override reason.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The reviewer returned nothing, so there were no findings." | Empty, timed-out, or off-task output is degraded evidence, not PASS. |
+| "All reviewers used the same strong model, so the panel is strong." | Multi-plane review and model-family diversity are separate. For trust-sensitive work, record missing diversity as `not_assessed_runtime`. |
+| "The adapter files exist, so the harness is supported." | Static parity is not runtime dispatch evidence. Mark runtime coverage `not_assessed_runtime` until the harness loads and runs the surface. |
+| "Network access means the reviewer verified the current docs." | Network permission is not evidence. Cite the source or mark the claim unverified. |
+| "Rubber-stamp roles are harmless." | They are acceptable only when explicitly recorded as shallow coverage; do not blend them into a full green verdict. |
+| "A compact maintainer note can hide provider failure." | It may justify accepting degraded coverage, but the degraded state must remain visible. |
+
 ## Write Plan (F101)
 
 Before writing review output files (verdict, findings), emit a write plan:

@@ -40,6 +40,17 @@ Continuation is the orchestrator's job (@oneshot / sdp orchestrate).
 5. **MODERN GO FOR GO CODE** — When touched files are Go, load `@go-modern` and prefer safe stdlib modernizations before inventing helpers.
 6. **PI FINDINGS NEED REGRESSION TESTS** — For prompt-injection or review-finding fixes, add a deterministic regression test for the exact failed vector before closing the finding bead.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The change is small enough to skip the workstream." | Small changes still need an executable owner. If no WS exists, stop and create or request one. |
+| "I can test at the end." | Late testing hides which slice introduced the failure. Use the narrowest relevant test before and after behavior changes. |
+| "The model says it verified this." | Model prose is not evidence. Use tool output, file state, schema validation, or Beads/GitHub state. |
+| "Prompt instructions are enough to prevent unsafe actions." | Prompt-only boundaries are not security boundaries. Runtime support is `not_assessed_runtime` unless dispatch evidence proves enforcement. |
+| "One broad review after implementation is enough." | Trust-sensitive changes need selected review planes, and degraded evidence must remain visible. |
+| "Unrelated cleanup will leave the repo better." | Cleanup is in scope only when required by the WS or explicitly accepted in the write plan. |
+
 ---
 
 ## Git Safety

@@ -111,6 +111,17 @@ Rules:
   huge provider error text or full prompts into the verdict, replace it with a
   compact verdict that preserves model status, P0/P1 counts, and override reason.
 
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "The reviewer returned nothing, so there were no findings." | Empty, timed-out, or off-task output is degraded evidence, not PASS. |
+| "All reviewers used the same strong model, so the panel is strong." | Multi-plane review and model-family diversity are separate. For trust-sensitive work, record missing diversity as `not_assessed_runtime`. |
+| "The adapter files exist, so the harness is supported." | Static parity is not runtime dispatch evidence. Mark runtime coverage `not_assessed_runtime` until the harness loads and runs the surface. |
+| "Network access means the reviewer verified the current docs." | Network permission is not evidence. Cite the source or mark the claim unverified. |
+| "Rubber-stamp roles are harmless." | They are acceptable only when explicitly recorded as shallow coverage; do not blend them into a full green verdict. |
+| "A compact maintainer note can hide provider failure." | It may justify accepting degraded coverage, but the degraded state must remain visible. |
+
 ## Write Plan (F101)
 
 Before writing review output files (verdict, findings), emit a write plan:

@@ -0,0 +1,58 @@
+# Harness Risk And Evidence
+
+Status: reference
+
+This vocabulary keeps SDP skill and harness claims honest. It is intentionally
+small: use it in skills, review reports, adapter checks, and evidence summaries
+without turning every task into a policy project.
+
+## Tool Risk Classes
+
+| Class | Meaning | Default policy |
+|---|---|---|
+| `perception` | Read-only inspection: files, logs, docs, links, local state. | Allowed for most roles. |
+| `analysis` | Local computation or synthesis without writes or external side effects. | Allowed with recorded evidence. |
+| `local_write` | Edits, generated artifacts, local database or checkpoint changes. | Implementer/workflow scope only. |
+| `external_write` | Push, publish, create or update a remote system, send messages. | Explicit workflow gate required. |
+| `irreversible` | Merge, deploy, delete, rotate credentials, spend money. | Explicit human or workflow authorization required. |
+
+Prompt text may describe a boundary, but it is not the boundary. If the harness
+cannot enforce a risk-class gate, record the claim as `not_assessed_runtime` or
+`manual_gate_only`.
+
+## Evidence States
+
+| State | Meaning |
+|---|---|
+| `passed` | Evidence completed and supports the claim. |
+| `failed` | Evidence completed and contradicts the claim. |
+| `not_assessed` | The plane was not run. |
+| `failed_provider` | Provider returned an explicit error. |
+| `timeout` | Run exceeded the bounded window. |
+| `empty_output` | Run completed with no useful content. |
+| `off_task` | Output did not address the requested plane. |
+| `unavailable_cli` | Required local tool was missing or could not run. |
+| `unverified_benchmark` | Vendor or third-party claim was not validated on SDP tasks. |
+| `not_assessed_runtime` | Static files exist, but runtime behavior was not proven. |
+| `manual_gate_only` | The workflow used an explicit human/workflow gate because runtime enforcement is unavailable. |
+
+Missing evidence is not a pass. Use the degraded state that preserves what
+actually happened.
+
+## Assignment Rule
+
+Deterministic tool output wins over model prose. If a model claims a check
+passed but the tool output is missing, classify the check as `not_assessed`.
+If states conflict, report the more conservative degraded state until a human or
+orchestrator inspects the evidence.
+
+## Common Examples
+
+- A review provider returns no findings because the process timed out:
+  `timeout`, not `passed`.
+- A harness adapter file exists but no dispatch run proves it loads:
+  `not_assessed_runtime`.
+- A skill says "do not push", but the harness cannot block pushing:
+  `manual_gate_only` for that action class unless another runtime gate exists.
+- A model vendor page claims strong coding benchmarks:
+  `unverified_benchmark` until reproduced on SDP tasks.