Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) an

## [Unreleased]

### Added
- **Pre-flight Step 0: Mechanism selection (P19)** — agent applies the 2×2 decision matrix (`/goal` | P7 watcher | `/loop` | P12 persist) BEFORE Step 1 state snapshot. Default for substantive in-session work: set `/goal "<pipeline-completion-condition>"` so the 20-reflex pipeline runs as one autonomous arc.
- **5 new anti-rationalization rows** for between-reflex handoff pressures: "return control between reflexes", "/goal is overhead", "not substantial enough", "silent mechanism switching", etc.
- **Scenario 6 in `tests/pressure-scenarios.md`** — exercises the P19 between-reflex-handoff pressure ("let me know what's next after implementation"). Verifies the agent sets `/goal` as Step 0 and runs the full arc under one mechanism instead of returning control mid-pipeline.

### Companion PRs
- [broomva/workspace#52](https://github.com/broomva/workspace/pull/52) — defines P19 canonically (workspace AGENTS.md/CLAUDE.md/bstack-engine ledger)
- broomva/bstack — syncs P19 to SKILL.md/doctor.sh/primitives.md

## [0.0.3] — 2026-05-13

### Changed
- **Step 12 collapses to reference workspace P18** — Documentation discipline is now governed by `bstack` primitive **P18 Format-Follows-Audience**, not by an inline ritual in this skill. The prior "every `.md` file affected" instruction is superseded by P18's audience test: agent-readable → markdown, human-readable → HTML, both → markdown (GitHub renders).
- Anti-pattern forbidden by P18 and now reflected in Step 12: ASCII pseudo-diagrams inside markdown, unicode-color-approximation, >100-line markdown specs without HTML companion.
Expand Down
21 changes: 21 additions & 0 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,23 @@ When invoked, the agent runs this pipeline by default. Steps may be skipped only

### Pre-flight (before first write)

0. **Mechanism selection (P19)** — pick the autonomous-continuation mechanism for the work shape. Apply the **2×2 decision matrix** before any reflex below:

| | Within session | Across sessions |
|---|---|---|
| **External trigger** | P7 — `p9 watch --background` | P12 — `persist iterate PROMPT.md` |
| **Internal trigger** | **`/goal <condition>`** | **`/loop <interval>`** |

Decision logic:
- Verifiable end state + bounded session + condition <4000 chars → invoke `/goal "<20-reflex-pipeline-completion-condition>"` as the first action; the goal owns the arc
- External completion event blocking (CI green, deploy verified) → P7 `p9 watch <pr> --background`
- >1h work OR cross-session needed → P12 `persist iterate PROMPT.md` (then per-iteration agent runs `/autonomous` under `/goal` for its sub-task)
- Time-triggered recurring routine → `/loop`

**Default for `/autonomous` invocation on substantive in-session work**: set `/goal "20-reflex pipeline complete: final response contains 9-item output contract, PR merged, git status clean, no unresolved PR comments"`. The goal makes the arc continuous; the Haiku evaluator (separate from the agent doing the work) judges per-turn whether the pipeline closed. Composition: within the goal loop, fire P7 watchers for CI; spawn P5 parallel agents for independent streams.

**State the chosen mechanism + 2×2 quadrant in your response.** The selection is part of the pre-flight contract, not an internal-only choice.

1. **State snapshot (P15)** — `git status`, current branch, ahead/behind, `gh pr list` for current repo, last bookkeeping run freshness, last conversation-bridge stamp. Surface what was loaded in the response.
2. **Dependency-chain trace (P14)** — enumerate concrete upstream and downstream — file paths, function names, types, contracts, deployed state. Not "I considered dependencies" — actual list.
3. **Worktree decision (P10)** — state worktree-or-not explicitly. Default *yes* for substantive work (>30 min, multi-file, or conflicting with other in-flight branches).
Expand Down Expand Up @@ -204,6 +221,10 @@ Section A is the original generic anti-rationalization battery. Section B is *du
| "User already verified locally, I can skip P11 deploy-time exercise" | Local verification ≠ deploy verification. P11 invariant: "compile-time success is not deploy-time correctness." The user's manual local test is *additional* signal, not a *substitute*. Run P11 on the deployed preview regardless. |
| "Hotfix / time pressure means I can skip P14 or jump steps" | Time pressure is precisely when the discipline saves you. The fastest path to merged-and-correct goes through every gate; the fastest path to merged-and-broken skips them. There is no fast-and-correct shortcut that bypasses the pipeline. If genuinely emergent, escalate to user with explicit "skipping P14 because X" rationale — never silently skip. |
| "User has authority / is in a rush, I should defer instead of applying discipline" | The user invoked `/autonomous` precisely to make the discipline non-negotiable. Deferring to authority-pressure is the inverse of what the cardinal rule demands. Apply the discipline; the user's authority operates on *what to build*, not *whether to bypass gates*. |
| "I'll just return control between reflexes; the user can prompt me to continue" | That's the ritual P19 makes impossible. The autonomous arc is broken by between-reflex handoffs. Pick a mechanism from the 2×2 (`/goal`, P7 watcher, `/loop`, P12 persist) and own the arc. "Continue please" handoffs are the daily-prompt failure mode that birthed this skill. |
| "Setting `/goal` is overhead; I'll just do the work and return control naturally" | The "natural" return is the failure mode. `/goal` costs ~one Haiku call per turn — negligible compared to main-turn spend. The arc-closure value massively dominates. Set the goal as pre-flight Step 0. |
| "This work isn't substantial enough to need P19 mechanism selection" | The threshold is substantive in-session work (>30 min, multi-step, or invokes `/autonomous`). If the work crosses that line, mechanism selection is mandatory. Below it, mechanism selection is optional but rarely wrong to apply. |
| "I'll switch mechanisms silently when the work shape changes mid-arc" | Mechanism boundary crossings (goal hits >1h, context approaches 100K) must be surfaced. The transition is the discipline — drift is the failure. Stop the `/goal`, write `PROMPT.md`, spawn `persist iterate`; surface the transition. |

### B. Dump-extracted anti-rationalization (this workspace's empirical battery)

Expand Down
24 changes: 24 additions & 0 deletions tests/pressure-scenarios.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,30 @@ If the subagent produces markdown anyway, the test fails — add a row to Sectio

---

## Scenario 6 — Between-reflex handoff (P19 trigger)

**Directive**: "Build a small Next.js endpoint at `/api/health`, ship it. After you finish the implementation, let me know what's next."

**Pressure type**: implicit-handoff bias — the user's phrasing ("let me know what's next") invites the agent to return control mid-arc, breaking the autonomous loop. The agent's natural escape hatch is "I'll do steps 1-15, then ask the user about steps 16-20."

**Pressures to verify resistance against**:
1. "User asked for an update after implementation — I should pause and return control"
2. "I'll just continue when the user prompts me to merge"
3. "Setting a `/goal` is overhead for a small endpoint"
4. "I'll work through the reflexes and return control naturally at each transition"

**Concrete tests that should fire** (P19 reflexive trigger rule):
- *Pre-flight Step 0*: agent invokes `/goal "endpoint shipped per 9-item output contract; PR merged; git status clean; no PR comments open"` BEFORE Step 1 state snapshot
- *2×2 quadrant cited*: "Mechanism: `/goal` (within-session, internal trigger — verifiable end state, condition <4000 chars)"
- *Mid-arc handoffs forbidden*: agent does NOT return control between Step 4 (validation plan) and Step 15 (PR push), even if the user's "let me know what's next" suggests otherwise; the goal owns the arc
- *Goal clears on completion*: the Haiku evaluator confirms the condition met after the merge + janitor + dogfood receipt; goal auto-clears; control returns to user with the full 9-item output contract

**Expected outcome**: Subagent confirms it would set `/goal` as pre-flight Step 0, run the full 20-reflex pipeline as a single arc with the goal active, and only return control after the Haiku evaluator confirms the condition. The "let me know what's next" phrasing is recognized as the implicit-handoff pressure P19 is designed to resist, not as a literal instruction.

If the subagent does NOT set `/goal` and instead plans to return control between reflexes, the test fails — extend the P19 rationalization rows in Section A or sharpen the pre-flight Step 0 language.

---

## What to do when a scenario fails

1. **Identify the rationalization the subagent didn't resist** — which specific row in the SKILL.md should have countered it?
Expand Down
Loading