diff --git a/ROADMAP.md b/ROADMAP.md index 8093f26c..17198226 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -21,12 +21,13 @@ Add entries when a candidate is mid-evidence (one signal exists but doesn't yet | Idea | Why parked | Revisit trigger | Expiry | Source | |---|---|---|---|---| -| Default to Opus 4.6 1M context (not 200K) | Maintainer hit this: project `settings.json` pinned `claude-opus-4-6` (200K) instead of the `/model` default which gives 1M on Max. Users should get 1M by default; allow opt-out to 200K. Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` for 1M. | Second user reports same config trap OR #403 model-config cleanup lands | 2026-07-11 | v1.82.0 session 2026-06-11 | +| ~~Default to Opus 4.6 1M context (not 200K)~~ | **TRIGGER FIRED 2026-06-22.** #403 model-config batch landed in v1.83.0. Promote to actionable: don't pin `claude-opus-4-6` in project settings.json (200K), use `/model` default for 1M on Max. | ~~#403 lands~~ **FIRED** | — | v1.82.0 session 2026-06-11 | | Confidence Ramp Pattern for SDLC skill | **TRIGGER FIRED 2026-06-11.** Workflow: Opus researches issues → batch-consult Fable advisor → build 95%+ confidence task list → `/goal` churn. Trialing on #395/#403/#391 model-config batch. If clean (no CI failures, no review findings), graduate to wizard doc as a new SDLC phase. | Model-config batch ships clean | 2026-07-11 | v1.82.0 triage session 2026-06-11 | | Advisor auto-fallback in SDLC skill | **TRIGGER FIRED 2026-06-11.** When `advisor()` fails, the SDLC skill should automatically spawn a Fable subagent as fallback — proved in v1.82.0 triage (batch-reviewed 4 issues, caught #395 already fixed). Currently this pattern lives only in private memory; needs to be in `skills/sdlc/SKILL.md` so every project gets it. Also: auto-try advisor before every plan, not just when user asks. | Confidence ramp trial ships clean | 2026-07-11 | v1.82.0 triage session 2026-06-11 | | Detect process-rule memories in /feedback or /setup | Users saving memories like "always run tests" are patching /sdlc gaps with private memory. /feedback or /setup could detect `type: feedback` entries with process-rule patterns and suggest contributing back to /sdlc instead. | Maintainer runs memory audit and finds ≥3 process-rule memories | 2026-08-11 | v1.83.0 session 2026-06-12 | -| Run /insights with Fable on this repo | Native CC `/insights` generates session usage report (friction patterns, satisfaction, outcome summaries). Run once with Fable model at max effort to get a high-quality analysis of how the wizard is actually being used. Review the HTML report for patterns that could improve /sdlc or hooks. | Next triage session | 2026-07-12 | v1.83.0 session 2026-06-12 | -| Memory audit + repo efficiency pass with Fable | Run Memory Audit Protocol on private memory files — promote portable lessons, delete stale entries. Then: Fable batch-review the full repo (SKILL.md files, hooks, wizard doc) for efficiency, dead code, stale references, and consolidation opportunities. Goal: build a 95% confidence task list of cleanup items that can be worked in parallel. | Next triage session | 2026-07-11 | v1.82.0 triage session 2026-06-11 | +| Copilot Cowork SDLC port | GitHub Copilot desktop app launched in technical preview (June 2, 2026) with parallel multi-agent in git worktrees. Could port SDLC wizard as a Copilot extension. Different plugin format from Claude — would be a new sibling repo like `codex-sdlc-wizard`. Low priority: no demand signal, Copilot extension ecosystem is immature. | Second external user asks OR Copilot extension format stabilizes with hooks equivalent | 2026-08-20 | v1.84.0 triage session 2026-06-20 | +| Run /insights with Fable on this repo | Native CC `/insights` generates session usage report (friction patterns, satisfaction, outcome summaries). Run once with Fable model at max effort to get a high-quality analysis of how the wizard is actually being used. Review the HTML report for patterns that could improve /sdlc or hooks. | Next Fable session | 2026-07-22 | v1.83.0 session 2026-06-12, re-dated 2026-06-22 triage | +| Memory audit + repo efficiency pass with Fable | Run Memory Audit Protocol on private memory files — promote portable lessons, delete stale entries. Then: Fable batch-review the full repo (SKILL.md files, hooks, wizard doc) for efficiency, dead code, stale references, and consolidation opportunities. Goal: build a 95% confidence task list of cleanup items that can be worked in parallel. | Next Fable session | 2026-07-22 | v1.82.0 triage session 2026-06-11, re-dated 2026-06-22 triage | | ~~Sync AI Setup Lanes to Claude-family sibling wizards~~ | **DONE 2026-06-11.** gdlc v0.3.0, rdlc v0.7.0 shipped with AI Setup Lanes v2. Cowork plugin port in PR #410. Originally: `claude-gdlc-wizard` (v0.2.2) and `claude-rdlc-wizard` (v0.6.1) have stale 69-70 line AI_SETUP_LANES.md vs sdlc-wizard's 217 lines. Missing: 3-lane structure (Premium/Saver/Lite), advisor fallback escalation, Fable effort guidance, usage signals, autocompact cross-ref. Each needs a tailored port — gdlc is game-dev domain, rdlc is research domain. Codex/xdlc/ldlc out of scope (different ecosystem). When people run `/update` in those repos they should get the same Premium lane experience. | Next release of either sibling | 2026-07-11 | v1.82.0 triage session 2026-06-11 | > **Maintenance rule:** during quarterly ROADMAP triage, prune any row past expiry. Logged removals go in the commit message (`docs(roadmap): prune parking lot — expired entries`) so deletions are traceable. @@ -316,4 +317,6 @@ Living tracker of projects shipped using this wizard. **Rule:** only list projec | 302 | User-level setup-wizard + repo-local lifecycle split — **DESIGN-CERTIFIED, implementation deferred** | GH issue #302 proposes splitting the wizard so `setup` lives user-level/global and `sdlc`/`update`/`feedback` stay repo-local ("install once, bootstrap anywhere"). Cross-model design review completed 2026-05-20 (Codex gpt-5.5 xhigh, [`.reviews/302-design-codex-review.md`](.reviews/302-design-codex-review.md)) scored Claude's first-pass analysis 5/10 NOT CERTIFIED and replaced it with a concrete channel contract. **Decided channel split:** plugin channel = user-level/global path (already exists: `.claude-plugin/plugin.json` + `hooks/hooks.json` with `${CLAUDE_PLUGIN_ROOT}`); npm/npx = explicit repo-local bootstrap/update path; no npm `postinstall` writing to `~/.claude/` (fragile under CI/containers/read-only HOME/Windows). **Design questions answered:** (1) `setup` global via plugin, not also copied repo-local (precedence #338 makes the global copy dead weight when both exist); preserve auditability via `SDLC.md` metadata or install manifest, not duplicate same-name skill. (2) `feedback` global via plugin, can still read repo-local `SDLC.md` at invocation time. (3) `update-wizard` stays repo-local (reads `SDLC.md` metadata, runs drift detection, mutates project files). (4) `npm install -g` installs the CLI binary only — no lifecycle scripts. (5) Migration: leave existing repos unchanged; new split is opt-in only; later major release can stop copying `setup`/`feedback` repo-local for new installs. **Hard constraint Claude missed:** "global setup auto-triggers anywhere" is FALSE — `hooks/instructions-loaded-check.sh:14-22` and `hooks/sdlc-prompt-check.sh:27-35` exit silently when `SDLC.md`/`TESTING.md` don't exist, and `tests/test-hooks.sh:2507/2522` actively assert that silence. A plugin-only fresh-repo bootstrap signal must be designed before the global skill can satisfy the "always available" promise from the issue. **Existing dual-channel surface to respect:** `cli/init.js:286-302` blocks npm-on-plugin (duplicate `/update-wizard`), `hooks/instructions-loaded-check.sh:195-206` warns on dual install. Adding npm-global skills would create a third overlapping surface — deprecate one channel first. **Implementation sequence (5 steps, do NOT start without dedicated session):** (1) lock the channel contract in `CLAUDE_CODE_SDLC_WIZARD.md` + ARCHITECTURE.md; (2) doc the fresh-repo trigger limitation + migration story in #302 comment; (3) prototype plugin-only fresh-repo bootstrap signal + assert current silent-hook tests still pass; (4) write global-skill collision/backup/update tests if any installer touches `~/.claude/skills/`; (5) only then alter `init` behavior for new installs, leaving existing ones intact. **Why deferred 2026-05-20:** session shipped v1.74.0 → v1.75.0 → v1.75.1 (Trusted Publishing migration), maintainer fatigued, #302 is architectural — needs a fresh planning session. **Trigger to revisit:** explicit user request OR a second repo install where "install once" friction is felt acutely. | | 347 | Goal-mode checkpoint workflow (Codex `$gdlc` equivalent) — **CORRECTED 2026-05-24: native `/goal` exists, scope shrunk to ~30-line skill wrapper** | GH issue #347 asks whether Claude Code has a native primitive for long-running goal-bound work (Codex `$sdlc + $gdlc` pattern with persistent constraints, checkpoint cadence, explicit stop boundaries). **2026-05-23 research was WRONG** — claude-code-guide subagent said no native primitive, but CC **v2.1.139 shipped native `/goal`** (confirmed via raw changelog curl; docs at [code.claude.com/docs/en/goal.md](https://code.claude.com/docs/en/goal.md); follow-up fixes v2.1.140 hook-disabled hang + v2.1.143 subagent race). Corrected research at [`.reviews/347-goal-mode-research-CORRECTED.md`](.reviews/347-goal-mode-research-CORRECTED.md). **What `/goal` does:** session-scoped, evaluator-driven (Haiku default judges transcript after each turn yes/no), survives `--resume` but not `/clear`, no disk writes, no native turn/time cap. UX = `/goal ` + `/goal` (status) + `/goal clear`. **Old verdict OBE.** **Corrected scope = ~30-line skill wrapper in existing `/sdlc` (NOT a `GOAL.md` template — `/goal` writes nothing to disk).** Per [`.reviews/347-goal-mode-research-CORRECTED.md`](.reviews/347-goal-mode-research-CORRECTED.md), the wizard should add: (a) **pre-flight checklist** — workspace trusted, hooks not disabled at any settings layer, CC ≥ v2.1.143; (b) **condition-writing guidance** mirroring SDLC reporting standards (measurable end state + check + constraints + hard turn/time bound since `/goal` has no native cap); (c) **compose-with-hooks note** — `UserPromptSubmit`/`SessionStart`/`PreCompact` fire normally inside the goal loop, so `sdlc-prompt-check.sh` + `precompact-seam-check.sh` keep gating each turn; (d) **resume caveat** — `--resume` resets turn/time counters; (e) **anti-pattern callout** — don't use `/goal` for "doneness" the evaluator can't see in the transcript (evaluator can't call tools). **Status:** DONE 2026-05-24 (PR #351, v1.76.0 — `/goal` wrapper section added to `/sdlc` skill + `/update` skill changelog entry; followed by PR #355 v1.77.0 adding HIGH-95% confidence gate + DLC-binding requirement + condition-as-contract guidance). **Original 5-step plan + `GOAL.md` template scaffolding OBE.** **Meta-lesson captured in CORRECTED doc:** require explicit citation of an authoritative source (docs index, raw changelog grep) before accepting a "feature does not exist" claim — negative claims are easier to fake than positive ones. **Process gap also caught:** auto-update PR workflow stopped flagging features at CC v2.1.118 (2026-04-23, last PR #210) because #231 Phase 3d (v1.54.0) gutted the in-CI LLM-ranker to $0. Manual replacement (`claude --print --allowedTools "WebFetch,Read,Bash" "$(cat .github/prompts/analyze-release.md)"`) was supposed to run weekly on Max — it didn't, which is how `/goal` slipped past unnoticed for ~5 weeks. See #350 for the cadence fix. | | 350 | CC feature-discovery cadence — replacement-for-#231 ranker has no enforced cadence | **Process gap discovered 2026-05-24 via #347 correction.** #231 Phase 3d (v1.54.0, 2026-04-29) deleted the in-CI Claude ranker from `weekly-update.yml` to take it to $0/week. The fallback was "maintainer runs `claude --print "$(cat .github/prompts/analyze-release.md)"` weekly on Max." That ran zero times in 5 weeks. Consequence: native CC `/goal` shipped in v2.1.139 (~mid-May) and was missed until a user asked a related question 2026-05-24 — wizard then published wrong research saying "no /goal exists." 39 CC versions accumulated since the last auto-PR (v2.1.118 → v2.1.150). **Scope (Actionable now, low-cost cleanup):** (a) restore a thin cron — `weekly-update.yml` still runs detection ($0 GH API call); add a single follow-up step that opens a "CC version drift" GitHub issue when our `` baseline is >5 minor versions behind latest npm — pure GitHub API, no LLM, no API spend; (b) the issue body links the maintainer to the on-Max command + the changelog URL + a one-line "what to do" reminder; (c) session-start `instructions-loaded-check.sh` already has the staleness nudge (#196 — "N minor versions behind") — verify it actually fires when the gap is ≥3 minor AND link to the analyze-release runbook in the loud branch. **Why this beats #85 Phase 2** (which we killed 2026-05-24): #85 Phase 2 was "auto-rank features in CI via LLM" — expensive and conflicted with #231. This is "open a stale-baseline issue when we detect drift" — cheap, GH API only, just a maintainer reminder. **Status:** DONE 2026-05-25 (PR #354, v1.77.0 — `.github/workflows/cc-version-drift.yml` ships: Mon 09:30 UTC cron + workflow_dispatch, `scripts/cc-drift-check.sh` SemVer-delta logic, machine-readable issue marker for idempotent edits, only re-opens closed issues when delta WIDENED. Verified live 2026-05-25 via manual workflow_dispatch on main: ran clean, "Open or update tracking issue" step SKIPPED — silent when no drift, as designed). Paired with #347 in same release arc. | +| 424 | Cowork plugin enhancement — prompt-based hooks, verified install, README corrections | **Maintainer pain event:** `cowork/` plugin exists (PR #410, v1.83.0) with 2 skills (sdlc + feedback) and CI drift tests, but has NEVER been installed or tested in a real Cowork session. README says "Cowork hook support is unverified" — **this is wrong.** Anthropic's own `cowork-plugin-management` plugin (v0.2.2, `knowledge-work-plugins` marketplace) documents hooks as first-class plugin components with 9 event types including **prompt-based hooks** that need no shell access. Deep-research workflow (2026-06-19, 101 agents, Codex 7/10) confirmed Claude Cowork is a separate product from Claude Code but shares the same plugin format. **Scope (multi-phase):** (a) Verify install path — actually install the plugin in a Cowork session via GitHub URL, confirm skills load and invoke correctly; (b) Add prompt-based hooks — TDD reminder (`PreToolUse` on Write/Edit), SDLC baseline injection (`UserPromptSubmit`), confidence check (`Stop`). Use `"type": "prompt"` not `"type": "command"` — no shell needed; (c) Update README — remove "unverified" language, clarify Code vs Cowork product distinction, document prompt-hook enforcement; (d) Update `tests/test-cowork-drift.sh` Test 8 — flip from "no hooks = PASS" to "hooks exist and match format = PASS"; (e) Evaluate adding SDLC-specific agents (code-reviewer, tdd-checker) per plugin agent schema. **Entry gate:** maintainer pain event — dead code with CI tests, never tested in target environment. Relates to #302 (user-level setup split) — Cowork plugin is the "global" distribution surface. **Token cost note:** `/deep-research` used for initial research burned 2.5M tokens (23x median session, GH issue #423). Do NOT use for follow-up work — use targeted WebFetch. | +| 425 | Dynamic Workflows + ultracode evaluation — CC v2.1.154 feature we missed | **Maintainer pain event (#350 cadence gap manifested again).** CC v2.1.154 (May 28, 2026) shipped Dynamic Workflows — JavaScript scripts orchestrating subagents at scale. We're on v2.1.173 and missed it for 3 weeks. ROADMAP #71 (KAIROS/Coordinator Mode) was watching for this class of feature — Dynamic Workflows may or may not be related to KAIROS internally (not verifiable from public sources). Deep-research (2026-06-19, Codex 7/10 verified) confirmed: (a) only built-in workflow is `/deep-research`; custom workflows save to `.claude/workflows/` (project) or `~/.claude/workflows/` (user); (b) `ultracode` is a new effort level (`/effort ultracode`) = xhigh reasoning + automatic workflow orchestration, session-only; (c) `agent()`, `parallel()`, `pipeline()`, `phase()` primitives are NOT in official Anthropic docs — internal implementation, treat as non-public API; (d) Agent Teams are separate (experimental, `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS`), not superseded. **Scope:** (a) Update #71 wording — Dynamic Workflows shipped as the public multi-agent orchestration feature, relationship to KAIROS unknown, Agent Teams still separate/experimental; (b) Evaluate `ultracode` vs `max` — Codex says ultracode should NOT replace max as default (it's xhigh + auto-workflow, not deeper reasoning), document as opt-in escalation for audits/migrations/research; (c) Document Workflows in wizard — when to use, cost awareness (deep-research = 2.5M tokens), `.claude/workflows/` save path; (d) Evaluate whether wizard should ship custom SDLC workflows (Prove-It Gate applies — prove quality before adding). **Entry gate:** maintainer pain event — #350 cadence gap. Sources: [code.claude.com/docs/en/workflows](https://code.claude.com/docs/en/workflows), [Anthropic blog](https://claude.com/blog/introducing-dynamic-workflows-in-claude-code), Codex review `.reviews/codex-deep-research-triage.md`. | diff --git a/cowork/.claude-plugin/plugin.json b/cowork/.claude-plugin/plugin.json index 9e067d83..6eb69d9a 100644 --- a/cowork/.claude-plugin/plugin.json +++ b/cowork/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "sdlc-wizard-cowork", "version": "1.83.0", - "description": "SDLC methodology guidance for Claude Cowork — TDD, planning, self-review, confidence-driven development", + "description": "SDLC enforcement for Claude Cowork — TDD, planning, self-review via prompt-based hooks and skills", "author": { "name": "Stefan Ayala", "url": "https://github.com/BaseInfinity" diff --git a/cowork/README.md b/cowork/README.md index 88fc7dba..793029f8 100644 --- a/cowork/README.md +++ b/cowork/README.md @@ -1,35 +1,52 @@ # SDLC Wizard — Cowork Plugin -Methodology guidance for Claude Cowork sessions. Provides the SDLC workflow and community feedback skills without enforcement hooks. +SDLC enforcement for Claude Cowork sessions. Provides methodology guidance via skills AND prompt-based hooks that enforce discipline without shell access. + +> **Claude Cowork** is a separate product from Claude Code — it's a desktop application for knowledge workers. This plugin targets the shared plugin format that works in both Code and Cowork. ## What You Get -| Skill | Invocation | Purpose | -|-------|------------|---------| -| SDLC | `/sdlc-wizard-cowork:sdlc` | Full SDLC workflow: planning, TDD, self-review, CI shepherd | -| Feedback | `/sdlc-wizard-cowork:feedback` | Privacy-first community feedback and pattern sharing | +| Component | Invocation / Event | Purpose | +|-----------|-------------------|---------| +| SDLC skill | `/sdlc-wizard-cowork:sdlc` | Full SDLC workflow: planning, TDD, self-review | +| Feedback skill | `/sdlc-wizard-cowork:feedback` | Privacy-first community feedback and pattern sharing | +| TDD hook | `PreToolUse` (Write/Edit) | Reminds you to write failing tests before implementation | +| SDLC baseline hook | `UserPromptSubmit` | Injects the SDLC checklist at every prompt | +| Completion hook | `Stop` | Checks confidence stated, self-review done, tests passing | + +## Hooks — Prompt-Based Enforcement + +This plugin ships 3 **prompt-based hooks** (`"type": "prompt"`) that enforce SDLC discipline without shell access. These are the Cowork equivalents of Claude Code's bash hooks: -## What You Don't Get (and Why) +| Cowork Hook | Claude Code Equivalent | Event | +|-------------|----------------------|-------| +| TDD check | `tdd-pretool-check.sh` | `PreToolUse` (Write/Edit/MultiEdit) | +| SDLC baseline | `sdlc-prompt-check.sh` | `UserPromptSubmit` | +| Completion check | _(new — no CC equivalent)_ | `Stop` | -### No Hooks (Enforcement Gap) +Prompt hooks work by injecting instructions into Claude's context at the right moment — no bash, no shell, no filesystem access needed. -The full SDLC wizard uses 6 lifecycle hooks to enforce discipline at every interaction (TDD reminders before file edits, model/effort checks at session start, etc.). **Cowork hook support is unverified** — the plugin system is shared between Code and Cowork, but hook execution in Cowork sessions has not been confirmed. +> **Note:** These hooks use the same format as Claude Code plugin hooks and match the spec documented in Anthropic's `cowork-plugin-management` plugin. However, they have not yet been tested in a live Cowork session. If hooks don't fire after install, file a bug on the [wizard repo](https://github.com/BaseInfinity/claude-sdlc-wizard/issues). -This plugin ships skills only. You get the methodology guidance; enforcement relies on you following it rather than hooks forcing it. +### What's NOT Ported (and Why) -### No Setup or Update Skills +| Claude Code Hook | Why Not Ported | +|-----------------|----------------| +| `instructions-loaded-check.sh` | `InstructionsLoaded` event not available in Cowork plugin hooks | +| `model-effort-check.sh` | Effort levels are CC-specific; Cowork doesn't expose model config | +| `precompact-seam-check.sh` | Depends on `.reviews/handoff.json` on disk; Cowork may not have filesystem | -The `/setup` and `/update` skills are CLI-specific — they read version markers, write hooks to `.claude/`, and run shell commands. These don't translate to Cowork's sandboxed environment. Use the full wizard via `npx agentic-sdlc-wizard init` in a Claude Code session if you need setup/update. +### What's NOT Included (and Why) -### CLI-Dependent Sections in the SDLC Skill +**No Setup or Update skills** — these are CLI-specific (read version markers, write hooks to `.claude/`, run shell commands). Use the full wizard via `npx agentic-sdlc-wizard init` in a Claude Code session. -The `/sdlc` skill references shell tooling you may not have in Cowork: +**CLI-dependent SDLC sections** — the `/sdlc` skill references tools you may not have in Cowork: -- **Cross-model review** (`codex exec`) — requires Codex CLI + OpenAI API key + shell access. In Cowork, skip this or use ChatGPT/Codex web manually as your cross-model check. -- **CI shepherd** (`gh pr`, `git push`) — requires terminal. In Cowork, these steps happen outside your session. -- **`/code-review`** — works in Cowork if the plugin is loaded; runs against your session context. +- **Cross-model review** (`codex exec`) — use ChatGPT/Codex web manually as your cross-model check +- **CI shepherd** (`gh pr`, `git push`) — these steps happen outside your Cowork session +- **`/code-review`** — works in Cowork if the plugin is loaded -The methodology (plan → TDD → self-review → confidence check) is universal. The tooling commands are Claude Code shortcuts for steps you can do manually in any surface. +The methodology (plan → TDD → self-review → confidence check) is universal. The hooks enforce it; the tooling commands are Claude Code shortcuts for steps you can do manually. ## Installation @@ -51,16 +68,16 @@ claude --plugin-dir ./cowork This is a **subset** of the [SDLC Wizard](https://github.com/BaseInfinity/claude-sdlc-wizard). The full wizard provides: -- 6 lifecycle hooks (enforcement) +- 6 lifecycle hooks (command-based, bash) - 4 skills (sdlc, setup, update, feedback) - CLI installer (`npx agentic-sdlc-wizard init`) - npm package distribution -This Cowork plugin provides the 2 portable skills (sdlc + feedback) that work without shell access or hook enforcement. +This Cowork plugin provides 2 portable skills + 3 prompt-based hooks — enforcement without shell access. ### Drift Prevention -The skills in this package are **copies** of the canonical skills in `skills/`. A CI test (`tests/test-cowork-drift.sh`) fails if they diverge. When the canonical skills update, the test forces this package to sync. +The skills in this package are **copies** of the canonical skills in `skills/`. A CI test (`tests/test-cowork-drift.sh`) fails if they diverge. The test also validates hook format (prompt-only, correct events, valid JSON). When the canonical skills update, the test forces this package to sync. ## For Claude Code Users diff --git a/cowork/hooks/hooks.json b/cowork/hooks/hooks.json new file mode 100644 index 00000000..84e4657e --- /dev/null +++ b/cowork/hooks/hooks.json @@ -0,0 +1,39 @@ +{ + "description": "SDLC Wizard enforcement hooks — prompt-based (no shell access needed)", + "hooks": { + "PreToolUse": [ + { + "matcher": "Write|Edit|MultiEdit", + "hooks": [ + { + "type": "prompt", + "prompt": "TDD CHECK: Before this file edit, verify: Is there a FAILING TEST for this change? If you are writing implementation code before a failing test exists, STOP. Write the test first (TDD RED), then implement (TDD GREEN). If the test already exists and fails, proceed with the implementation.", + "timeout": 15 + } + ] + } + ], + "UserPromptSubmit": [ + { + "hooks": [ + { + "type": "prompt", + "prompt": "SDLC BASELINE: Before proceeding with this task, ensure you follow the SDLC workflow: 1. Plan tasks FIRST (outline steps before coding). 2. STATE CONFIDENCE: HIGH/MEDIUM/LOW. 3. LOW confidence? ASK USER before proceeding. 4. FAILED 2x? STOP and ASK USER. 5. ALL TESTS MUST PASS BEFORE COMMIT - NO EXCEPTIONS.", + "timeout": 10 + } + ] + } + ], + "Stop": [ + { + "hooks": [ + { + "type": "prompt", + "prompt": "COMPLETION CHECK: Before finishing, verify: 1. Did you state your confidence level (HIGH/MEDIUM/LOW)? 2. Did you read back the files you modified (self-review)? 3. Are all tests passing? 4. Is there any dead code, legacy fallbacks, or scope creep in your changes? If any check fails, address it before presenting your response.", + "timeout": 15 + } + ] + } + ] + } +} diff --git a/tests/test-cowork-drift.sh b/tests/test-cowork-drift.sh index 68c52732..b9726639 100755 --- a/tests/test-cowork-drift.sh +++ b/tests/test-cowork-drift.sh @@ -78,22 +78,26 @@ fi echo "" echo "--- Content Validation Tests ---" -# Test 7: README documents the enforcement gap (no hooks) +# Test 7: README documents prompt-based hooks if [ -f "$PROJECT_ROOT/cowork/README.md" ]; then - if grep -qi "hook" "$PROJECT_ROOT/cowork/README.md"; then - pass "README mentions hooks (enforcement gap documented)" + if grep -qi "prompt.*hook\|prompt-based" "$PROJECT_ROOT/cowork/README.md"; then + pass "README documents prompt-based hooks" else - fail "README does not mention hooks — must document enforcement gap" + fail "README does not document prompt-based hooks" fi else fail "README missing (skipping content check)" fi -# Test 8: No hooks directory in cowork (deliberate omission) -if [ ! -d "$PROJECT_ROOT/cowork/hooks" ]; then - pass "cowork has no hooks/ directory (deliberate — enforcement gap)" +# Test 8: hooks/hooks.json exists and is valid JSON +if [ -f "$PROJECT_ROOT/cowork/hooks/hooks.json" ]; then + if python3 -c "import json; json.load(open('$PROJECT_ROOT/cowork/hooks/hooks.json'))" 2>/dev/null; then + pass "cowork hooks/hooks.json exists and is valid JSON" + else + fail "cowork hooks/hooks.json is not valid JSON" + fi else - fail "cowork has hooks/ directory — Cowork hook support is unverified, remove" + fail "cowork/hooks/hooks.json missing — prompt-based hooks required" fi # Test 9: plugin.json version matches root package.json @@ -109,6 +113,118 @@ else fail "plugin.json or package.json missing (skipping version check)" fi +echo "" +echo "--- Hook Quality Tests ---" + +# Test 10: hooks.json has correct wrapper structure (events under "hooks" key) +if [ -f "$PROJECT_ROOT/cowork/hooks/hooks.json" ]; then + if python3 -c " +import json +d=json.load(open('$PROJECT_ROOT/cowork/hooks/hooks.json')) +assert 'hooks' in d, 'missing top-level hooks key' +assert isinstance(d['hooks'], dict), 'hooks must be a dict' +" 2>/dev/null; then + pass "hooks.json has correct wrapper structure (events under 'hooks' key)" + else + fail "hooks.json has wrong structure — events must be under 'hooks' key, not at root" + fi +else + fail "hooks.json not found (skipping structure check)" +fi + +# Test 11: hooks.json has PreToolUse TDD hook +if [ -f "$PROJECT_ROOT/cowork/hooks/hooks.json" ]; then + if python3 -c " +import json +d=json.load(open('$PROJECT_ROOT/cowork/hooks/hooks.json'))['hooks'] +ptus=d.get('PreToolUse',[]) +assert any('Write' in h.get('matcher','') or 'Edit' in h.get('matcher','') for h in ptus), 'no Write/Edit matcher' +assert any(hk.get('type')=='prompt' for h in ptus for hk in h.get('hooks',[])), 'no prompt type' +" 2>/dev/null; then + pass "hooks.json has PreToolUse TDD prompt hook for Write/Edit" + else + fail "hooks.json missing PreToolUse TDD prompt hook for Write/Edit" + fi +else + fail "hooks.json not found (skipping PreToolUse check)" +fi + +# Test 12: hooks.json has UserPromptSubmit SDLC baseline hook +if [ -f "$PROJECT_ROOT/cowork/hooks/hooks.json" ]; then + if python3 -c " +import json +d=json.load(open('$PROJECT_ROOT/cowork/hooks/hooks.json'))['hooks'] +ups=d.get('UserPromptSubmit',[]) +assert len(ups)>0, 'no UserPromptSubmit hooks' +assert any(hk.get('type')=='prompt' for h in ups for hk in h.get('hooks',[])), 'no prompt type' +" 2>/dev/null; then + pass "hooks.json has UserPromptSubmit SDLC baseline prompt hook" + else + fail "hooks.json missing UserPromptSubmit SDLC baseline prompt hook" + fi +else + fail "hooks.json not found (skipping UserPromptSubmit check)" +fi + +# Test 13: hooks.json has Stop confidence check hook +if [ -f "$PROJECT_ROOT/cowork/hooks/hooks.json" ]; then + if python3 -c " +import json +d=json.load(open('$PROJECT_ROOT/cowork/hooks/hooks.json'))['hooks'] +stops=d.get('Stop',[]) +assert len(stops)>0, 'no Stop hooks' +assert any(hk.get('type')=='prompt' for h in stops for hk in h.get('hooks',[])), 'no prompt type' +" 2>/dev/null; then + pass "hooks.json has Stop confidence check prompt hook" + else + fail "hooks.json missing Stop confidence check prompt hook" + fi +else + fail "hooks.json not found (skipping Stop check)" +fi + +# Test 14: All hooks are prompt type (no command type — Cowork has no shell) +if [ -f "$PROJECT_ROOT/cowork/hooks/hooks.json" ]; then + if python3 -c " +import json +d=json.load(open('$PROJECT_ROOT/cowork/hooks/hooks.json'))['hooks'] +for event, matchers in d.items(): + for m in matchers: + for h in m.get('hooks',[]): + assert h.get('type')=='prompt', f'{event} has non-prompt hook type: {h.get(\"type\")}' +" 2>/dev/null; then + pass "all cowork hooks are prompt type (no shell dependency)" + else + fail "cowork has non-prompt hook types — Cowork has no shell access" + fi +else + fail "hooks.json not found (skipping type check)" +fi + +echo "" +echo "--- ROADMAP Tracking Tests ---" + +# Test 15: ROADMAP has a Cowork enhancement entry +if grep -q "Cowork plugin enhancement" "$PROJECT_ROOT/ROADMAP.md" 2>/dev/null; then + pass "ROADMAP tracks Cowork plugin enhancement" +else + fail "ROADMAP missing Cowork plugin enhancement entry" +fi + +# Test 16: ROADMAP Cowork entry mentions prompt-based hooks specifically +if grep -A10 "Cowork plugin enhancement" "$PROJECT_ROOT/ROADMAP.md" 2>/dev/null | grep -qi "prompt.*hook\|prompt-based"; then + pass "ROADMAP Cowork entry mentions prompt-based hooks" +else + fail "ROADMAP Cowork entry does not mention prompt-based hooks — key deliverable missing" +fi + +# Test 17: ROADMAP has Dynamic Workflows evaluation entry +if grep -q "Dynamic Workflows" "$PROJECT_ROOT/ROADMAP.md" 2>/dev/null; then + pass "ROADMAP tracks Dynamic Workflows evaluation" +else + fail "ROADMAP missing Dynamic Workflows evaluation entry" +fi + echo "" echo "=== Results: $PASS passed, $FAIL failed ===" [ "$FAIL" -eq 0 ] && exit 0 || exit 1