Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
7352f2a
Retire complexity.npath rule, recalibrate complexity thresholds, and …
mattyhansen May 30, 2026
16ab00a
Bump version to 1.0.0, update CHANGELOG for stable release, and adjus…
mattyhansen May 30, 2026
0209f86
Add ignoredPathDetails to JSON report and implement check-ignore comm…
mattyhansen May 30, 2026
9063a7c
Add baseline movement reporting and failure conditions count gate
mattyhansen May 30, 2026
3427251
Add support for new-findings gate in analysis command and reporting
mattyhansen May 30, 2026
1c8a780
Add new-findings-only gate and enhance reporting features in analysis…
mattyhansen May 30, 2026
645980c
Add result caching mechanism and disable cache option in analysis com…
mattyhansen May 30, 2026
9336bd1
Add ADR-020 for incremental per-file result caching in analysis command
mattyhansen May 30, 2026
47453f2
Add config presets and `extends:` inheritance for simplified configur…
mattyhansen May 30, 2026
c559a6d
Add config presets and `extends:` inheritance for simplified configur…
mattyhansen May 30, 2026
3c2f2cb
Decompose AnalyseCommand into AnalysisFindingSupport and BranchReview…
mattyhansen May 31, 2026
39396b9
retire synthetic design.god-method rubric, and restore docs.return-co…
mattyhansen May 31, 2026
b69f647
Update documentation for missing param tag rule and introduce cluster…
mattyhansen May 31, 2026
460591c
Add security rule fixtures for URL embedded credentials and debug mod…
mattyhansen May 31, 2026
9370fbd
Add advisory comments to rules for clarity on behavior and intent
mattyhansen May 31, 2026
9e889c1
Add comments for clarity and context in various files; enhance docume…
mattyhansen Jun 1, 2026
e4cc3d7
Enhance documentation by adding detailed return descriptions to vario…
mattyhansen Jun 1, 2026
511a47c
Add gruff-code-quality hook and enhance dependency update script
mattyhansen Jun 1, 2026
964dcac
Update goat-flow reference version to 1.9.0 across multiple files
mattyhansen Jun 1, 2026
4168c5e
Add new classes and rules for dead code detection and update composer…
mattyhansen Jun 1, 2026
fab8e73
Enhance documentation by adding hyphens to parameter and return descr…
mattyhansen Jun 1, 2026
d6dd641
Remove redundant comments from various files to improve code clarity
mattyhansen Jun 1, 2026
8b5abb4
Remove redundant comments to enhance code clarity across multiple files
mattyhansen Jun 1, 2026
2161b5a
Remove redundant comments to improve code clarity in multiple rule files
mattyhansen Jun 1, 2026
23df30b
Enhance code clarity by adding comments, updating test assertions, an…
mattyhansen Jun 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
16 changes: 8 additions & 8 deletions .agents/skills/goat-critique/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: goat-critique
description: "Use when a decision or analysis needs multi-lens critique to surface blind spots before shipping."
goat-flow-skill-version: "1.7.0"
goat-flow-skill-version: "1.9.0"
---
# /goat-critique

Expand Down Expand Up @@ -67,7 +67,7 @@ All three perspectives must appear in every critique from Agents A and B. The te

| Agent | Reads | Does NOT read |
|---|---|---|
| A (Risk) | artifact + architecture.md + footguns + lessons + rubric | git history, config.yaml |
| A (Risk) | artifact + architecture.md + targeted grep-first footgun/lesson hits + rubric | git history, config.yaml |
| B (Alternatives) | artifact + architecture.md + `git log --oneline -20` + config.yaml + rubric | footguns, lessons |
| C (Fresh Eyes) | artifact + rubric ONLY | everything else (isolation enforced) |

Expand All @@ -79,7 +79,7 @@ Full directives: `references/sub-agent-directives.md`.
- **B (Alternatives):** SKEPTIC/ANALYST/STRATEGIST on alternatives, ranked by implementation friction. Must surface at least one alternative.
- **C (Fresh Eyes):** No project context. Flags unstated assumptions. ISOLATION RULE enforced.

Each sub-agent MUST return 3-7 findings, each with: title, severity, evidence (file + semantic anchor), confidence, Proof attempt, Evidence quality (OBSERVED/INFERRED/UNVERIFIED), SKEPTIC/ANALYST/STRATEGIST lines, and rubric dimensions covered. Plus: overall assessment (STRONG/ADEQUATE/WEAK/FLAWED) and one thing the artifact gets RIGHT.
Each sub-agent MUST return 3-7 findings, each with: title, severity, evidence (file + semantic anchor), confidence, Proof attempt, Proof class (`RUNTIME | CONTRACT-GREP | STATIC | NOT-REPRODUCED`), Evidence quality (OBSERVED/INFERRED/UNVERIFIED), SKEPTIC/ANALYST/STRATEGIST lines, and rubric dimensions covered. Plus: overall assessment (STRONG/ADEQUATE/WEAK/FLAWED) and one thing the artifact gets RIGHT.

**Lens-finding floor:** each lens must surface >= 1 finding per sub-agent or re-run once; convergence allowed after one re-run. See anti-fabrication constraint. Full floor spec in the sub-agent directives reference pack.

Expand Down Expand Up @@ -154,7 +154,7 @@ Then the full critique:

**Blind spot check:** List unaddressed artifact sections, unmapped rubric aspects, and unread referenced files as "What Wasn't Critiqued." Must never be empty.

**Proof Gate:** Apply the Proof Gate (see Constraints) to every synthesised finding before inclusion.
**Proof Gate:** Apply the Proof Gate (see Constraints) to every synthesised finding before inclusion. Every synthesised finding must carry proof class `RUNTIME | CONTRACT-GREP | STATIC | NOT-REPRODUCED`.

**Phase 5.5 - Meta-audit.** Spawn a lightweight meta-agent (budget: 2 tool calls, no context beyond the draft Phase 5 output). Audit the critique for internal consistency against the 10-point rubric in `references/rubric-examples.md`. If issues found, insert an `## Auto-Detected Issues` block before presenting. Verdict block updated with `Meta-score: N/100`.

Expand Down Expand Up @@ -190,10 +190,10 @@ The rubric determines what sub-agents evaluate. Match to artifact type. Dimensio
- MUST set max 5 tool-call budget per critique sub-agent; log calls/limit when exposed, otherwise unavailable markers. Do not claim mechanical enforcement when counts are unavailable.
- MUST log per spawned critique/cross-exam/meta agent: id/handle if exposed, calls/limit, or unavailable markers.
- MUST Scan Agent C output for context leaks before any other Phase 2 work. Only flag references absent from the input artifact. Any untraceable match = CONTEXT LEAK; discard and re-spawn.
- MUST Check sub-agent completeness: verify each sub-agent returned 3-7 findings plus required lens fields, severity, evidence, confidence, rubric dimensions, overall assessment, and preservation note. Incomplete → re-spawn once; if still incomplete, record `sub-agent completeness limited`.
- MUST Check sub-agent completeness: verify each sub-agent returned 3-7 findings plus required lens fields, severity, evidence, confidence, proof class, rubric dimensions, and overall assessment. Incomplete → re-spawn once; if still incomplete, record `sub-agent completeness limited`.
- MUST enforce cross-examination budget: Max 3 cross-examination agents total, max 3 tool calls per agent.
- Recommendations are never auto-applied. After synthesis, stop. Do not enter implementation mode unless the user explicitly asks to apply changes.
- MUST apply the Proof Gate from `skill-preamble.md` to every synthesised finding. Sub-agent reports are inputs to verify, not evidence to launder. Re-read applies to findings surviving to Phase 5 (typically 3-7 after Phase 3/4 filtering), not to all findings raised in Phase 1.
- MUST apply the Proof Gate from `skill-preamble.md` to every synthesised finding and preserve one proof class tag (`RUNTIME | CONTRACT-GREP | STATIC | NOT-REPRODUCED`) on each. Sub-agent reports are inputs to verify, not evidence to launder. Re-read applies to findings surviving to Phase 5 (typically 3-7 after Phase 3/4 filtering), not to all findings raised in Phase 1.
- MUST NOT fabricate findings. Do not fabricate findings to meet the lens-finding floor; convergence allowed after one re-run.
- Universal constraints from skill-preamble.md apply.

Expand All @@ -209,13 +209,13 @@ The rubric determines what sub-agents evaluate. Match to artifact type. Dimensio
## Sub-Agent Rankings
## Rubric Coverage Gaps
## Control Group Delta
## Validated Findings <!-- source pool for Recommended Changes -->
## Validated Findings <!-- source pool for Recommended Changes; every finding includes proof class -->
## Cross-Examination Results
## Auto-Detected Issues <!-- from Phase 5.5 meta-audit, if any -->
## Retracted Findings
## Human Decisions
## Strengths
## Recommended Changes <!-- subset of Validated Findings; ordered by severity; each with concrete action -->
## Recommended Changes <!-- subset of Validated Findings; ordered by severity; each with concrete action and proof class -->
## Open Questions
## Integration Hooks <!-- for-goat-plan, for-goat-debug, for-implementation -->
## What Wasn't Critiqued
Expand Down
22 changes: 12 additions & 10 deletions .agents/skills/goat-critique/references/rubric-examples.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,46 @@
---
goat-flow-reference-version: "1.7.0"
goat-flow-reference-version: "1.9.0"
---
# Critique Rubric Examples (Reference Pack)

*Extracted from the goat-critique SKILL.md to stay within the 2500-word skill cap. Canonical rubric definitions remain in SKILL.md; worked examples and context-map details live here.*

## Rubric Context Maps

Each rubric has a context map that Step 0 reads and passes to sub-agent spawn directives. Agent C's isolation enforcement (Phase 2 step 1 grep check) is unchanged regardless of context map. Generic fallback uses the default split.
Each rubric has a context map that Step 0 reads and passes to sub-agent spawn directives. Footgun/lesson entries mean targeted grep-first hits from those buckets, not whole-directory reads. Agent C's isolation enforcement (Phase 2 step 1 grep check) is unchanged regardless of context map. Generic fallback uses the default split.

### Plan
- **A:** footguns, lessons, `.goat-flow/decisions/`
- **A:** targeted grep-first footgun/lesson hits, `.goat-flow/decisions/`
- **B:** `.goat-flow/tasks/.active`, `git log --oneline -20`, milestone logs
- **C:** [] (isolation enforced)

### Security assessment
- **A:** footguns, lessons, threat-model docs, `.goat-flow/decisions/`
- **A:** targeted grep-first footgun/lesson hits, threat-model docs, `.goat-flow/decisions/`
- **B:** `git log --oneline -20`, config.yaml, dependency manifests
- **C:** [] (isolation enforced)

### Debug hypotheses
- **A:** footguns, lessons, `.goat-flow/logs/sessions/`
- **A:** targeted grep-first footgun/lesson hits, `.goat-flow/logs/sessions/`
- **B:** `git log --oneline -20`, config.yaml, test output
- **C:** [] (isolation enforced)

### Review findings
- **A:** footguns, lessons, `.goat-flow/decisions/`
- **A:** targeted grep-first footgun/lesson hits, `.goat-flow/decisions/`
- **B:** `git log --oneline -20`, config.yaml, CI logs
- **C:** [] (isolation enforced)

### Test strategy
- **A:** footguns, lessons, `.goat-flow/decisions/`
- **A:** targeted grep-first footgun/lesson hits, `.goat-flow/decisions/`
- **B:** `git log --oneline -20`, config.yaml, test manifests
- **C:** [] (isolation enforced)

### Architecture/refactor
- **A:** footguns, lessons, `.goat-flow/decisions/`, dependency maps
- **A:** targeted grep-first footgun/lesson hits, `.goat-flow/decisions/`, dependency maps
- **B:** `git log --oneline -20`, config.yaml, module boundaries
- **C:** [] (isolation enforced)

### Generic (fallback)
- **A:** footguns, lessons
- **A:** targeted grep-first footgun/lesson hits
- **B:** `git log --oneline -20`, config.yaml
- **C:** [] (isolation enforced)

Expand All @@ -53,6 +53,7 @@ Each rubric has a context map that Step 0 reads and passes to sub-agent spawn di
- **Severity:** HIGH | **Confidence:** HIGH
- **Evidence:** Milestone plan excerpt (search: "Phase 2 additions") - Phase 2 additions depend on Phase 1 extraction completing first
- **Proof attempt:** Read the milestone plan excerpt, confirmed extraction must precede additions
- **Proof class:** STATIC
- **Evidence quality:** OBSERVED
- **SKEPTIC:** If extraction doesn't reclaim enough words, Phase 2 additions blow the 2500 cap
- **ANALYST:** Current 2532w minus ~100w extraction gives ~80w budget for additions; tight but feasible
Expand All @@ -67,6 +68,7 @@ Each rubric has a context map that Step 0 reads and passes to sub-agent spawn di
- **Severity:** CRITICAL | **Confidence:** HIGH
- **Evidence:** `src/api/handler.ts` (search: "database query") - user input passed directly to database query
- **Proof attempt:** Read handler.ts around the database query, confirmed no sanitization before query construction
- **Proof class:** STATIC
- **Evidence quality:** OBSERVED
- **SKEPTIC:** SQL injection vector; worst case is full database compromise
- **ANALYST:** Direct string interpolation in query; parameterised queries would eliminate the risk at zero performance cost
Expand All @@ -79,7 +81,7 @@ Each rubric has a context map that Step 0 reads and passes to sub-agent spawn di
The meta-agent scores the draft critique against these 10 points:

1. **Gate-finding match** - Gate value matches highest surviving severity
2. **Evidence quality per finding** - every finding has Proof attempt + Evidence quality fields
2. **Evidence quality per finding** - every finding has Proof attempt + Proof class + Evidence quality fields
3. **Rubric coverage completeness** - no unaddressed mandatory dimensions
4. **Rec-changes actionability** - every recommendation has a concrete next step
5. **No orphan retractions** - every retracted finding has rationale
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
---
goat-flow-reference-version: "1.7.0"
goat-flow-reference-version: "1.9.0"
---
# Critique Sub-Agent Directives (Reference Pack)

*Extracted from the goat-critique SKILL.md to stay within the 2500-word skill cap. Canonical detail lives here; SKILL.md retains concise summaries.*

## Sub-agent A (Risk Focus - backward-looking context)

**Directive:** "Apply SKEPTIC/ANALYST/STRATEGIST. Focus on RISKS: what could go wrong, what the evidence says about cost/benefit, what the 2nd-order systemic impacts are (local fix → global break patterns), and what the fastest safe path looks like. For any 2nd-order claim, you MUST cite the downstream file or system by name - speculation without a named target gets retracted in Phase 3. Your context includes past mistakes (footguns, lessons) - use them."
**Directive:** "Apply SKEPTIC/ANALYST/STRATEGIST. Focus on RISKS: what could go wrong, what the evidence says about cost/benefit, what the 2nd-order systemic impacts are (local fix → global break patterns), and what the fastest safe path looks like. For any 2nd-order claim, you MUST cite the downstream file or system by name - speculation without a named target gets retracted in Phase 3. Your context includes targeted grep-first past-mistake hits - use them."

**Context reads:** artifact + architecture.md + footguns + lessons + rubric
**Context reads:** artifact + architecture.md + targeted grep-first footgun/lesson hits + rubric
**Does NOT read:** git history, config.yaml

## Sub-agent B (Alternatives Focus - current-state context)
Expand All @@ -31,6 +31,7 @@ goat-flow-reference-version: "1.7.0"
Every finding MUST include:

- **Proof attempt:** exact command/read executed in sub-agent's tool budget, or "N/A - purely structural"
- **Proof class:** `RUNTIME | CONTRACT-GREP | STATIC | NOT-REPRODUCED`
- **Evidence quality:** OBSERVED / INFERRED / UNVERIFIED
- Title, severity (CRITICAL/HIGH/MEDIUM/LOW), evidence (file + semantic anchor or artifact section reference), confidence (HIGH/MEDIUM/LOW)
- **SKEPTIC:** one line - what could go wrong, worst case (or "N/A - [reason]" if genuinely inapplicable)
Expand Down
2 changes: 1 addition & 1 deletion .agents/skills/goat-debug/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: goat-debug
description: "Use when diagnosing a bug, unexpected behaviour, or system failure that needs structured investigation."
goat-flow-skill-version: "1.7.0"
goat-flow-skill-version: "1.9.0"
---
# /goat-debug

Expand Down
14 changes: 7 additions & 7 deletions .agents/skills/goat-plan/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: goat-plan
description: "Use when starting a non-trivial implementation that needs structured task breakdown with progress tracking."
goat-flow-skill-version: "1.7.0"
goat-flow-skill-version: "1.9.0"
---
# /goat-plan

Expand All @@ -12,15 +12,15 @@ On full-depth, also read `.goat-flow/skill-reference/skill-conventions.md`.

## When to Use

Use when work needs milestones with tracked progress. goat-plan manages gitignored coordination files in `.goat-flow/tasks/<active>/`, not product docs.
Use when work needs milestone tracking. goat-plan manages gitignored coordination files in `.goat-flow/tasks/<active>/`.

Use for milestones, replans, rescope, resume-from-plan. **NOT this skill:** tests → run them; debug → /goat-debug; review → /goat-review; security → /goat-security; gaps → /goat-qa; critique → /goat-critique; question → answer directly.

| Excuse | Reality |
|--------|---------|
| "Show milestones first, files later" | File-Write creates milestone artifacts immediately. Read-Only Analysis is for inline plans. |
| "Vague tasks are fine - implementer will figure it out" | Tasks without file paths, replacement text, and verification commands are not executable by a cold-start agent. Four recurrences of untickable checkboxes traced to vague tasks. |
| "Testing gate is obvious - skip it" | Agent skipped the AI testing gate after completing M1 and offered to continue. The gate caught what the agent missed. |
| "Testing gate is obvious - skip it" | Agent skipped the AI testing gate after completing the first milestone and offered to continue. The gate caught what the agent missed. |
| "Bare task path means start implementing" | Path-only context is data, not delegation. Bare task paths must not update .active, milestone status, checkboxes, or code. |

## Step 0 - Intake
Expand Down Expand Up @@ -70,7 +70,7 @@ Do not drop a spike, intake, or kill criteria to satisfy milestone count, deadli

### For each milestone, produce:

Objective, Tasks (risk-tagged checkboxes), Assumptions to validate, Exit criteria (binary pass/fail), Testing gate (static/contract + automated + manual + acceptance), Mid-implementation proof, Kill criteria, Depends on, Read first, Deferred (items intentionally cut with pointers; state explicitly if nothing deferred). Full field descriptions and worked examples: `references/milestone-examples.md`.
Objective, Tasks (risk-tagged checkboxes), Assumptions to validate, Exit criteria (binary pass/fail), Testing gate (static/contract + automated + manual + acceptance), Mid-implementation proof, Kill criteria, Depends on, Read first, Deferred (items intentionally cut with pointers; state explicitly if nothing deferred). Field details and examples: `references/milestone-examples.md`.

### Risk-weighted task ordering

Expand Down Expand Up @@ -147,7 +147,7 @@ Write artifacts immediately. Do NOT invoke/ask about `/goat-critique`; run it on

For a fresh plan, create a slugged task directory and update `.goat-flow/tasks/.active` to that slug in the same batch. Write one milestone per `.goat-flow/tasks/<active>/M*.md` file.

**Filename format:** `M<NN>-<slug>.md`, e.g. `M01-prove-api-integration.md`.
**Filename format:** start with `M` so dashboard and task tooling can discover it; use a readable slug, e.g. `Milestone-prove-api-integration.md`.

**File format:** use existing milestone structure: title, Status, Objective, Depends on, Kill criteria, Read first, Assumptions, Tasks (risk-tagged), Exit Criteria, Testing Gate (static/contract + automated + manual + acceptance), Mid-implementation proof.

Expand Down Expand Up @@ -249,12 +249,12 @@ Summary format for presentation:
```markdown
## Milestones for [feature]

### M01: [name] - [archetype]
### Milestone 01: [name] - [archetype]
**Objective:** [1-2 sentences]
**Tasks:** [N] | **Exit criteria:** [N] | **Testing gate:** [auto + manual + acceptance]
**Kill criteria:** [condition]

### M02: [name] - [archetype]
### Milestone 02: [name] - [archetype]
...

**Total milestones:** [N] | **Estimated sessions:** [rough guess]
Expand Down
2 changes: 1 addition & 1 deletion .agents/skills/goat-plan/references/issue-format.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
goat-flow-reference-version: "1.7.0"
goat-flow-reference-version: "1.9.0"
---
# ISSUE.md Format

Expand Down
6 changes: 3 additions & 3 deletions .agents/skills/goat-plan/references/milestone-examples.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
goat-flow-reference-version: "1.7.0"
goat-flow-reference-version: "1.9.0"
---
# Milestone Template - Detailed Field Reference

Expand Down Expand Up @@ -29,9 +29,9 @@ Assumptions are not tasks - they're beliefs about the system that affect the pla

```markdown
## Assumptions
- [x] Background job queue handles 500-item batches (benchmarked in M1)
- [x] Background job queue handles 500-item batches (benchmarked in the spike)
- [ ] File upload endpoint accepts multipart form data (untested)
- [x] Database migration runs without downtime (spike confirmed in M1)
- [x] Database migration runs without downtime (spike confirmed in the first milestone)
- [ ] Rate limiting handles concurrent requests correctly (assumed, not tested)
```

Expand Down
Loading
Loading