feat: test coverage gate + plan completion audit + auto-verification (v0.11.13.0) by garrytan · Pull Request #428 · garrytan/gstack

garrytan · 2026-03-24T06:00:19Z

Summary

Test coverage gate in /ship — AI-assessed coverage below 60% is a hard stop, 60-79% prompts, 80%+ passes. Configurable via CLAUDE.md ## Test Coverage section.
Coverage warning in /review — Low coverage flagged before reaching the /ship gate.
Plan completion audit — /ship reads plan file, extracts actionable items, cross-references against diff, gates on NOT DONE items. /review uses it to augment scope drift detection.
Auto-verification via /qa-only — /ship invokes /qa-only inline with plan's verification section, conditional on localhost server availability.
Shared plan file discovery — conversation context first, grep fallback second. DRY with existing plan file review report.
Ship metrics logging — coverage %, plan completion ratio, verification results to review JSONL.
Plan completion in /retro — weekly retros show plan completion rates.

Test Coverage

All new code paths have test coverage. 162 lines of new tests in gen-skill-docs.test.ts covering:

Plan completion audit placeholders (ship + review modes)
Plan verification placeholder
Coverage gate thresholds + graceful degradation
Ship metrics logging step
Plan file discovery shared helper
Retro plan completion section

Pre-Landing Review

No issues found. All changes are prompt template text generators — no SQL, no runtime code, no security surface.

Design Review

No frontend files changed — design review skipped.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

Plan file: ~/.claude/plans/eventual-puzzling-peacock.md
All items DONE — CEO review (4 expansions accepted), Eng review (0 issues), Codex outside voice (3 findings accepted).

Test plan

All gen-skill-docs tests pass (2 pre-existing failures unrelated to this PR)
All skill validation tests pass
No unresolved placeholders in any generated SKILL.md
New placeholders (PLAN_COMPLETION_AUDIT_SHIP, PLAN_COMPLETION_AUDIT_REVIEW, PLAN_VERIFICATION_EXEC) expand correctly

🤖 Generated with Claude Code

Three new gates in /ship and /review: 1. Test coverage gate: configurable thresholds (60%/80% default), hard stop below minimum with user override 2. Plan completion audit: discovers plan file, extracts actionable items, cross-references against diff, gates on NOT DONE items 3. Auto-verification: invokes /qa-only inline with plan's verification section, conditional on localhost reachability Also: coverage warning in /review, plan completion data in /retro, shared plan file discovery helper (DRY), ship metrics logging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-24T06:16:05Z

E2E Evals: ❌ FAIL

81/100 tests passed | $19.04 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-browse	7/7	✅	$0.33
e2e-deploy	4/4	✅	$0.6
e2e-design	7/7	✅	$1.85
e2e-plan	6/6	✅	$2.53
e2e-qa-bugs	3/3	✅	$1.39
e2e-qa-workflow	6/6	✅	$2.86
e2e-review	7/7	✅	$1.84
e2e-routing	7/21	❌	$3.5
e2e-workflow	4/9	❌	$0.8
llm-judge	24/24	✅	$0.48
e2e-qa-workflow	6/6	✅	$2.86

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Failures

❌ journey-ideation: success
❌ journey-design-system: success
❌ journey-think-bigger: success
❌ journey-ideation: success
❌ journey-debug: success
❌ journey-visual-qa: success
❌ journey-think-bigger: success
❌ journey-ideation: success
❌ journey-think-bigger: success
❌ journey-visual-qa: success
❌ journey-debug: success
❌ journey-design-system: success
❌ journey-visual-qa: success
❌ journey-debug: success
❌ /ship local workflow: success
❌ /ship local workflow: success
❌ /ship local workflow: success
❌ /setup-browser-cookies detect: error_max_turns
❌ /setup-browser-cookies detect: error_max_turns

Ported plan completion audit, coverage gate, and auto-verification resolvers into main's modular resolver pipeline. Updated CHANGELOG version to 0.11.14.0 (main took 0.11.13.0).

garrytan and others added 4 commits March 23, 2026 22:58

chore: regenerate SKILL.md files

68ee3d3

chore: merge main and resolve conflicts

48cb411

chore: bump version and changelog (v0.11.13.0)

2400cce

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: merge main and resolve conflicts

b126d5f

Ported plan completion audit, coverage gate, and auto-verification resolvers into main's modular resolver pipeline. Updated CHANGELOG version to 0.11.14.0 (main took 0.11.13.0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: test coverage gate + plan completion audit + auto-verification (v0.11.13.0)#428

feat: test coverage gate + plan completion audit + auto-verification (v0.11.13.0)#428
garrytan wants to merge 5 commits intomainfrom
garrytan/ship-test-plan-gates

garrytan commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Mar 24, 2026

Summary

Test Coverage

Pre-Landing Review

Design Review

Eval Results

Plan Completion

Test plan

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Evals: ❌ FAIL

Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 24, 2026 •

edited

Loading