Skip to content

feat: test coverage gate + plan completion audit + auto-verification (v0.11.13.0)#428

Open
garrytan wants to merge 5 commits intomainfrom
garrytan/ship-test-plan-gates
Open

feat: test coverage gate + plan completion audit + auto-verification (v0.11.13.0)#428
garrytan wants to merge 5 commits intomainfrom
garrytan/ship-test-plan-gates

Conversation

@garrytan
Copy link
Owner

Summary

  • Test coverage gate in /ship — AI-assessed coverage below 60% is a hard stop, 60-79% prompts, 80%+ passes. Configurable via CLAUDE.md ## Test Coverage section.
  • Coverage warning in /review — Low coverage flagged before reaching the /ship gate.
  • Plan completion audit — /ship reads plan file, extracts actionable items, cross-references against diff, gates on NOT DONE items. /review uses it to augment scope drift detection.
  • Auto-verification via /qa-only — /ship invokes /qa-only inline with plan's verification section, conditional on localhost server availability.
  • Shared plan file discovery — conversation context first, grep fallback second. DRY with existing plan file review report.
  • Ship metrics logging — coverage %, plan completion ratio, verification results to review JSONL.
  • Plan completion in /retro — weekly retros show plan completion rates.

Test Coverage

All new code paths have test coverage. 162 lines of new tests in gen-skill-docs.test.ts covering:

  • Plan completion audit placeholders (ship + review modes)
  • Plan verification placeholder
  • Coverage gate thresholds + graceful degradation
  • Ship metrics logging step
  • Plan file discovery shared helper
  • Retro plan completion section

Pre-Landing Review

No issues found. All changes are prompt template text generators — no SQL, no runtime code, no security surface.

Design Review

No frontend files changed — design review skipped.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

Plan file: ~/.claude/plans/eventual-puzzling-peacock.md
All items DONE — CEO review (4 expansions accepted), Eng review (0 issues), Codex outside voice (3 findings accepted).

Test plan

  • All gen-skill-docs tests pass (2 pre-existing failures unrelated to this PR)
  • All skill validation tests pass
  • No unresolved placeholders in any generated SKILL.md
  • New placeholders (PLAN_COMPLETION_AUDIT_SHIP, PLAN_COMPLETION_AUDIT_REVIEW, PLAN_VERIFICATION_EXEC) expand correctly

🤖 Generated with Claude Code

garrytan and others added 4 commits March 23, 2026 22:58
Three new gates in /ship and /review:
1. Test coverage gate: configurable thresholds (60%/80% default), hard stop
   below minimum with user override
2. Plan completion audit: discovers plan file, extracts actionable items,
   cross-references against diff, gates on NOT DONE items
3. Auto-verification: invokes /qa-only inline with plan's verification
   section, conditional on localhost reachability

Also: coverage warning in /review, plan completion data in /retro,
shared plan file discovery helper (DRY), ship metrics logging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 24, 2026

E2E Evals: ❌ FAIL

81/100 tests passed | $19.04 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 7/7 $0.33
e2e-deploy 4/4 $0.6
e2e-design 7/7 $1.85
e2e-plan 6/6 $2.53
e2e-qa-bugs 3/3 $1.39
e2e-qa-workflow 6/6 $2.86
e2e-review 7/7 $1.84
e2e-routing 7/21 $3.5
e2e-workflow 4/9 $0.8
llm-judge 24/24 $0.48
e2e-qa-workflow 6/6 $2.86

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Failures

  • ❌ journey-ideation: success
  • ❌ journey-design-system: success
  • ❌ journey-think-bigger: success
  • ❌ journey-ideation: success
  • ❌ journey-debug: success
  • ❌ journey-visual-qa: success
  • ❌ journey-think-bigger: success
  • ❌ journey-ideation: success
  • ❌ journey-think-bigger: success
  • ❌ journey-visual-qa: success
  • ❌ journey-debug: success
  • ❌ journey-design-system: success
  • ❌ journey-visual-qa: success
  • ❌ journey-debug: success
  • ❌ /ship local workflow: success
  • ❌ /ship local workflow: success
  • ❌ /ship local workflow: success
  • ❌ /setup-browser-cookies detect: error_max_turns
  • ❌ /setup-browser-cookies detect: error_max_turns

Ported plan completion audit, coverage gate, and auto-verification
resolvers into main's modular resolver pipeline. Updated CHANGELOG
version to 0.11.14.0 (main took 0.11.13.0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant