feat: creation workflow quality guidance system by kschlt · Pull Request #7 · kschlt/adr-kit

kschlt · 2026-03-22T08:19:23Z

Summary

Add comprehensive decision quality guidance system that coaches AI agents through creating well-reasoned ADRs
Refactor creation workflow to run quality gate before file creation, enabling a correction loop for better output
Simplify ADR content generation to prevent empty sections
Replace regex extraction with reasoning-agent promptlet for more robust workflow parsing
Audit and restructure README for accuracy and clarity

Changes

New module: decision_guidance.py — quality assessment framework with scoring and actionable feedback
Creation workflow: quality gate moved before file creation; simplified content generation; reasoning-agent promptlet replaces fragile regex
MCP models: new fields to support quality gate skip option
Tests: new unit/integration tests for decision guidance; streamlined creation workflow tests
README: comprehensive rewrite reflecting current capabilities

Test plan

make test-unit passes
make test-integration passes
make test-all passes
Manual: uv run adr-kit mcp-server starts successfully

🤖 Generated with Claude Code

…omptlet Replaced ~308 lines of fragile regex-based policy extraction with a schema-driven reasoning-agent promptlet architecture. This architectural improvement makes the system more reliable and leverages agent reasoning capabilities instead of brittle pattern matching. Background: Started by investigating hanging integration test (test_very_large_data_handling with 600KB text causing regex backtracking). Investigation revealed that the regex extraction approach was architecturally wrong - agents should reason about policies from schema, not rely on pattern matching to extract them. Architectural change: - Removed 7 regex extraction methods (~308 lines total): * _suggest_policy_from_alternatives * _suggest_import_policies * _suggest_pattern_policies * _suggest_architecture_policies * _suggest_config_policies * _suggest_rationales * _normalize_library_name - Replaced _generate_policy_guidance() with promptlet providing: * agent_task with role, objective, and 5 reasoning steps * policy_capabilities (full schema documentation) * example_workflow (concrete scenario showing decision → policy mapping) * guidance (dos/don'ts for constraint extraction) This implements Task 2 (Policy Construction) of the two-step reasoning flow. Task 1 (Decision Creation) guidance documented in new DEC backlog task. Test updates: - Removed 14 regex-based integration tests (TestPolicySuggestionLogic class) - Removed 3 regex-based unit tests (TestPolicySuggestion class) - Added 3 new promptlet-validation tests - All 161 tests passing in 4.93s (was hanging before) - Coverage increased from 21% to 49% Performance: - test_very_large_data_handling: passes in 0.40s (was hanging indefinitely) - Full test suite: 4.93s (130 unit + 31 integration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implement complete decision quality guidance to strengthen ADR creation workflow. This establishes the foundation for high-quality architectural decisions that enable effective policy extraction. ## What Changed **New Decision Guidance Module** (`decision_guidance.py`): - Comprehensive promptlet with ADR structure explanation - Quality criteria (specific, actionable, complete, policy-ready, balanced) - Good vs bad examples for database, frontend, and generic decisions - Anti-patterns with fixes (vague, one-sided, missing context, etc.) - Connection between Task 1 (decision quality) and Task 2 (policy extraction) - Dos/don'ts and workflow guidance **Enhanced Creation Workflow** (`creation.py`): - New `_assess_decision_quality()` method with scoring system (0-100) - Detects 6 quality dimensions: 1. Specificity (generic terms vs specific tech names) 2. Balance (pros AND cons documented) 3. Context quality (explains WHY) 4. Explicit constraints (for policy extraction) 5. Alternatives documentation (enables disallow policies) 6. Decision completeness - Returns quality_feedback with: - Quality score and grade (A-F) - Issues found with severity and suggestions - Strengths recognized - Prioritized recommendations - Context-aware next steps - Improved validation error messages with examples **Enhanced MCP Tool** (`server.py`): - Expanded `adr_create` docstring with inline guidance: - ADR structure (Context/Decision/Consequences/Alternatives) - Quality guidelines (be specific, document trade-offs, explain WHY) - Explicit constraint language examples - Response contents explanation **Comprehensive Tests**: - 14 unit tests for decision guidance module - 12 integration tests for quality assessment - All existing tests still pass (11 in test_workflow_creation.py) - 80% code coverage for creation.py ## Why This Matters **Foundation Enhancement**: Decision quality directly impacts: - Policy extraction effectiveness (Task 2) - Agent understanding of constraints - Future decision reasoning - Automated enforcement reliability **Two-Step Creation Flow**: 1. Task 1 (NEW): Guide agents to write high-quality decisions 2. Task 2 (existing): Extract enforceable policies from decisions Good Task 1 output makes Task 2 trivial. Example: - Bad: "Use a modern framework" - Good: "Use FastAPI. Don't use Flask or Django." → Enables: {'imports': {'disallow': ['flask', 'django']}} **Agent Experience**: - Before: Minimal guidance, vague validation errors - After: Inline structure guide, quality scoring, actionable feedback ## Implementation Details **Scoring System**: - Start at 100, deduct for issues: - Vague terms: -15 - One-sided consequences: -25 - Weak context: -20 - No explicit constraints: -15 - Missing alternatives: -15 - Too brief: -10 - Grades: A (90+), B (75+), C (60+), D (40+), F (<40) **Quality Checks**: - Pattern matching for generic terms - Keyword detection for balance (pros AND cons) - Regex for explicit constraints ("don't use", "must have") - Length checks for completeness - Alternatives presence validation **Feedback Structure**: { "quality_score": 85, "grade": "B", "issues": [{category, severity, issue, suggestion, example_fix}], "strengths": ["..."], "recommendations": ["..."], "next_steps": ["..."] } ## Testing All tests pass (26 new + 11 existing = 37 total): - Unit tests verify guidance structure completeness - Integration tests validate quality assessment accuracy - Edge cases covered (vague, one-sided, missing context, etc.) - Existing creation workflow tests unaffected 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…tion loop BREAKING CHANGE: Quality gate now blocks ADR creation if score < 75 Previously, the quality assessment ran AFTER creating the ADR file, which led to file pollution when agents needed to revise low-quality decisions. The new flow enables a clean correction loop: 1. Agent submits ADR creation request 2. Quality gate runs deterministic checks (BEFORE file I/O) 3. If score < 75: Return REQUIRES_ACTION with feedback, no file created 4. Agent revises and resubmits (correction loop) 5. Only create ADR file when quality passes threshold ## Changes ### Core Workflow (`adr_kit/workflows/creation.py`) - Add `_quick_quality_gate()` method for pre-validation quality checks - Refactor `execute()` to run quality gate BEFORE `_generate_adr_id()` - Return `WorkflowStatus.REQUIRES_ACTION` when quality < threshold - Add `skip_quality_gate` parameter to `CreationInput` for test override ### Enum Extension (`adr_kit/workflows/base.py`) - Add `WorkflowStatus.REQUIRES_ACTION` status for quality gate failures ### Test Updates - Update test_decision_quality_assessment.py: expect `success=False` + `REQUIRES_ACTION` - Add `skip_quality_gate=True` to test fixtures that use minimal inputs - Improve `sample_creation_input` fixture to be high-quality (pass gate) ## Quality Threshold - B grade (75/100) minimum required - Scoring: Specificity (15), Balance (25), Context (20), Constraints (15), Alternatives (15), Completeness (10) ## Backward Compatibility - Tests can set `skip_quality_gate=True` to bypass validation - Quality gate skipped returns placeholder feedback structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Added skip_quality_gate=True parameter to integration tests that use minimal ADR content for testing error scenarios, edge cases, and workflow integration rather than testing decision quality. **Tests Fixed (14 failing → passing):** 1. test_comprehensive_scenarios.py (3 tests): - test_disk_full_simulation: Testing disk I/O errors - test_malformed_input_data: Testing malformed policy handling - test_unicode_and_encoding_handling: Testing Unicode support 2. test_workflow_creation.py (8 tests): - test_conflict_detection: Testing conflict detection logic - test_policy_integration: Testing policy block handling - test_very_long_title_handling: Testing title length limits - test_special_characters_in_title: Testing filename sanitization - test_semantic_similarity_detection: Testing similarity matching - test_incremental_id_generation: Testing ID generation - Second ADR in test_incremental_id_generation 3. test_mcp_workflow_integration.py (4 tests): - test_mcp_create_integration: Testing MCP request translation - test_mcp_approve_integration: Testing approval workflow - test_mcp_supersede_integration: Testing supersede workflow - test_end_to_end_workflow_chain: Testing analyze → create → approve **Why This Fix:** These tests are validating workflow mechanics (error handling, ID generation, MCP integration, etc.) not decision quality. The quality gate would block these tests from reaching the code paths they're designed to test. **Note:** sample_creation_input fixture already has high-quality content and passes the quality gate without skip_quality_gate flag. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixed issue where Consequences section was written empty due to circular content parsing in _generate_madr_content method. **Problem:** - _build_adr_structure built content with formatted sections - _generate_madr_content tried to parse those sections back using adr.context/adr.decision/adr.consequences properties - These properties use ParsedContent which re-parses the content - This circular parsing was failing, resulting in empty sections **Solution:** - Simplified _generate_madr_content to use adr.content directly - Removed redundant parsing and rebuilding logic - Content is now built once in _build_adr_structure and used as-is **Testing:** - All 187 tests passing (144 unit + 43 integration) - test_successful_adr_creation now passes with full content 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

README audit session 3 addressing accumulated drift from implementation: Accuracy fixes: - Corrected MCP Tools table layer assignments (5 tools had wrong layers) - Fixed FAQ layer references to use correct layer names - Terminology: "update ADR" → "supersede" (ADRs are immutable records), "creates ADR" → "proposes ADR" (adr_create proposes with status:proposed) - Added migration awareness to supersede examples - Updated policy schema to match current Pydantic models (architecture replaces boundaries, config_enforcement with typescript/python) Structural improvements: - Replaced ASCII flow diagram with Mermaid flowchart color-coded by layer - Restructured Quick Start: universal flow first, collapsible brownfield details via HTML details tag - Moved Current Capabilities next to What's Coming for coherence - Consolidated "Writing ADRs for Constraint Extraction" + "ADR Format" into single "How ADRs Get Their Policies" reflecting agent-driven two-step creation flow (quality gate + policy guidance) Dedup: - Removed 3 redundant sections (Example Complete Lifecycle, Example Conversations, Discovering Implicit Decisions) - Trimmed Layer 1 Deep Dive to unique content (supersede flow, quality gate) - Removed pattern-matching fallback documentation README reduced from 989 to ~790 lines.

MCP integration tests failed because CreateADRRequest and SupersedeADRRequest didn't expose skip_quality_gate, causing the quality gate to reject minimal test inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kschlt force-pushed the feat/creation-workflow-guidance branch from 487e3b5 to eb0a73d Compare March 22, 2026 20:49

kschlt changed the title ~~feat: decision quality guidance system for ADR creation~~ feat: creation workflow quality guidance system Mar 22, 2026

kschlt and others added 6 commits March 22, 2026 21:50

kschlt force-pushed the feat/creation-workflow-guidance branch from eb0a73d to 90824f9 Compare March 22, 2026 20:51

fix(mcp): expose skip_quality_gate in MCP request models

8f678ee

MCP integration tests failed because CreateADRRequest and SupersedeADRRequest didn't expose skip_quality_gate, causing the quality gate to reject minimal test inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kschlt merged commit 08ba749 into main Mar 22, 2026
13 checks passed

kschlt deleted the feat/creation-workflow-guidance branch March 22, 2026 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: creation workflow quality guidance system#7

feat: creation workflow quality guidance system#7
kschlt merged 7 commits intomainfrom
feat/creation-workflow-guidance

kschlt commented Mar 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kschlt commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kschlt commented Mar 22, 2026 •

edited

Loading