Skip to content

feat: creation workflow quality guidance system#7

Merged
kschlt merged 7 commits intomainfrom
feat/creation-workflow-guidance
Mar 22, 2026
Merged

feat: creation workflow quality guidance system#7
kschlt merged 7 commits intomainfrom
feat/creation-workflow-guidance

Conversation

@kschlt
Copy link
Copy Markdown
Owner

@kschlt kschlt commented Mar 22, 2026

Summary

  • Add comprehensive decision quality guidance system that coaches AI agents through creating well-reasoned ADRs
  • Refactor creation workflow to run quality gate before file creation, enabling a correction loop for better output
  • Simplify ADR content generation to prevent empty sections
  • Replace regex extraction with reasoning-agent promptlet for more robust workflow parsing
  • Audit and restructure README for accuracy and clarity

Changes

  • New module: decision_guidance.py — quality assessment framework with scoring and actionable feedback
  • Creation workflow: quality gate moved before file creation; simplified content generation; reasoning-agent promptlet replaces fragile regex
  • MCP models: new fields to support quality gate skip option
  • Tests: new unit/integration tests for decision guidance; streamlined creation workflow tests
  • README: comprehensive rewrite reflecting current capabilities

Test plan

  • make test-unit passes
  • make test-integration passes
  • make test-all passes
  • Manual: uv run adr-kit mcp-server starts successfully

🤖 Generated with Claude Code

@kschlt kschlt force-pushed the feat/creation-workflow-guidance branch from 487e3b5 to eb0a73d Compare March 22, 2026 20:49
@kschlt kschlt changed the title feat: decision quality guidance system for ADR creation feat: creation workflow quality guidance system Mar 22, 2026
kschlt and others added 6 commits March 22, 2026 21:50
…omptlet

Replaced ~308 lines of fragile regex-based policy extraction with a schema-driven
reasoning-agent promptlet architecture. This architectural improvement makes the
system more reliable and leverages agent reasoning capabilities instead of brittle
pattern matching.

Background:
Started by investigating hanging integration test (test_very_large_data_handling
with 600KB text causing regex backtracking). Investigation revealed that the regex
extraction approach was architecturally wrong - agents should reason about policies
from schema, not rely on pattern matching to extract them.

Architectural change:
- Removed 7 regex extraction methods (~308 lines total):
  * _suggest_policy_from_alternatives
  * _suggest_import_policies
  * _suggest_pattern_policies
  * _suggest_architecture_policies
  * _suggest_config_policies
  * _suggest_rationales
  * _normalize_library_name

- Replaced _generate_policy_guidance() with promptlet providing:
  * agent_task with role, objective, and 5 reasoning steps
  * policy_capabilities (full schema documentation)
  * example_workflow (concrete scenario showing decision → policy mapping)
  * guidance (dos/don'ts for constraint extraction)

This implements Task 2 (Policy Construction) of the two-step reasoning flow.
Task 1 (Decision Creation) guidance documented in new DEC backlog task.

Test updates:
- Removed 14 regex-based integration tests (TestPolicySuggestionLogic class)
- Removed 3 regex-based unit tests (TestPolicySuggestion class)
- Added 3 new promptlet-validation tests
- All 161 tests passing in 4.93s (was hanging before)
- Coverage increased from 21% to 49%

Performance:
- test_very_large_data_handling: passes in 0.40s (was hanging indefinitely)
- Full test suite: 4.93s (130 unit + 31 integration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement complete decision quality guidance to strengthen ADR creation
workflow. This establishes the foundation for high-quality architectural
decisions that enable effective policy extraction.

## What Changed

**New Decision Guidance Module** (`decision_guidance.py`):
- Comprehensive promptlet with ADR structure explanation
- Quality criteria (specific, actionable, complete, policy-ready, balanced)
- Good vs bad examples for database, frontend, and generic decisions
- Anti-patterns with fixes (vague, one-sided, missing context, etc.)
- Connection between Task 1 (decision quality) and Task 2 (policy extraction)
- Dos/don'ts and workflow guidance

**Enhanced Creation Workflow** (`creation.py`):
- New `_assess_decision_quality()` method with scoring system (0-100)
- Detects 6 quality dimensions:
  1. Specificity (generic terms vs specific tech names)
  2. Balance (pros AND cons documented)
  3. Context quality (explains WHY)
  4. Explicit constraints (for policy extraction)
  5. Alternatives documentation (enables disallow policies)
  6. Decision completeness
- Returns quality_feedback with:
  - Quality score and grade (A-F)
  - Issues found with severity and suggestions
  - Strengths recognized
  - Prioritized recommendations
  - Context-aware next steps
- Improved validation error messages with examples

**Enhanced MCP Tool** (`server.py`):
- Expanded `adr_create` docstring with inline guidance:
  - ADR structure (Context/Decision/Consequences/Alternatives)
  - Quality guidelines (be specific, document trade-offs, explain WHY)
  - Explicit constraint language examples
  - Response contents explanation

**Comprehensive Tests**:
- 14 unit tests for decision guidance module
- 12 integration tests for quality assessment
- All existing tests still pass (11 in test_workflow_creation.py)
- 80% code coverage for creation.py

## Why This Matters

**Foundation Enhancement**: Decision quality directly impacts:
- Policy extraction effectiveness (Task 2)
- Agent understanding of constraints
- Future decision reasoning
- Automated enforcement reliability

**Two-Step Creation Flow**:
1. Task 1 (NEW): Guide agents to write high-quality decisions
2. Task 2 (existing): Extract enforceable policies from decisions

Good Task 1 output makes Task 2 trivial. Example:
- Bad: "Use a modern framework"
- Good: "Use FastAPI. Don't use Flask or Django."
  → Enables: {'imports': {'disallow': ['flask', 'django']}}

**Agent Experience**:
- Before: Minimal guidance, vague validation errors
- After: Inline structure guide, quality scoring, actionable feedback

## Implementation Details

**Scoring System**:
- Start at 100, deduct for issues:
  - Vague terms: -15
  - One-sided consequences: -25
  - Weak context: -20
  - No explicit constraints: -15
  - Missing alternatives: -15
  - Too brief: -10
- Grades: A (90+), B (75+), C (60+), D (40+), F (<40)

**Quality Checks**:
- Pattern matching for generic terms
- Keyword detection for balance (pros AND cons)
- Regex for explicit constraints ("don't use", "must have")
- Length checks for completeness
- Alternatives presence validation

**Feedback Structure**:
{
  "quality_score": 85,
  "grade": "B",
  "issues": [{category, severity, issue, suggestion, example_fix}],
  "strengths": ["..."],
  "recommendations": ["..."],
  "next_steps": ["..."]
}

## Testing

All tests pass (26 new + 11 existing = 37 total):
- Unit tests verify guidance structure completeness
- Integration tests validate quality assessment accuracy
- Edge cases covered (vague, one-sided, missing context, etc.)
- Existing creation workflow tests unaffected

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…tion loop

BREAKING CHANGE: Quality gate now blocks ADR creation if score < 75

Previously, the quality assessment ran AFTER creating the ADR file, which led
to file pollution when agents needed to revise low-quality decisions. The new
flow enables a clean correction loop:

1. Agent submits ADR creation request
2. Quality gate runs deterministic checks (BEFORE file I/O)
3. If score < 75: Return REQUIRES_ACTION with feedback, no file created
4. Agent revises and resubmits (correction loop)
5. Only create ADR file when quality passes threshold

## Changes

### Core Workflow (`adr_kit/workflows/creation.py`)
- Add `_quick_quality_gate()` method for pre-validation quality checks
- Refactor `execute()` to run quality gate BEFORE `_generate_adr_id()`
- Return `WorkflowStatus.REQUIRES_ACTION` when quality < threshold
- Add `skip_quality_gate` parameter to `CreationInput` for test override

### Enum Extension (`adr_kit/workflows/base.py`)
- Add `WorkflowStatus.REQUIRES_ACTION` status for quality gate failures

### Test Updates
- Update test_decision_quality_assessment.py: expect `success=False` + `REQUIRES_ACTION`
- Add `skip_quality_gate=True` to test fixtures that use minimal inputs
- Improve `sample_creation_input` fixture to be high-quality (pass gate)

## Quality Threshold
- B grade (75/100) minimum required
- Scoring: Specificity (15), Balance (25), Context (20), Constraints (15), Alternatives (15), Completeness (10)

## Backward Compatibility
- Tests can set `skip_quality_gate=True` to bypass validation
- Quality gate skipped returns placeholder feedback structure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added skip_quality_gate=True parameter to integration tests that use
minimal ADR content for testing error scenarios, edge cases, and
workflow integration rather than testing decision quality.

**Tests Fixed (14 failing → passing):**

1. test_comprehensive_scenarios.py (3 tests):
   - test_disk_full_simulation: Testing disk I/O errors
   - test_malformed_input_data: Testing malformed policy handling
   - test_unicode_and_encoding_handling: Testing Unicode support

2. test_workflow_creation.py (8 tests):
   - test_conflict_detection: Testing conflict detection logic
   - test_policy_integration: Testing policy block handling
   - test_very_long_title_handling: Testing title length limits
   - test_special_characters_in_title: Testing filename sanitization
   - test_semantic_similarity_detection: Testing similarity matching
   - test_incremental_id_generation: Testing ID generation
   - Second ADR in test_incremental_id_generation

3. test_mcp_workflow_integration.py (4 tests):
   - test_mcp_create_integration: Testing MCP request translation
   - test_mcp_approve_integration: Testing approval workflow
   - test_mcp_supersede_integration: Testing supersede workflow
   - test_end_to_end_workflow_chain: Testing analyze → create → approve

**Why This Fix:**

These tests are validating workflow mechanics (error handling, ID
generation, MCP integration, etc.) not decision quality. The quality
gate would block these tests from reaching the code paths they're
designed to test.

**Note:** sample_creation_input fixture already has high-quality
content and passes the quality gate without skip_quality_gate flag.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed issue where Consequences section was written empty due to
circular content parsing in _generate_madr_content method.

**Problem:**
- _build_adr_structure built content with formatted sections
- _generate_madr_content tried to parse those sections back using
  adr.context/adr.decision/adr.consequences properties
- These properties use ParsedContent which re-parses the content
- This circular parsing was failing, resulting in empty sections

**Solution:**
- Simplified _generate_madr_content to use adr.content directly
- Removed redundant parsing and rebuilding logic
- Content is now built once in _build_adr_structure and used as-is

**Testing:**
- All 187 tests passing (144 unit + 43 integration)
- test_successful_adr_creation now passes with full content

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
README audit session 3 addressing accumulated drift from implementation:

Accuracy fixes:
- Corrected MCP Tools table layer assignments (5 tools had wrong layers)
- Fixed FAQ layer references to use correct layer names
- Terminology: "update ADR" → "supersede" (ADRs are immutable records),
  "creates ADR" → "proposes ADR" (adr_create proposes with status:proposed)
- Added migration awareness to supersede examples
- Updated policy schema to match current Pydantic models (architecture
  replaces boundaries, config_enforcement with typescript/python)

Structural improvements:
- Replaced ASCII flow diagram with Mermaid flowchart color-coded by layer
- Restructured Quick Start: universal flow first, collapsible brownfield
  details via HTML details tag
- Moved Current Capabilities next to What's Coming for coherence
- Consolidated "Writing ADRs for Constraint Extraction" + "ADR Format"
  into single "How ADRs Get Their Policies" reflecting agent-driven
  two-step creation flow (quality gate + policy guidance)

Dedup:
- Removed 3 redundant sections (Example Complete Lifecycle, Example
  Conversations, Discovering Implicit Decisions)
- Trimmed Layer 1 Deep Dive to unique content (supersede flow, quality gate)
- Removed pattern-matching fallback documentation

README reduced from 989 to ~790 lines.
@kschlt kschlt force-pushed the feat/creation-workflow-guidance branch from eb0a73d to 90824f9 Compare March 22, 2026 20:51
MCP integration tests failed because CreateADRRequest and
SupersedeADRRequest didn't expose skip_quality_gate, causing
the quality gate to reject minimal test inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kschlt kschlt merged commit 08ba749 into main Mar 22, 2026
13 checks passed
@kschlt kschlt deleted the feat/creation-workflow-guidance branch March 22, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant