diff --git a/README.md b/README.md index aab5119..35d1b8b 100644 --- a/README.md +++ b/README.md @@ -1,28 +1,131 @@ # ADR Kit -Document architectural decisions. Enforce them automatically. +Keep AI agents architecturally consistent. [![Python Version](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -[![Tests](https://img.shields.io/badge/tests-comprehensive-green)](#testing--validation) +[![Tests](https://img.shields.io/badge/tests-comprehensive-green)](#reliability--testing) > **👥 For users:** Install ADR Kit in your project → [Quick Start](#quick-start) > **🔧 For contributors:** Develop on ADR Kit itself → [Development](#development) -## The Problem +## The Concept of Architectural Decision Records -Your team decides "use React Query for all data fetching." Three months later, new code uses axios and fetch(). No one remembers the decision or why it was made. Code reviews catch some violations, but not all. +**The core idea**: When multiple people work on the same project, they need to align on significant technical choices. When the team agrees on a decision—React over Vue, PostgreSQL over MongoDB, microservices over monolith—they document it as an Architectural Decision Record: -## The Solution +- **Context**: Why was this decision needed? +- **Decision**: What did we choose? +- **Consequences**: What are the trade-offs? +- **Alternatives**: What did we reject and why? -ADR Kit makes architectural decisions enforceable: +**The team alignment mechanism**: Once a decision is recorded, everyone must either follow it or propose a new decision to be discussed with the team. ADRs also maintain a track record of how decisions evolved—when they were superseded, why they changed. -1. **Document decisions** in standard MADR format -2. **Extract policies** from decision documents automatically -3. **Generate lint rules** that prevent violations -4. **Integrate with AI agents** for autonomous ADR management +**How this works in practice**: New team members read the ADRs once during onboarding to understand what was agreed upon. Existing members don't read them every morning—they just know the decisions exist and can check back when needed because they've been working in the codebase for a while. -**Example**: An ADR that says `disallow: [axios]` becomes an ESLint rule that blocks axios imports with a message pointing to your ADR. +**The problem this concept solves**: Without written agreements, architectural consistency erodes. Team members make conflicting choices because they don't know what was already decided. + +**The benefit**: ADRs create explicit team agreements with historical context that survive beyond individual memory and onboard new members effectively. + +This concept has proven valuable for human teams. Now we're in the era of AI-driven development, where the problem is similar but fundamentally different in how it needs to be solved. + +## The AI-Driven Development Challenge + +**The similar problem**: Like human teams, we need architectural consistency across work sessions. Each new chat with your AI agent (Cursor, Claude Code, Copilot) is like onboarding a fresh team member with blank context. + +**The key difference from human teams**: + +In a **human team**, new members: +- Read all ADRs once during onboarding +- Remember they exist from working in the codebase +- Check back when needed because they have context + +In **AI-driven development**, each chat: +- Starts with blank context (sometimes project info is preloaded, but generally it's a fresh start) +- Cannot read all ADRs like a human—that would waste valuable context window +- Would be "tired by the end of the context window" before even starting to solve the actual problem +- Needs only the relevant decisions for the current challenge, at the right point in time + +**The second problem: ADRs don't exist yet**: Most projects don't have Architectural Decision Records at all. They were never created because manual ADR maintenance was too much overhead for human teams. + +**Result without selective context loading**: +- **Monday**: "Use React Query" → Implements with React Query +- **Wednesday**: New chat, new context → Uses axios (no memory of Monday) +- **Friday**: Different conversation → Uses fetch() (different approach again) + +The ADR concept solves the alignment problem, but AI-driven development requires a different approach: decisions must be created automatically, maintained as they evolve, and loaded selectively into context only when needed for the current challenge. + +## ADR Kit: Introducing ADRs for AI-Driven Development + +ADR Kit brings the proven ADR concept to AI-driven development, adapting how it works to solve the challenges we just outlined: + +Instead of humans reading all ADRs once during onboarding, ADR Kit **selectively loads only relevant ADRs** into AI context at decision time. Instead of humans maintaining ADRs manually, **AI agents create and maintain them automatically**. The concept stays the same—session alignment on architectural decisions—but the implementation changes for the reality of AI agents. + +**Works for both existing and new projects**: Whether you're working with an existing codebase (brownfield) or starting fresh (greenfield), ADR Kit adapts to your situation. Brownfield projects need initial discovery of implicit decisions and may require migration work for conflicts. Greenfield projects can create ADRs from the start, avoiding drift entirely. Both use the same three-layer mechanism. + +Here's how ADR Kit solves three interconnected problems: + +**1. ADR Lifecycle Management** - During every chat session, AI detects when a decision has architectural relevance. It checks: "Do we have an ADR for this?" If not, it proposes creating one. If yes, it loads the context to apply it—or challenges whether the decision needs to evolve. The AI manages the full lifecycle: create new ADRs, supersede outdated ones, and maintain them as your architecture evolves. This happens continuously, not just during initial setup. + +**2. Context at the Right Time** - No matter how good the AI model, it needs the right context at the right time to make decisions. ADR Kit surfaces relevant ADRs automatically when needed—before the AI reasons about solutions. Not all ADRs dumped into context, only the relevant ones, only when needed. This is fundamental to AI-driven development. + +**3. Enforcement with Feedback Loop** - Even when AI receives context, it can make mistakes or ignore constraints. Automated enforcement (linting, CI checks) catches violations and provides direct feedback explaining why it violates the agreed-upon decision. This triggers management: either fix the code to comply, or supersede the ADR with a new decision if it needs to evolve. The whole mechanism—create, maintain, enforce, feedback—is baked into ADR Kit. + +### How This Works: Three-Layer Approach + +ADR Kit introduces ADRs to your project with three active layers: + +**Layer 1: ADR Lifecycle Management** - Continuous detection and management during chat sessions +- AI detects architectural relevance: "This decision seems architecturally significant" +- Checks: "Do we have an ADR for this already?" +- If not: Proposes creating new ADR +- If yes: Loads context to apply it, or challenges if decision needs to evolve +- Manages full lifecycle: Create, supersede outdated decisions, maintain as architecture evolves +- Quality gate: Reject vague decisions ("use a modern framework") that can't be enforced +- Works continuously: Not just initial setup, but every chat session where architectural decisions emerge + +**Layer 2: Context at the Right Time** - Surface relevant information when decisions are being made +- Task: "Implement authentication" → Automatically loads ADR-0005 (Auth0), ADR-0008 (JWT structure) +- Filters by relevance: Only 3-5 relevant ADRs, never all 50—don't blow the context window +- Timing: Before AI reasons about solutions, while it's making decisions +- Automation: ADR Kit ensures relevant context is always surfaced when needed, regardless of which AI agent you use +- Goal: Right information, right time, every time—no matter how good the model is + +**Layer 3: Enforcement with Feedback Loop** - Catch violations and trigger management +- Approved ADRs → ESLint/Ruff rules automatically generated +- Developer (or AI) violates constraint → Linter blocks with ADR reference and explanation +- Feedback triggers decision: Fix code to comply, or supersede the ADR if decision needs to evolve +- Works independent of whether context was loaded or AI made a mistake +- Completes the cycle: violations feed back into management (maintain or supersede) + +### Integration into AI-Driven Development Cycle + +ADR Kit integrates into your AI development workflow at critical decision points: + +```mermaid +graph TD + Request[Feature request] --> Context[adr_planning_context] + Context -->|Loads relevant ADRs| Choice[AI proposes technical choice] + Choice --> Preflight[adr_preflight] + Preflight -->|Decision exists| Implement[AI implements feature] + Preflight -->|New decision needed| Create[adr_create] + Create -->|Proposes ADR| Review[You review proposed ADR] + Review --> Approve[adr_approve] + Approve -->|Generates enforcement rules| Implement + Implement --> Linter[Linter runs] + Linter -->|No violations| Done[Done] + Linter -->|Violation detected| Feedback{Fix or evolve?} + Feedback -->|Fix code| Implement + Feedback -->|Supersede ADR| Create + + style Context fill:#90EE90 + style Preflight fill:#90EE90 + style Create fill:#90EE90 + style Approve fill:#FFE4B5 + style Linter fill:#FFE4B5 + style Feedback fill:#FFB6C1 +``` + +**The complete cycle**: Violations aren't dead ends—they feed back into ADR management. Either fix the code to comply, or supersede the decision with a new ADR. The whole mechanism—create, serve context, enforce, provide feedback—is baked into one system. ## Quick Start @@ -32,350 +135,250 @@ ADR Kit makes architectural decisions enforceable: uv tool install adr-kit ``` -### Initialize Your Project +### Setup Your Project ```bash cd your-project -adr-kit init +adr-kit init # Creates docs/adr/ directory +adr-kit setup-cursor # or setup-claude for Claude Code ``` -### Choose Your Path +This connects ADR Kit to your AI agent via MCP (Model Context Protocol). -**Path A: Greenfield (New Project)** -Start creating ADRs as you make architectural decisions. +### Start Using It -```bash -# Setup AI agent to help you -adr-kit setup-cursor # or setup-claude +Once set up, ADR Kit works in every chat session — whether your project is new or established: -# Then in Cursor/Claude Code: -# "Create an ADR for our decision to use React with TypeScript" ``` - -**Path B: Brownfield (Existing Project)** -Analyze existing codebase to discover decisions already made. - -```bash -# Setup AI agent -adr-kit setup-cursor # or setup-claude - -# Then in Cursor/Claude Code: -# "Analyze my project for architectural decisions that need ADRs" -# AI will detect your tech stack and propose ADRs +You: "Let's use FastAPI for the backend API" +AI: [Calls adr_preflight({choice: "fastapi"})] +AI: "No existing ADR. I'll propose one for FastAPI." +AI: [Calls adr_create()] → Proposes ADR-0001 (status: proposed) +AI: "Here's the proposed ADR-0001. Review it?" +You: "Looks good, approve it" +AI: [Calls adr_approve()] → Enforcement now active ``` -## How It Works +The AI detects architectural decisions as you work, proposes ADRs, and you review and approve. It also catches decisions made implicitly — like noticing you're importing React everywhere and suggesting to document it. + +
+Existing codebase? Start with project analysis -### Design Pattern: MCP Tools → Workflows → Internal Functions +If you have an existing project, let the AI discover the architectural decisions already baked into your code: ``` -AI Agent - ↓ -MCP Tool (6 simple interfaces) - ↓ -Workflow (multi-step automation) - ↓ -Internal Functions (automatic) - ├─ generate_adr_index() - ├─ generate_eslint_config() - ├─ generate_ruff_config() - ├─ apply_guardrails() - └─ rebuild_contract() +You: "Analyze my project for architectural decisions" +AI: [Calls adr_analyze_project()] +AI: "If I'm reading this correctly, you're using React, TypeScript, PostgreSQL, Docker" +AI: "I also found a potential conflict: PostgreSQL is used in 80% of the code, + but MySQL appears in the legacy module. Should I document both or propose + migrating to consistent PostgreSQL usage?" ``` -**Why this matters**: When you approve an ADR, one MCP tool call triggers a 9-step automation pipeline. You don't manually generate indexes or lint rules—it all happens automatically. +Review the proposed ADRs, approve the ones that accurately reflect your decisions: -### The 6 MCP Tools +``` +You: "Approve ADR-0001 through ADR-0003. For the database conflict, let's + document PostgreSQL as the standard and create a migration plan for the + legacy module." +AI: [Calls adr_approve() for each, notes the migration decision] +``` -ADR Kit exposes 6 MCP tools for AI agents. Each tool triggers a comprehensive workflow: +Now your implicit decisions are explicit, documented, and enforced. From this point forward, the workflow is the same as above — AI references these ADRs when implementing features and ADR Kit blocks violations. -| MCP Tool | When To Use | What It Does | -|----------|-------------|--------------| -| `adr_analyze_project` | Starting with existing codebase | Detects technologies, proposes ADRs for existing decisions | -| `adr_preflight` | Before making technical choice | Returns ALLOWED/REQUIRES_ADR/BLOCKED | -| `adr_create` | Documenting a decision | Creates ADR file with conflict detection | -| `adr_approve` | After human review | **Triggers full automation**: contract rebuild, lint rules, guardrails, indexes | -| `adr_supersede` | Replacing existing decision | Manages relationships, updates old ADR to superseded | -| `adr_planning_context` | Before implementing feature | Returns relevant ADRs, constraints, technology recommendations | +
-### The Approval Automation Pipeline +## How It Works: Technical Deep Dive -When you approve an ADR, this happens automatically: +This section explains the mechanics of each layer with concrete examples. -```mermaid -graph LR - Approve[adr_approve] --> Validate[Validate ADR] - Validate --> Status[Update status to accepted] - Status --> Contract[Rebuild constraints contract] - Contract --> Guardrails[Apply guardrails] - Guardrails --> ESLint[Generate ESLint rules] - Guardrails --> Ruff[Generate Ruff rules] - ESLint --> Index[Update indexes] - Ruff --> Index - Index --> Done[Enforcement active] +### Layer 1: ADR Lifecycle Management (Continuous Detection & Evolution) - style Contract fill:#90EE90 - style Guardrails fill:#90EE90 - style ESLint fill:#FFE4B5 - style Ruff fill:#FFE4B5 - style Index fill:#FFE4B5 -``` +**The Problem**: Architectural decisions emerge during development, but without active management they're never documented or maintained as they evolve. -**No manual steps required**. Index generation, lint rule creation, and config updates all happen automatically. +**The Solution**: AI continuously detects architectural relevance during chat sessions → Checks for existing ADRs → Proposes creating them. The creation flow is covered in [Quick Start](#quick-start) above. This section focuses on what happens after: evolution and quality control. -## Writing ADRs for Constraint Extraction +**When a decision needs to evolve** (supersede, never update): -ADR Kit can automatically extract constraints from your ADRs to provide architectural guidance to AI agents and developers. To enable this feature, your ADRs should include policy information in one of two formats: +ADRs are immutable records. When a decision needs to change, you supersede it with a new ADR — the old one stays as historical record. -### Approach 1: Structured Policy (Recommended) +``` +You: "Actually, we need axios for this specific external API integration" +AI: [Detects conflict with ADR-0003: Use React Query] +AI: "This conflicts with ADR-0003. ADRs are immutable records, so we have two options: + A) Find a React Query solution that works here + B) Supersede ADR-0003 with a new decision that allows axios for external APIs" +You: "Option B" +AI: [Calls adr_supersede()] → Proposes ADR-0015, marks ADR-0003 as superseded +AI: "ADR-0015 allows axios for external API integrations while keeping React Query + as default. Note: 3 existing files use React Query for external calls — + these would need to migrate to axios for consistency with the new decision." +``` -Include a `policy` block in your ADR front-matter for reliable, machine-readable constraints: +Superseding always surfaces migration implications — the AI identifies what existing code needs to change to align with the new decision. -```yaml ---- -id: "ADR-0002" -title: "Use FastAPI as Web Framework" -status: proposed -policy: - imports: - disallow: ["flask", "django", "litestar"] - prefer: ["fastapi"] - python: - disallow_imports: ["flask", "django"] - rationales: - - "FastAPI provides native async support required for I/O operations" - - "Automatic OpenAPI documentation reduces maintenance burden" ---- -``` +**Quality Gate**: ADR Kit ensures decisions are specific enough to enforce: +- ❌ "Use a modern framework" → Rejected (too vague) +- ✅ "Use React 18. Don't use Vue or Angular." → Accepted (specific, has constraints) -**Benefits**: -- ✅ Reliable, explicit constraint extraction -- ✅ Machine-readable policies for automation -- ✅ Clear separation of concerns -- ✅ Works perfectly with `adr_planning_context` and `adr_preflight` +**Continuous operation**: This layer works throughout development, not just during initial setup. Every chat session where architectural decisions emerge triggers this detection and management cycle. -**Policy Schema**: -```typescript -{ - imports: { - disallow: string[], // Libraries/packages to ban - prefer: string[] // Recommended alternatives - }, - python: { - disallow_imports: string[] // Python-specific module bans - }, - boundaries: { - rules: [{ forbid: string }] // Architecture rules like "ui -> database" - }, - rationales: string[] // Reasons for the policies -} -``` +### Layer 2: Context Loading (Right Information, Right Time) -### Approach 2: Pattern-Matching Language +**The Problem**: AI can't remember decisions from previous conversations. -Use specific phrases in your decision and consequences sections: +**The Solution**: Before implementing features, AI asks "what architectural constraints apply here?" -```markdown -## Decision +``` +Task: "Implement user authentication" +Domain: backend, security -Use FastAPI as the backend web framework. **Don't use Flask** or Django -as they lack native async support. **Prefer FastAPI over Flask** for -this use case. +ADR Kit returns: +✅ ADR-0005: Use Auth0 for Authentication + - Constraint: Don't implement custom auth + - Warning: ⚠️ Rate limiting required on auth endpoints -## Consequences +✅ ADR-0008: JWT Token Structure + - Constraint: Access tokens expire in 1 hour + - Constraint: Refresh tokens stored in httpOnly cookies -**Avoid** synchronous frameworks like Flask. Backend **should not use** -Django REST Framework for new services. +Filtered out (not relevant): +❌ ADR-0001: React Query (frontend) +❌ ADR-0012: CSS-in-JS (styling) ``` -**Recognized patterns**: -- `Don't use X` / `Avoid X` / `X is deprecated` -- `Use Y instead of X` / `Prefer Y over X` -- `Layer A should not access Layer B` - -**Benefits**: -- ✅ Natural language - reads well in documentation -- ✅ Works with existing ADRs without modification -- ✅ No schema knowledge needed +**How**: `adr_planning_context` tool filters ADRs by task relevance, surfaces warnings from ADR consequences, returns only 3-5 relevant decisions instead of flooding context with all 50 ADRs. -**Limitations**: -- ⚠️ Less reliable than structured policy -- ⚠️ May miss nuanced constraints -- ⚠️ Harder to validate programmatically +### Layer 3: Enforcement with Feedback Loop -### When to Use Which Approach +**The Problem**: Even with documentation, AI can make mistakes or forget context. -| Scenario | Recommendation | -|----------|----------------| -| **New ADRs** (AI agents) | Structured policy - most reliable | -| **Existing ADRs** (manual) | Pattern language - easier to retrofit | -| **Critical constraints** | Structured policy - no ambiguity | -| **Documentation focus** | Pattern language - more readable | -| **Best of both** | Combine both approaches | +**The Solution**: Approved ADRs become automated lint rules that provide feedback. -### Validation and Feedback +```yaml +# ADR-0003: Use React Query for data fetching +policy: + imports: + prefer: [react-query, @tanstack/react-query] + disallow: [axios] +``` -When creating ADRs through `adr_create`, ADR Kit validates policy completeness: +Automatically generates: ```json +// .eslintrc.adrs.json (auto-generated, don't edit) { - "validation_warnings": [ - "No structured policy provided and no pattern-matching language detected in content. - Constraint extraction may not work. Consider adding a 'policy' block or using - phrases like 'Don't use X' in your decision text.", - "Suggested policy structure: { - \"imports\": { - \"disallow\": [\"flask\"], - \"prefer\": [\"fastapi\"] + "rules": { + "no-restricted-imports": [ + "error", + { + "paths": [{ + "name": "axios", + "message": "Use React Query instead (ADR-0003)" + }] } - }" - ] + ] + } } ``` -If you receive this warning: -1. **Option A**: Add a structured `policy` block to your ADR front-matter -2. **Option B**: Update your decision/consequences text with pattern-friendly language -3. **Option C**: Accept that this ADR won't provide automated constraints (documentation only) - -### Example ADRs +**Enforcement with Feedback**: +```javascript +import axios from 'axios'; // ❌ ESLint error: Use React Query instead (ADR-0003) +``` -See [tests/fixtures/examples/](tests/fixtures/examples/) for complete examples: -- `good-adr-with-structured-policy.md` - FastAPI ADR with full policy schema -- `good-adr-with-pattern-language.md` - React Query ADR using patterns -- `bad-adr-no-policy.md` - PostgreSQL ADR without constraints (triggers warnings) +When this violation is caught, it triggers a decision: +- **Fix the code**: Change to React Query (maintain consistency) +- **Supersede the ADR**: If the decision needs to evolve — create a new ADR that replaces the old one (e.g., "Allow axios for external API integrations"), then migrate existing code to match the new decision -### Checking Constraint Extraction +The feedback loop is complete: violations aren't dead ends—they feed back into ADR management. Either fix to comply, or supersede the architectural decision. ADRs are immutable records — they're never edited, only superseded by new decisions. -Verify your ADRs have extractable constraints: +**Current Support**: ESLint (JavaScript/TypeScript), Ruff (Python) +**Future**: More linters, runtime checks, CI gates -```python -# Via Python API -from adr_kit.core.policy_extractor import PolicyExtractor -from adr_kit.core.parse import parse_adr_file +## The 6 MCP Tools -adr = parse_adr_file("docs/adr/ADR-0001-fastapi.md") -extractor = PolicyExtractor() +AI agents interact with ADR Kit through 6 tools. Each tool implements one part of the three-layer system: -if extractor.has_extractable_policy(adr): - policy = extractor.extract_policy(adr) - print(f"Disallowed: {policy.get_disallowed_imports()}") - print(f"Preferred: {policy.get_preferred_imports()}") -else: - print("⚠️ No extractable constraints found") -``` +| Tool | Layer | When AI Uses It | What It Does | +|------|-------|-----------------|--------------| +| `adr_analyze_project` | Layer 1 | Starting with existing codebase | Detects tech stack, proposes ADRs for existing decisions | +| `adr_preflight` | Layer 1 | Before making technical choice | Returns ALLOWED/REQUIRES_ADR/BLOCKED based on existing decisions | +| `adr_create` | Layer 1 | Documenting a decision | Proposes ADR file (status: proposed) with quality validation and conflict detection | +| `adr_approve` | Layer 3 | After human review | Activates enforcement: generates lint rules, updates indexes | +| `adr_supersede` | Layer 1 | Replacing existing decision | Creates new ADR, marks old one as superseded | +| `adr_planning_context` | Layer 2 | Before implementing feature | Returns relevant ADRs filtered by task + domain, includes warnings | -```bash -# Via MCP (AI agents use this automatically) -# The agent calls: adr_planning_context({ -# task_description: "Implement new API endpoint", -# domain_hints: ["backend"] -# }) -# -# Response includes extracted constraints: -# { -# "constraints": [ -# "Use fastapi (ADR-0001)", -# "Don't use flask (ADR-0001)" -# ] -# } -``` +### The Approval Automation Pipeline -## Example: Complete Lifecycle +When you approve an ADR, this happens automatically: -### Greenfield (New Project) +```mermaid +graph LR + Approve[adr_approve] --> Validate[Validate ADR] + Validate --> Status[Update status to accepted] + Status --> Contract[Rebuild constraints contract] + Contract --> Guardrails[Apply guardrails] + Guardrails --> ESLint[Generate ESLint rules] + Guardrails --> Ruff[Generate Ruff rules] + ESLint --> Index[Update indexes] + Ruff --> Index + Index --> Done[Enforcement active] -```bash -# 1. Initialize -adr-kit init -adr-kit setup-cursor + style Contract fill:#90EE90 + style Guardrails fill:#90EE90 + style ESLint fill:#FFE4B5 + style Ruff fill:#FFE4B5 + style Index fill:#FFE4B5 +``` -# 2. Make a decision -# In Cursor: "I want to use React Query for data fetching" +**No manual steps required**. Index generation, lint rule creation, and config updates all happen automatically. -# 3. AI calls adr_preflight({choice: "react-query"}) -# Returns: REQUIRES_ADR (no existing ADR for this) +### Decision Quality Assistance -# 4. AI calls adr_create({title: "Use React Query", ...}) -# Creates: docs/adr/ADR-0001-react-query.md (status: proposed) +ADR Kit helps you write better architectural decisions by providing guidance **before** creating ADR files: -# 5. You review the ADR -# In Cursor: "Approve ADR-0001" +**The Problem**: Vague decisions like "use a modern framework" can't be enforced. Without specificity and explicit constraints, your ADRs become documentation-only. -# 6. AI calls adr_approve({adr_id: "ADR-0001"}) -# Automatically: -# - Updates status to "accepted" -# - Rebuilds constraints contract -# - Generates ESLint rules blocking axios -# - Applies guardrails to .eslintrc.adrs.json -# - Updates adr-index.json +**The Solution**: ADR Kit evaluates decision quality and provides specific feedback: -# 7. Enforcement active -# Developer tries: import axios from 'axios' -# ESLint error: "Use React Query instead (ADR-0001)" ``` +❌ Vague: "Use a modern framework with good performance" + → Feedback: "Specify exact framework name and version" -### Brownfield (Existing Project) - -```bash -# 1. Initialize -cd existing-project -adr-kit init -adr-kit setup-cursor - -# 2. Analyze existing architecture -# In Cursor: "Analyze my project for architectural decisions" - -# 3. AI calls adr_analyze_project() -# Detects: React, TypeScript, Express, PostgreSQL, Docker -# Generates tech-specific analysis prompts - -# 4. AI creates ADRs for existing decisions -# ADR-0001: Use React for Frontend -# ADR-0002: Use TypeScript for Type Safety -# ADR-0003: Use PostgreSQL for Data Storage -# ADR-0004: Use Docker for Containerization +✅ Specific: "Use React 18 with TypeScript 5.0" +``` -# 5. You review all proposed ADRs -# In Cursor: "Approve ADR-0001 through ADR-0004" +**What Gets Evaluated**: +- **Specificity**: Are technologies and versions named concretely? +- **Trade-offs**: Are both pros AND cons documented? +- **Context**: Why is this decision needed right now? +- **Constraints**: Are there explicit "don't use X" policies? +- **Alternatives**: What options were rejected and why? -# 6. AI approves each one (4 automation pipelines run) -# Now your existing decisions are documented AND enforced +**User Experience**: +1. AI agent drafts an ADR based on your requirements +2. ADR Kit provides quality feedback with specific suggestions +3. Agent revises and improves the decision +4. Once quality passes, the ADR file is created -# 7. Future decisions follow greenfield workflow -``` +**How This Helps**: -## Brownfield Analysis Details +Weak ADRs can't be enforced. "Use a modern framework" doesn't tell a linter what to block. ADR Kit's quality feedback pushes toward specific, actionable decisions that can be translated into automated policies. -ADR Kit includes sophisticated technology detection for brownfield projects: +## How ADRs Get Their Policies -**Detected Technologies** (20+): -- **Frontend**: React, Vue, Angular, Svelte -- **Backend**: Express.js, FastAPI, Django, Flask, Spring -- **Databases**: PostgreSQL, MySQL, MongoDB, Redis -- **Languages**: TypeScript, JavaScript, Python, Rust, Go -- **Tools**: Docker, Kubernetes +ADR Kit uses a **two-step creation flow**. You don't write ADRs or policies manually — the AI agent handles both, guided by ADR Kit: -**How It Works**: -```python -# Scans for config files -package.json → Detects React, TypeScript, Express -requirements.txt → Detects FastAPI, PostgreSQL -Dockerfile → Detects Docker +**Step 1 — Decision Quality**: When the agent calls `adr_create`, ADR Kit evaluates the decision for specificity, trade-offs, and enforceability. Vague decisions ("use a modern framework") are rejected with actionable feedback before any file is created. -# Generates technology-specific prompts -"React: Document component architecture, state management decisions" -"PostgreSQL: Document schema design, migration strategy" -"Docker: Document containerization approach" -``` +**Step 2 — Policy Construction**: ADR Kit returns a `policy_guidance` promptlet that walks the agent through mapping the decision text into structured policies. The agent reasons about what constraints to extract and constructs the policy block. -**Different prompts based on existing ADRs**: -- **0 ADRs found** → "Identify ALL architectural decisions in this project" -- **N ADRs found** → "Find decisions MISSING from existing ADR set" +The result is an ADR with both human-readable documentation and machine-readable enforcement policies — written by the AI, reviewed by you. -## ADR Format with Structured Policies +### ADR Format -ADRs use MADR format with policy extensions for enforcement: +ADRs use [MADR format](https://adr.github.io/madr/) with a `policy` block in the front-matter for enforcement: ```markdown --- @@ -397,7 +400,7 @@ policy: Custom data fetching is scattered across components... ## Decision -Use React Query for all data fetching. +Use React Query for all data fetching. Don't use axios directly. ## Consequences ### Positive @@ -406,27 +409,84 @@ Use React Query for all data fetching. - Additional dependency, learning curve ``` -**After approval, this automatically generates**: +After approval, this automatically generates lint rules: ```json // .eslintrc.adrs.json (auto-generated, don't edit) { "rules": { - "no-restricted-imports": [ - "error", - { - "paths": [ - { - "name": "axios", - "message": "Use React Query instead (ADR-0001)" - } - ] - } - ] + "no-restricted-imports": ["error", { + "paths": [{ + "name": "axios", + "message": "Use React Query instead (ADR-0001)" + }] + }] } } ``` +### Policy Types + +The `policy` block supports five types of enforcement: + +| Type | What It Enforces | Example | +|------|-----------------|---------| +| `imports` | Library restrictions | `disallow: [axios]`, `prefer: [react-query]` | +| `python` | Python-specific imports | `disallow_imports: [flask, django]` | +| `patterns` | Code pattern rules | Named rules with regex or structured queries, severity levels | +| `architecture` | Layer boundaries + required structure | `rule: "ui -> database"`, `action: block` | +| `config_enforcement` | Tool configuration | TypeScript tsconfig settings, Python ruff/mypy settings | +| `rationales` | Reasons for policies | `["Native async support required"]` | + +
+Full policy schema reference + +```yaml +policy: + imports: + disallow: [string] # Libraries/packages to ban + prefer: [string] # Recommended alternatives + + python: + disallow_imports: [string] # Python-specific module bans + + patterns: + patterns: + rule_name: # Named pattern rules (dict) + description: string # Human-readable description + language: string # python, typescript, etc. + rule: string | object # Regex string or structured query + severity: error | warning | info + autofix: boolean # Whether autofix is available + + architecture: + layer_boundaries: # Access control between layers + - rule: string # Format: "layer -> layer" (e.g., "ui -> database") + check: string # Path pattern to check (glob) + action: block | warn # How to enforce + message: string # Custom error message + required_structure: # Required files/directories + - path: string # Required path (glob supported) + description: string # Why this is required + + config_enforcement: + typescript: + tsconfig: object # Required tsconfig.json settings + python: + ruff: object # Required Ruff configuration + mypy: object # Required mypy configuration + + rationales: [string] # Reasons for the policies +``` + +
+ +### Example ADRs + +See [tests/fixtures/examples/](tests/fixtures/examples/) for complete examples: +- `good-adr-with-structured-policy.md` — Full policy block in front-matter +- `bad-adr-no-policy.md` — No enforceable constraints (triggers quality gate) + ## AI Agent Integration ### Setup for Cursor IDE @@ -457,35 +517,6 @@ adr-kit setup-claude Or manually add to Claude Code MCP settings. -### Example Conversations - -**Scenario 1: Before Making a Decision** -``` -You: "I want to use PostgreSQL for this project" -AI: [calls adr_preflight({choice: "postgresql"})] -AI: "This requires an ADR. Let me help you create one..." -AI: [calls adr_create()] -AI: "Created ADR-0003 for PostgreSQL. Review and approve?" -``` - -**Scenario 2: Analyzing Existing Project** -``` -You: "What architectural decisions should we document?" -AI: [calls adr_analyze_project()] -AI: "I found: React, TypeScript, Next.js, PostgreSQL" -AI: "Creating ADRs for each decision..." -AI: [creates 4 ADRs] -AI: "Review these proposed ADRs?" -``` - -**Scenario 3: Getting Context for Task** -``` -You: "I need to implement user authentication" -AI: [calls adr_planning_context({task: "implement authentication"})] -AI: "Based on ADR-0005, you must use Auth0" -AI: "Here's the authentication pattern to follow..." -``` - ## Manual CLI Usage (Without AI) For direct usage without AI agents: @@ -636,50 +667,77 @@ Expected output: - ✅ Workflow backend system: OK - 📡 6 MCP Tools available -### Test Coverage +### Reliability & Testing -- **Unit Tests**: Individual workflow components -- **Integration Tests**: MCP server ↔ workflow integration -- **End-to-End Tests**: Complete lifecycle (analyze → create → approve) -- **Error Scenarios**: Permission errors, malformed input -- **Performance Tests**: Large projects, memory efficiency +ADR Kit is thoroughly tested across multiple scenarios: -Each workflow is testable Python code: -```python -workflow = CreationWorkflow(adr_dir="/test") -result = workflow.execute(creation_input) -assert result.success is True -``` +- **Complete lifecycle flows**: Analyze → create → approve → enforce +- **Error handling**: Permission issues, malformed input, conflicts +- **Performance**: Large projects, memory efficiency +- **MCP integration**: AI agent communication + +Each workflow is designed for predictable, reliable behavior. ## FAQ -**Q: Do I need to use AI agents?** -A: No. CLI works standalone. But AI agents make it much more useful—they can analyze your codebase, propose ADRs, and manage the entire lifecycle. +**Q: I don't use AI agents. Is this useful for me?** +A: ADR Kit works standalone for documentation and manual enforcement setup. But it's designed for the AI development workflow. The CLI exists for testing and CI/CD, but the real value comes from AI integration. + +**Q: Does this replace code reviews?** +A: No. ADR Kit catches architectural violations automatically (Layer 3), but code reviews catch logic errors, security issues, and design problems that lint rules can't detect. Think of it as an additional safety net. -**Q: What if I don't use JavaScript/Python?** -A: ADRs are still valuable for documentation. Policy enforcement is limited to supported linters (ESLint, Ruff, import-linter). +**Q: What languages/frameworks are supported?** +A: **Layer 1 (lifecycle management)** and **Layer 2 (context loading)** are language-agnostic. **Layer 3 (enforcement)** currently supports JavaScript/TypeScript (ESLint) and Python (Ruff). Other languages require manual policy application until more linters are supported. **Q: Can I use this with existing ADRs?** -A: Yes. ADR Kit reads standard MADR format. Add the `policy:` section to enable enforcement. +A: Yes. ADR Kit reads standard MADR format. Add a `policy:` section to existing ADRs to enable automated enforcement. -**Q: Greenfield vs Brownfield—what's the difference?** -A: **Greenfield** = new project, create ADRs as you make decisions. -**Brownfield** = existing project, use `adr_analyze_project()` to find and document existing decisions. +**Q: What if my ADR doesn't map to lint rules?** +A: Not all architectural decisions can be enforced with linters. Example: "Use microservices architecture" can't become an ESLint rule. These decisions work at Layer 1 (lifecycle management) and Layer 2 (context loading) but not Layer 3 (enforcement). ADR Kit focuses on decisions that CAN be enforced: library choices, coding patterns, file structure. **Q: Does this work offline?** -A: Yes. No external API calls. Semantic search uses local models (sentence-transformers). +A: Yes. No external API calls. Semantic search uses local models (sentence-transformers). Your ADRs and policies stay on your machine. **Q: What's the difference between MCP tools and CLI commands?** -A: **MCP tools** (6) are for AI agents. **CLI commands** (20+) are for manual use, debugging, and CI/CD. Different interfaces for different purposes. +A: **MCP tools** (6) are the AI interface—how agents interact with ADR Kit. **CLI commands** (20+) are for manual operations, debugging, and CI/CD. Most users interact through AI; the CLI exists for edge cases. + +## Current Capabilities + +**What ADR Kit Does Today:** + +- ✅ **Context loading**: Filters ADRs by relevance to current task (planning_context tool) +- ✅ **Implicit decision discovery**: Reads codebase to discover architectural decisions already made but never documented (analyze_project tool) +- ✅ **Conflict detection**: Analyzes whether discovered decisions are consistently followed or violated in some cases +- ✅ **Greenfield support**: Create ADRs before or as you implement for strong foundation from the start +- ✅ **Policy extraction**: Converts decision text into enforceable policies (import restrictions) +- ✅ **ESLint/Ruff generation**: Creates lint rules from policies automatically +- ✅ **Quality gate**: Prevents vague decisions that can't be enforced + +**Current Limitations:** + +- Policy types: Import restrictions work well. Pattern policies, architecture boundaries, and config enforcement are defined but not yet enforced. +- Language support: JavaScript/TypeScript (ESLint) and Python (Ruff) today. Other languages require manual policy application. +- Enforcement: Linter-based only. Runtime enforcement and CI gates are planned. ## What's Coming +**Enforcement Pipeline** (High Priority): +- Staged enforcement system (warn → block transitions) +- Complete enforcement loop with automated code scanning +- Import-linter template generation for Python projects + +**Developer Experience**: - Enhanced semantic search as primary conflict detection method -- Additional linter integrations (import-linter templates) - ADR templates for common decision types -- Static site generation for ADR documentation +- Static site generation for ADR documentation (Log4brains integration) + +**Recent Additions** (Since Feb 2026): +- Decision quality guidance system +- Expanded policy types: patterns, architecture rules, config enforcement +- AI warning extraction for task-specific guidance +- Policy suggestion engine with auto-detection -See [.agent/GAP_ANALYSIS.md](.agent/GAP_ANALYSIS.md) for detailed feature status and roadmap. +See [.agent/task-tracking.md](.agent/task-tracking.md) for detailed feature status and priority queue. ## Learn More diff --git a/adr_kit/mcp/models.py b/adr_kit/mcp/models.py index 242a0a9..27ef4b9 100644 --- a/adr_kit/mcp/models.py +++ b/adr_kit/mcp/models.py @@ -114,6 +114,9 @@ class CreateADRRequest(BaseModel): """, ) alternatives: str | None = Field(None, description="Alternative options considered") + skip_quality_gate: bool = Field( + False, description="Skip quality assessment (for testing or override)" + ) adr_dir: str = Field("docs/adr", description="ADR directory path") @@ -150,6 +153,9 @@ class SupersedeADRRequest(BaseModel): auto_approve: bool = Field( False, description="Automatically approve new ADR without human review" ) + skip_quality_gate: bool = Field( + False, description="Skip quality assessment (for testing or override)" + ) adr_dir: str = Field("docs/adr", description="ADR directory path") @@ -174,6 +180,16 @@ class PlanningContextRequest(BaseModel): adr_dir: str = Field("docs/adr", description="ADR directory path") +class DecisionGuidanceRequest(BaseModel): + """Parameters for getting decision quality guidance.""" + + include_examples: bool = Field(True, description="Include good vs bad ADR examples") + focus_area: str | None = Field( + None, + description="Optional focus area (e.g., 'database', 'frontend') for tailored examples", + ) + + # Response Data Models for Tool-Specific Data diff --git a/adr_kit/mcp/server.py b/adr_kit/mcp/server.py index b24e28e..a56ba55 100644 --- a/adr_kit/mcp/server.py +++ b/adr_kit/mcp/server.py @@ -141,25 +141,56 @@ def adr_preflight(request: PreflightCheckRequest) -> dict[str, Any]: @mcp.tool() def adr_create(request: CreateADRRequest) -> dict[str, Any]: """ - Create new ADR with optional policy enforcement. + Create new ADR with quality assessment and policy guidance. WHEN TO USE: Document significant technical decisions. - RETURNS: ADR details + policy_guidance (if policies detected in content). - - Parameters: - - title, context, decision, consequences: Required ADR content - - policy (optional): Structured policy dict for enforcement - - If no policy provided, response includes policy_guidance with: - - suggestion: Auto-detected policy structure from decision text - - policy_reference: Complete documentation for all policy types - - example_usage: Code example with your ADR + suggested policy - - Use pattern-friendly language for auto-detection: - - "Don't use X" / "Prefer Y over X" → import policies - - "All X must be Y" → pattern policies - - "X must not access Y" → architecture boundaries - - "TypeScript strict mode required" → config enforcement + RETURNS: ADR details + quality_feedback + policy_guidance. + + ## ADR Structure (MADR Format) + + Your ADR should have four sections: + + 1. **Context** (WHY): The problem or opportunity that prompted this decision + - Current state and why it's insufficient + - Requirements and constraints + - Business/technical drivers + + 2. **Decision** (WHAT): The specific technology/pattern/approach chosen + - Explicit statement with technology names/versions + - Scope ('All new services', 'Frontend only') + - Constraints ('Don't use X', 'Must have Y') + + 3. **Consequences** (TRADE-OFFS): Both positive AND negative outcomes + - Benefits and improvements (### Positive) + - Drawbacks and limitations (### Negative) + - Risks and mitigation strategies + + 4. **Alternatives** (OPTIONAL but CRITICAL): What else was considered + - Each rejected option with specific reason + - Enables extraction of 'disallow' policies + + ## Quality Guidelines + + - **Be specific**: "Use React 18" not "use a modern framework" + - **Document trade-offs**: List BOTH pros and cons (every decision has negatives) + - **Explain WHY**: Context should justify the decision + - **State constraints explicitly**: "Don't use Flask" → enables policy extraction + - **Include alternatives**: Rejected options become 'disallow' policies + + ## Response Contents + + The response includes: + - **quality_feedback**: Assessment of decision quality with improvement suggestions + - **policy_guidance**: How to add automated enforcement (Task 2) + + ## Example + + Good decision language: + - "Use **FastAPI** for all new backend services. **Don't use Flask** or Django." + - "All FastAPI handlers must be async functions." + - "Frontend must not access database directly - use API layer." + + This enables automatic extraction of enforceable policies. """ try: logger.info(f"Creating ADR: {request.title}") @@ -175,6 +206,7 @@ def adr_create(request: CreateADRRequest) -> dict[str, Any]: tags=request.tags, policy=request.policy, alternatives=request.alternatives, + skip_quality_gate=request.skip_quality_gate, ) result = workflow.execute(input_data=creation_input) @@ -305,6 +337,7 @@ def adr_supersede(request: SupersedeADRRequest) -> dict[str, Any]: tags=request.new_tags, policy=request.new_policy, alternatives=request.new_alternatives, + skip_quality_gate=request.skip_quality_gate, ) supersede_input = SupersedeInput( diff --git a/adr_kit/workflows/base.py b/adr_kit/workflows/base.py index 7bec06f..8aa14e2 100644 --- a/adr_kit/workflows/base.py +++ b/adr_kit/workflows/base.py @@ -16,6 +16,9 @@ class WorkflowStatus(str, Enum): FAILED = "failed" VALIDATION_ERROR = "validation_error" CONFLICT_ERROR = "conflict_error" + REQUIRES_ACTION = ( + "requires_action" # Quality gate or other check requires user action + ) @dataclass diff --git a/adr_kit/workflows/creation.py b/adr_kit/workflows/creation.py index 225a719..5e48446 100644 --- a/adr_kit/workflows/creation.py +++ b/adr_kit/workflows/creation.py @@ -26,6 +26,7 @@ class CreationInput: tags: list[str] | None = None policy: dict[str, Any] | None = None # Structured policy block alternatives: str | None = None # Alternative options considered + skip_quality_gate: bool = False # Skip quality gate (for testing or override) @dataclass @@ -60,7 +61,7 @@ class CreationWorkflow(BaseWorkflow): def execute( self, input_data: CreationInput | None = None, **kwargs: Any ) -> WorkflowResult: - """Execute ADR creation workflow.""" + """Execute ADR creation workflow with quality gate.""" # Use positional input_data if provided, otherwise extract from kwargs if input_data is None: input_data = kwargs.get("input_data") @@ -70,15 +71,53 @@ def execute( self._start_workflow("Create ADR") try: - # Step 1: Generate ADR ID - adr_id = self._execute_step("generate_adr_id", self._generate_adr_id) - - # Step 2: Validate input + # Step 1: Basic validation (minimum requirements) self._execute_step( "validate_input", self._validate_creation_input, input_data ) - # Step 3: Check conflicts + # Step 2: Quality gate - run BEFORE any file operations (unless skipped) + quality_feedback = None + if not input_data.skip_quality_gate: + quality_feedback = self._execute_step( + "quality_gate", self._quick_quality_gate, input_data + ) + + # Step 3: Check quality threshold + if not quality_feedback.get("passes_threshold", True): + # Quality below threshold - BLOCK creation and return feedback + self._complete_workflow( + success=False, + message=f"Quality threshold not met (score: {quality_feedback['quality_score']}/{quality_feedback['threshold']})", + status=WorkflowStatus.REQUIRES_ACTION, + ) + self.result.data = { + "quality_feedback": quality_feedback, + "correction_prompt": ( + "Please address the quality issues identified above and resubmit. " + "Focus on high-priority issues first for maximum impact." + ), + } + self.result.next_steps = quality_feedback.get("next_steps", []) + return self.result + else: + # Quality gate skipped - generate basic feedback for backward compatibility + quality_feedback = { + "quality_score": None, + "grade": None, + "passes_threshold": True, + "summary": "Quality gate skipped (skip_quality_gate=True)", + "issues": [], + "strengths": [], + "recommendations": [], + "next_steps": [], + } + + # Quality passed threshold - proceed with ADR creation + # Step 4: Generate ADR ID + adr_id = self._execute_step("generate_adr_id", self._generate_adr_id) + + # Step 5: Check conflicts related_adrs = self._execute_step( "find_related_adrs", self._find_related_adrs, input_data ) @@ -86,12 +125,12 @@ def execute( "check_conflicts", self._detect_conflicts, input_data, related_adrs ) - # Step 4: Create ADR content + # Step 6: Create ADR content adr = self._execute_step( "create_adr_content", self._build_adr_structure, adr_id, input_data ) - # Step 5: Write ADR file + # Step 7: Write ADR file (only happens if quality passed) file_path = self._execute_step( "write_adr_file", self._generate_adr_file, adr ) @@ -117,7 +156,7 @@ def execute( review_required=review_required, ) - # Generate policy suggestions if no policy was provided + # Generate policy suggestions if no policy was provided (Task 2) policy_guidance = self._generate_policy_guidance(adr, input_data) self._complete_workflow( @@ -125,7 +164,8 @@ def execute( ) self.result.data = { "creation_result": result, - "policy_guidance": policy_guidance, # New: return policy guidance to agent + "quality_feedback": quality_feedback, # Task 1: Quality gate results + "policy_guidance": policy_guidance, # Task 2: Policy construction guidance } self.result.guidance = next_steps self.result.next_steps = self._generate_next_steps_list( @@ -176,18 +216,35 @@ def _generate_adr_id(self) -> str: return f"ADR-{next_num:04d}" def _validate_creation_input(self, input_data: CreationInput) -> None: - """Validate the input data for ADR creation.""" + """Validate the input data for ADR creation with helpful error messages.""" if not input_data.title or len(input_data.title.strip()) < 3: - raise ValueError("Title must be at least 3 characters") + raise ValueError( + "Title must be at least 3 characters. " + "Example: 'Use PostgreSQL for Primary Database' or 'Use React 18 with TypeScript'" + ) if not input_data.context or len(input_data.context.strip()) < 10: - raise ValueError("Context must be at least 10 characters") + raise ValueError( + "Context must be at least 10 characters. " + "Context should explain WHY this decision is needed - the problem or opportunity. " + "Example: 'We need ACID transactions for financial data integrity. Current SQLite " + "setup doesn't support concurrent writes from multiple services.'" + ) if not input_data.decision or len(input_data.decision.strip()) < 5: - raise ValueError("Decision must be at least 5 characters") + raise ValueError( + "Decision must be at least 5 characters. " + "Decision should state WHAT specific technology/pattern/approach is chosen. " + "Example: 'Use PostgreSQL 15 as the primary database. Don't use MySQL or MongoDB.' " + "Be specific and include explicit constraints." + ) if not input_data.consequences or len(input_data.consequences.strip()) < 5: - raise ValueError("Consequences must be at least 5 characters") + raise ValueError( + "Consequences must be at least 5 characters. " + "Consequences should document BOTH positive and negative outcomes (trade-offs). " + "Example: '+ ACID compliance, + Rich features, - Higher resource usage, - Ops expertise required'" + ) if input_data.status and input_data.status not in [ "proposed", @@ -491,6 +548,9 @@ def _validate_policy_completeness( """Validate that ADR has extractable policy information. Returns list of warnings if policy is missing or insufficient. + + Note: This is a lightweight check. Policy construction guidance is provided + via the policy_guidance promptlet, which agents can use to construct policies. """ from ..core.policy_extractor import PolicyExtractor @@ -499,338 +559,14 @@ def _validate_policy_completeness( # Check if policy is extractable if not extractor.has_extractable_policy(adr): - # Analyze decision and alternatives to suggest policy - alternatives_text = creation_input.alternatives or "" - suggested = self._suggest_policy_from_alternatives( - creation_input.decision, alternatives_text + # Provide brief warning - detailed guidance is in policy_guidance promptlet + warnings.append( + "⚠️ No structured policy provided. Review the policy_guidance in the response " + "for instructions on constructing enforcement policies." ) - if suggested: - # Policy could be auto-generated from content - import json - - warnings.append( - "⚠️ No structured policy provided, but enforceable policies detected in content." - ) - warnings.append( - f"📋 Suggested policy structure:\n{json.dumps(suggested, indent=2)}" - ) - warnings.append( - "💡 To enable automatic enforcement, include a 'policy' block with this structure when creating the ADR." - ) - else: - # No detectable policy at all - warnings.append( - "⚠️ No structured policy provided and no enforceable policies detected in content." - ) - warnings.append( - "📖 Use pattern-friendly language to enable constraint extraction:\n" - " • Import restrictions: 'Don't use X', 'Prefer Y over X'\n" - " • Code patterns: 'All X must be Y', 'X must have Y'\n" - " • Architecture: 'X must not access Y', 'Required: path/to/file'\n" - " • Config: 'TypeScript strict mode required', 'Ruff must check imports'" - ) - warnings.append( - "💡 Or include a structured 'policy' block for guaranteed enforcement." - ) - return warnings - def _suggest_policy_from_alternatives( - self, decision: str, alternatives: str - ) -> dict[str, Any] | None: - """Suggest policy structure based on decision and alternatives text. - - This is a comprehensive policy suggestion engine that analyzes the - decision and alternatives to detect enforceable policies across all - policy types: imports, patterns, architecture, and config enforcement. - """ - # Combine decision and alternatives for comprehensive analysis - full_text = f"{decision}\n\n{alternatives}" - - suggested_policy: dict[str, Any] = {} - - # 1. Extract Import Policies - import_policy = self._suggest_import_policies(full_text, decision) - if import_policy: - suggested_policy["imports"] = import_policy - - # 2. Extract Pattern Policies - pattern_policy = self._suggest_pattern_policies(full_text) - if pattern_policy: - suggested_policy["patterns"] = pattern_policy - - # 3. Extract Architecture Policies - architecture_policy = self._suggest_architecture_policies(full_text) - if architecture_policy: - suggested_policy["architecture"] = architecture_policy - - # 4. Extract Config Enforcement Policies - config_policy = self._suggest_config_policies(full_text) - if config_policy: - suggested_policy["config_enforcement"] = config_policy - - # 5. Extract Rationales - rationales = self._suggest_rationales(full_text) - if rationales: - suggested_policy["rationales"] = rationales - - return suggested_policy if suggested_policy else None - - def _suggest_import_policies( - self, full_text: str, decision: str - ) -> dict[str, Any] | None: - """Suggest import/library policies from text.""" - disallow = set() - prefer = set() - - # Pattern 1: "Don't use X", "Avoid X", "Ban X", "X is deprecated" - ban_patterns = [ - r"(?i)(?:don't\s+use|avoid|ban|deprecated?)\s+([a-zA-Z0-9\-_@/.]+)", - r"(?i)no\s+longer\s+use\s+([a-zA-Z0-9\-_@/.]+)", - r"(?i)([a-zA-Z0-9\-_@/.]+)\s+is\s+deprecated", - ] - - for pattern in ban_patterns: - matches = re.findall(pattern, full_text) - for match in matches: - normalized = self._normalize_library_name(match) - if normalized: - disallow.add(normalized) - - # Pattern 2: "Use Y instead of X", "Prefer Y over X", "Replace X with Y" - preference_patterns = [ - r"(?i)use\s+([a-zA-Z0-9\-_@/.]+)\s+instead\s+of\s+([a-zA-Z0-9\-_@/.]+)", - r"(?i)prefer\s+([a-zA-Z0-9\-_@/.]+)\s+over\s+([a-zA-Z0-9\-_@/.]+)", - r"(?i)replace\s+([a-zA-Z0-9\-_@/.]+)\s+with\s+([a-zA-Z0-9\-_@/.]+)", - ] - - for pattern in preference_patterns: - matches = re.findall(pattern, full_text) - for match in matches: - if len(match) == 2: # (preferred, deprecated) - preferred, deprecated = match - preferred_norm = self._normalize_library_name(preferred) - deprecated_norm = self._normalize_library_name(deprecated) - if preferred_norm: - prefer.add(preferred_norm) - if deprecated_norm: - disallow.add(deprecated_norm) - - # Pattern 3: Extract from alternatives section - # "### Technology Name\n- Rejected" - heading_matches = re.findall( - r"(?i)###\s+([a-zA-Z0-9\-_@/. ]+?)\n\s*-\s*Reject(?:ed)?", full_text - ) - for match in heading_matches: - first_word = match.strip().split()[0] if match.strip().split() else "" - if first_word and len(first_word) > 2: - if re.match(r"^[A-Za-z][A-Za-z0-9\-_.]*$", first_word): - normalized = self._normalize_library_name(first_word) - if normalized: - disallow.add(normalized) - - # Pattern 3b: "Rejected X and Y" format - rejected_and_pattern = r"(?i)Rejected?\s+([A-Za-z][A-Za-z0-9\-_.@/]*)" - rejected_and_matches = re.findall(rejected_and_pattern, full_text) - for match in rejected_and_matches: - normalized = self._normalize_library_name(match) - if normalized: - disallow.add(normalized) - - # Pattern 4: Extract chosen technology from decision - use_matches = re.findall(r"(?i)Use\s+([a-zA-Z0-9\-_@/.]+)", decision) - for match in use_matches: - normalized = self._normalize_library_name(match) - if normalized: - prefer.add(normalized) - - if disallow or prefer: - return { - "disallow": sorted(disallow) if disallow else None, - "prefer": sorted(prefer) if prefer else None, - } - - return None - - def _suggest_pattern_policies(self, full_text: str) -> dict[str, Any] | None: - """Suggest code pattern policies from text.""" - patterns_dict = {} - - # Pattern 1: "All X must be Y" - all_must_patterns = re.findall( - r"(?i)all\s+([a-zA-Z0-9\-_\s]+?)\s+must\s+be\s+([a-zA-Z0-9\-_\s]+)", - full_text, - ) - for _idx, (subject, requirement) in enumerate(all_must_patterns, start=1): - rule_name = f"all_{subject.strip().lower().replace(' ', '_')}_must_be_{requirement.strip().lower().replace(' ', '_')}" - patterns_dict[rule_name] = { - "description": f"All {subject.strip()} must be {requirement.strip()}", - "severity": "error", - "rule": f"{subject.strip()}.*{requirement.strip()}", # Simple regex placeholder - } - - # Pattern 2: "X must have Y" or "X must include Y" - must_have_patterns = re.findall( - r"(?i)([a-zA-Z0-9\-_\s]+?)\s+must\s+(?:have|include)\s+([a-zA-Z0-9\-_\s]+)", - full_text, - ) - for _idx, (subject, requirement) in enumerate(must_have_patterns, start=1): - rule_name = f"{subject.strip().lower().replace(' ', '_')}_must_have_{requirement.strip().lower().replace(' ', '_')}" - patterns_dict[rule_name] = { - "description": f"{subject.strip()} must have {requirement.strip()}", - "severity": "error", - "rule": f"{subject.strip()}.*{requirement.strip()}", - } - - # Pattern 3: "No X allowed" or "X is forbidden" - no_allowed_patterns = re.findall( - r"(?i)no\s+([a-zA-Z0-9\-_\s]+?)\s+(?:allowed|permitted)", full_text - ) - for match in no_allowed_patterns: - rule_name = f"no_{match.strip().lower().replace(' ', '_')}_allowed" - patterns_dict[rule_name] = { - "description": f"No {match.strip()} allowed", - "severity": "error", - "rule": f"(?!.*{match.strip()})", # Negative lookahead - } - - return {"patterns": patterns_dict} if patterns_dict else None - - def _suggest_architecture_policies(self, full_text: str) -> dict[str, Any] | None: - """Suggest architecture policies (boundaries + structure) from text.""" - layer_boundaries = [] - required_structure = [] - - # Pattern 1: "X must not access/call/use Y" - boundary_patterns = [ - r"(?i)([a-zA-Z0-9\-_]+)\s+must\s+not\s+(?:access|call|use|import)\s+([a-zA-Z0-9\-_]+)", - r"(?i)no\s+direct\s+access\s+from\s+([a-zA-Z0-9\-_]+)\s+to\s+([a-zA-Z0-9\-_]+)", - r"(?i)([a-zA-Z0-9\-_]+)\s+(?:cannot|should\s+not)\s+(?:access|import)\s+([a-zA-Z0-9\-_]+)", - ] - - for pattern in boundary_patterns: - matches = re.findall(pattern, full_text) - for source, target in matches: - layer_boundaries.append( - { - "rule": f"{source.strip()} -> {target.strip()}", - "action": "block", - "message": f"{source.strip()} must not access {target.strip()}", - } - ) - - # Pattern 2: "Required: path/to/file" - required_patterns = re.findall( - r"(?i)required:\s+([a-zA-Z0-9\-_/.]+)", full_text - ) - for path in required_patterns: - required_structure.append( - {"path": path.strip(), "description": f"Required: {path.strip()}"} - ) - - # Pattern 3: "Must have X directory/file" - must_have_structure = re.findall( - r"(?i)must\s+have\s+([a-zA-Z0-9\-_/.]+)\s+(directory|file|folder)", - full_text, - ) - for path, _ in must_have_structure: - required_structure.append( - {"path": path.strip(), "description": f"Required {path.strip()}"} - ) - - policy = {} - if layer_boundaries: - policy["layer_boundaries"] = layer_boundaries - if required_structure: - policy["required_structure"] = required_structure - - return policy if policy else None - - def _suggest_config_policies(self, full_text: str) -> dict[str, Any] | None: - """Suggest configuration enforcement policies from text.""" - config_policy = {} - - # TypeScript config patterns - ts_patterns: dict[str, dict[str, Any]] = { - r"(?i)typescript.*strict\s+mode": {"tsconfig": {"strict": True}}, - r"(?i)tsconfig.*strict.*true": {"tsconfig": {"strict": True}}, - r"(?i)enable.*noImplicitAny": { - "tsconfig": {"compilerOptions": {"noImplicitAny": True}} - }, - } - - typescript_config: dict[str, Any] = {} - for pattern, config in ts_patterns.items(): - if re.search(pattern, full_text): - typescript_config.update(config) - - if typescript_config: - config_policy["typescript"] = typescript_config - - # Python config patterns - py_patterns: dict[str, dict[str, Any]] = { - r"(?i)ruff.*check.*imports": {"ruff": {"lint": {"select": ["I"]}}}, - r"(?i)mypy.*strict": {"mypy": {"strict": True}}, - } - - python_config: dict[str, Any] = {} - for pattern, config in py_patterns.items(): - if re.search(pattern, full_text): - python_config.update(config) - - if python_config: - config_policy["python"] = python_config - - return config_policy if config_policy else None - - def _suggest_rationales(self, full_text: str) -> list[str] | None: - """Extract rationales for the policies from content.""" - rationales = set() - - # Pattern 1: "For X" or "To X" - rationale_patterns = [ - r"(?i)for\s+(performance|security|maintainability|consistency|bundle\s+size|scalability)", - r"(?i)to\s+(?:improve|enhance|ensure|maintain)\s+(performance|security|maintainability|consistency)", - r"(?i)(?:better|improved)\s+(performance|security|maintainability|developer\s+experience|dx)", - r"(?i)because\s+(?:of\s+)?([^.]+)", - ] - - for pattern in rationale_patterns: - matches = re.findall(pattern, full_text) - for match in matches: - rationale = match.strip().replace("_", " ").capitalize() - if len(rationale) > 5: # Filter out too-short matches - rationales.add(rationale) - - return sorted(rationales) if rationales else None - - def _normalize_library_name(self, name: str) -> str | None: - """Normalize library names using common mappings.""" - # Common library name mappings for normalization - library_mappings = { - "react-query": "@tanstack/react-query", - "react query": "@tanstack/react-query", - "tanstack query": "@tanstack/react-query", - "axios": "axios", - "fetch": "fetch", - "lodash": "lodash", - "moment": "moment", - "momentjs": "moment", - "moment.js": "moment", - "date-fns": "date-fns", - "dayjs": "dayjs", - "jquery": "jquery", - "underscore": "underscore", - "flask": "flask", - "django": "django", - "fastapi": "fastapi", - "express": "express", - } - - name_lower = name.lower().strip() - return library_mappings.get(name_lower, name if len(name) > 1 else None) - def _generate_adr_file(self, adr: ADR) -> str: """Generate the ADR file.""" # Create filename with slugified title @@ -882,27 +618,8 @@ def _generate_madr_content(self, adr: ADR) -> str: lines.append("---") lines.append("") - # MADR content - lines.append("## Context") - lines.append("") - lines.append(adr.context) - lines.append("") - - lines.append("## Decision") - lines.append("") - lines.append(adr.decision) - lines.append("") - - lines.append("## Consequences") - lines.append("") - lines.append(adr.consequences) - lines.append("") - - if adr.alternatives: - lines.append("## Alternatives") - lines.append("") - lines.append(adr.alternatives) - lines.append("") + # MADR content sections (already formatted in adr.content) + lines.append(adr.content) return "\n".join(lines) @@ -995,73 +712,69 @@ def _generate_policy_guidance( ) -> dict[str, Any] | None: """Generate policy guidance promptlet for agents. - This method creates actionable guidance for agents when policies - could be extracted from the ADR content but weren't provided - as structured policy in the front-matter. + This method provides a structured promptlet that guides reasoning agents + through the process of constructing enforcement policies. Rather than + using regex to extract policies from text (which is fragile and redundant), + we provide the schema and let the agent reason about how to map their + architectural decision to the available policy capabilities. + + This follows the principle: "ADR Kit provides structure, agents provide intelligence." Returns: - Policy guidance dict with suggestions, or None if policy already provided + Policy guidance dict with schema and reasoning prompts, or None if policy already provided """ # If policy was already provided, no guidance needed if adr.front_matter.policy: return { "has_policy": True, "message": "✅ Structured policy provided and validated", - "suggestion": None, } - # Analyze decision and alternatives to detect enforceable policies - alternatives_text = creation_input.alternatives or "" - suggested = self._suggest_policy_from_alternatives( - creation_input.decision, alternatives_text - ) - - if suggested: - # Enforceable policies detected - provide guidance - import json - - return { - "has_policy": False, - "detectable": True, - "message": ( - "📋 Enforceable policies detected in ADR content but no structured policy provided. " - "To enable automatic enforcement, include a 'policy' parameter when creating ADRs." - ), - "suggestion": suggested, - "suggestion_json": json.dumps(suggested, indent=2), - "example_usage": ( - f"adr_create(\n" - f" title='{creation_input.title}',\n" - f" context='{creation_input.context[:50]}...',\n" - f" decision='{creation_input.decision[:50]}...',\n" - f" consequences='{creation_input.consequences[:50]}...',\n" - f" policy={json.dumps(suggested)}\n" - f")" + # No policy provided - guide the agent through policy construction + return { + "has_policy": False, + "message": ( + "📋 No policy provided. To enable automated enforcement, review your " + "architectural decision and construct a policy dict using the schema below." + ), + "agent_task": { + "role": "Policy Constructor", + "objective": ( + "Analyze your architectural decision and identify enforceable constraints " + "that can be automated. Map these constraints to the policy schema capabilities." ), - "guidance": [ - "Use the suggested policy structure to enable enforcement", - "Adjust the policy dict based on your specific requirements", - "Call adr_create() again with the policy parameter", + "reasoning_steps": [ + "1. Review your decision text for enforceable rules (what you said 'yes' or 'no' to)", + "2. Identify which policy types apply (imports, patterns, architecture, config)", + "3. Map your constraints to the schema structures below", + "4. Construct a policy dict with only the relevant policy types", + "5. Call adr_create() again with the policy parameter", ], - "policy_reference": self._build_policy_reference(), - } - else: - # No enforceable policies detected - return { - "has_policy": False, - "detectable": False, - "message": ( - "⚠️ No structured policy provided and no enforceable policies detected in content. " - "Use pattern-friendly language to enable constraint extraction." + "focus": ( + "Look for explicit constraints in your decision: library choices, " + "code patterns, architectural boundaries, or configuration requirements." ), - "guidance": [ - "Import restrictions: 'Don't use X', 'Prefer Y over X'", - "Code patterns: 'All X must be Y', 'X must have Y'", - "Architecture: 'X must not access Y', 'Required: path/to/file'", - "Config: 'TypeScript strict mode required', 'Ruff must check imports'", - ], - "suggestion": None, - } + }, + "policy_capabilities": self._build_policy_reference(), + "example_workflow": { + "scenario": "Decision says: 'Use FastAPI. Don't use Flask or Django due to lack of async support.'", + "reasoning": "This is an import restriction - FastAPI is preferred, Flask/Django are disallowed.", + "constructed_policy": { + "imports": { + "disallow": ["flask", "django"], + "prefer": ["fastapi"], + }, + "rationales": ["Native async support required for I/O operations"], + }, + "next_call": "adr_create(..., policy={...})", + }, + "guidance": [ + "Only create policies for explicit constraints in your decision", + "Don't invent constraints that weren't in your decision", + "Multiple policy types can be combined in one policy dict", + "Rationales help explain why constraints exist", + ], + } def _build_policy_reference(self) -> dict[str, Any]: """Build comprehensive policy structure reference documentation. @@ -1160,3 +873,600 @@ def _build_policy_reference(self) -> dict[str, Any]: ], }, } + + def _quick_quality_gate(self, creation_input: CreationInput) -> dict[str, Any]: + """Quick quality gate that runs BEFORE ADR file creation. + + This pre-validation check runs deterministic quality checks on the input + to ensure decision quality meets the minimum threshold BEFORE creating + any files. This enables a correction loop without file pollution. + + Args: + creation_input: The input data for ADR creation + + Returns: + Quality assessment with passes_threshold boolean and feedback + """ + issues = [] + strengths = [] + score = 100 # Start perfect, deduct points for issues + QUALITY_THRESHOLD = 75 # B grade minimum (anything lower blocks creation) + + context_text = creation_input.context.lower() + decision_text = creation_input.decision.lower() + consequences_text = creation_input.consequences.lower() + + # Check 1: Specificity - detect generic/vague language + generic_terms = [ + "modern", + "good", + "best", + "framework", + "library", + "tool", + "better", + "nice", + ] + vague_count = sum( + 1 for term in generic_terms if term in decision_text or term in context_text + ) + + if vague_count >= 2: + score -= 15 + issues.append( + { + "category": "specificity", + "severity": "medium", + "issue": f"Decision uses {vague_count} generic terms ('{', '.join([t for t in generic_terms if t in decision_text or t in context_text][:3])}...')", + "suggestion": "Replace generic terms with specific technology names and versions", + "example_fix": "Instead of 'use a modern framework', write 'Use React 18 with TypeScript'", + } + ) + else: + strengths.append("Decision uses specific, concrete terminology") + + # Check 2: Balanced consequences - must have BOTH pros AND cons + positive_keywords = [ + "benefit", + "advantage", + "positive", + "improve", + "better", + "gain", + ] + negative_keywords = [ + "drawback", + "limitation", + "negative", + "cost", + "risk", + "challenge", + ] + + has_positives = any(kw in consequences_text for kw in positive_keywords) + has_negatives = any(kw in consequences_text for kw in negative_keywords) + + if not (has_positives and has_negatives): + score -= 25 + issues.append( + { + "category": "balance", + "severity": "high", + "issue": "Consequences are one-sided (only pros or only cons)", + "suggestion": "Document BOTH positive and negative consequences - every technical decision has trade-offs", + "example_fix": "Add '### Negative' section listing drawbacks, limitations, or risks", + "why_it_matters": "Balanced trade-off analysis enables informed decision-making", + } + ) + else: + strengths.append("Consequences show balanced trade-off analysis") + + # Check 3: Context quality - sufficient detail + context_length = len(creation_input.context) + + if context_length < 50: + score -= 20 + issues.append( + { + "category": "context", + "severity": "high", + "issue": f"Context is too brief ({context_length} characters)", + "suggestion": "Expand context to explain WHY this decision is needed: current state, requirements, drivers", + "example_fix": "Add: business requirements, technical constraints, user needs", + } + ) + elif context_length >= 150: + strengths.append("Context provides detailed problem background") + + # Check 4: Explicit constraints - policy-ready language + import re + + constraint_patterns = [ + r"\bdon[''']t\s+use\b", + r"\bmust\s+not\s+use\b", + r"\bavoid\b", + r"\bmust\s+(?:use|have|be)\b", + r"\ball\s+\w+\s+must\b", + ] + + has_explicit_constraints = any( + re.search(pattern, decision_text, re.IGNORECASE) + for pattern in constraint_patterns + ) + + if not has_explicit_constraints: + score -= 15 + issues.append( + { + "category": "policy_readiness", + "severity": "medium", + "issue": "Decision lacks explicit constraints (enables policy extraction)", + "suggestion": "Add explicit constraints using 'Don't use X', 'Must use Y', 'All Z must...'", + "example_fix": "Use FastAPI for APIs. **Don't use Flask** or Django.", + "why_it_matters": "Explicit constraints enable automated policy enforcement (Task 2)", + } + ) + else: + strengths.append( + "Decision includes explicit constraints ready for policy extraction" + ) + + # Check 5: Alternatives - critical for 'disallow' policies + if not creation_input.alternatives or len(creation_input.alternatives) < 20: + score -= 15 + issues.append( + { + "category": "alternatives", + "severity": "medium", + "issue": "Missing or minimal alternatives section", + "suggestion": "Document rejected alternatives with specific reasons", + "example_fix": "### MySQL\\n**Rejected**: Weaker JSON support\\n\\n### MongoDB\\n**Rejected**: Conflicts with ACID requirements", + "why_it_matters": "Alternatives section enables extraction of 'disallow' policies", + } + ) + else: + strengths.append("Alternatives documented (enables disallow policies)") + + # Check 6: Decision completeness + decision_length = len(creation_input.decision) + + if decision_length < 30: + score -= 10 + issues.append( + { + "category": "completeness", + "severity": "low", + "issue": f"Decision is very brief ({decision_length} characters)", + "suggestion": "Expand decision with: specific technology, scope, and constraints", + "example_fix": "Use PostgreSQL 15 for all application data. Deploy on AWS RDS with Multi-AZ.", + } + ) + + # Clamp score to valid range + score = max(0, min(100, score)) + + # Determine grade (A=90+, B=75+, C=60+, D=40+, F=<40) + if score >= 90: + grade = "A" + elif score >= 75: + grade = "B" + elif score >= 60: + grade = "C" + elif score >= 40: + grade = "D" + else: + grade = "F" + + passes_threshold = score >= QUALITY_THRESHOLD + + # Generate summary + if passes_threshold: + summary = f"Decision quality is acceptable (Grade {grade}, {score}/100). {len(issues)} minor improvements suggested." + else: + summary = f"Decision quality is below threshold (Grade {grade}, {score}/100). {len(issues)} issues must be addressed before ADR creation." + + # Generate prioritized recommendations + high_priority = [i for i in issues if i["severity"] == "high"] + medium_priority = [i for i in issues if i["severity"] == "medium"] + + recommendations = [] + if high_priority: + recommendations.append( + f"🔴 **High Priority**: Fix {len(high_priority)} critical issues first" + ) + for issue in high_priority[:2]: # Top 2 high priority + recommendations.append( + f" - {issue['category'].title()}: {issue['suggestion']}" + ) + + if medium_priority and score < QUALITY_THRESHOLD: + recommendations.append( + f"🟡 **Medium Priority**: Address {len(medium_priority)} quality issues" + ) + for issue in medium_priority[:2]: # Top 2 medium priority + recommendations.append( + f" - {issue['category'].title()}: {issue['suggestion']}" + ) + + # Next steps vary by quality score + next_steps = [] + if not passes_threshold: + next_steps.append( + "⛔ **ADR Creation Blocked**: Quality score below threshold" + ) + next_steps.append( + "📝 **Action Required**: Address the issues above and resubmit" + ) + next_steps.append( + "💡 **Tip**: Focus on high-priority issues first for maximum impact" + ) + else: + next_steps.append( + "✅ **Quality Gate Passed**: ADR will be created with this input" + ) + if issues: + next_steps.append( + f"💡 **Optional**: Consider addressing {len(issues)} suggestions for even higher quality" + ) + + return { + "quality_score": score, + "grade": grade, + "passes_threshold": passes_threshold, + "threshold": QUALITY_THRESHOLD, + "summary": summary, + "issues": issues, + "strengths": strengths, + "recommendations": recommendations, + "next_steps": next_steps, + } + + def _assess_decision_quality( + self, adr: ADR, creation_input: CreationInput + ) -> dict[str, Any]: + """Assess decision quality and provide targeted feedback. + + This implements Task 1 of the two-step ADR creation flow: + - Task 1 (this method): Assess decision quality and provide guidance + - Task 2 (_generate_policy_guidance): Extract enforceable policies + + The assessment identifies common quality issues and provides actionable + feedback to help agents improve their ADRs. It follows the principle: + "ADR Kit provides structure, agents provide intelligence." + + Args: + adr: The created ADR + creation_input: The input data used to create the ADR + + Returns: + Quality assessment with issues found and improvement suggestions + """ + issues = [] + strengths = [] + score = 100 # Start with perfect score, deduct for issues + + # Check 1: Specificity (are technology names specific?) + generic_terms = [ + "modern", + "good", + "best", + "framework", + "library", + "tool", + "system", + "platform", + ] + decision_lower = adr.decision.lower() + title_lower = adr.title.lower() + + vague_terms_found = [ + term + for term in generic_terms + if term in decision_lower or term in title_lower + ] + if vague_terms_found: + issues.append( + { + "category": "specificity", + "severity": "medium", + "issue": f"Decision uses generic terms: {', '.join(vague_terms_found)}", + "suggestion": ( + "Replace generic terms with specific technology names and versions. " + "Example: Instead of 'modern framework', use 'React 18' or 'FastAPI 0.104'." + ), + "example_fix": { + "bad": "Use a modern web framework", + "good": "Use React 18 with TypeScript for frontend development", + }, + } + ) + score -= 15 + else: + strengths.append("Decision is specific with clear technology choices") + + # Check 2: Balanced consequences (are there both pros AND cons?) + consequences_lower = adr.consequences.lower() + has_positives = any( + word in consequences_lower + for word in [ + "benefit", + "advantage", + "positive", + "+", + "pro:", + "pros:", + "good", + "better", + "improve", + ] + ) + has_negatives = any( + word in consequences_lower + for word in [ + "drawback", + "disadvantage", + "negative", + "-", + "con:", + "cons:", + "risk", + "limitation", + "downside", + "trade-off", + "tradeoff", + ] + ) + + if not (has_positives and has_negatives): + issues.append( + { + "category": "balance", + "severity": "high", + "issue": "Consequences appear one-sided (missing pros or cons)", + "suggestion": ( + "Every technical decision has trade-offs. Document BOTH positive outcomes " + "AND negative consequences honestly. Use structure like:\n" + "### Positive\n- Benefit 1\n- Benefit 2\n\n" + "### Negative\n- Drawback 1\n- Drawback 2" + ), + "why_it_matters": ( + "Balanced consequences help future decision-makers understand when to " + "reconsider this choice. Hiding drawbacks leads to technical debt." + ), + } + ) + score -= 25 + else: + strengths.append("Consequences document both benefits and drawbacks") + + # Check 3: Context quality (does it explain WHY?) + context_length = len(adr.context.strip()) + if context_length < 50: + issues.append( + { + "category": "context", + "severity": "high", + "issue": "Context is too brief (less than 50 characters)", + "suggestion": ( + "Context should explain WHY this decision is needed. Include:\n" + "- The problem or opportunity\n" + "- Current state and why it's insufficient\n" + "- Requirements that must be met\n" + "- Constraints or limitations" + ), + "example": ( + "Good Context: 'We need ACID transactions for financial data integrity. " + "Current SQLite setup doesn't support concurrent writes from multiple services. " + "Requires complex queries with joins and JSON document storage.'" + ), + } + ) + score -= 20 + else: + strengths.append("Context provides sufficient detail about the problem") + + # Check 4: Explicit constraints (for policy extraction) + constraint_patterns = [ + r"\bdon[''']t\s+use\b", + r"\bavoid\b.*\b(?:using|use)\b", + r"\bmust\s+(?:not\s+)?(?:use|have|be)\b", + r"\ball\s+\w+\s+must\b", + r"\brequired?\b", + r"\bprohibited?\b", + ] + + has_explicit_constraints = any( + re.search(pattern, decision_lower, re.IGNORECASE) + for pattern in constraint_patterns + ) + + if not has_explicit_constraints: + issues.append( + { + "category": "policy_readiness", + "severity": "medium", + "issue": "Decision lacks explicit constraints for policy extraction", + "suggestion": ( + "Use explicit constraint language to enable automated policy extraction:\n" + "- 'Don't use X' / 'Avoid X'\n" + "- 'Use Y instead of X'\n" + "- 'All X must have Y'\n" + "- 'Must not access'\n" + "Example: 'Use FastAPI. Don't use Flask or Django due to lack of async support.'" + ), + "why_it_matters": ( + "Explicit constraints enable Task 2 (policy extraction) to generate " + "enforceable rules automatically. Vague language can't be automated." + ), + } + ) + score -= 15 + else: + strengths.append( + "Decision includes explicit constraints ready for policy extraction" + ) + + # Check 5: Alternatives (critical for policy extraction) + if ( + not creation_input.alternatives + or len(creation_input.alternatives.strip()) < 20 + ): + issues.append( + { + "category": "alternatives", + "severity": "medium", + "issue": "Missing or insufficient alternatives documentation", + "suggestion": ( + "Document what alternatives you considered and WHY you rejected each one. " + "This is CRITICAL for policy extraction - rejected alternatives often become " + "'disallow' policies.\n\n" + "Structure:\n" + "### Alternative Name\n" + "**Rejected**: Specific reason for rejection\n" + "- Pros: ...\n" + "- Cons: ...\n" + "- Why not: ..." + ), + "example": ( + "### Flask\n" + "**Rejected**: Lacks native async support.\n" + "- Pros: Lightweight, huge ecosystem\n" + "- Cons: No native async, requires Quart\n" + "- Why not: Async support is critical for our use case" + ), + "why_it_matters": ( + "Alternatives with clear rejection reasons enable extraction of 'disallow' policies. " + "Example: 'Rejected Flask' becomes {'imports': {'disallow': ['flask']}}" + ), + } + ) + score -= 15 + else: + strengths.append( + "Alternatives documented with clear rejection reasons (enables 'disallow' policies)" + ) + + # Check 6: Decision length (too short is usually vague) + decision_length = len(adr.decision.strip()) + if decision_length < 30: + issues.append( + { + "category": "completeness", + "severity": "medium", + "issue": "Decision section is very brief (less than 30 characters)", + "suggestion": ( + "Decision should clearly state:\n" + "1. What technology/pattern/approach is chosen\n" + "2. Scope of applicability ('All new services', 'Frontend only')\n" + "3. Explicit constraints ('Don't use X', 'Must have Y')\n" + "4. Migration path if replacing existing technology" + ), + } + ) + score -= 10 + + # Determine overall quality grade + if score >= 90: + grade = "A" + summary = "Excellent ADR - ready for policy extraction" + elif score >= 75: + grade = "B" + summary = "Good ADR - minor improvements would help" + elif score >= 60: + grade = "C" + summary = "Acceptable ADR - several areas need improvement" + elif score >= 40: + grade = "D" + summary = ( + "Weak ADR - significant improvements needed before policy extraction" + ) + else: + grade = "F" + summary = "Poor ADR - needs major revision" + + return { + "quality_score": score, + "grade": grade, + "summary": summary, + "issues": issues, + "strengths": strengths, + "recommendations": self._generate_quality_recommendations(issues), + "next_steps": self._generate_quality_next_steps(issues, score), + } + + def _generate_quality_recommendations( + self, issues: list[dict[str, Any]] + ) -> list[str]: + """Generate prioritized recommendations based on quality issues. + + Args: + issues: List of quality issues found + + Returns: + Prioritized list of actionable recommendations + """ + if not issues: + return [ + "✅ Your ADR meets quality standards", + "Consider reviewing the policy_guidance to add automated enforcement", + ] + + recommendations = [] + + # Prioritize by severity + high_severity = [issue for issue in issues if issue["severity"] == "high"] + medium_severity = [issue for issue in issues if issue["severity"] == "medium"] + + if high_severity: + recommendations.append( + f"🔴 High Priority: Address {len(high_severity)} critical quality issue(s):" + ) + for issue in high_severity: + recommendations.append(f" - {issue['issue']}") + recommendations.append(f" → {issue['suggestion']}") + + if medium_severity: + recommendations.append( + f"🟡 Medium Priority: Improve {len(medium_severity)} quality aspect(s):" + ) + for issue in medium_severity: + recommendations.append(f" - {issue['issue']}") + + return recommendations + + def _generate_quality_next_steps( + self, issues: list[dict[str, Any]], score: int + ) -> list[str]: + """Generate next steps based on quality assessment. + + Args: + issues: List of quality issues found + score: Overall quality score + + Returns: + List of recommended next steps + """ + if score >= 80: + # High quality - ready to proceed + return [ + "Your ADR is high quality and ready for review", + "Review the policy_guidance to add automated enforcement policies", + "Use adr_approve() after human review to activate the decision", + ] + elif score >= 60: + # Acceptable but could improve + return [ + "ADR is acceptable but could be strengthened", + "Consider addressing the quality issues listed above", + "You can proceed with approval or revise for better policy extraction", + ] + else: + # Needs significant improvement + return [ + "⚠️ ADR quality is below recommended threshold", + "Strongly recommend revising before approval:", + " 1. Address high-priority issues (context, balance, specificity)", + " 2. Add alternatives with rejection reasons (enables policy extraction)", + " 3. Use explicit constraint language ('Don't use', 'Must have')", + "After revision, create a new ADR with improved content", + ] diff --git a/adr_kit/workflows/decision_guidance.py b/adr_kit/workflows/decision_guidance.py new file mode 100644 index 0000000..0135691 --- /dev/null +++ b/adr_kit/workflows/decision_guidance.py @@ -0,0 +1,491 @@ +"""Decision Quality Guidance - Promptlets for high-quality ADR creation. + +This module provides comprehensive guidance for agents writing architectural decisions. +It follows the "ADR Kit provides structure, agents provide intelligence" principle by +offering focused promptlets that guide reasoning without prescribing exact outputs. +""" + +from typing import Any + + +def build_decision_guidance( + include_examples: bool = True, focus_area: str | None = None +) -> dict[str, Any]: + """Build comprehensive decision quality guidance promptlet for agents. + + This is Task 1 of the two-step ADR creation flow: + - Task 1 (this module): Guide agents to write high-quality decision content + - Task 2 (creation.py): Extract enforceable policies from the decision + + The guidance follows the reasoning-agent promptlet architecture pattern, + providing structure and letting the agent's intelligence fill in the details. + + Args: + include_examples: Whether to include good vs bad ADR examples + focus_area: Optional focus area for tailored examples (e.g., 'database', 'frontend') + + Returns: + Comprehensive promptlet with ADR structure, quality criteria, examples, and guidance + """ + guidance = { + "agent_task": { + "role": "Architectural Decision Documenter", + "objective": ( + "Document a significant technical decision with clarity, completeness, " + "and sufficient detail to enable automated policy extraction and future reasoning." + ), + "reasoning_steps": [ + "1. Understand the PROBLEM or OPPORTUNITY that prompted this decision (Context)", + "2. State the DECISION explicitly - what specific technology/pattern/approach are you choosing?", + "3. Analyze CONSEQUENCES - document both positive outcomes AND negative trade-offs", + "4. Document ALTERNATIVES - what did you consider and why did you reject each option?", + "5. Identify DECIDERS - who made or approved this choice?", + "6. Extract CONSTRAINTS - what enforceable rules emerge from this decision?", + ], + "focus": ( + "Create a decision document that is specific, actionable, complete, and " + "policy-extraction-ready. Good Task 1 output makes Task 2 (policy extraction) trivial." + ), + }, + "adr_structure": { + "overview": ( + "ADRs follow MADR (Markdown Architectural Decision Records) format with " + "four main sections. Each serves a distinct purpose in documenting architectural reasoning." + ), + "sections": { + "context": { + "purpose": "WHY this decision is needed - the problem or opportunity", + "required": True, + "what_to_include": [ + "The problem statement or opportunity being addressed", + "Current state and why it's insufficient", + "Requirements that must be met", + "Constraints or limitations to consider", + "Business or technical drivers", + ], + "what_to_avoid": [ + "Describing the solution (that's the Decision section)", + "Being too vague ('We need a database')", + "Skipping the 'why' - context must explain the need", + ], + "quality_bar": "After reading Context, someone should understand the problem without reading the Decision.", + }, + "decision": { + "purpose": "WHAT you're choosing - the specific technology, pattern, or approach", + "required": True, + "what_to_include": [ + "Explicit statement of what is being chosen", + "Specific technology names and versions if relevant", + "Explicit constraints ('Don't use X', 'Must have Y')", + "Scope of applicability ('All new services', 'Frontend only')", + ], + "what_to_avoid": [ + "Being generic ('Use a modern framework' → 'Use React 18')", + "Ambiguity about scope ('sometimes', 'maybe', 'consider')", + "Missing explicit constraints (makes policy extraction harder)", + ], + "quality_bar": "After reading Decision, it should be crystal clear what technology/approach was chosen and what's forbidden.", + }, + "consequences": { + "purpose": "Trade-offs - both POSITIVE and NEGATIVE outcomes of this decision", + "required": True, + "what_to_include": [ + "Positive consequences (benefits, improvements)", + "Negative consequences (drawbacks, limitations)", + "Risks and how they'll be mitigated", + "Impact on team, operations, or future flexibility", + "Known pitfalls or gotchas (AI-centric warnings)", + ], + "what_to_avoid": [ + "Only listing benefits (every decision has trade-offs)", + "Generic statements ('It will work well')", + "Hiding or minimizing negative consequences", + ], + "quality_bar": "Consequences should list both pros AND cons. If you see only positives, something's missing.", + "structure_tip": "Use subsections: ### Positive, ### Negative, ### Risks, ### Mitigation", + }, + "alternatives": { + "purpose": "What ELSE did you consider and WHY did you reject each option?", + "required": False, + "importance": "CRITICAL for policy extraction - rejected alternatives often become 'disallow' policies", + "what_to_include": [ + "Each alternative considered", + "Pros and cons of each", + "Specific reason for rejection", + "Under what conditions you might reconsider", + ], + "what_to_avoid": [ + "Saying 'We considered other options' without naming them", + "Not explaining WHY each was rejected", + "Unfairly dismissing alternatives", + ], + "quality_bar": "Each alternative should have a clear rejection reason that could become a policy.", + "example_structure": "### Flask\n**Rejected**: Lacks native async support.\n- Pros: ...\n- Cons: ...\n- Why not: ...", + }, + }, + }, + "quality_criteria": { + "specific": { + "description": "Use exact technology names, not generic categories", + "good": "Use PostgreSQL 15 as the primary database", + "bad": "Use a SQL database", + "why_it_matters": "Specific decisions enable precise policy extraction and clear implementation guidance", + }, + "actionable": { + "description": "Team can implement this decision immediately", + "good": "Use FastAPI for all new backend services. Migrate existing Flask services opportunistically.", + "bad": "Consider using FastAPI at some point", + "why_it_matters": "Vague decisions lead to inconsistent implementation and drift", + }, + "complete": { + "description": "All required fields filled with meaningful content", + "good": "Context explains the problem, Decision states the choice, Consequences list pros AND cons, Alternatives show what was rejected", + "bad": "Context: 'We need this.' Decision: 'Use X.' Consequences: 'It's good.'", + "why_it_matters": "Incomplete ADRs don't provide enough information for future reasoning or policy extraction", + }, + "policy_ready": { + "description": "Constraints are stated explicitly for automated extraction", + "good": "Use FastAPI. **Don't use Flask** or Django due to lack of native async support.", + "bad": "FastAPI is preferred in most cases", + "why_it_matters": "Explicit constraints ('Don't use X', 'Must have Y') enable Task 2 to extract enforceable policies", + }, + "balanced": { + "description": "Documents both benefits AND drawbacks honestly", + "good": "+ Native async support, + Auto docs, - Smaller ecosystem, - Team learning curve", + "bad": "FastAPI is perfect for everything", + "why_it_matters": "Unbalanced ADRs don't help future decision-makers understand when to reconsider", + }, + }, + "anti_patterns": { + "too_vague": { + "bad": "Use a modern web framework", + "good": "Use React 18 with TypeScript for frontend development", + "fix": "Replace generic categories with specific technology names and versions", + }, + "no_trade_offs": { + "bad": "PostgreSQL is the best database. It has ACID compliance and great performance.", + "good": "+ ACID compliance, + Great performance, + Rich features, - Higher resource usage than SQLite, - Requires operational expertise", + "fix": "Always list both positive AND negative consequences. Every decision has trade-offs.", + }, + "missing_context": { + "bad": "Decision: Use PostgreSQL", + "good": "Context: We need ACID transactions for financial data integrity and support for concurrent writes. Decision: Use PostgreSQL.", + "fix": "Explain WHY before stating WHAT. Context must justify the decision.", + }, + "no_alternatives": { + "bad": "(No alternatives section)", + "good": "### MySQL\nRejected: Weaker JSON support and extensibility vs PostgreSQL.\n### MongoDB\nRejected: Our data is highly relational, ACID compliance is critical.", + "fix": "Document what else you considered and specific reasons for rejection. This enables 'disallow' policy extraction.", + }, + "weak_constraints": { + "bad": "FastAPI is recommended for new services", + "good": "Use FastAPI for all new services. **Don't use Flask** or Django for new development.", + "fix": "Use explicit constraint language: 'Don't use', 'Must have', 'All X must Y'. This enables automated policy extraction.", + }, + }, + "example_workflow": { + "description": "How Task 1 (decision quality) enables Task 2 (policy extraction)", + "scenario": "Team needs to choose a web framework for a new API service", + "bad_adr": { + "title": "Use a web framework", + "context": "We need a framework for the API", + "decision": "Use a modern framework with good performance", + "consequences": "It will work well for our needs", + "alternatives": None, + "why_bad": [ + "Too vague - 'modern framework' could mean anything", + "No specific technology named", + "Consequences are generic platitudes", + "No alternatives documented", + "No explicit constraints for policy extraction", + ], + "task_2_result": "❌ Cannot extract any policies - no specific constraints stated", + }, + "good_adr": { + "title": "Use FastAPI for API Service", + "context": ( + "New API service requires async I/O for handling 1000+ concurrent connections. " + "Need automatic OpenAPI documentation for external partners. Team has Python experience." + ), + "decision": ( + "Use **FastAPI** as the web framework for all new backend API services. " + "**Don't use Flask or Django** for new services - they lack native async support. " + "Existing Flask services can be migrated opportunistically." + ), + "consequences": ( + "### Positive\n" + "- Native async/await support enables 10x higher concurrent connections\n" + "- Automatic OpenAPI/Swagger documentation reduces API maintenance burden\n" + "- Strong typing with Pydantic catches errors at API boundaries\n" + "- Modern Python features (3.10+) and excellent IDE support\n\n" + "### Negative\n" + "- Smaller plugin ecosystem compared to Django/Flask\n" + "- Team needs training on async/await patterns\n" + "- Async code can be harder to debug than synchronous code\n\n" + "### Risks\n" + "- Team unfamiliarity with async Python could cause subtle bugs\n\n" + "### Mitigation\n" + "- Provide async Python training (scheduled Q1 2026)\n" + "- Create internal FastAPI template with best practices" + ), + "alternatives": ( + "### Flask\n" + "**Rejected**: Lacks native async support.\n" + "- Pros: Lightweight, huge ecosystem, team familiarity\n" + "- Cons: No native async (requires Quart/ASGI), manual validation\n" + "- Why not: Async support is bolt-on, not native\n\n" + "### Django\n" + "**Rejected**: Too heavyweight for API-only services.\n" + "- Pros: Mature, batteries-included, excellent admin\n" + "- Cons: Synchronous by default, opinionated structure\n" + "- Why not: Don't need ORM or admin for API-only service" + ), + "why_good": [ + "Specific technology named (FastAPI)", + "Context explains requirements (async I/O, API docs)", + "Decision includes explicit constraints ('Don't use Flask or Django')", + "Consequences balanced (pros AND cons, risks AND mitigation)", + "Alternatives documented with clear rejection reasons", + "Policy-extraction-ready language", + ], + "task_2_result": ( + "✅ Can extract clear policies:\n" + "{'imports': {'disallow': ['flask', 'django'], 'prefer': ['fastapi']}, " + "'rationales': ['Native async support required', 'Automatic API documentation reduces maintenance']}" + ), + }, + "key_insight": ( + "Good Task 1 output (clear constraints + rejected alternatives) " + "makes Task 2 (policy extraction) trivial. The agent can directly " + "map 'Don't use Flask' to {'imports': {'disallow': ['flask']}}." + ), + }, + "connection_to_task_2": { + "overview": ( + "Task 1 (Decision Quality) and Task 2 (Policy Construction) work together. " + "The quality of your decision content directly impacts how easily policies can be extracted." + ), + "how_task_1_enables_task_2": [ + { + "decision_pattern": "Use FastAPI. Don't use Flask or Django.", + "extracted_policy": "{'imports': {'disallow': ['flask', 'django'], 'prefer': ['fastapi']}}", + "principle": "Explicit 'Don't use X' statements become 'disallow' policies", + }, + { + "decision_pattern": "All FastAPI handlers must be async functions", + "extracted_policy": "{'patterns': {'async_handlers': {'rule': 'async def', 'severity': 'error'}}}", + "principle": "'All X must be Y' statements become pattern policies", + }, + { + "decision_pattern": "Frontend must not access database directly", + "extracted_policy": "{'architecture': {'layer_boundaries': [{'rule': 'frontend -> database', 'action': 'block'}]}}", + "principle": "'X must not access Y' becomes architecture boundary", + }, + { + "decision_pattern": "TypeScript strict mode required for all frontend code", + "extracted_policy": "{'config_enforcement': {'typescript': {'tsconfig': {'strict': True}}}}", + "principle": "Config requirements become config enforcement policies", + }, + ], + "best_practices": [ + "Use explicit constraint language: 'Don't use', 'Must have', 'All X must Y'", + "Document alternatives with clear rejection reasons (enables 'disallow' extraction)", + "Be specific about technology names (not 'a modern framework', but 'React 18')", + "State scope clearly ('All new services', 'Frontend only')", + ], + }, + "dos_and_donts": { + "dos": [ + "✅ Use specific technology names and versions", + "✅ Document both positive AND negative consequences", + "✅ Explain WHY in Context before stating WHAT in Decision", + "✅ List alternatives with clear rejection reasons", + "✅ Use explicit constraint language ('Don't use', 'Must have')", + "✅ Include risks and mitigation strategies", + "✅ State scope of applicability clearly", + "✅ Identify who made the decision (deciders)", + ], + "donts": [ + "❌ Don't be vague or generic ('Use a modern framework')", + "❌ Don't only list benefits - every decision has trade-offs", + "❌ Don't skip Context - explain the problem first", + "❌ Don't forget Alternatives - they become 'disallow' policies", + "❌ Don't use weak language ('consider', 'maybe', 'sometimes')", + "❌ Don't hide negative consequences or risks", + "❌ Don't make decisions sound perfect - honest trade-offs matter", + ], + }, + "next_steps": [ + "1. Follow this guidance to draft your ADR content", + "2. Use adr_create() with your title, context, decision, consequences, and alternatives", + "3. Review the policy_guidance in the response to construct enforcement policies (Task 2)", + "4. Call adr_create() again with the policy parameter if you want automated enforcement", + ], + } + + # Add examples if requested + if include_examples: + guidance["examples"] = _build_examples(focus_area) + + return guidance + + +def _build_examples(focus_area: str | None = None) -> dict[str, Any]: + """Build good vs bad ADR examples. + + Args: + focus_area: Optional focus to tailor examples (e.g., 'database', 'frontend') + + Returns: + Dictionary with categorized examples + """ + examples = { + "database": { + "good": { + "title": "Use PostgreSQL for Primary Database", + "context": ( + "Application requires ACID transactions for financial data integrity. " + "Need support for complex queries with joins, concurrent writes from multiple services, " + "and JSON document storage for flexible user metadata. Team has SQL experience." + ), + "decision": ( + "Use **PostgreSQL 15** as the primary database for all application data. " + "**Don't use MySQL** (weaker JSON support) or **MongoDB** (eventual consistency conflicts with financial requirements). " + "Deploy on AWS RDS with Multi-AZ for high availability." + ), + "consequences": ( + "### Positive\n" + "- ACID compliance guarantees data consistency for transactions\n" + "- Rich feature set: JSON, full-text search, advanced indexing\n" + "- Excellent query planner handles complex joins efficiently\n" + "- Mature tooling and ecosystem\n\n" + "### Negative\n" + "- Higher resource usage (memory/CPU) than simpler databases\n" + "- Requires operational expertise for tuning and maintenance\n" + "- Vertical scaling limits (single-server architecture)\n\n" + "### Risks & Mitigation\n" + "- Risk: Poor indexing causes performance issues at scale\n" + "- Mitigation: Use connection pooling (PgBouncer), monitor with pg_stat_statements" + ), + "alternatives": ( + "### MySQL\n" + "**Rejected**: Weaker JSON support and extensibility compared to PostgreSQL.\n\n" + "### MongoDB\n" + "**Rejected**: Eventual consistency model conflicts with financial transaction requirements. " + "ACID transactions added in 4.0 but less mature than PostgreSQL." + ), + }, + "bad": { + "title": "Use a Database", + "context": "We need to store data", + "decision": "Use PostgreSQL", + "consequences": "PostgreSQL is good for data storage", + "alternatives": None, + }, + }, + "frontend": { + "good": { + "title": "Use React 18 with TypeScript for Frontend", + "context": ( + "Building complex interactive dashboard with real-time data updates. " + "Need component reusability, strong typing to catch errors early, and excellent developer tooling. " + "Team has JavaScript experience but new to TypeScript." + ), + "decision": ( + "Use **React 18** with **TypeScript** for all frontend development. " + "**Don't use Vue or Angular** - smaller ecosystems and steeper learning curves for our use case. " + "All new components must be written in TypeScript with strict mode enabled." + ), + "consequences": ( + "### Positive\n" + "- Huge ecosystem of components and libraries\n" + "- TypeScript catches errors at compile time, reducing runtime bugs\n" + "- Concurrent features in React 18 improve perceived performance\n" + "- Excellent IDE support and developer experience\n\n" + "### Negative\n" + "- TypeScript learning curve for team\n" + "- More boilerplate than plain JavaScript\n" + "- React hooks mental model takes time to master\n\n" + "### Risks & Mitigation\n" + "- Risk: Team struggles with TypeScript\n" + "- Mitigation: 2-week TypeScript training, pair programming on first components" + ), + "alternatives": ( + "### Vue 3\n" + "**Rejected**: Smaller ecosystem, less corporate backing than React.\n\n" + "### Angular\n" + "**Rejected**: Steep learning curve, very opinionated, our team has React experience not Angular." + ), + }, + "bad": { + "title": "Use a Frontend Framework", + "context": "We need to build a UI", + "decision": "Use React because it's popular", + "consequences": "React will work well", + "alternatives": None, + }, + }, + "generic": { + "good": { + "title": "Use FastAPI for Backend API Services", + "context": ( + "Building API service for mobile app with 1000+ concurrent users. " + "Need automatic API documentation for mobile team, async I/O for performance, " + "and strong typing for reliability. Team knows Python." + ), + "decision": ( + "Use **FastAPI** for all new backend API services. " + "**Don't use Flask** (no native async) or **Django** (too heavyweight for API-only). " + "Existing Flask services can migrate opportunistically." + ), + "consequences": ( + "### Positive\n" + "- Native async/await for 10x better concurrent performance\n" + "- Automatic OpenAPI docs reduce coordination overhead with mobile team\n" + "- Pydantic validation catches errors at API boundaries\n\n" + "### Negative\n" + "- Smaller ecosystem than Flask/Django\n" + "- Team needs async Python training\n" + "- Debugging async code is harder\n\n" + "### Mitigation\n" + "- Async Python training scheduled Q1 2026\n" + "- Internal template with best practices" + ), + "alternatives": ( + "### Flask\n" + "**Rejected**: No native async support, would require Quart/ASGI.\n\n" + "### Django\n" + "**Rejected**: Too heavyweight for API-only service, don't need ORM/admin." + ), + }, + "bad": { + "title": "Use Python Web Framework", + "context": "Need backend framework", + "decision": "Use FastAPI", + "consequences": "FastAPI is fast and modern", + "alternatives": None, + }, + }, + } + + # Return focused examples if specified + if focus_area and focus_area in examples: + return { + "focus": focus_area, + "good_example": examples[focus_area]["good"], + "bad_example": examples[focus_area]["bad"], + "comparison": ( + "Notice how the good example is specific, documents trade-offs, " + "includes alternatives with rejection reasons, and uses explicit constraint language." + ), + } + + # Return all examples + return { + "by_category": examples, + "comparison": ( + "Good examples are specific, document both pros and cons, explain context thoroughly, " + "list alternatives with clear rejection reasons, and use explicit constraint language. " + "Bad examples are vague, incomplete, and don't provide enough information for policy extraction." + ), + } diff --git a/tests/integration/test_comprehensive_scenarios.py b/tests/integration/test_comprehensive_scenarios.py index b8b2e2d..f1c2dda 100644 --- a/tests/integration/test_comprehensive_scenarios.py +++ b/tests/integration/test_comprehensive_scenarios.py @@ -76,6 +76,7 @@ def test_disk_full_simulation(self, temp_adr_dir): context="Testing disk full scenario", decision="Test decision", consequences="Test consequences", + skip_quality_gate=True, # Skip quality gate to test disk full error ) # Mock write operation to raise disk full error @@ -114,6 +115,7 @@ def test_malformed_input_data(self, temp_adr_dir): decision="Test decision", consequences="Test consequences", policy="invalid_policy_format", # Should be dict, not string + skip_quality_gate=True, # Skip quality gate to test malformed policy handling ) result = workflow.execute(input_data=malformed_input) @@ -226,6 +228,7 @@ def test_unicode_and_encoding_handling(self, temp_adr_dir): context="Unicode context: العربية русский 日本語 emoji: 🎉🔥💯", decision="Decision with symbols: ±∞≠≤≥∑∫∆", consequences="Consequences: →←↑↓⟵⟶⟷", + skip_quality_gate=True, # Skip quality gate to test Unicode handling ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) diff --git a/tests/integration/test_decision_quality_assessment.py b/tests/integration/test_decision_quality_assessment.py new file mode 100644 index 0000000..9bce001 --- /dev/null +++ b/tests/integration/test_decision_quality_assessment.py @@ -0,0 +1,399 @@ +"""Integration tests for decision quality assessment in creation workflow.""" + +import tempfile +from pathlib import Path + +import pytest + +from adr_kit.workflows.creation import CreationInput, CreationWorkflow + + +class TestDecisionQualityAssessment: + """Test quality assessment feedback in ADR creation.""" + + @pytest.fixture + def temp_adr_dir(self): + """Create temporary ADR directory.""" + with tempfile.TemporaryDirectory() as tmpdir: + adr_dir = Path(tmpdir) / "docs" / "adr" + adr_dir.mkdir(parents=True) + yield str(adr_dir) + + def test_high_quality_adr_gets_good_score(self, temp_adr_dir): + """Test that high-quality ADR receives good score.""" + # High-quality ADR with all elements + input_data = CreationInput( + title="Use PostgreSQL 15 for Primary Database", + context=( + "We need ACID transactions for financial data integrity. " + "Current SQLite setup doesn't support concurrent writes from multiple services. " + "Requires complex queries with joins and JSON document storage for flexible user metadata." + ), + decision=( + "Use PostgreSQL 15 as the primary database for all application data. " + "Don't use MySQL (weaker JSON support) or MongoDB (eventual consistency conflicts with requirements). " + "Deploy on AWS RDS with Multi-AZ for high availability." + ), + consequences=( + "### Positive\n" + "- ACID compliance guarantees data consistency\n" + "- Rich feature set: JSON, full-text search\n" + "- Excellent query planner\n\n" + "### Negative\n" + "- Higher resource usage than simpler databases\n" + "- Requires operational expertise\n" + "- Vertical scaling limits" + ), + alternatives=( + "### MySQL\n" + "**Rejected**: Weaker JSON support.\n\n" + "### MongoDB\n" + "**Rejected**: Eventual consistency conflicts with financial requirements." + ), + deciders=["backend-team"], + tags=["database"], + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + assert result.success is True + assert "quality_feedback" in result.data + + feedback = result.data["quality_feedback"] + assert ( + feedback["quality_score"] >= 75 + ) # Should be good quality (B grade or higher) + assert feedback["grade"] in ["A", "B"] + assert len(feedback["strengths"]) > 0 + + def test_vague_adr_gets_specificity_issue(self, temp_adr_dir): + """Test that vague ADR is flagged for specificity.""" + input_data = CreationInput( + title="Use a Modern Framework", + context="We need a framework for the frontend", + decision="Use a modern framework with good performance", + consequences="The framework will work well for our needs", + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + # Should be blocked due to low quality + assert result.success is False + assert result.status.value == "requires_action" + assert "quality_feedback" in result.data + + feedback = result.data["quality_feedback"] + assert not feedback["passes_threshold"] + + # Should flag specificity issue + specificity_issues = [ + issue for issue in feedback["issues"] if issue["category"] == "specificity" + ] + assert len(specificity_issues) > 0 + + issue = specificity_issues[0] + assert "generic terms" in issue["issue"].lower() + assert "suggestion" in issue + assert "example_fix" in issue + + def test_one_sided_consequences_flagged(self, temp_adr_dir): + """Test that one-sided consequences (only pros) are flagged.""" + input_data = CreationInput( + title="Use React for Frontend", + context="We need a frontend framework for building interactive UIs", + decision="Use React 18 with TypeScript for all frontend development", + consequences=( + "React provides excellent performance and developer experience. " + "Large ecosystem and strong community support." + ), # Only positives, no negatives + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + # Should be blocked due to low quality (one-sided consequences) + assert result.success is False + assert result.status.value == "requires_action" + assert "quality_feedback" in result.data + + feedback = result.data["quality_feedback"] + assert not feedback["passes_threshold"] + + # Should flag balance issue + balance_issues = [ + issue for issue in feedback["issues"] if issue["category"] == "balance" + ] + assert len(balance_issues) > 0 + + issue = balance_issues[0] + assert "one-sided" in issue["issue"].lower() + assert issue["severity"] == "high" + + def test_weak_context_flagged(self, temp_adr_dir): + """Test that insufficient context is flagged.""" + input_data = CreationInput( + title="Use PostgreSQL", + context="We need a database", # Too brief + decision="Use PostgreSQL as the database", + consequences="PostgreSQL is reliable and feature-rich", + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + # Should be blocked due to low quality (weak context) + assert result.success is False + assert result.status.value == "requires_action" + assert "quality_feedback" in result.data + + feedback = result.data["quality_feedback"] + assert not feedback["passes_threshold"] + + # Should flag context issue + context_issues = [ + issue for issue in feedback["issues"] if issue["category"] == "context" + ] + assert len(context_issues) > 0 + + issue = context_issues[0] + assert "too brief" in issue["issue"].lower() + + def test_missing_constraints_flagged(self, temp_adr_dir): + """Test that lack of explicit constraints is flagged.""" + input_data = CreationInput( + title="Use FastAPI for Backend", + context="We need a Python web framework for our API service with async support", + decision="Use FastAPI for the backend API", # No explicit "don't use" constraints + consequences=( + "### Positive\n- Good async support\n\n" + "### Negative\n- Smaller ecosystem" + ), + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + # Should be blocked due to low quality (missing constraints) + assert result.success is False + assert result.status.value == "requires_action" + assert "quality_feedback" in result.data + + feedback = result.data["quality_feedback"] + assert not feedback["passes_threshold"] + + # Should flag policy readiness issue + policy_issues = [ + issue + for issue in feedback["issues"] + if issue["category"] == "policy_readiness" + ] + assert len(policy_issues) > 0 + + issue = policy_issues[0] + assert "explicit constraints" in issue["issue"].lower() + assert "Don't use" in issue["suggestion"] + + def test_missing_alternatives_flagged(self, temp_adr_dir): + """Test that missing alternatives are flagged.""" + input_data = CreationInput( + title="Use React for Frontend", + context="We need a modern frontend framework", + decision="Use React 18 with TypeScript. Don't use Vue or Angular.", + consequences=( + "### Positive\n- Large ecosystem\n\n" "### Negative\n- Learning curve" + ), + alternatives=None, # No alternatives provided + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + # Should be blocked due to low quality (missing alternatives) + assert result.success is False + assert result.status.value == "requires_action" + assert "quality_feedback" in result.data + + feedback = result.data["quality_feedback"] + assert not feedback["passes_threshold"] + + # Should flag alternatives issue + alternatives_issues = [ + issue for issue in feedback["issues"] if issue["category"] == "alternatives" + ] + assert len(alternatives_issues) > 0 + + issue = alternatives_issues[0] + assert "alternatives" in issue["issue"].lower() + assert "why_it_matters" in issue + assert "'disallow' policies" in issue["why_it_matters"] + + def test_quality_recommendations_prioritized(self, temp_adr_dir): + """Test that recommendations are prioritized by severity.""" + # Create ADR with multiple issues of different severities + input_data = CreationInput( + title="Use a framework", # Vague (medium) + context="Need framework", # Too brief (high) + decision="Use framework X", # Vague (medium) + consequences="Good performance", # One-sided (high) + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + feedback = result.data["quality_feedback"] + recommendations = feedback["recommendations"] + + # High priority issues should be mentioned first + assert len(recommendations) > 0 + first_rec = recommendations[0] + assert "High Priority" in first_rec or "critical" in first_rec.lower() + + def test_quality_next_steps_vary_by_score(self, temp_adr_dir): + """Test that next steps vary based on quality score.""" + # High quality ADR + good_input = CreationInput( + title="Use PostgreSQL 15 for Primary Database", + context=( + "We need ACID transactions for financial integrity and support for " + "complex queries with concurrent writes from multiple services." + ), + decision=( + "Use PostgreSQL 15 as primary database. " + "Don't use MySQL or MongoDB for production data." + ), + consequences=( + "### Positive\n- ACID compliance\n- Rich features\n\n" + "### Negative\n- Higher resource usage\n- Ops complexity" + ), + alternatives="### MySQL\n**Rejected**: Weaker JSON support", + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=good_input) + feedback = result.data["quality_feedback"] + + # Good quality (B grade) should suggest proceeding or improving + next_steps = feedback["next_steps"] + # Score around 75 means "acceptable but could improve" or "high quality" + assert len(next_steps) > 0 + # Should mention either approval or improvement + combined_text = " ".join(next_steps).lower() + assert ( + "approv" in combined_text + or "review" in combined_text + or "quality" in combined_text + ) + + def test_low_quality_suggests_revision(self, temp_adr_dir): + """Test that low quality ADR suggests revision.""" + # Poor quality ADR (but passes minimum validation) + poor_input = CreationInput( + title="Use tool X", + context="We need it for the project", # Meets 10 char minimum + decision="Use tool X", # Meets 5 char minimum + consequences="It is good", # Meets 5 char minimum + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=poor_input) + feedback = result.data["quality_feedback"] + + # Low quality should suggest revision or improvement + assert feedback["quality_score"] < 70 # Below "good" threshold + next_steps = feedback["next_steps"] + combined_text = " ".join(next_steps).lower() + assert any( + keyword in combined_text + for keyword in ["revis", "improv", "address", "strengthen", "quality"] + ) + + def test_quality_feedback_includes_all_fields(self, temp_adr_dir): + """Test that quality feedback has complete structure.""" + input_data = CreationInput( + title="Use FastAPI", + context="Need async API framework", + decision="Use FastAPI", + consequences="Good async support", + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + + feedback = result.data["quality_feedback"] + + # Check all required fields present + assert "quality_score" in feedback + assert "grade" in feedback + assert "summary" in feedback + assert "issues" in feedback + assert "strengths" in feedback + assert "recommendations" in feedback + assert "next_steps" in feedback + + # Score should be integer + assert isinstance(feedback["quality_score"], int) + assert 0 <= feedback["quality_score"] <= 100 + + # Grade should be A-F + assert feedback["grade"] in ["A", "B", "C", "D", "F"] + + def test_explicit_constraints_recognized_as_strength(self, temp_adr_dir): + """Test that explicit constraints are recognized as strength.""" + input_data = CreationInput( + title="Use FastAPI for Backend", + context="Need async API framework with automatic documentation for mobile team", + decision=( + "Use FastAPI for all new backend services. " + "Don't use Flask (no native async) or Django (too heavyweight). " + "All handlers must be async functions." + ), + consequences=( + "### Positive\n- Native async/await\n- Auto OpenAPI docs\n\n" + "### Negative\n- Smaller ecosystem\n- Team learning curve" + ), + alternatives=( + "### Flask\n**Rejected**: No native async.\n\n" + "### Django\n**Rejected**: Too heavyweight for API-only." + ), + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + feedback = result.data["quality_feedback"] + + # Should recognize explicit constraints as strength + assert any( + "explicit constraints" in strength.lower() + or "policy extraction" in strength.lower() + for strength in feedback["strengths"] + ) + + def test_balanced_consequences_recognized_as_strength(self, temp_adr_dir): + """Test that balanced consequences are recognized.""" + input_data = CreationInput( + title="Use PostgreSQL", + context="Need database with ACID compliance and JSON support", + decision="Use PostgreSQL 15. Don't use MySQL or MongoDB.", + consequences=( + "Benefits: ACID compliance for transactions, rich feature set. " + "Drawbacks: Higher resource usage, operational complexity. " + "Risk: Performance issues without proper indexing." + ), + alternatives="### MySQL\n**Rejected**: Weaker JSON support", + ) + + workflow = CreationWorkflow(adr_dir=temp_adr_dir) + result = workflow.execute(input_data=input_data) + feedback = result.data["quality_feedback"] + + # Should recognize balance (look for consequence-related strengths) + # The exact wording depends on the implementation, check that it's high quality + assert feedback["quality_score"] >= 75 # Good quality + assert len(feedback["strengths"]) > 0 # Has recognized strengths + # Check that consequences section was evaluated positively (not flagged for balance issue) + balance_issues = [ + issue for issue in feedback["issues"] if issue["category"] == "balance" + ] + assert len(balance_issues) == 0 # No balance issues = balanced recognized diff --git a/tests/integration/test_mcp_workflow_integration.py b/tests/integration/test_mcp_workflow_integration.py index 9f2f91f..adb86ae 100644 --- a/tests/integration/test_mcp_workflow_integration.py +++ b/tests/integration/test_mcp_workflow_integration.py @@ -153,6 +153,7 @@ def test_mcp_create_integration(self, temp_adr_dir): tags=request.tags, policy=request.policy, alternatives=request.alternatives, + skip_quality_gate=True, # Skip quality gate to test MCP integration ) result = workflow.execute(input_data=creation_input) @@ -177,6 +178,7 @@ def test_mcp_approve_integration(self, temp_adr_dir): context="Testing approval workflow", decision="Use test technology", consequences="Test consequences", + skip_quality_gate=True, # Skip quality gate to test approval workflow ) creation_result = creation_workflow.execute(input_data=creation_input) @@ -225,6 +227,7 @@ def test_mcp_supersede_integration(self, temp_adr_dir): context="Need database solution", decision="Use MySQL", consequences="Good performance", + skip_quality_gate=True, # Skip quality gate to test supersede workflow ) creation_result = creation_workflow.execute(input_data=original_input) @@ -434,6 +437,7 @@ def test_end_to_end_workflow_chain(self, temp_project_dir, temp_adr_dir): context="Analysis revealed React usage in project", decision="Standardize on React for all frontend development", consequences="Consistent frontend architecture, team training needed", + skip_quality_gate=True, # Skip quality gate to test workflow chain ) create_result = create_workflow.execute(input_data=create_input) diff --git a/tests/integration/test_workflow_creation.py b/tests/integration/test_workflow_creation.py index 6ebc616..9c602f4 100644 --- a/tests/integration/test_workflow_creation.py +++ b/tests/integration/test_workflow_creation.py @@ -25,15 +25,37 @@ def temp_adr_dir(self): @pytest.fixture def sample_creation_input(self): - """Create sample creation input.""" + """Create sample creation input (high quality to pass quality gate).""" return CreationInput( - title="Use PostgreSQL for primary database", - context="We need a reliable relational database for storing user data and application state.", - decision="Use PostgreSQL as our primary database management system.", - consequences="Better data integrity and ACID compliance, but requires more infrastructure setup than SQLite.", + title="Use PostgreSQL 15 for primary database", + context=( + "We need ACID transactions for financial data integrity. " + "Current SQLite setup doesn't support concurrent writes from multiple services. " + "Requires complex queries with joins and JSON document storage for flexible user metadata." + ), + decision=( + "Use PostgreSQL 15 as the primary database for all application data. " + "Don't use MySQL (weaker JSON support) or MongoDB (eventual consistency conflicts with requirements). " + "Deploy on AWS RDS with Multi-AZ for high availability." + ), + consequences=( + "### Positive\n" + "- ACID compliance guarantees data consistency\n" + "- Rich feature set: JSON, full-text search\n" + "- Excellent query planner\n\n" + "### Negative\n" + "- Higher resource usage than simpler databases\n" + "- Requires operational expertise\n" + "- Vertical scaling limits" + ), deciders=["backend-team", "tech-lead"], tags=["database", "backend", "infrastructure"], - alternatives="Considered MySQL, SQLite, and MongoDB as alternatives.", + alternatives=( + "### MySQL\n" + "**Rejected**: Weaker JSON support.\n\n" + "### MongoDB\n" + "**Rejected**: Eventual consistency conflicts with financial requirements." + ), ) @pytest.fixture @@ -146,6 +168,7 @@ def test_conflict_detection(self, temp_adr_dir, existing_adr): decision="Use MongoDB as our primary database.", consequences="Flexible schema but less ACID compliance.", tags=["database", "nosql"], + skip_quality_gate=True, # Skip quality gate to test conflict detection ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) @@ -182,6 +205,7 @@ def test_policy_integration(self, temp_adr_dir): "rules": [{"forbid": "utils -> components"}], }, }, + skip_quality_gate=True, # Skip quality gate to test policy integration ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) @@ -275,6 +299,7 @@ def test_very_long_title_handling(self, temp_adr_dir): context="Testing long title handling", decision="Use the long-named technology", consequences="Might cause file naming issues", + skip_quality_gate=True, # Skip quality gate to test long title handling ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) @@ -302,6 +327,7 @@ def test_special_characters_in_title(self, temp_adr_dir): context="Testing special character handling", decision="Use technologies with special characters in names", consequences="Must handle file naming properly", + skip_quality_gate=True, # Skip quality gate to test special characters ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) @@ -329,6 +355,7 @@ def test_semantic_similarity_detection(self, temp_adr_dir, existing_adr): context="We need a relational database for our data.", decision="Use MariaDB as our database solution.", consequences="Similar to MySQL with some improvements.", + skip_quality_gate=True, # Skip quality gate to test semantic similarity ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) @@ -374,6 +401,7 @@ def test_incremental_id_generation(self, temp_adr_dir, sample_creation_input): context="Need caching solution", decision="Use Redis for cache", consequences="Fast caching but additional infrastructure", + skip_quality_gate=True, # Skip quality gate to test ID generation ) result2 = workflow.execute(input_data=second_input) @@ -386,301 +414,64 @@ def test_incremental_id_generation(self, temp_adr_dir, sample_creation_input): assert Path(creation2.file_path).exists() -class TestPolicySuggestionLogic: - """Test comprehensive policy suggestion and guidance functionality.""" +class TestPolicyGuidancePromplet: + """Test policy guidance promptlet for reasoning agents. + + These tests verify that ADR Kit provides proper schema and reasoning + prompts to guide agents in constructing policies, rather than using + regex extraction (which is fragile and redundant for reasoning agents). + """ @pytest.fixture def temp_adr_dir(self): """Create temporary ADR directory.""" - with tempfile.TemporaryDirectory() as tmpdir: - adr_dir = Path(tmpdir) / "docs" / "adr" - adr_dir.mkdir(parents=True) - yield str(adr_dir) - def test_import_policy_suggestion_from_alternatives(self, temp_adr_dir): - """Test import policy suggestion from alternatives section.""" + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) + + def test_policy_guidance_structure_without_policy(self, temp_adr_dir): + """Test that policy guidance promptlet is provided when no policy given.""" input_data = CreationInput( title="Use FastAPI for backend", context="Need async web framework for better performance", decision="Use FastAPI as our backend framework", consequences="Better async support and auto-generated docs", - alternatives=( - "### Flask\n- Rejected: Lacks native async support\n\n" - "### Django\n- Rejected: Too heavyweight for our use case" - ), ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) result = workflow.execute(input_data=input_data) assert result.success is True - - # Check policy guidance was generated assert "policy_guidance" in result.data - policy_guidance = result.data["policy_guidance"] - - assert policy_guidance["detectable"] is True - assert policy_guidance["suggestion"] is not None - - # Should detect Flask and Django as disallowed - imports = policy_guidance["suggestion"].get("imports", {}) - disallow = imports.get("disallow", []) - assert "Flask" in disallow or "flask" in disallow - assert "Django" in disallow or "django" in disallow - - def test_import_policy_suggestion_from_decision_text(self, temp_adr_dir): - """Test import policy detection from decision text patterns.""" - input_data = CreationInput( - title="Deprecate jQuery", - context="Modern frontend needs modern tools", - decision=( - "Don't use jQuery anymore. Use vanilla JavaScript or React instead. " - "Prefer React over jQuery for new components." - ), - consequences="More maintainable code but migration effort required", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - assert policy_guidance["detectable"] is True - - suggestion = policy_guidance["suggestion"] - imports = suggestion.get("imports", {}) - - # Should detect jQuery as disallowed - disallow = imports.get("disallow", []) - assert any("jquery" in lib.lower() for lib in disallow) - - # Should detect React as preferred - prefer = imports.get("prefer", []) - assert any("react" in lib.lower() for lib in prefer) - - def test_pattern_policy_suggestion(self, temp_adr_dir): - """Test code pattern policy detection.""" - input_data = CreationInput( - title="Async handlers required", - context="Need better I/O performance", - decision=( - "All FastAPI handlers must be async. " - "Route handlers must have async def syntax for better concurrency." - ), - consequences="Better I/O performance with async/await", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - assert policy_guidance["detectable"] is True - - suggestion = policy_guidance["suggestion"] - - # Should detect pattern policies - assert "patterns" in suggestion - patterns = suggestion["patterns"] - assert "patterns" in patterns - assert len(patterns["patterns"]) > 0 - - # Check pattern structure - first_pattern = list(patterns["patterns"].values())[0] - assert "description" in first_pattern - assert "severity" in first_pattern - assert "rule" in first_pattern - - def test_architecture_boundary_suggestion(self, temp_adr_dir): - """Test architecture boundary policy detection.""" - input_data = CreationInput( - title="Layer boundaries", - context="Need clear architectural separation", - decision=( - "Frontend must not access database directly. " - "No direct access from frontend to database layer. " - "UI components cannot import backend modules." - ), - consequences="Better separation of concerns but requires API layer", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - suggestion = policy_guidance["suggestion"] - - # Should detect architecture policies - assert "architecture" in suggestion - arch = suggestion["architecture"] - assert "layer_boundaries" in arch - assert len(arch["layer_boundaries"]) > 0 - - # Check boundary structure - first_boundary = arch["layer_boundaries"][0] - assert "rule" in first_boundary - assert "action" in first_boundary - assert first_boundary["action"] in ["block", "warn"] - - def test_required_structure_suggestion(self, temp_adr_dir): - """Test required file structure detection.""" - input_data = CreationInput( - title="Required directory structure", - context="Need consistent project structure", - decision=( - "Required: src/models/*.py for all data models. " - "Must have tests/ directory for test files. " - "Projects must have docs/adr folder." - ), - consequences="Consistent structure across projects", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - suggestion = policy_guidance["suggestion"] - - # Should detect required structure - assert "architecture" in suggestion - arch = suggestion["architecture"] - assert "required_structure" in arch - assert len(arch["required_structure"]) > 0 - - def test_config_enforcement_typescript(self, temp_adr_dir): - """Test TypeScript config enforcement detection.""" - input_data = CreationInput( - title="TypeScript strict mode", - context="Need type safety across codebase", - decision="TypeScript strict mode required for all projects", - consequences="Better type safety but may require code updates", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - suggestion = policy_guidance["suggestion"] - - # Should detect config enforcement - assert "config_enforcement" in suggestion - config = suggestion["config_enforcement"] - assert "typescript" in config - assert "tsconfig" in config["typescript"] - - def test_config_enforcement_python(self, temp_adr_dir): - """Test Python config enforcement detection.""" - input_data = CreationInput( - title="Python tooling config", - context="Need consistent Python linting", - decision=( - "Ruff must check imports for all Python projects. " - "Mypy strict mode required." - ), - consequences="Better code quality but stricter checks", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - suggestion = policy_guidance["suggestion"] - # Should detect Python config - assert "config_enforcement" in suggestion - config = suggestion["config_enforcement"] - assert "python" in config - - def test_rationale_extraction(self, temp_adr_dir): - """Test extraction of rationales from decision text.""" - input_data = CreationInput( - title="Use CDN for assets", - context="Need better asset delivery", - decision=( - "Use CDN for all static assets. This is for performance and better " - "user experience. We need this to improve load times." - ), - consequences="Better performance but additional CDN costs", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - suggestion = policy_guidance["suggestion"] - - # Should extract rationales - assert "rationales" in suggestion - rationales = suggestion["rationales"] - assert len(rationales) > 0 - # Should capture performance-related rationale - assert any("performance" in r.lower() for r in rationales) - - def test_multiple_policy_types_combined(self, temp_adr_dir): - """Test detection of multiple policy types in single ADR.""" - input_data = CreationInput( - title="FastAPI with architecture boundaries", - context="Need modern backend with clear architecture", - decision=( - "Use FastAPI not Flask. All handlers must be async. " - "Frontend must not access database directly. " - "TypeScript strict mode required for frontend." - ), - consequences="Better architecture but more setup complexity", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - suggestion = policy_guidance["suggestion"] - - # Should detect multiple policy types - assert "imports" in suggestion - assert "patterns" in suggestion - assert "architecture" in suggestion - assert "config_enforcement" in suggestion - - def test_no_policy_detected(self, temp_adr_dir): - """Test guidance when no enforceable policies detected.""" - input_data = CreationInput( - title="General architecture discussion", - context="We discussed various options", - decision="We decided to think about this more", - consequences="More time for consideration", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - - # Should indicate no policy detected - assert policy_guidance["detectable"] is False - assert policy_guidance["suggestion"] is None - - # Should provide guidance on how to write enforceable policies - assert "guidance" in policy_guidance - assert len(policy_guidance["guidance"]) > 0 - - def test_policy_provided_no_suggestion_needed(self, temp_adr_dir): - """Test that no suggestion is made when policy already provided.""" + guidance = result.data["policy_guidance"] + + # Should indicate no policy provided + assert guidance["has_policy"] is False + + # Should provide agent task with reasoning steps + assert "agent_task" in guidance + assert "reasoning_steps" in guidance["agent_task"] + assert "objective" in guidance["agent_task"] + + # Should provide policy capabilities (schema) + assert "policy_capabilities" in guidance + capabilities = guidance["policy_capabilities"] + assert "imports" in capabilities + assert "patterns" in capabilities + assert "architecture" in capabilities + assert "config_enforcement" in capabilities + + # Should provide example workflow + assert "example_workflow" in guidance + example = guidance["example_workflow"] + assert "scenario" in example + assert "reasoning" in example + assert "constructed_policy" in example + + def test_policy_guidance_when_policy_provided(self, temp_adr_dir): + """Test that no guidance needed when policy already provided.""" input_data = CreationInput( title="Use FastAPI", context="Need async framework", @@ -697,62 +488,38 @@ def test_policy_provided_no_suggestion_needed(self, temp_adr_dir): assert result.success is True - policy_guidance = result.data["policy_guidance"] + guidance = result.data["policy_guidance"] # Should indicate policy already provided - assert policy_guidance["has_policy"] is True - assert policy_guidance["suggestion"] is None + assert guidance["has_policy"] is True + assert guidance["message"] == "✅ Structured policy provided and validated" - def test_policy_guidance_includes_example_usage(self, temp_adr_dir): - """Test that policy guidance includes example usage.""" + def test_policy_capabilities_schema_completeness(self, temp_adr_dir): + """Test that policy capabilities include all policy types.""" input_data = CreationInput( - title="Use React", - context="Need modern frontend", - decision="Don't use jQuery, prefer React instead", - consequences="Modern development practices", + title="Test decision", + context="Test context", + decision="Test decision text", + consequences="Test consequences", ) workflow = CreationWorkflow(adr_dir=temp_adr_dir) result = workflow.execute(input_data=input_data) - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - - # Should include example usage - assert "example_usage" in policy_guidance - example = policy_guidance["example_usage"] - - # Example should show how to call adr_create with policy - assert "adr_create" in example - assert "policy=" in example - - def test_policy_suggestion_json_format(self, temp_adr_dir): - """Test that policy suggestion includes formatted JSON.""" - input_data = CreationInput( - title="Use TypeScript", - context="Need type safety", - decision="Use TypeScript, don't use JavaScript", - consequences="Better type safety", - ) - - workflow = CreationWorkflow(adr_dir=temp_adr_dir) - result = workflow.execute(input_data=input_data) - - assert result.success is True - - policy_guidance = result.data["policy_guidance"] - - # Should include formatted JSON - assert "suggestion_json" in policy_guidance - json_str = policy_guidance["suggestion_json"] - - # Should be valid, formatted JSON - import json - - parsed = json.loads(json_str) - assert isinstance(parsed, dict) - - -if __name__ == "__main__": - pytest.main([__file__, "-v"]) + capabilities = result.data["policy_guidance"]["policy_capabilities"] + + # Verify all 4 policy types are documented + assert "imports" in capabilities + assert "patterns" in capabilities + assert "architecture" in capabilities + assert "config_enforcement" in capabilities + + # Each should have description and example + for policy_type in [ + "imports", + "patterns", + "architecture", + "config_enforcement", + ]: + assert "description" in capabilities[policy_type] + assert "example" in capabilities[policy_type] diff --git a/tests/mcp/test_mcp_integration.py b/tests/mcp/test_mcp_integration.py index e698f24..2cc25ba 100644 --- a/tests/mcp/test_mcp_integration.py +++ b/tests/mcp/test_mcp_integration.py @@ -191,6 +191,7 @@ async def test_create_basic_adr(self, temp_adr_dir): context="We need a reliable database for user data", decision="Use PostgreSQL as our primary database", consequences="Better data integrity, more complex setup", + skip_quality_gate=True, adr_dir=temp_adr_dir, ) @@ -223,6 +224,7 @@ async def test_create_adr_with_policy(self, temp_adr_dir): decision="Use React for all frontend development", consequences="Modern UI, learning curve", policy={"imports": {"prefer": ["react"], "disallow": ["vue", "angular"]}}, + skip_quality_gate=True, adr_dir=temp_adr_dir, ) @@ -258,6 +260,7 @@ async def test_approve_proposed_adr(self, temp_adr_dir): context="Need fast caching solution", decision="Use Redis for application caching", consequences="Better performance, additional infrastructure", + skip_quality_gate=True, adr_dir=temp_adr_dir, ) @@ -320,6 +323,7 @@ async def test_supersede_existing_adr(self, temp_adr_dir): context="Need relational database", decision="Use MySQL for data storage", consequences="Good performance, licensing concerns", + skip_quality_gate=True, adr_dir=temp_adr_dir, ) @@ -343,6 +347,7 @@ async def test_supersede_existing_adr(self, temp_adr_dir): new_decision="Migrate to PostgreSQL", new_consequences="Better licensing, migration effort", supersede_reason="MySQL licensing concerns", + skip_quality_gate=True, adr_dir=temp_adr_dir, ) @@ -438,6 +443,7 @@ async def test_complete_adr_workflow(self, temp_adr_dir, sample_project_dir): decision="Use React for all frontend components", consequences="Better user experience, steeper learning curve", tags=["frontend", "javascript"], + skip_quality_gate=True, adr_dir=temp_adr_dir, ) create_result = await client.call_tool( diff --git a/tests/unit/test_decision_guidance.py b/tests/unit/test_decision_guidance.py new file mode 100644 index 0000000..a25788e --- /dev/null +++ b/tests/unit/test_decision_guidance.py @@ -0,0 +1,226 @@ +"""Unit tests for decision quality guidance.""" + +from adr_kit.workflows.decision_guidance import _build_examples, build_decision_guidance + + +class TestDecisionGuidance: + """Test decision quality guidance generation.""" + + def test_build_decision_guidance_basic(self): + """Test basic decision guidance structure.""" + guidance = build_decision_guidance(include_examples=False) + + # Check top-level structure + assert "agent_task" in guidance + assert "adr_structure" in guidance + assert "quality_criteria" in guidance + assert "anti_patterns" in guidance + assert "example_workflow" in guidance + assert "connection_to_task_2" in guidance + assert "dos_and_donts" in guidance + assert "next_steps" in guidance + + def test_agent_task_structure(self): + """Test agent task definition.""" + guidance = build_decision_guidance(include_examples=False) + agent_task = guidance["agent_task"] + + assert agent_task["role"] == "Architectural Decision Documenter" + assert "objective" in agent_task + assert "reasoning_steps" in agent_task + assert len(agent_task["reasoning_steps"]) == 6 + assert "focus" in agent_task + + def test_adr_structure_sections(self): + """Test ADR structure guidance covers all sections.""" + guidance = build_decision_guidance(include_examples=False) + sections = guidance["adr_structure"]["sections"] + + # Check all required sections present + assert "context" in sections + assert "decision" in sections + assert "consequences" in sections + assert "alternatives" in sections + + # Check context section details + context = sections["context"] + assert ( + context["purpose"] + == "WHY this decision is needed - the problem or opportunity" + ) + assert context["required"] is True + assert "what_to_include" in context + assert "what_to_avoid" in context + assert "quality_bar" in context + + def test_quality_criteria_complete(self): + """Test quality criteria are comprehensive.""" + guidance = build_decision_guidance(include_examples=False) + criteria = guidance["quality_criteria"] + + # Check all quality dimensions + assert "specific" in criteria + assert "actionable" in criteria + assert "complete" in criteria + assert "policy_ready" in criteria + assert "balanced" in criteria + + # Each criterion should have structure + for criterion in criteria.values(): + assert "description" in criterion + assert "good" in criterion + assert "bad" in criterion + assert "why_it_matters" in criterion + + def test_anti_patterns_with_fixes(self): + """Test anti-patterns include fixes.""" + guidance = build_decision_guidance(include_examples=False) + anti_patterns = guidance["anti_patterns"] + + # Check key anti-patterns + assert "too_vague" in anti_patterns + assert "no_trade_offs" in anti_patterns + assert "missing_context" in anti_patterns + assert "no_alternatives" in anti_patterns + assert "weak_constraints" in anti_patterns + + # Each pattern should show bad → good → fix + for pattern in anti_patterns.values(): + assert "bad" in pattern + assert "good" in pattern + assert "fix" in pattern + + def test_example_workflow_shows_good_vs_bad(self): + """Test example workflow demonstrates quality difference.""" + guidance = build_decision_guidance(include_examples=False) + workflow = guidance["example_workflow"] + + assert "scenario" in workflow + assert "bad_adr" in workflow + assert "good_adr" in workflow + assert "key_insight" in workflow + + # Bad example should show problems + bad = workflow["bad_adr"] + assert "why_bad" in bad + assert "task_2_result" in bad + assert len(bad["why_bad"]) > 0 + + # Good example should show strengths + good = workflow["good_adr"] + assert "why_good" in good + assert "task_2_result" in good + assert len(good["why_good"]) > 0 + + def test_connection_to_task_2(self): + """Test guidance explains Task 1 → Task 2 connection.""" + guidance = build_decision_guidance(include_examples=False) + connection = guidance["connection_to_task_2"] + + assert "overview" in connection + assert "how_task_1_enables_task_2" in connection + assert "best_practices" in connection + + # Check policy mapping examples + examples = connection["how_task_1_enables_task_2"] + assert len(examples) >= 4 # imports, patterns, architecture, config + + for example in examples: + assert "decision_pattern" in example + assert "extracted_policy" in example + assert "principle" in example + + def test_dos_and_donts_lists(self): + """Test dos and don'ts are actionable.""" + guidance = build_decision_guidance(include_examples=False) + dos_donts = guidance["dos_and_donts"] + + assert "dos" in dos_donts + assert "donts" in dos_donts + + # Should have multiple items + assert len(dos_donts["dos"]) >= 6 + assert len(dos_donts["donts"]) >= 6 + + # Dos should start with ✅ + for item in dos_donts["dos"]: + assert item.startswith("✅") + + # Don'ts should start with ❌ + for item in dos_donts["donts"]: + assert item.startswith("❌") + + def test_examples_included_by_default(self): + """Test examples are included by default.""" + guidance = build_decision_guidance(include_examples=True) + assert "examples" in guidance + + def test_examples_excluded_when_disabled(self): + """Test examples can be excluded.""" + guidance = build_decision_guidance(include_examples=False) + assert "examples" not in guidance + + def test_examples_have_categories(self): + """Test examples cover different focus areas.""" + examples = _build_examples(focus_area=None) + + assert "by_category" in examples + assert "database" in examples["by_category"] + assert "frontend" in examples["by_category"] + assert "generic" in examples["by_category"] + + # Each category should have good and bad + for category in examples["by_category"].values(): + assert "good" in category + assert "bad" in category + + def test_focused_examples_database(self): + """Test focused examples for database.""" + examples = _build_examples(focus_area="database") + + assert "focus" in examples + assert examples["focus"] == "database" + assert "good_example" in examples + assert "bad_example" in examples + assert "comparison" in examples + + # Good example should be comprehensive + good = examples["good_example"] + assert "title" in good + assert "context" in good + assert "decision" in good + assert "consequences" in good + assert "alternatives" in good + + # Should be database-related + assert ( + "database" in good["title"].lower() or "postgresql" in good["title"].lower() + ) + + def test_focused_examples_frontend(self): + """Test focused examples for frontend.""" + examples = _build_examples(focus_area="frontend") + + assert examples["focus"] == "frontend" + good = examples["good_example"] + + # Should be frontend-related + assert ( + "frontend" in good["title"].lower() + or "react" in good["title"].lower() + or "vue" in good["title"].lower() + ) + + def test_next_steps_provided(self): + """Test guidance includes clear next steps.""" + guidance = build_decision_guidance(include_examples=False) + + assert "next_steps" in guidance + steps = guidance["next_steps"] + + # Should have multiple steps + assert len(steps) >= 3 + + # Steps should be numbered + assert any("1." in step for step in steps) + assert any("2." in step for step in steps) diff --git a/tests/unit/test_policy_validation.py b/tests/unit/test_policy_validation.py index 18c10a1..fe50884 100644 --- a/tests/unit/test_policy_validation.py +++ b/tests/unit/test_policy_validation.py @@ -24,6 +24,7 @@ def test_creation_without_policy_returns_warning(self): decision="Use FastAPI for backend API development.", consequences="Better performance and automatic documentation.", alternatives="Rejected Flask due to lack of native async support.", + skip_quality_gate=True, # Skip for test ) result = workflow.execute(input_data=input_data) @@ -55,6 +56,7 @@ def test_creation_with_structured_policy_no_warning(self): policy={ "imports": {"disallow": ["flask"], "prefer": ["fastapi"]}, }, + skip_quality_gate=True, # Skip for test ) result = workflow.execute(input_data=input_data) @@ -80,6 +82,7 @@ def test_creation_with_pattern_language_no_warning(self): context="Need async support", decision="Use FastAPI. **Don't use Flask** as it lacks native async support.", consequences="**Avoid** synchronous frameworks like Flask.", + skip_quality_gate=True, # Skip for test ) result = workflow.execute(input_data=input_data) @@ -176,8 +179,8 @@ def test_policy_extractor_without_policy(self): extractor = PolicyExtractor() assert extractor.has_extractable_policy(adr) is False - def test_policy_suggestion_from_alternatives(self): - """Should suggest policy based on alternatives text.""" + def test_policy_guidance_provided_when_no_policy(self): + """Should provide policy guidance when no policy provided.""" with tempfile.TemporaryDirectory() as tmpdir: workflow = CreationWorkflow(adr_dir=tmpdir) @@ -187,25 +190,23 @@ def test_policy_suggestion_from_alternatives(self): decision="Use FastAPI as the framework", consequences="Better performance", alternatives="Rejected Flask and Django", + skip_quality_gate=True, # Skip for test ) result = workflow.execute(input_data=input_data) assert result.success - creation_result = result.data["creation_result"] - # Should have suggestions - warnings = creation_result.validation_warnings - assert len(warnings) > 0 + # Should provide policy guidance + assert "policy_guidance" in result.data + guidance = result.data["policy_guidance"] - # Check if suggestion includes Flask - suggestion_text = " ".join(warnings) - assert ( - "suggested" in suggestion_text.lower() - and "policy" in suggestion_text.lower() - ) - # Flask should be in the suggested JSON structure - assert "flask" in suggestion_text.lower() + # Should indicate no policy provided + assert guidance["has_policy"] is False + + # Should provide reasoning prompts for agents + assert "agent_task" in guidance + assert "policy_capabilities" in guidance def test_validation_backward_compatible(self): """Validation should not break existing workflows.""" @@ -218,6 +219,7 @@ def test_validation_backward_compatible(self): context="Some context here", decision="Make this decision", consequences="Some consequences", + skip_quality_gate=True, # Skip for test ) result = workflow.execute(input_data=input_data) @@ -409,61 +411,5 @@ def test_has_extractable_policy_with_new_types(self): assert extractor.has_extractable_policy(adr_config) is True -class TestPolicySuggestion: - """Test policy suggestion helper method.""" - - def test_suggest_from_rejected_alternatives(self): - """Should extract rejected technologies from alternatives.""" - with tempfile.TemporaryDirectory() as tmpdir: - workflow = CreationWorkflow(adr_dir=tmpdir) - - decision = "Use FastAPI" - alternatives = "Rejected: Flask\nRejected: Django" - - suggested = workflow._suggest_policy_from_alternatives( - decision, alternatives - ) - - assert suggested is not None - assert "imports" in suggested - disallow = suggested["imports"].get("disallow") - # Disallow should be None or contain Flask - assert disallow is None or any("flask" in item.lower() for item in disallow) - - def test_suggest_from_use_statement(self): - """Should extract chosen technology from 'Use X' statement.""" - with tempfile.TemporaryDirectory() as tmpdir: - workflow = CreationWorkflow(adr_dir=tmpdir) - - decision = "Use FastAPI as our framework" - alternatives = "" - - suggested = workflow._suggest_policy_from_alternatives( - decision, alternatives - ) - - assert suggested is not None - assert "imports" in suggested - prefer = suggested["imports"].get("prefer") - assert prefer is not None - # Check if FastAPI is in prefer list (case-insensitive) - assert any("fastapi" in item.lower() for item in prefer) - - def test_suggest_no_policy_when_no_patterns(self): - """Should return None when no recognizable patterns.""" - with tempfile.TemporaryDirectory() as tmpdir: - workflow = CreationWorkflow(adr_dir=tmpdir) - - decision = "We decided this approach is better" - alternatives = "We considered other options" - - suggested = workflow._suggest_policy_from_alternatives( - decision, alternatives - ) - - # No clear technology names or rejected patterns - assert suggested is None or len(suggested) == 0 - - if __name__ == "__main__": pytest.main([__file__, "-v"])