Skip to content

Commit d04e02d

Browse files
authored
Enhance risk analysis features and documentation (#14)
- Updated the risk scoring system to include detailed per-file risk tiers (SAFE, REVIEW, TEST, CRITICAL) and narratives. - Introduced new slash commands `/pr-risk` and `/pr-fix` for AI-assisted PR reviews, providing actionable insights based on risk assessments. - Enhanced documentation to reflect changes in risk triage, scoring formulas, and command usage. - Added tests for risk triage formatting and evidence gating logic to ensure accurate risk reporting. This update improves the clarity and usability of the risk analysis tools within the project.
1 parent 012e658 commit d04e02d

33 files changed

Lines changed: 3915 additions & 41 deletions

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ make help # List all targets
5959
cmd/contextception/ CLI entrypoint
6060
cmd/gen-schema/ JSON schema generator
6161
internal/
62-
analyzer/ Core analysis engine (scoring, categorization, cycles)
62+
analyzer/ Core analysis engine (scoring, categorization, risk triage, cycles)
6363
change/ PR/branch diff analysis
6464
classify/ File role classification
6565
cli/ Command handlers (cobra subcommands)

CONTRIBUTING.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,16 +31,30 @@ make lint # Run golangci-lint
3131

3232
```
3333
cmd/contextception/ CLI entrypoint
34+
cmd/gen-schema/ JSON schema generator
3435
internal/
35-
analyzer/ Core analysis engine
36+
analyzer/ Core analysis engine (scoring, categorization, risk triage)
3637
change/ PR/branch diff analysis
37-
cli/ Command handlers
38-
config/ Configuration parsing
39-
db/ SQLite database layer
38+
classify/ File role classification
39+
cli/ Command handlers (cobra subcommands)
40+
config/ Configuration parsing (per-repo + global)
41+
db/ SQLite database layer (migrations, store, search)
4042
extractor/ Language-specific extractors (python, typescript, golang, java, rust)
41-
git/ Git history signals
42-
indexer/ Incremental indexing
43+
git/ Git history signal extraction
44+
grader/ Internal quality evaluation framework
45+
history/ Historical analysis, usage tracking, and feedback storage
46+
indexer/ Incremental indexing pipeline
47+
mcpserver/ MCP server (tools, stdio transport)
48+
model/ Shared data types
4349
resolver/ Module resolution (per-language)
50+
session/ Claude Code session parser (discover, adoption)
51+
update/ Version check, self-update, install method detection
52+
validation/ Fixture-based validation framework
53+
version/ Version injection (set via ldflags)
54+
protocol/ JSON Schema specifications
55+
schema/ Go types for schema generation
56+
integrations/ MCP config examples and slash commands
57+
testdata/ Test fixtures (synthetic repos + expected outputs)
4458
```
4559

4660
## Adding a New Language

README.md

Lines changed: 63 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -172,13 +172,47 @@ $ contextception analyze-change origin/main
172172

173173
Returns a PR-level impact report with:
174174

175+
- **Per-file risk scoring:** every changed file gets a risk score (0–100) and tier (SAFE/REVIEW/TEST/CRITICAL)
176+
- **Risk triage:** files grouped by tier with human-readable risk narratives explaining why each file is flagged
175177
- **Test gaps:** changed files with no test coverage, flagged before merge
178+
- **Test suggestions:** auto-generated recommendations for high-risk untested files (which test file to create, what to test)
176179
- **Coupling detection:** pairs of changed files that depend on each other
177180
- **Hidden coupling:** co-change partners *not in your diff* that may need updating
178-
- **Per-file blast radius:** which specific changes carry the most risk
181+
- **Aggregate risk:** overall PR risk score with percentile ranking against historical baselines (after 10+ analyses)
179182
- **Aggregated must_read:** merged context across all changed files
180183

181-
Use `--ci --fail-on high` to gate PRs automatically. Results are stored in a local history database, enabling trend tracking with `contextception history`:
184+
### Risk Tiers
185+
186+
| Tier | Score | Meaning |
187+
|------|-------|---------|
188+
| **SAFE** | 0–20 | New files, well-tested utilities, low coupling |
189+
| **REVIEW** | 21–50 | Moderate risk, standard code review sufficient |
190+
| **TEST** | 51–75 | High risk, targeted testing recommended |
191+
| **CRITICAL** | 76–100 | Maximum risk, regressions likely without careful review |
192+
193+
Risk scores combine change status, structural factors (importer count, co-change frequency, fragility, mutual dependencies), and test coverage adjustments.
194+
195+
### Token-optimized output
196+
197+
Use `--compact` for a text summary optimized for LLM consumption (~60–75% fewer tokens than JSON):
198+
199+
```bash
200+
$ contextception analyze-change --compact
201+
```
202+
203+
### CI integration
204+
205+
Use `--ci --fail-on` to gate PRs automatically. A risk badge is printed to stderr:
206+
207+
```bash
208+
# Fail on high blast radius
209+
contextception analyze-change --ci --fail-on high
210+
211+
# Fail only if risk triage has CRITICAL files
212+
contextception analyze-change --ci --fail-on critical
213+
```
214+
215+
Results are stored in a local history database, enabling trend tracking with `contextception history`:
182216

183217
```bash
184218
$ contextception history hotspots # Files that repeatedly appear as hotspots
@@ -249,7 +283,20 @@ Contextception averages ~1,000 tokens per analysis vs. Repomix's full-repo outpu
249283

250284
## MCP Setup (30 seconds)
251285

252-
Make your AI agent smarter. Add to your `~/.claude.json` (Claude Code) or equivalent MCP config:
286+
Make your AI agent smarter. The `setup` command auto-detects your editor and configures everything:
287+
288+
```bash
289+
# Claude Code (MCP server + hooks + slash commands)
290+
contextception setup
291+
292+
# Cursor or Windsurf
293+
contextception setup --editor cursor
294+
contextception setup --editor windsurf
295+
```
296+
297+
Use `--dry-run` to preview changes, or `--uninstall` to reverse.
298+
299+
Or configure manually — add to your `~/.claude.json` (Claude Code) or equivalent MCP config:
253300

254301
```json
255302
{
@@ -273,11 +320,22 @@ This exposes nine tools to the AI agent:
273320
| `get_entrypoints` | Return entrypoint and foundation files for project orientation |
274321
| `get_structure` | Return directory structure with file counts and language distribution |
275322
| `get_archetypes` | Detect representative files across architectural layers |
276-
| `analyze_change` | Analyze the impact of a git diff / PR (blast radius, test gaps, coupling) |
323+
| `analyze_change` | Analyze the impact of a git diff / PR (risk scoring, triage, test gaps, coupling) |
277324
| `rate_context` | Rate how useful a previous `get_context` result was (feedback for accuracy tracking) |
278325

279326
Works with **Claude Code**, **Cursor**, **Windsurf**, and any MCP-compatible tool.
280327

328+
### Slash Commands
329+
330+
Two built-in slash commands for AI-assisted PR review (installed automatically by `contextception setup`):
331+
332+
| Command | Description |
333+
|---------|-------------|
334+
| `/pr-risk` | Run risk analysis on the current branch and present an actionable, human-friendly review |
335+
| `/pr-fix` | Analyze risk, then build an ordered plan to fix every issue found (test gaps, coupling, fragility) |
336+
337+
These work by combining contextception's deterministic risk analysis with the LLM's ability to explain and translate — contextception computes the scores, the LLM presents them in plain language. See [`integrations/`](integrations/) for setup details.
338+
281339
---
282340

283341
## Language Support
@@ -320,7 +378,7 @@ contextception session Show contextception adoption across Clau
320378
| `--mode plan\|implement\|review` | Shape output for AI workflow stage |
321379
| `--token-budget N` | Cap output to fit token limits |
322380
| `--compact` | Token-optimized text summary (~60-75% fewer tokens than JSON) |
323-
| `--ci --fail-on high\|medium` | Exit codes for CI pipelines |
381+
| `--ci --fail-on high\|medium\|critical` | Exit codes for CI pipelines |
324382
| `--cap N` | Limit must_read entries (overflow to related) |
325383
| `--no-external` | Exclude external dependencies |
326384
| `--no-update-check` | Disable automatic update version check |

docs/ARCHITECTURE.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -119,11 +119,23 @@ The historical cap prevents co-change signals from overwhelming structural evide
119119
Analyzes the impact of a git diff (PR or branch):
120120

121121
1. Diffs `base..head` to find changed files
122-
2. Analyzes each changed file independently
123-
3. Detects coupling between changed files (structural edges)
124-
4. Identifies test gaps (changed files with no test coverage)
125-
5. Surfaces hidden coupling (co-change partners not in the diff)
126-
6. Aggregates blast radius across all changed files
122+
2. Analyzes each changed file independently (full per-file AnalysisOutput)
123+
3. Computes per-file risk scores (0--100) with tier classification (SAFE/REVIEW/TEST/CRITICAL)
124+
4. Detects coupling between changed files (structural edges)
125+
5. Identifies test gaps (changed files with no test coverage)
126+
6. Surfaces hidden coupling (co-change partners not in the diff)
127+
7. Aggregates blast radius and risk triage across all changed files
128+
8. Generates test suggestions for high-risk untested files
129+
130+
### Risk Scoring Engine (`internal/analyzer/risk.go`)
131+
132+
Per-file risk scoring for change analysis. Formula: `base_score + structural_risk * coverage_multiplier`, clamped to [0, 100].
133+
134+
- **Base score**: added=10 (20 with exports), modified=30, deleted=5, renamed=5
135+
- **Structural risk**: normalized importer count, co-change frequency, fragility (Ce/(Ca+Ce)), mutual deps, cycles
136+
- **Coverage adjustment**: direct tests ×0.7, dependency tests ×0.85, no tests ×1.2
137+
- **Evidence gating**: same-package siblings filtered unless they have import edges, co-change ≥2, or prefix match
138+
- **Percentile ranking**: stored in `history.sqlite` `risk_scores` table, computed after 10+ records
127139

128140
### Database Layer (`internal/db/`)
129141

docs/features.md

Lines changed: 62 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,57 @@ Where Ce = files the subject imports (outdegree), Ca = files that import the sub
9898

9999
---
100100

101+
## Risk Triage
102+
103+
The `analyze-change` command assigns a per-file **risk score** (0--100) and groups files into tiers:
104+
105+
| Tier | Score Range | Meaning |
106+
|------|-------------|---------|
107+
| `SAFE` | 0--20 | New files, well-tested utilities, low coupling |
108+
| `REVIEW` | 21--50 | Moderate risk, standard code review sufficient |
109+
| `TEST` | 51--75 | High risk, targeted testing recommended |
110+
| `CRITICAL` | 76--100 | Maximum risk, regressions likely without careful review |
111+
112+
### Risk Score Formula
113+
114+
The score combines four components:
115+
116+
1. **Base score** by change status: added=10 (20 with exports), modified=30, deleted=5, renamed=5
117+
2. **Structural risk** (modified files only): normalized importer count, co-change frequency, fragility, mutual dependencies, circular dependencies
118+
3. **Coverage adjustment**: direct tests ×0.7, dependency tests ×0.85, no tests ×1.2
119+
4. **Clamp** to [0, 100]
120+
121+
### Per-file fields
122+
123+
| Field | Type | Description |
124+
|-------|------|-------------|
125+
| `risk_score` | int | Computed risk score (0--100) |
126+
| `risk_tier` | string | `SAFE`, `REVIEW`, `TEST`, or `CRITICAL` |
127+
| `risk_factors` | []string | Factors contributing to the score |
128+
| `risk_narrative` | string | Human-readable risk explanation |
129+
130+
### Report-level fields
131+
132+
| Field | Type | Description |
133+
|-------|------|-------------|
134+
| `risk_triage` | object | Files grouped by tier (critical, test, review, safe) |
135+
| `aggregate_risk.score` | int | Max per-file score across the PR |
136+
| `aggregate_risk.percentile` | int | Percentile vs. historical scores (after 10+ analyses) |
137+
| `aggregate_risk.regression_risk` | string | Summary of regression risk from critical files |
138+
| `aggregate_risk.test_coverage_ratio` | float | Ratio of changed files with direct tests |
139+
| `test_suggestions` | []object | Suggested tests for high-risk untested files |
140+
141+
### Evidence-Gated Same-Package Filtering
142+
143+
Same-package siblings (Go, Java, Rust) are only included in `must_read` if they have structural evidence:
144+
- Direct import/call edge
145+
- Co-change frequency >= 2
146+
- Filename prefix match
147+
148+
This reduces noise in large packages where most siblings are irrelevant.
149+
150+
---
151+
101152
## Hotspot Detection
102153

103154
Identifies files that are both high-churn AND structural bottlenecks:
@@ -370,19 +421,24 @@ contextception session Show adoption across Claude Code session
370421
| `--signatures` | false | Include code signatures for must_read symbols |
371422
| `--stable-threshold` | adaptive | Indegree threshold for the stable flag |
372423
| `--ci` | false | CI mode: suppress output, exit code reflects blast radius |
373-
| `--fail-on` | high | Blast radius level that triggers non-zero exit (`high` or `medium`) |
424+
| `--fail-on` | high | Trigger non-zero exit: `high`, `medium`, or `critical` (risk triage) |
374425
| `--mode` | (none) | Workflow mode: `plan`, `implement`, or `review` |
375426
| `--token-budget` | 0 | Target token budget (auto-adjusts caps) |
376427
| `--compact` | false | Token-optimized text summary (~60-75% fewer tokens than JSON) |
377428

378429
### CI mode
379430

380-
When `--ci` is set, output is suppressed and the exit code reflects blast radius:
431+
When `--ci` is set, output is suppressed and the exit code reflects blast radius. A risk badge is also printed to stderr:
432+
433+
```
434+
contextception: main..HEAD blast_radius=high files=27
435+
RISK: 72/100 | 1 CRITICAL | 2 TEST | 5 REVIEW | 19 SAFE
436+
```
381437

382438
| Exit code | Meaning |
383439
|-----------|---------|
384440
| 0 | Blast radius below threshold |
385-
| 1 | Medium blast radius (with `--fail-on medium`) |
441+
| 1 | Medium blast radius (with `--fail-on medium`) or CRITICAL files (with `--fail-on critical`) |
386442
| 2 | High blast radius |
387443

388444
```bash
@@ -391,6 +447,9 @@ contextception analyze-change --ci --fail-on high
391447

392448
# Fail on medium or high
393449
contextception analyze-change --ci --fail-on medium
450+
451+
# Fail only if risk triage has CRITICAL files
452+
contextception analyze-change --ci --fail-on critical
394453
```
395454

396455
### Workflow modes

integrations/README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,10 @@ contextception setup --editor cursor
3838
contextception setup --editor windsurf
3939
```
4040

41-
Use `--dry-run` to preview changes, or `--uninstall` to reverse. For Claude Code, this also installs hooks that remind the AI to call `get_context` before editing files.
41+
Use `--dry-run` to preview changes, or `--uninstall` to reverse. For Claude Code, this installs:
42+
- MCP server configuration
43+
- PreToolUse hooks that remind the AI to call `get_context` before editing files
44+
- `/pr-risk` and `/pr-fix` slash commands for AI-assisted PR review
4245

4346
## Manual Configuration
4447

@@ -151,6 +154,17 @@ All integrations expose the same nine tools:
151154

152155
Contextception supports repositories using: Python, TypeScript/JavaScript, Go, Java, Rust.
153156

157+
## Slash Commands
158+
159+
Two slash commands are included for AI-assisted PR review. These are installed automatically by `contextception setup` for Claude Code.
160+
161+
| Command | File | Description |
162+
|---------|------|-------------|
163+
| `/pr-risk` | [`claude-code/pr-risk.md`](claude-code/pr-risk.md) | Run risk analysis and present a human-friendly review with verdicts, test coverage, and next steps |
164+
| `/pr-fix` | [`claude-code/pr-fix.md`](claude-code/pr-fix.md) | Analyze risk, then build an ordered fix plan for every issue (test gaps, coupling, fragility) |
165+
166+
For Cursor/Windsurf, place the command files in `.cursor/rules/` or `.windsurf/rules/` respectively. For other agents, see [`pr-risk-review.md`](pr-risk-review.md) for the full prompt template.
167+
154168
## Further Reading
155169

156170
- [MCP Tutorial](../docs/mcp-tutorial.md) — step-by-step guide to adding context intelligence to any AI agent

0 commit comments

Comments
 (0)