Skip to content

health: CRAP tier precedence, surface in human output, configurable refactor band #474

@BartWaardenburg

Description

@BartWaardenburg

The CRAP score tier system in crates/cli/src/health/scoring.rs and crates/cli/src/health_types/finding.rs has three structural rough edges that show up as user-visible inconsistency in finding outputs.

1. Tier precedence is implicit in control flow

scoring.rs:1352-1404 resolves which coverage tier applies to a given function:

  • Line 1352: Angular template inheritance context present -> EstimatedComponentInherited.
  • Line 1372: Istanbul data loaded -> try Istanbul match; if function unmatched, fall through silently to estimated tiers.
  • Line 1393: else -> StaticEstimated.

There is no explicit enum CoverageSource priority order documented in code. A reader has to follow the control flow to understand which tier wins when.

The more concrete user-impact: when Istanbul is loaded but only matches 80% of functions, the remaining 20% silently fall back to estimated tiers. The CRAP-finding action selection in report::json::build_health_finding_actions then emits different action verbs (add-tests for estimated tiers, increase-coverage for partial/high) for functions in the same file. The user sees mixed advice and cannot tell from the output that the coverage source shifted mid-file.

2. SECONDARY_REFACTOR_BAND is a magic constant

crates/cli/src/health_types/finding.rs:48:

const SECONDARY_REFACTOR_BAND: u16 = 5;

Used at finding.rs:267-273 to trigger a secondary refactor-function action on CRAP-only findings whose cyclomatic is within 5 points of the max threshold AND whose cognitive complexity is at least half the cognitive threshold. The rationale (suppress false positives on flat type-tag dispatchers per .claude/rules/cli-crate.md) lives only in CLAUDE.md, not next to the constant.

Three issues:

  • No unit test asserts the band heuristic's behavior on a fixture.
  • Not user-configurable, so teams with stricter standards ("extract if within 3 of threshold") cannot adjust.
  • Lives in health_types rather than config/health.rs alongside other thresholds.

3. Coverage tier is invisible in the human-output sections users actually read

crates/cli/src/report/human/health.rs:1102 already emits a tier-aware hint:

CRAP estimated from export references (85% direct, 40% indirect, 0% untested). Use --coverage for exact scores.

But this string is only rendered in the --file-scores section. The two sections that surface CRAP values most prominently when running bare fallow (the default combined pipeline) carry zero tier context:

  • The orientation header at crates/cli/src/report/human/health.rs:635 lists LOC, cyclomatic, MI, churn, unused deps but never says "no coverage data; CRAP scores are estimated."
  • The ● High complexity functions section header at crates/cli/src/report/human/health.rs:843 and its per-finding CRAP value rendering at line 917 ({crap:>5.1} CRAP) prints values like 650.0 CRAP with no indication of whether the score came from real Istanbul coverage or the estimated model.

The on-disk artifact already knows: coverage_source is on every finding (scoring.rs:443 sets it via CoverageTier::from_pct(estimated_coverage)), and the report-level summary.coverage_model records the source. The signal is there; the human renderer just does not surface it where users are looking.

Fix surface options:

  • (a) one-line hint above ● High complexity functions when coverage_model == Estimated: "CRAP scores estimated from export references. Pass --coverage <path> for exact scores."
  • (b) per-finding suffix on the CRAP line: 650.0 CRAP (estimated) / 12.0 CRAP (Istanbul) / 47.0 CRAP (Istanbul partial)
  • (c) extend the orientation header ■ Metrics: line at health.rs:635 with a coverage: estimated / coverage: Istanbul (854/1043 matched) segment when complexity is in the pipeline.

(a) is cheapest and high-signal; (b) is most informative per-finding; (c) is the most discoverable but also the most space-constrained.

The same gap shows up via MCP and JSON: consumers reading findings[].crap cannot route advice without also reading findings[].coverage_source. Documenting the precedence (Part 1) and surfacing it in human output (this Part 3) reinforce each other.

Scope

Part A: explicit tier precedence

  1. Replace the implicit if/else chain with an explicit enum CoverageSource { TemplateInherited, IstanbulHigh, IstanbulPartial, IstanbulMissing, StaticEstimated } and a single resolver function that returns the source on From<...>.

  2. Per file, track whether all functions resolved to the same tier or diverged. Emit a top-level finding metadata field coverage_source_consistency: "uniform" | "mixed" so downstream consumers (dashboards, AI agents) can detect the silent shift and surface a hint.

  3. Document the precedence in the resolver function's doc-comment with a one-line "why" per tier.

Part B: configurable threshold

  1. Move SECONDARY_REFACTOR_BAND to crates/config/src/config/health.rs as a documented field with a sensible default.
  2. Add a unit test that constructs a finding at the band boundary and asserts the secondary action fires.
  3. Expose via .fallowrc.json so users can tune; default unchanged.

Part C: surface the coverage tier in human output

  1. Add a single-line tier note above the ● High complexity functions section in crates/cli/src/report/human/health.rs whenever report.summary.coverage_model is Estimated (and a less prominent variant for Istanbul so consistency is signalled both ways).

  2. Optionally extend the orientation header at health.rs:635 with a coverage: estimated | Istanbul (m/t matched) segment when the complexity sub-pipeline ran.

  3. Surface the Part-A coverage_source_consistency value in the same hint when mixed, so users see "CRAP scores: Istanbul where matched, estimated for 21% of functions" instead of silent divergence within one file.

  4. Snapshot tests for the new line against fixtures with and without --coverage.

  5. Rewrite the existing tier-aware hint at crates/cli/src/report/human/health.rs:1102. Today it reads "CRAP estimated from export references (85% direct, 40% indirect, 0% untested). Use --coverage for exact scores.". Users who follow that hint type either fallow --coverage (gets unexpected argument --coverage plus a confused tip: a similar argument exists: '--tolerance') or fallow health --coverage (gets a value is required for '--coverage <PATH>' but none was supplied). The hint must spell out the subcommand context AND the required path argument:

    Run `fallow health --coverage <coverage-final.json>` for exact scores.
    

    Same edit applies to any docs page mirroring the hint (e.g. https://docs.fallow.tools/explanations/health#file-health-scores); the prose should match the CLI invocation users need to type.

  6. Add a custom clap UnknownArgument handler for top-level --coverage. Clap's default ranking suggests --tolerance because it is the alphabetically nearest top-level long arg, which is irrelevant. The handler should detect --coverage at the top level (where neither the coverage subcommand nor any --coverage flag exists) and route to a targeted hint:

    error: --coverage is a flag on the `health` subcommand, not a top-level flag.
    tip: run `fallow health --coverage <coverage-final.json>` for exact CRAP scores, or `fallow coverage setup` to configure the runtime-coverage sidecar.
    

    Two confounders justify the custom handler: coverage is also a subcommand (fallow coverage), and --coverage <PATH> is health-scoped. The handler disambiguates both cases in one message instead of letting clap fall back to distance-ranking against --tolerance.

Why this matters

  • Mixed-tier action verbs in the same file erode user trust in fallow's output. Either uniformly use one source per file (with a fallback hint), or expose the divergence as metadata.
  • Magic constants without tests + without config knobs are the canonical example of "numerology" that project memory project_eslint_overlap_not_disqualifier warns against.
  • A first-time fallow user looking at 650.0 CRAP on a complex function has no way to know that number is based on "I have no idea how well this is tested." That is precisely the kind of opaque metric that erodes trust; Part C closes the gap by making the input source visible at the point of consumption, not hidden in --file-scores.

Out of scope

  • Reworking the CRAP formula itself. Savoia & Evans (2007) is the literature anchor; fallow inherits it intentionally.
  • Coverage tier watermarks (HIGH_COVERAGE_WATERMARK = 70% at scores.rs:186). Worth surfacing as configurable separately if requested by users.

Verification

  • New unit test on tier precedence.
  • New unit test on configurable band.
  • New snapshot tests on the human-output tier hint (no-coverage and Istanbul cases).
  • Existing CRAP scoring tests continue to pass.
  • Manually inspect a fixture with mixed Istanbul match: confirm the new metadata field surfaces AND the human-output hint mentions the divergence.

Metadata

Metadata

Labels

area:cliCommand-line interface behaviorarea:healthHealth, complexity, ownership, and scoringarea:reportReport rendering and output format mappingdocumentationImprovements or additions to documentationpriority:lowDeferred cleanup, blocked dependency work, or nice-to-haverustPull requests that update rust codetype:refactorInternal cleanup without intended user-facing behavior changetype:uxHuman-facing clarity or workflow improvement
No fields configured for Maintenance.

Projects

Status
Done

Relationships

None yet

Development

No branches or pull requests

Issue actions