Skip to content

feat(report): add analysis_completeness field to JSON output#160

Merged
rng1995 merged 1 commit into
NVIDIA:mainfrom
mimran-khan:feat/analysis-completeness-field
Jun 23, 2026
Merged

feat(report): add analysis_completeness field to JSON output#160
rng1995 merged 1 commit into
NVIDIA:mainfrom
mimran-khan:feat/analysis-completeness-field

Conversation

@mimran-khan

Copy link
Copy Markdown
Contributor

Summary

When SkillSpector produces a "clean" scan (no findings), consumers have no way to know whether the tool actually analyzed everything or if it silently skipped components due to missing file content, LLM unavailability, or other limitations. This makes it impossible to trust a clean scan for registry gating decisions.

This PR adds an analysis_completeness section to the JSON report format that explicitly communicates scan coverage.

Example Output

{
  "analysis_completeness": {
    "total_components": 5,
    "scanned_components": 5,
    "coverage_percent": 100.0,
    "llm_analysis": "applied",
    "findings_before_filtering": 3,
    "findings_after_filtering": 1,
    "limitations": null,
    "is_complete": true
  }
}

When limitations exist:

{
  "analysis_completeness": {
    "total_components": 5,
    "scanned_components": 3,
    "coverage_percent": 60.0,
    "llm_analysis": "skipped",
    "findings_before_filtering": 2,
    "findings_after_filtering": 2,
    "limitations": [
      "2 component(s) had no content in file_cache (skipped)",
      "LLM meta-analysis unavailable: OPENAI_API_KEY not set"
    ],
    "is_complete": false
  }
}

Design Decisions

  • JSON only: SARIF has its own coverage mechanisms; terminal is human-readable. Only JSON format gets this field.
  • is_complete boolean: Enables simple programmatic checks (if not report.analysis_completeness.is_complete: warn)
  • Human-readable limitations: Each string explains what was missed and why — actionable for operators
  • Non-breaking: Field is additive; existing consumers that don't check it are unaffected

Testing

8 new tests covering:

  • Full coverage produces is_complete: true
  • Partial file_cache coverage reports skipped components
  • LLM unavailable noted in limitations
  • LLM disabled (--no-llm) noted
  • Filtered findings count tracked
  • Empty components edge case
  • JSON format includes field
  • SARIF format does NOT include field

Fixes #149

Adds an analysis_completeness section to the JSON report output that
communicates scan coverage and known limitations to consumers:

- total_components / scanned_components / coverage_percent
- llm_analysis status (applied/skipped)
- findings_before_filtering / findings_after_filtering
- limitations array (human-readable list of gaps)
- is_complete boolean for quick programmatic checks

This helps CI integrations and registry gates understand whether a
"clean" scan actually analyzed everything or if gaps exist that
require re-scanning with full capabilities.

Only included in JSON format output; SARIF and terminal formats are
unchanged.

Fixes NVIDIA#149
@mimran-khan mimran-khan force-pushed the feat/analysis-completeness-field branch from 2d0758f to 5f3e62b Compare June 23, 2026 09:46

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Approve — additive transparency metadata, no detection-logic change, well tested.

What's good

  • New _build_analysis_completeness (src/skillspector/nodes/report.py, ~L9-54) reports coverage %, scanned vs total components, LLM applied/skipped, findings before/after filtering, a limitations list, and is_complete. Emitted JSON-only (~L72-73, _format_json); SARIF/markdown/terminal unaffected. This is exactly the kind of "what was NOT analyzed" signal a fail-closed scanner should surface.
  • No change to what is detected or filtered — purely reporting.

Non-blocking

  • Touches report() / _format_json, so it will textually conflict with #142, #158, #143, and #163 (all edit report.py). Needs rebase coordination, but no logic concern.

Tests

Good: full/partial coverage, LLM unavailable vs disabled, findings-filtered note, empty-components → 100%, and presence in JSON / absence in SARIF. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] LLM API failures silently produce zero findings with no retry or user notification

2 participants