Skip to content

nightshift: test-gap — 62% of source code lacks test coverage #22

@nightshift-micr

Description

@nightshift-micr

Nightshift test-gap Analysis

Analysis of test coverage gaps across the traccia Python codebase.

Summary

Metric Value
Total source modules 17
Modules with test files 5 (29%)
Modules without tests 12 (71%)
Total LOC (source) 6,718
Uncovered LOC 4,142 (62%)

Coverage Matrix

Module Lines Tests? Priority Risk
pipeline.py 1,252 ✅ Covered
rendering.py 931 ❌ None HIGH Tree/viewer output correctness
parsers.py 729 ✅ Covered
storage.py 518 ❌ None HIGH Data persistence, file corruption
llm.py 507 ❌ None MEDIUM API interaction, error handling
family_normalizer.py 450 ❌ None MEDIUM Skill family classification
cli.py 433 ❌ None MEDIUM User-facing entry point
bootstrap.py 391 ❌ None HIGH Project initialization, dir creation
models.py 297 ✅ Covered
document_normalizer.py 288 ❌ None MEDIUM PDF/DOCX conversion
source_detection.py 253 ✅ Covered
extraction.py 193 ❌ None HIGH Evidence extraction core logic
pipeline_support.py 164 ❌ None LOW Helper functions
taxonomy.py 124 ❌ None MEDIUM Skill matching, canonicalization
config.py 97 ❌ None LOW Config load/save/defaults
utils.py 46 ❌ None LOW Pure utility functions
fixtures.py 45 ✅ Covered

Priority Recommendations

HIGH Priority (critical untested logic)

  1. extraction.py (193 lines) — Core evidence extraction with action pattern matching and confidence scoring. The ACTION_PATTERNS table and EVIDENCE_BASE_CONFIDENCE map directly affect skill leveling accuracy. Tests should verify:

    • Pattern matching for each EvidenceType
    • Confidence calculation with reliability tiers
    • Edge cases: empty documents, documents with no extractable signals
    • unresolved_candidates aggregation
  2. rendering.py (931 lines) — Largest untested module. Generates tree views, Obsidian exports, and viewer bundles. Tests should verify:

    • ASCII tree rendering correctness
    • Markdown node page generation
    • Obsidian vault export structure
    • Edge cases: empty graph, single-node graph, circular references
  3. storage.py (518 lines) — Handles all file-based persistence. Tests should verify:

    • JSON round-trip (write → read → compare)
    • Atomic writes (temp file + rename pattern)
    • Concurrent access behavior
    • Corrupted file recovery
    • Empty/missing file handling
  4. bootstrap.py (391 lines) — Project initialization and directory scaffolding. Tests should verify:

    • Directory creation from config paths
    • Idempotency (running bootstrap twice produces same state)
    • Default config generation
    • Missing parent directory handling

MEDIUM Priority (important but less critical)

  1. llm.py (507 lines) — LLM backend abstraction. Recommend mocking the HTTP layer to test:

    • Retry logic on transient failures
    • Structured output parsing
    • Timeout handling
    • Provider switching
  2. family_normalizer.py (450 lines) — Skill family grouping logic. Tests should verify:

    • Family assignment from skill names
    • Cross-family relationship handling
    • Edge cases: orphan skills, duplicate families
  3. taxonomy.py (124 lines) — Skill name matching and canonicalization. Tests should verify:

    • Exact and fuzzy matching
    • Alias resolution
    • Unknown skill handling
  4. document_normalizer.py (288 lines) — PDF/DOCX to text conversion. Tests should verify:

    • Plain text passthrough
    • PDF extraction (with sample fixtures)
    • Encoding handling
  5. cli.py (433 lines) — CLI entry point. Recommend integration tests with:

    • --help output
    • --dry-run flag behavior
    • Config override arguments
    • Error message formatting

LOW Priority (simple, lower risk)

  1. config.py (97 lines) — Pydantic config models. Low risk due to Pydantic validation, but tests for load_config/dump_config_text round-trip and default values would be valuable.

  2. utils.py (46 lines) — Pure functions (slugify, short_hash, file_sha256). Easy to test, low risk, good first contribution.

  3. pipeline_support.py (164 lines) — Helper functions for the pipeline. Lower priority since pipeline.py itself is tested.

Quick Wins (recommended starting points)

  1. test_utils.py — 8 pure functions, trivial to test, no dependencies. ~15 minutes.
  2. test_config.py — Config round-trip and defaults. ~20 minutes.
  3. test_extraction.py — Evidence pattern matching with fixed test data. ~30 minutes.

Generated by Nightshift — autonomous code quality bot

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions