Nightshift test-gap Analysis
Analysis of test coverage gaps across the traccia Python codebase.
Summary
| Metric |
Value |
| Total source modules |
17 |
| Modules with test files |
5 (29%) |
| Modules without tests |
12 (71%) |
| Total LOC (source) |
6,718 |
| Uncovered LOC |
4,142 (62%) |
Coverage Matrix
| Module |
Lines |
Tests? |
Priority |
Risk |
pipeline.py |
1,252 |
✅ Covered |
— |
— |
rendering.py |
931 |
❌ None |
HIGH |
Tree/viewer output correctness |
parsers.py |
729 |
✅ Covered |
— |
— |
storage.py |
518 |
❌ None |
HIGH |
Data persistence, file corruption |
llm.py |
507 |
❌ None |
MEDIUM |
API interaction, error handling |
family_normalizer.py |
450 |
❌ None |
MEDIUM |
Skill family classification |
cli.py |
433 |
❌ None |
MEDIUM |
User-facing entry point |
bootstrap.py |
391 |
❌ None |
HIGH |
Project initialization, dir creation |
models.py |
297 |
✅ Covered |
— |
— |
document_normalizer.py |
288 |
❌ None |
MEDIUM |
PDF/DOCX conversion |
source_detection.py |
253 |
✅ Covered |
— |
— |
extraction.py |
193 |
❌ None |
HIGH |
Evidence extraction core logic |
pipeline_support.py |
164 |
❌ None |
LOW |
Helper functions |
taxonomy.py |
124 |
❌ None |
MEDIUM |
Skill matching, canonicalization |
config.py |
97 |
❌ None |
LOW |
Config load/save/defaults |
utils.py |
46 |
❌ None |
LOW |
Pure utility functions |
fixtures.py |
45 |
✅ Covered |
— |
— |
Priority Recommendations
HIGH Priority (critical untested logic)
-
extraction.py (193 lines) — Core evidence extraction with action pattern matching and confidence scoring. The ACTION_PATTERNS table and EVIDENCE_BASE_CONFIDENCE map directly affect skill leveling accuracy. Tests should verify:
- Pattern matching for each
EvidenceType
- Confidence calculation with reliability tiers
- Edge cases: empty documents, documents with no extractable signals
unresolved_candidates aggregation
-
rendering.py (931 lines) — Largest untested module. Generates tree views, Obsidian exports, and viewer bundles. Tests should verify:
- ASCII tree rendering correctness
- Markdown node page generation
- Obsidian vault export structure
- Edge cases: empty graph, single-node graph, circular references
-
storage.py (518 lines) — Handles all file-based persistence. Tests should verify:
- JSON round-trip (write → read → compare)
- Atomic writes (temp file + rename pattern)
- Concurrent access behavior
- Corrupted file recovery
- Empty/missing file handling
-
bootstrap.py (391 lines) — Project initialization and directory scaffolding. Tests should verify:
- Directory creation from config paths
- Idempotency (running bootstrap twice produces same state)
- Default config generation
- Missing parent directory handling
MEDIUM Priority (important but less critical)
-
llm.py (507 lines) — LLM backend abstraction. Recommend mocking the HTTP layer to test:
- Retry logic on transient failures
- Structured output parsing
- Timeout handling
- Provider switching
-
family_normalizer.py (450 lines) — Skill family grouping logic. Tests should verify:
- Family assignment from skill names
- Cross-family relationship handling
- Edge cases: orphan skills, duplicate families
-
taxonomy.py (124 lines) — Skill name matching and canonicalization. Tests should verify:
- Exact and fuzzy matching
- Alias resolution
- Unknown skill handling
-
document_normalizer.py (288 lines) — PDF/DOCX to text conversion. Tests should verify:
- Plain text passthrough
- PDF extraction (with sample fixtures)
- Encoding handling
-
cli.py (433 lines) — CLI entry point. Recommend integration tests with:
--help output
--dry-run flag behavior
- Config override arguments
- Error message formatting
LOW Priority (simple, lower risk)
-
config.py (97 lines) — Pydantic config models. Low risk due to Pydantic validation, but tests for load_config/dump_config_text round-trip and default values would be valuable.
-
utils.py (46 lines) — Pure functions (slugify, short_hash, file_sha256). Easy to test, low risk, good first contribution.
-
pipeline_support.py (164 lines) — Helper functions for the pipeline. Lower priority since pipeline.py itself is tested.
Quick Wins (recommended starting points)
test_utils.py — 8 pure functions, trivial to test, no dependencies. ~15 minutes.
test_config.py — Config round-trip and defaults. ~20 minutes.
test_extraction.py — Evidence pattern matching with fixed test data. ~30 minutes.
Generated by Nightshift — autonomous code quality bot
Nightshift test-gap Analysis
Analysis of test coverage gaps across the traccia Python codebase.
Summary
Coverage Matrix
pipeline.pyrendering.pyparsers.pystorage.pyllm.pyfamily_normalizer.pycli.pybootstrap.pymodels.pydocument_normalizer.pysource_detection.pyextraction.pypipeline_support.pytaxonomy.pyconfig.pyutils.pyfixtures.pyPriority Recommendations
HIGH Priority (critical untested logic)
extraction.py(193 lines) — Core evidence extraction with action pattern matching and confidence scoring. TheACTION_PATTERNStable andEVIDENCE_BASE_CONFIDENCEmap directly affect skill leveling accuracy. Tests should verify:EvidenceTypeunresolved_candidatesaggregationrendering.py(931 lines) — Largest untested module. Generates tree views, Obsidian exports, and viewer bundles. Tests should verify:storage.py(518 lines) — Handles all file-based persistence. Tests should verify:bootstrap.py(391 lines) — Project initialization and directory scaffolding. Tests should verify:MEDIUM Priority (important but less critical)
llm.py(507 lines) — LLM backend abstraction. Recommend mocking the HTTP layer to test:family_normalizer.py(450 lines) — Skill family grouping logic. Tests should verify:taxonomy.py(124 lines) — Skill name matching and canonicalization. Tests should verify:document_normalizer.py(288 lines) — PDF/DOCX to text conversion. Tests should verify:cli.py(433 lines) — CLI entry point. Recommend integration tests with:--helpoutput--dry-runflag behaviorLOW Priority (simple, lower risk)
config.py(97 lines) — Pydantic config models. Low risk due to Pydantic validation, but tests forload_config/dump_config_textround-trip and default values would be valuable.utils.py(46 lines) — Pure functions (slugify,short_hash,file_sha256). Easy to test, low risk, good first contribution.pipeline_support.py(164 lines) — Helper functions for the pipeline. Lower priority sincepipeline.pyitself is tested.Quick Wins (recommended starting points)
test_utils.py— 8 pure functions, trivial to test, no dependencies. ~15 minutes.test_config.py— Config round-trip and defaults. ~20 minutes.test_extraction.py— Evidence pattern matching with fixed test data. ~30 minutes.Generated by Nightshift — autonomous code quality bot