nightshift: test-gap — 62% of source code lacks test coverage

## Nightshift test-gap Analysis

Analysis of test coverage gaps across the traccia Python codebase.

### Summary

| Metric | Value |
|--------|-------|
| Total source modules | 17 |
| Modules with test files | 5 (29%) |
| Modules without tests | 12 (71%) |
| Total LOC (source) | 6,718 |
| Uncovered LOC | 4,142 (62%) |

### Coverage Matrix

| Module | Lines | Tests? | Priority | Risk |
|--------|-------|--------|----------|------|
| `pipeline.py` | 1,252 | ✅ Covered | — | — |
| `rendering.py` | 931 | ❌ None | **HIGH** | Tree/viewer output correctness |
| `parsers.py` | 729 | ✅ Covered | — | — |
| `storage.py` | 518 | ❌ None | **HIGH** | Data persistence, file corruption |
| `llm.py` | 507 | ❌ None | **MEDIUM** | API interaction, error handling |
| `family_normalizer.py` | 450 | ❌ None | **MEDIUM** | Skill family classification |
| `cli.py` | 433 | ❌ None | **MEDIUM** | User-facing entry point |
| `bootstrap.py` | 391 | ❌ None | **HIGH** | Project initialization, dir creation |
| `models.py` | 297 | ✅ Covered | — | — |
| `document_normalizer.py` | 288 | ❌ None | **MEDIUM** | PDF/DOCX conversion |
| `source_detection.py` | 253 | ✅ Covered | — | — |
| `extraction.py` | 193 | ❌ None | **HIGH** | Evidence extraction core logic |
| `pipeline_support.py` | 164 | ❌ None | **LOW** | Helper functions |
| `taxonomy.py` | 124 | ❌ None | **MEDIUM** | Skill matching, canonicalization |
| `config.py` | 97 | ❌ None | **LOW** | Config load/save/defaults |
| `utils.py` | 46 | ❌ None | **LOW** | Pure utility functions |
| `fixtures.py` | 45 | ✅ Covered | — | — |

### Priority Recommendations

#### HIGH Priority (critical untested logic)

1. **`extraction.py` (193 lines)** — Core evidence extraction with action pattern matching and confidence scoring. The `ACTION_PATTERNS` table and `EVIDENCE_BASE_CONFIDENCE` map directly affect skill leveling accuracy. Tests should verify:
   - Pattern matching for each `EvidenceType`
   - Confidence calculation with reliability tiers
   - Edge cases: empty documents, documents with no extractable signals
   - `unresolved_candidates` aggregation

2. **`rendering.py` (931 lines)** — Largest untested module. Generates tree views, Obsidian exports, and viewer bundles. Tests should verify:
   - ASCII tree rendering correctness
   - Markdown node page generation
   - Obsidian vault export structure
   - Edge cases: empty graph, single-node graph, circular references

3. **`storage.py` (518 lines)** — Handles all file-based persistence. Tests should verify:
   - JSON round-trip (write → read → compare)
   - Atomic writes (temp file + rename pattern)
   - Concurrent access behavior
   - Corrupted file recovery
   - Empty/missing file handling

4. **`bootstrap.py` (391 lines)** — Project initialization and directory scaffolding. Tests should verify:
   - Directory creation from config paths
   - Idempotency (running bootstrap twice produces same state)
   - Default config generation
   - Missing parent directory handling

#### MEDIUM Priority (important but less critical)

5. **`llm.py` (507 lines)** — LLM backend abstraction. Recommend mocking the HTTP layer to test:
   - Retry logic on transient failures
   - Structured output parsing
   - Timeout handling
   - Provider switching

6. **`family_normalizer.py` (450 lines)** — Skill family grouping logic. Tests should verify:
   - Family assignment from skill names
   - Cross-family relationship handling
   - Edge cases: orphan skills, duplicate families

7. **`taxonomy.py` (124 lines)** — Skill name matching and canonicalization. Tests should verify:
   - Exact and fuzzy matching
   - Alias resolution
   - Unknown skill handling

8. **`document_normalizer.py` (288 lines)** — PDF/DOCX to text conversion. Tests should verify:
   - Plain text passthrough
   - PDF extraction (with sample fixtures)
   - Encoding handling

9. **`cli.py` (433 lines)** — CLI entry point. Recommend integration tests with:
   - `--help` output
   - `--dry-run` flag behavior
   - Config override arguments
   - Error message formatting

#### LOW Priority (simple, lower risk)

10. **`config.py` (97 lines)** — Pydantic config models. Low risk due to Pydantic validation, but tests for `load_config`/`dump_config_text` round-trip and default values would be valuable.

11. **`utils.py` (46 lines)** — Pure functions (`slugify`, `short_hash`, `file_sha256`). Easy to test, low risk, good first contribution.

12. **`pipeline_support.py` (164 lines)** — Helper functions for the pipeline. Lower priority since `pipeline.py` itself is tested.

### Quick Wins (recommended starting points)

1. **`test_utils.py`** — 8 pure functions, trivial to test, no dependencies. ~15 minutes.
2. **`test_config.py`** — Config round-trip and defaults. ~20 minutes.
3. **`test_extraction.py`** — Evidence pattern matching with fixed test data. ~30 minutes.

---
*Generated by [Nightshift](https://github.com/marcus/nightshift) — autonomous code quality bot*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nightshift: test-gap — 62% of source code lacks test coverage #22

Nightshift test-gap Analysis

Summary

Coverage Matrix

Priority Recommendations

HIGH Priority (critical untested logic)

MEDIUM Priority (important but less critical)

LOW Priority (simple, lower risk)

Quick Wins (recommended starting points)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Value
Total source modules	17
Modules with test files	5 (29%)
Modules without tests	12 (71%)
Total LOC (source)	6,718
Uncovered LOC	4,142 (62%)

Module	Lines	Tests?	Priority	Risk
`pipeline.py`	1,252	✅ Covered	—	—
`rendering.py`	931	❌ None	HIGH	Tree/viewer output correctness
`parsers.py`	729	✅ Covered	—	—
`storage.py`	518	❌ None	HIGH	Data persistence, file corruption
`llm.py`	507	❌ None	MEDIUM	API interaction, error handling
`family_normalizer.py`	450	❌ None	MEDIUM	Skill family classification
`cli.py`	433	❌ None	MEDIUM	User-facing entry point
`bootstrap.py`	391	❌ None	HIGH	Project initialization, dir creation
`models.py`	297	✅ Covered	—	—
`document_normalizer.py`	288	❌ None	MEDIUM	PDF/DOCX conversion
`source_detection.py`	253	✅ Covered	—	—
`extraction.py`	193	❌ None	HIGH	Evidence extraction core logic
`pipeline_support.py`	164	❌ None	LOW	Helper functions
`taxonomy.py`	124	❌ None	MEDIUM	Skill matching, canonicalization
`config.py`	97	❌ None	LOW	Config load/save/defaults
`utils.py`	46	❌ None	LOW	Pure utility functions
`fixtures.py`	45	✅ Covered	—	—

nightshift: test-gap — 62% of source code lacks test coverage #22

Description

Nightshift test-gap Analysis

Summary

Coverage Matrix

Priority Recommendations

HIGH Priority (critical untested logic)

MEDIUM Priority (important but less critical)

LOW Priority (simple, lower risk)

Quick Wins (recommended starting points)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions