fix(#74): wire Transaction.confidence_score via ExtractionScoringConfig by longieirl · Pull Request #77 · longieirl/bankstatementprocessor

longieirl · 2026-03-26T16:20:12Z

Summary

Closes #74. Transaction.confidence_score was declared but always 1.0 — no pipeline code ever set it. This PR wires real values by detecting two extraction anomalies in RowPostProcessor and stamping the computed score onto each transaction row dict before it reaches Transaction.from_dict().

Changes

New: domain/models/extraction_scoring_config.py — ExtractionScoringConfig frozen dataclass with injectable penalty weights, following the ScoringConfig pattern from PR feat(#32): make ScoringConfig injectable in TemplateDetector #36
New: CODE_MISSING_BALANCE warning code added to extraction_warning.py
Modified: RowPostProcessor accepts optional ExtractionScoringConfig; applies penalties and stamps confidence_score on transaction row dicts:
- DATE_PROPAGATED: −0.1 (date inferred from prior row or filename)
- MISSING_BALANCE: −0.2 (balance field absent or empty)
- Score clamped to [0.0, 1.0]
Modified: PDFTableExtractor accepts optional scoring_config and threads it to RowPostProcessor
Modified: domain/models/__init__.py exports ExtractionScoringConfig
New: tests/extraction/test_confidence_scoring.py — 19 tests covering all signal combinations

Type

Bug fix
New feature

Testing

Tests pass (coverage ≥ 91%) — 1423 passed, 5 skipped (no regressions)
Manually tested
make docker-integration passed locally (touches packages/parser-core/)

Checklist

Code follows project style
Self-reviewed
Documentation updated (if needed)
No new warnings

Introduces ExtractionScoringConfig (injectable frozen dataclass) holding penalty weights for extraction anomalies, following the ScoringConfig pattern established in PR #36. RowPostProcessor now applies penalties and stamps confidence_score on each transaction row dict: - DATE_PROPAGATED: -0.1 (date inferred from prior row or filename) - MISSING_BALANCE: -0.2 (balance field absent or empty) Score is clamped to [0.0, 1.0]. Rows with no anomalies keep 1.0. PDFTableExtractor accepts an optional scoring_config and threads it through to RowPostProcessor; defaults to ExtractionScoringConfig.default(). Adds CODE_MISSING_BALANCE warning code to extraction_warning.py. Exports ExtractionScoringConfig from domain/models/__init__.py. 19 new tests cover: full-confidence, date-propagated, missing-balance, both penalties, clamping, injectable config, and non-transaction rows.

github-actions bot added the bug Something isn't working label Mar 26, 2026

longieirl self-assigned this Mar 26, 2026

style: black/flake8/isort fixes for #74 changes

c2dff84

longieirl merged commit dfeb868 into main Mar 26, 2026
10 checks passed

longieirl deleted the fix/74-confidence-score branch March 26, 2026 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#74): wire Transaction.confidence_score via ExtractionScoringConfig#77

fix(#74): wire Transaction.confidence_score via ExtractionScoringConfig#77
longieirl merged 2 commits intomainfrom
fix/74-confidence-score

longieirl commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

longieirl commented Mar 26, 2026

Summary

Changes

Type

Testing

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants