feat(#32): make ScoringConfig injectable in TemplateDetector by longieirl · Pull Request #36 · longieirl/bankstatementprocessor

longieirl · 2026-03-24T23:11:03Z

Replace global DETECTOR_WEIGHTS / MIN_CONFIDENCE_THRESHOLD refs with self._scoring.weight_for() / self._scoring.min_confidence_threshold
Extract _run_scoring() private helper (scoring loop) from detect_template()
Add get_detection_explanation() -> DetectionExplanation for diagnostic introspection
Export ScoringConfig and DetectionExplanation from templates/init.py
Add 4 new tests: threshold boundary pass/fail, weight ordering, tie-break reason

Pull Request

Summary

Changes

Type

Testing

Tests pass (coverage ≥ 91%)
Manually tested

Checklist

Code follows project style
Self-reviewed
Documentation updated (if needed)
No new warnings

- Replace global DETECTOR_WEIGHTS / MIN_CONFIDENCE_THRESHOLD refs with self._scoring.weight_for() / self._scoring.min_confidence_threshold - Extract _run_scoring() private helper (scoring loop) from detect_template() - Add get_detection_explanation() -> DetectionExplanation for diagnostic introspection - Export ScoringConfig and DetectionExplanation from templates/__init__.py - Add 4 new tests: threshold boundary pass/fail, weight ordering, tie-break reason

…tractionResult - PDFTableExtractor.extract() now returns ExtractionResult (not 3-tuple) - Credit card early-return path returns ExtractionResult with warnings entry - Normal return path wraps dicts_to_transactions(rows) in ExtractionResult - extraction_facade.extract_tables_from_pdf() return annotation -> ExtractionResult - All 6 extraction-layer test files updated: field access replaces tuple unpacking - ExtractionOrchestrator.extract_from_pdf() unpacks ExtractionResult, re-packs as tuple for upstream callers (Rule 3 - blocking fix) - All mock return values in test_excluded_files_logging, test_processor, test_processor_refactored_methods, test_extraction_orchestrator updated to ExtractionResult - 1360 tests pass, 9 skipped (all expected)

…ct_from_pdf() - Update 4 tuple-unpack assertion sites to use ExtractionResult field access - Add assertIsInstance(result, ExtractionResult) to key assertions - Tests now expect extract_from_pdf() to return ExtractionResult (not tuple)

- Add ExtractionResult import from bankstatements_core.domain - Change return annotation from tuple[list, int, str | None] to ExtractionResult - Remove tuple unpacking; pass ExtractionResult through unchanged - Use result.iban for IBAN logging - Update docstring Returns section

- Add ExtractionResult and transactions_to_dicts imports - Change return annotation from tuple[list[dict], int, dict[str,str]] to list[ExtractionResult] - Replace tuple unpacking with result.page_count/iban/transactions field access - Exclusion logic preserved using ExtractionResult fields - filter_service.apply_all_filters() receives transactions_to_dicts(result.transactions) - Append excluded results to list (page_count preserved for callers) - Update processor._process_all_pdfs() to adapt list[ExtractionResult] -> 3-tuple - Update test_processor test mock to use list[ExtractionResult] return value

- _process_all_pdfs() now returns list[ExtractionResult] directly (no adapter) - run() loops over results, aggregates all_rows/pages_read/pdf_ibans via transactions_to_dicts() - Added ExtractionResult and transactions_to_dicts imports at module level - test_processor_refactored_methods.py updated: tuple-unpack assertions replaced with indexed list access

- group_words_by_y: 8 tests for Y-grouping with tolerance param - assign_words_to_columns: 11 tests for strict/relaxed x-position assignment - calculate_column_coverage: 7 tests for coverage fraction calculation

…ions - group_words_by_y: groups words by round(top, 0); tolerance param accepted for API compat - assign_words_to_columns: relaxed/strict rightmost column boundary check - calculate_column_coverage: returns fraction of columns with non-empty data across rows - Fix test rounding: round(100.7, 0) is 101.0 in Python banker rounding

…branch

- boundary_detector.py: import group_words_by_y/assign_words_to_columns/calculate_column_coverage; add pre-filter before group_words_by_y; delete _group_words_by_y, _build_row_from_words, _calculate_column_coverage - row_builder.py: import assign_words_to_columns/group_words_by_y; replace inline loops with word_utils calls; remove _rightmost_column/_column_names attrs - header_detection.py: import group_words_by_y; delete _group_words_by_y_coordinate private method - page_validation.py: import calculate_column_coverage as _calculate_column_coverage_impl; make calculate_column_coverage a thin wrapper - content_density.py: import assign_words_to_columns; replace inline column-assignment loop

…rd_utils - test_boundary_detector.py: delete test_group_words_by_y, test_group_words_filters_above_table_top, test_build_row_from_words, test_calculate_column_coverage_full, test_calculate_column_coverage_partial, test_calculate_column_coverage_empty - test_header_detection.py: delete test_group_words_by_y_coordinate - All deleted tests now covered by test_word_utils.py TestGroupWordsByY, TestAssignWordsToColumns, TestCalculateColumnCoverage

… mypy

Introduces ExtractionScoringConfig (injectable frozen dataclass) holding penalty weights for extraction anomalies, following the ScoringConfig pattern established in PR #36. RowPostProcessor now applies penalties and stamps confidence_score on each transaction row dict: - DATE_PROPAGATED: -0.1 (date inferred from prior row or filename) - MISSING_BALANCE: -0.2 (balance field absent or empty) Score is clamped to [0.0, 1.0]. Rows with no anomalies keep 1.0. PDFTableExtractor accepts an optional scoring_config and threads it through to RowPostProcessor; defaults to ExtractionScoringConfig.default(). Adds CODE_MISSING_BALANCE warning code to extraction_warning.py. Exports ExtractionScoringConfig from domain/models/__init__.py. 19 new tests cover: full-confidence, date-propagated, missing-balance, both penalties, clamping, injectable config, and non-transaction rows.

longieirl self-assigned this Mar 24, 2026

web-flow added 12 commits March 25, 2026 07:54

test(24-01): add failing tests for word_utils pure functions

0334de6

- group_words_by_y: 8 tests for Y-grouping with tolerance param - assign_words_to_columns: 11 tests for strict/relaxed x-position assignment - calculate_column_coverage: 7 tests for coverage fraction calculation

merge(24-01): bring word_utils TDD work into feat/32 branch

7f0427f

merge(24-01): bring word_utils TDD work into worktree-agent-a9eba7e4 …

db0b6a7

…branch

merge(24-02): bring caller migration into feat/32 branch

69d25c1

longieirl force-pushed the feat/32-scoring-config-injectable branch from 0604547 to 69d25c1 Compare March 25, 2026 10:49

web-flow added 4 commits March 25, 2026 10:51

style: apply black formatting to parser-core

3307b34

style: fix isort import ordering in parser-core

d9ade9b

fix: remove unused imports and variables flagged by flake8

4d60970

fix: rename loop var to avoid ExtractionResult/dict type collision in…

60df9ad

… mypy

longieirl merged commit 09589e8 into main Mar 25, 2026
10 checks passed

longieirl deleted the feat/32-scoring-config-injectable branch March 25, 2026 11:03

longieirl mentioned this pull request Mar 26, 2026

fix(#74): wire Transaction.confidence_score via ExtractionScoringConfig #77

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#32): make ScoringConfig injectable in TemplateDetector#36

feat(#32): make ScoringConfig injectable in TemplateDetector#36
longieirl merged 17 commits intomainfrom
feat/32-scoring-config-injectable

longieirl commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

longieirl commented Mar 24, 2026

Pull Request

Summary

Changes

Type

Testing

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants