Skip to content

feat: collapse redundant orchestration layers and retire pdf_table_extractor shim (RFC #19)#26

Merged
longieirl merged 8 commits intomainfrom
feat/rfc-19-collapse-orchestration-shim-clean
Mar 24, 2026
Merged

feat: collapse redundant orchestration layers and retire pdf_table_extractor shim (RFC #19)#26
longieirl merged 8 commits intomainfrom
feat/rfc-19-collapse-orchestration-shim-clean

Conversation

@longieirl
Copy link
Copy Markdown
Owner

Closes #19

Summary

  • Deleted 10 dead private methods from BankStatementProcessor (logic already owned by injected services) and inlined PDFProcessingOrchestrator._process_single_pdf (3-line passthrough with no logic)
  • Redirected all 9 production shim imports across 7 files off pdf_table_extractor.py to real facades (extraction_facade, validation_facade, row_classification_facade, column_config)
  • Annotated shim with DeprecationWarning at module import time (stacklevel=2)
  • Added tests/test_architecture.py CI guard — fails if any production code in src/ imports from the shim
  • Removed 3 test files that exclusively tested deleted dead methods; cleaned 1 stale test from test_repository_integration.py
  • Added .planning/ to .gitignore so GSD planning artifacts are never committed

Coverage: 92.36% (up from 91.1% baseline, threshold 91%)
Tests: 1302 passed, 9 skipped

Test plan

  • CI passes (all tests green, coverage ≥ 91%)
  • test_architecture.py::test_no_production_shim_imports passes (0 shim imports in src/)
  • Importing pdf_table_extractor emits DeprecationWarning
  • Full pipeline smoke test: PDF → BankStatementProcessor → output produces expected transactions

- extraction_orchestrator.py: import extract_tables_from_pdf from extraction_facade (not shim)
- pdf_extractor.py: redirect all 5 inline shim imports to validation_facade / extraction_facade
- pdf_table_extractor.py: annotate with DeprecationWarning at module import time
- pyproject.toml: suppress DeprecationWarning in filterwarnings for legitimate shim test files
- tests/test_architecture.py: new CI guard scanning src/ for shim imports; fails with descriptive violation message if any found
- facades/processing_facade.py: get_columns_config redirected to config.column_config
- services/content_density.py: classify_row_type redirected to row_classification_facade
- services/page_validation.py: classify_row_type redirected to row_classification_facade
- services/row_merger.py: classify_row_type redirected to row_classification_facade
- Full test suite: 1302 passed, 92.36% coverage
…cessor

- Remove _write_csv_with_totals and _append_totals_to_csv (zero callers)
- Remove _filter_rows, _has_row_data, _filter_empty_rows (owned by TransactionFilterService)
- Remove _is_header_row, _filter_header_rows (owned by TransactionFilterService)
- Remove _has_valid_transaction_date, _filter_invalid_date_rows (owned by TransactionFilterService)
- Remove _group_rows_by_iban (owned by TransactionProcessingOrchestrator)
- Drop unused Callable import
- Delete test_iban_grouping.py, test_empty_row_filtering.py, test_header_row_filtering.py
- Remove test_append_totals_uses_repository and appended_csv tracking from test_repository_integration.py
- Replace `result = self._process_single_pdf(pdf); rows, page_count, iban = result`
  with direct call `rows, page_count, iban = self.extraction_orchestrator.extract_from_pdf(pdf)`
- Delete the 3-line passthrough _process_single_pdf method (no logic, pure indirection)
@longieirl longieirl self-assigned this Mar 24, 2026
@longieirl longieirl merged commit cba6d3d into main Mar 24, 2026
11 checks passed
@longieirl longieirl deleted the feat/rfc-19-collapse-orchestration-shim-clean branch March 24, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Collapse redundant orchestration layers and retire the pdf_table_extractor shim

2 participants