From 3b96693b4ea259b67fe5612de55734b56f904d14 Mon Sep 17 00:00:00 2001
From: longieirl <noreply@github.com>
Date: Wed, 25 Mar 2026 17:25:56 +0000
Subject: [PATCH] docs: populate CHANGELOG and update architecture for v1.2
 (PRs #56-#58)

---
 CHANGELOG.md         | 45 ++++++++++++++++++++++++++
 docs/architecture.md | 75 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 101 insertions(+), 19 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 11bddf3..fdc1b02 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,3 +6,48 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ## [Unreleased]
+
+---
+
+## [0.1.2] — 2026-03-25
+
+### Fixed
+- **#47** — `filter_service.apply_all_filters()` result was computed and logged but silently discarded. Filtered rows are now written back to `result.transactions` in `PDFProcessingOrchestrator.process_all_pdfs()`, so `filter_empty_rows`, `filter_header_rows`, and `filter_invalid_dates` are applied to every successfully extracted PDF.
+- **#52** — `BankStatementProcessorBuilder.with_duplicate_strategy()` and `.with_date_sorting()` were inert: `build()` called `ServiceRegistry.from_config()` with no services, causing the registry to create its own defaults and silently ignore the configured strategy. The builder now constructs `DuplicateDetectionService` and `TransactionSortingService` from its configured values and passes them explicitly into `ServiceRegistry.from_config()`.
+- **#55** — Credit card / no-IBAN PDFs excluded from the `pdfs_extracted` count in processing output. `process_all_pdfs()` now returns a 3-tuple `(results, pdf_count, pages_read)`.
+
+### Changed (architecture cleanup — PRs #56, #57)
+- **#49** — `ChronologicalSortingStrategy` sorts dicts directly via `DateParserService`, removing a redundant `Transaction` round-trip.
+- **#48** — Deferred circular imports in `processor.py` removed; `service_registry`, `monthly_summary`, and `expense_analysis` import `ColumnAnalysisService`/`DateParserService` directly at module level.
+- **#50** — `TransactionClassifier._looks_like_date` delegates to `RowAnalysisService.looks_like_date`, removing a duplicate regex and fixing a subtle 1-or-2-digit day matching bug.
+- **#51** — `ProcessorFactory.create_from_config()` builds `ProcessorConfig` in one block via `BankStatementProcessorBuilder.with_processor_config()`; new config knobs now touch ≤2 files.
+
+---
+
+## [0.1.1] — 2026-03-25
+
+### Added (v1.1 — Transaction Pipeline & Word Utils)
+- **Transaction enrichment** (`source_page: int | None`, `confidence_score: float`, `extraction_warnings: list[str]`) — all three fields default correctly and survive `to_dict` / `from_dict` round-trips (#16 / Phase 21).
+- **`ExtractionResult` dataclass** (`domain/models/extraction_result.py`) — typed extraction boundary with `transactions`, `page_count`, `iban`, `source_file`, and `warnings` fields. Architecture guard test enforces placement in `domain/models/` (#16 / Phase 22).
+- **End-to-end `ExtractionResult` pipeline** — `PDFTableExtractor.extract()`, `ExtractionOrchestrator`, `PDFProcessingOrchestrator`, and `processor` all produce and consume `ExtractionResult`; zero tuple-index unpacking remains (#16 / Phase 23).
+- **`extraction/word_utils.py`** — canonical module for `group_words_by_y`, `assign_words_to_columns` (with `strict_rightmost` flag), and `calculate_column_coverage`. Five callers migrated; four private duplicate methods deleted (#21 / Phase 24).
+
+### Changed
+- **ServiceRegistry** introduced (`feat/28`, PR #44) — `ServiceRegistry.from_config(ProcessorConfig, Entitlements)` wires all transaction-processing services. `TransactionProcessingOrchestrator` deleted (PR #46 / issue #45).
+- **ClassifierRegistry** with explicit integer priorities added to `row_classifiers.py` (fix/29, PR #39).
+- **`recursive_scan` default** changed `False → True` in `ProcessingConfig`, `AppConfig`, `ProcessorBuilder`, and `PDFDiscoveryService`; `RECURSIVE_SCAN` env var added to `docker-compose.yml` (fix/40, PR #41).
+- **`ScoringConfig` injectable** via `BankStatementProcessorBuilder.with_scoring_config()` (feat/32, PR #36).
+
+---
+
+## [0.1.0] — 2026-03-24
+
+### Added (v1.0 — Architecture RFC)
+- **`extraction/word_utils.py`** foundation work — `RowClassifier` chain injected as shared dependency (issue #17, PR #22).
+- **`PDFTableExtractor` decomposed** into `PageHeaderAnalyser`, `RowBuilder`, and `RowPostProcessor` (issue #18, PR #23).
+- **Facade passthroughs deleted** — `content_analysis_facade.py`, `validation_facade.py`, `row_classification_facade.py` removed; service→shim circular import chain broken (issue #20, Phase 20).
+- **`pdf_table_extractor.py` shim** rewired to module-level singletons; `pdf_extractor.py` cleaned of four lazy facade imports.
+- Architecture guard test `test_facade_modules_deleted` added.
+
+### Changed
+- Credit card templates (`aib_credit_card.json`, `credit_card_default.json`) removed from open-source repo; credit card support is PAID tier only via `require_iban=False` in `Entitlements.paid_tier()`.
diff --git a/docs/architecture.md b/docs/architecture.md
index a62d90e..ef63929 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -9,8 +9,8 @@ This document describes the structure of the `bankstatementprocessor` monorepo a
 ```
 bankstatementprocessor/
 ├── packages/
-│   ├── parser-core/          PyPI: bankstatements-core
-│   └── parser-free/          PyPI: bankstatements-free
+│   ├── parser-core/          PyPI: bankstatements-core (v0.1.2)
+│   └── parser-free/          PyPI: bankstatements-free (v0.1.0)
 ├── templates/                shared IBAN-based bank templates
 └── .github/workflows/
     ├── ci.yml                lint + test both packages
@@ -22,15 +22,16 @@ bankstatementprocessor/
 
 The shared parsing library. Contains:
 
-- **`extraction/`** — PDF → rows pipeline (`pdf_extractor`, `boundary_detector`, `row_classifiers`)
-- **`services/`** — 21 single-responsibility services (duplicate detection, sorting, monthly summary, GDPR audit log, etc.)
+- **`extraction/`** — PDF → rows pipeline (`pdf_extractor`, `boundary_detector`, `row_classifiers`, `word_utils`)
+- **`services/`** — single-responsibility services (duplicate detection, sorting, filtering, monthly summary, GDPR audit log, etc.)
+- **`builders/`** — `BankStatementProcessorBuilder` fluent builder
 - **`templates/`** — template model, registry, detectors, and bundled IBAN-based bank templates
-- **`domain/`** — domain models, protocols, currency, dataframe utilities
-- **`config/`** — `AppConfig` dataclass validated from environment variables
+- **`domain/`** — domain models (`Transaction`, `ExtractionResult`), protocols, currency, converters, dataframe utilities
+- **`config/`** — `AppConfig` dataclass validated from environment variables; `ProcessorConfig` for programmatic use
 - **`patterns/`** — Strategy, Factory, Repository implementations
-- **`facades/`** — `BankStatementProcessingFacade` (main orchestrator)
+- **`facades/`** — `BankStatementProcessingFacade` (main orchestrator entry point)
 - **`entitlements.py`** — `Entitlements` frozen dataclass (`free_tier()` and `paid_tier()`)
-- **`processor.py`** — `BankStatementProcessor` (PDF extraction → dedup → sort → output)
+- **`processor.py`** — `BankStatementProcessor` (PDF extraction → filter → dedup → sort → output)
 
 This package has no dependency on any licensing code. The `paid_tier()` entitlement is defined here because it describes a feature set (`require_iban=False`), not access control — activating it requires a valid signed license issued externally.
 
@@ -47,21 +48,36 @@ The free tier processes bank statements that include an IBAN pattern. Credit car
 
 ## Processing Pipeline
 
-The core flow is the same across all distributions:
-
 ```
-app.py
+app.py / ProcessorFactory
   └── BankStatementProcessingFacade.process_with_error_handling()
-        └── BankStatementProcessor
-              ├── PDFExtractor          (page iteration)
-              │     └── BoundaryDetector
-              │     └── RowClassifiers  (Chain of Responsibility)
-              ├── DuplicateDetectionService
-              ├── SortingService
-              └── OutputService         (CSV / JSON / Excel)
+        └── BankStatementProcessor.run()
+              ├── PDFProcessingOrchestrator.process_all_pdfs()
+              │     └── ExtractionOrchestrator.extract_from_pdf()
+              │           └── BankStatementProcessingFacade.extract_tables_from_pdf()
+              │                 └── PDFTableExtractor.extract()    → ExtractionResult
+              │                       ├── BoundaryDetector         (word_utils)
+              │                       ├── RowClassifiers           (chain of responsibility)
+              │                       └── RowBuilder               (word_utils)
+              │     └── TransactionFilterService.apply_all_filters()
+              │           ├── filter_empty_rows
+              │           ├── filter_header_rows
+              │           └── filter_invalid_dates
+              └── ServiceRegistry.process_transaction_group()
+                    ├── EnrichmentService        (Filename, document_type, transaction_type)
+                    ├── DuplicateDetectionService
+                    ├── TransactionSortingService
+                    └── OutputService            (CSV / JSON / Excel)
 ```
 
-`AppConfig` (from environment variables) is the single source of truth for runtime configuration. Use `get_config_singleton()` to access it.
+`ExtractionResult` is the typed boundary between the extraction layer and the service layer:
+- Produced by `PDFTableExtractor.extract()` and propagated unchanged through `ExtractionOrchestrator` and `PDFProcessingOrchestrator`
+- Fields: `transactions: list[Transaction]`, `page_count: int`, `iban: str | None`, `source_file: Path`, `warnings: list[str]`
+- `processor.run()` converts `result.transactions` to `list[dict]` via `transactions_to_dicts()` before handing off to `ServiceRegistry`
+
+`ServiceRegistry` is the wiring point for all post-extraction services. It is constructed by `BankStatementProcessorBuilder.build()` via `ServiceRegistry.from_config()`, which accepts optional injected services to override defaults — enabling custom duplicate strategies and sort orders.
+
+`AppConfig` (from environment variables) is the single source of truth for runtime configuration via Docker/CLI. Use `get_config_singleton()` to access it. For programmatic use, `ProcessorConfig` is constructed directly by the builder.
 
 ---
 
@@ -112,6 +128,27 @@ The free-tier CLI always calls `free_tier()`. The premium distribution validates
 
 ---
 
+## ServiceRegistry
+
+`ServiceRegistry` centralises all transaction-processing service wiring. It is the single construction point for `DuplicateDetectionService`, `TransactionSortingService`, and `IBANGroupingService`.
+
+```python
+# Default construction (services built from config)
+registry = ServiceRegistry.from_config(config, entitlements=entitlements)
+
+# Custom strategy injection (builder passes these in)
+registry = ServiceRegistry.from_config(
+    config,
+    entitlements=entitlements,
+    duplicate_detector=DuplicateDetectionService(my_strategy),
+    sorting_service=TransactionSortingService(my_sort_strategy),
+)
+```
+
+`BankStatementProcessorBuilder` constructs services from its configured strategies before calling `from_config()`, so `.with_duplicate_strategy()` and `.with_date_sorting()` are guaranteed to be honoured.
+
+---
+
 ## Premium Distribution
 
 A separate premium distribution extends the open-source packages with: