Summary
The services/ layer contains 23 single-responsibility classes, most under 150 lines. This is the opposite of a deep module: the interface (constructor signature + method names) is nearly as complex as the implementation. Every caller must import, instantiate, and wire services manually, passing column_names, entitlements, and strategy objects individually to each one.
This RFC proposes a ServiceRegistry (Design 3) with an internal ServiceContext frozen dataclass (borrowed from Design 1) to eliminate wiring ceremony without removing or restructuring the individual services.
Problem
Current calling pattern (orchestrators)
PDFProcessingOrchestrator.__init__ receives 8 constructor arguments and conditionally creates 3 default services:
# services/pdf_processing_orchestrator.py:47–93
def __init__(
self,
extraction_config: ExtractionConfig,
column_names: list[str],
output_dir: Path,
repository: FileSystemTransactionRepository,
entitlements: Entitlements | None = None,
pdf_discovery: IPDFDiscovery | None = None,
extraction_orchestrator: ExtractionOrchestrator | None = None,
filter_service: ITransactionFilter | None = None,
):
self.pdf_discovery = pdf_discovery or PDFDiscoveryService(entitlements=entitlements)
self.extraction_orchestrator = extraction_orchestrator or ExtractionOrchestrator(
extraction_config=extraction_config, entitlements=entitlements
)
self.filter_service = filter_service or TransactionFilterService(column_names)
TransactionProcessingOrchestrator.process_transaction_group chains 3 enrichment calls, a dedupe call, and a sort call explicitly — logic that must be re-expressed by every caller:
# services/transaction_processing_orchestrator.py:93–112
enriched = self.enrich_with_filename(transactions)
enriched = self.enrich_with_document_type(enriched)
enriched = self.classify_transaction_types(enriched, template)
unique_rows, duplicate_rows = self.duplicate_detector.detect_and_separate(enriched)
sorted_rows = self.sorting_service.sort(unique_rows)
return sorted_rows, duplicate_rows
ProcessorFactory.create_from_config (patterns/factories.py) already performs one round of wiring, but the orchestrators then repeat the pattern internally.
What this causes
column_names and entitlements are passed through at least 4 layers before reaching the service that uses them
- Adding a new service (e.g., IBAN masking, fraud flagging) requires updating constructor signatures, factory wiring, and all test doubles
- Tests mock individual services; there is no single seam to inject a test double for the entire processing chain
Proposed Interface
ServiceContext (internal, frozen)
A frozen dataclass that carries the shared dependencies once. Never exposed to callers — used only inside ServiceRegistry to wire services.
@dataclass(frozen=True)
class _ServiceContext:
column_names: list[str]
debit_columns: list[str]
credit_columns: list[str]
totals_columns: list[str] | None
entitlements: Entitlements | None
generate_monthly_summary: bool
generate_expense_analysis: bool
ServiceRegistry
class ServiceRegistry:
"""Single wiring point for all transaction processing services.
Callers use the primary methods for the 80% case.
Individual services are accessible via get_*() for the 20% case.
"""
@classmethod
def from_processor_config(cls, config: ProcessorConfig) -> "ServiceRegistry":
"""Primary factory. Replaces manual service construction in orchestrators."""
...
# --- 80% case: primary methods ---
def process_transaction_group(
self,
transactions: list[dict],
template: BankTemplate | None = None,
) -> tuple[list[dict], list[dict]]:
"""Enrich → classify → deduplicate → sort. Returns (unique, duplicates)."""
...
def group_and_process_by_iban(
self,
transactions: list[dict],
pdf_ibans: dict[str, str],
) -> dict[str, tuple[list[dict], list[dict]]]:
"""Group by IBAN, then process_transaction_group() each group."""
...
def generate_output_summary(
self,
unique_rows: list[dict],
duplicate_rows: list[dict],
df_unique: pd.DataFrame,
iban_suffix: str | None = None,
) -> dict[str, Any]:
"""Totals + monthly summary + expense analysis (honours feature flags)."""
...
# --- 20% case: escape hatches ---
def get_duplicate_detector(self) -> IDuplicateDetector: ...
def get_sorting_service(self) -> ITransactionSorting: ...
def get_iban_grouping_service(self) -> IIBANGrouping: ...
def get_monthly_summary_service(self) -> IMonthlySummary: ...
def get_transaction_filter(self) -> ITransactionFilter: ...
def get_totals_service(self) -> IColumnTotals | None: ...
def get_expense_analysis_service(self) -> IExpenseAnalysis | None: ...
Before / after at the call site
# BEFORE — processor.py, per-group processing (5 explicit calls)
enriched = self._transaction_orchestrator.enrich_with_filename(rows)
enriched = self._transaction_orchestrator.enrich_with_document_type(enriched)
enriched = self._transaction_orchestrator.classify_transaction_types(enriched, template)
unique, dupes = self._duplicate_service.detect_and_separate(enriched)
sorted_rows = self._sorting_service.sort(unique)
# AFTER — one call
unique, dupes = self._registry.process_transaction_group(rows, template)
What stays unchanged
- All 23 individual service classes remain as-is (no deletions, no renames)
ProcessorFactory.create_from_config continues to work — the registry is a layer above it
- Existing tests that mock individual services continue to pass
TransactionProcessingOrchestrator and PDFProcessingOrchestrator are not deleted immediately; they thin out to delegates, deprecated in a follow-up
Implementation steps
-
Create src/bankstatements_core/services/service_registry.py
_ServiceContext frozen dataclass (private)
ServiceRegistry class with from_processor_config() classmethod
- Primary methods:
process_transaction_group, group_and_process_by_iban, generate_output_summary
- Escape hatches:
get_duplicate_detector(), get_sorting_service(), etc.
-
Update ProcessorFactory.create_from_config to instantiate and store a ServiceRegistry on the processor, alongside the existing orchestrator construction
-
Migrate processor.py to call self._registry.process_transaction_group() instead of the explicit 5-call chain
-
Add boundary tests at the registry seam:
test_process_transaction_group_deduplicates_and_sorts
test_process_transaction_group_enriches_metadata
test_generate_output_summary_respects_feature_flags
test_service_registry_from_processor_config_wires_correctly
-
Deprecate orchestrator wiring methods (enrich_with_filename, enrich_with_document_type, classify_transaction_types) with DeprecationWarning — remove in next minor version
What this does NOT do
- Does not merge or delete the 23 services
- Does not introduce a pipeline/stage abstraction (out of scope; consider if a plugin extension point is needed later)
- Does not change output format logic or extraction pipeline
Coverage impact
Current: 92% (1288 tests). The 4 new boundary tests replace 5–6 existing orchestrator-level tests that mock individual services. Net change: roughly neutral on count, higher signal per test.
Files affected
| File |
Change |
src/bankstatements_core/services/service_registry.py |
New |
src/bankstatements_core/patterns/factories.py |
Minor: instantiate registry |
src/bankstatements_core/processor.py |
Replace 5-call chain with registry call |
src/bankstatements_core/services/transaction_processing_orchestrator.py |
Add DeprecationWarning to enrichment methods |
tests/test_service_registry.py |
New (4 boundary tests) |
Summary
The
services/layer contains 23 single-responsibility classes, most under 150 lines. This is the opposite of a deep module: the interface (constructor signature + method names) is nearly as complex as the implementation. Every caller must import, instantiate, and wire services manually, passingcolumn_names,entitlements, and strategy objects individually to each one.This RFC proposes a
ServiceRegistry(Design 3) with an internalServiceContextfrozen dataclass (borrowed from Design 1) to eliminate wiring ceremony without removing or restructuring the individual services.Problem
Current calling pattern (orchestrators)
PDFProcessingOrchestrator.__init__receives 8 constructor arguments and conditionally creates 3 default services:TransactionProcessingOrchestrator.process_transaction_groupchains 3 enrichment calls, a dedupe call, and a sort call explicitly — logic that must be re-expressed by every caller:ProcessorFactory.create_from_config(patterns/factories.py) already performs one round of wiring, but the orchestrators then repeat the pattern internally.What this causes
column_namesandentitlementsare passed through at least 4 layers before reaching the service that uses themProposed Interface
ServiceContext(internal, frozen)A frozen dataclass that carries the shared dependencies once. Never exposed to callers — used only inside
ServiceRegistryto wire services.ServiceRegistryBefore / after at the call site
What stays unchanged
ProcessorFactory.create_from_configcontinues to work — the registry is a layer above itTransactionProcessingOrchestratorandPDFProcessingOrchestratorare not deleted immediately; they thin out to delegates, deprecated in a follow-upImplementation steps
Create
src/bankstatements_core/services/service_registry.py_ServiceContextfrozen dataclass (private)ServiceRegistryclass withfrom_processor_config()classmethodprocess_transaction_group,group_and_process_by_iban,generate_output_summaryget_duplicate_detector(),get_sorting_service(), etc.Update
ProcessorFactory.create_from_configto instantiate and store aServiceRegistryon the processor, alongside the existing orchestrator constructionMigrate
processor.pyto callself._registry.process_transaction_group()instead of the explicit 5-call chainAdd boundary tests at the registry seam:
test_process_transaction_group_deduplicates_and_sortstest_process_transaction_group_enriches_metadatatest_generate_output_summary_respects_feature_flagstest_service_registry_from_processor_config_wires_correctlyDeprecate orchestrator wiring methods (
enrich_with_filename,enrich_with_document_type,classify_transaction_types) withDeprecationWarning— remove in next minor versionWhat this does NOT do
Coverage impact
Current: 92% (1288 tests). The 4 new boundary tests replace 5–6 existing orchestrator-level tests that mock individual services. Net change: roughly neutral on count, higher signal per test.
Files affected
src/bankstatements_core/services/service_registry.pysrc/bankstatements_core/patterns/factories.pysrc/bankstatements_core/processor.pysrc/bankstatements_core/services/transaction_processing_orchestrator.pyDeprecationWarningto enrichment methodstests/test_service_registry.py