Skip to content

RFC: Deepen services layer — ServiceRegistry with internal ServiceContext #28

@longieirl

Description

@longieirl

Summary

The services/ layer contains 23 single-responsibility classes, most under 150 lines. This is the opposite of a deep module: the interface (constructor signature + method names) is nearly as complex as the implementation. Every caller must import, instantiate, and wire services manually, passing column_names, entitlements, and strategy objects individually to each one.

This RFC proposes a ServiceRegistry (Design 3) with an internal ServiceContext frozen dataclass (borrowed from Design 1) to eliminate wiring ceremony without removing or restructuring the individual services.


Problem

Current calling pattern (orchestrators)

PDFProcessingOrchestrator.__init__ receives 8 constructor arguments and conditionally creates 3 default services:

# services/pdf_processing_orchestrator.py:47–93
def __init__(
    self,
    extraction_config: ExtractionConfig,
    column_names: list[str],
    output_dir: Path,
    repository: FileSystemTransactionRepository,
    entitlements: Entitlements | None = None,
    pdf_discovery: IPDFDiscovery | None = None,
    extraction_orchestrator: ExtractionOrchestrator | None = None,
    filter_service: ITransactionFilter | None = None,
):
    self.pdf_discovery = pdf_discovery or PDFDiscoveryService(entitlements=entitlements)
    self.extraction_orchestrator = extraction_orchestrator or ExtractionOrchestrator(
        extraction_config=extraction_config, entitlements=entitlements
    )
    self.filter_service = filter_service or TransactionFilterService(column_names)

TransactionProcessingOrchestrator.process_transaction_group chains 3 enrichment calls, a dedupe call, and a sort call explicitly — logic that must be re-expressed by every caller:

# services/transaction_processing_orchestrator.py:93–112
enriched = self.enrich_with_filename(transactions)
enriched = self.enrich_with_document_type(enriched)
enriched = self.classify_transaction_types(enriched, template)
unique_rows, duplicate_rows = self.duplicate_detector.detect_and_separate(enriched)
sorted_rows = self.sorting_service.sort(unique_rows)
return sorted_rows, duplicate_rows

ProcessorFactory.create_from_config (patterns/factories.py) already performs one round of wiring, but the orchestrators then repeat the pattern internally.

What this causes

  • column_names and entitlements are passed through at least 4 layers before reaching the service that uses them
  • Adding a new service (e.g., IBAN masking, fraud flagging) requires updating constructor signatures, factory wiring, and all test doubles
  • Tests mock individual services; there is no single seam to inject a test double for the entire processing chain

Proposed Interface

ServiceContext (internal, frozen)

A frozen dataclass that carries the shared dependencies once. Never exposed to callers — used only inside ServiceRegistry to wire services.

@dataclass(frozen=True)
class _ServiceContext:
    column_names: list[str]
    debit_columns: list[str]
    credit_columns: list[str]
    totals_columns: list[str] | None
    entitlements: Entitlements | None
    generate_monthly_summary: bool
    generate_expense_analysis: bool

ServiceRegistry

class ServiceRegistry:
    """Single wiring point for all transaction processing services.

    Callers use the primary methods for the 80% case.
    Individual services are accessible via get_*() for the 20% case.
    """

    @classmethod
    def from_processor_config(cls, config: ProcessorConfig) -> "ServiceRegistry":
        """Primary factory. Replaces manual service construction in orchestrators."""
        ...

    # --- 80% case: primary methods ---

    def process_transaction_group(
        self,
        transactions: list[dict],
        template: BankTemplate | None = None,
    ) -> tuple[list[dict], list[dict]]:
        """Enrich → classify → deduplicate → sort. Returns (unique, duplicates)."""
        ...

    def group_and_process_by_iban(
        self,
        transactions: list[dict],
        pdf_ibans: dict[str, str],
    ) -> dict[str, tuple[list[dict], list[dict]]]:
        """Group by IBAN, then process_transaction_group() each group."""
        ...

    def generate_output_summary(
        self,
        unique_rows: list[dict],
        duplicate_rows: list[dict],
        df_unique: pd.DataFrame,
        iban_suffix: str | None = None,
    ) -> dict[str, Any]:
        """Totals + monthly summary + expense analysis (honours feature flags)."""
        ...

    # --- 20% case: escape hatches ---

    def get_duplicate_detector(self) -> IDuplicateDetector: ...
    def get_sorting_service(self) -> ITransactionSorting: ...
    def get_iban_grouping_service(self) -> IIBANGrouping: ...
    def get_monthly_summary_service(self) -> IMonthlySummary: ...
    def get_transaction_filter(self) -> ITransactionFilter: ...
    def get_totals_service(self) -> IColumnTotals | None: ...
    def get_expense_analysis_service(self) -> IExpenseAnalysis | None: ...

Before / after at the call site

# BEFORE — processor.py, per-group processing (5 explicit calls)
enriched = self._transaction_orchestrator.enrich_with_filename(rows)
enriched = self._transaction_orchestrator.enrich_with_document_type(enriched)
enriched = self._transaction_orchestrator.classify_transaction_types(enriched, template)
unique, dupes = self._duplicate_service.detect_and_separate(enriched)
sorted_rows = self._sorting_service.sort(unique)

# AFTER — one call
unique, dupes = self._registry.process_transaction_group(rows, template)

What stays unchanged

  • All 23 individual service classes remain as-is (no deletions, no renames)
  • ProcessorFactory.create_from_config continues to work — the registry is a layer above it
  • Existing tests that mock individual services continue to pass
  • TransactionProcessingOrchestrator and PDFProcessingOrchestrator are not deleted immediately; they thin out to delegates, deprecated in a follow-up

Implementation steps

  1. Create src/bankstatements_core/services/service_registry.py

    • _ServiceContext frozen dataclass (private)
    • ServiceRegistry class with from_processor_config() classmethod
    • Primary methods: process_transaction_group, group_and_process_by_iban, generate_output_summary
    • Escape hatches: get_duplicate_detector(), get_sorting_service(), etc.
  2. Update ProcessorFactory.create_from_config to instantiate and store a ServiceRegistry on the processor, alongside the existing orchestrator construction

  3. Migrate processor.py to call self._registry.process_transaction_group() instead of the explicit 5-call chain

  4. Add boundary tests at the registry seam:

    • test_process_transaction_group_deduplicates_and_sorts
    • test_process_transaction_group_enriches_metadata
    • test_generate_output_summary_respects_feature_flags
    • test_service_registry_from_processor_config_wires_correctly
  5. Deprecate orchestrator wiring methods (enrich_with_filename, enrich_with_document_type, classify_transaction_types) with DeprecationWarning — remove in next minor version


What this does NOT do

  • Does not merge or delete the 23 services
  • Does not introduce a pipeline/stage abstraction (out of scope; consider if a plugin extension point is needed later)
  • Does not change output format logic or extraction pipeline

Coverage impact

Current: 92% (1288 tests). The 4 new boundary tests replace 5–6 existing orchestrator-level tests that mock individual services. Net change: roughly neutral on count, higher signal per test.


Files affected

File Change
src/bankstatements_core/services/service_registry.py New
src/bankstatements_core/patterns/factories.py Minor: instantiate registry
src/bankstatements_core/processor.py Replace 5-call chain with registry call
src/bankstatements_core/services/transaction_processing_orchestrator.py Add DeprecationWarning to enrichment methods
tests/test_service_registry.py New (4 boundary tests)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions