RFC: Make template confidence scoring injectable via ScoringConfig

## Problem

`TemplateDetector` aggregates confidence scores from 7 detectors to select a bank template. The scoring policy — detector weights and minimum threshold — is hard-coded as module-level constants:

```python
DETECTOR_WEIGHTS = {"IBAN": 2.0, "ColumnHeader": 1.5, "Header": 1.0, "Filename": 0.8, ...}
MIN_CONFIDENCE_THRESHOLD = 0.6
```

This creates three friction points:

1. **No test safety net for weights/threshold.** The 18 existing tests mock entire detectors; none validates that a specific detector signal at a known confidence selects (or rejects) a template. If a weight or threshold is changed to tune behaviour for a new bank format, there is no test that will catch a regression.

2. **Debugging requires reading 8 files.** To understand why template X was chosen over Y, you must read `DETECTOR_WEIGHTS`, `MIN_CONFIDENCE_THRESHOLD`, and all 7 detector implementations. There is no structured way to ask "what did each detector contribute for this PDF?"

3. **Testing threshold boundaries requires arithmetic gymnastics.** The comment in `test_detect_below_minimum_threshold` already shows this: `# Filename detector returns low confidence (0.5 * 0.8 weight = 0.4 < 0.6 threshold)`. A test author must read the module constants, do the arithmetic, and pick a magic number — which silently breaks if either constant changes.

---

## Proposed Solution

### 1. `ScoringConfig` — injectable weights + threshold

Add a frozen dataclass to `template_detector.py` that carries both constants as fields:

```python
@dataclass(frozen=True)
class ScoringConfig:
    weights: dict[str, float]
    min_confidence_threshold: float

    def __post_init__(self) -> None:
        if not 0.0 < self.min_confidence_threshold <= 1.0:
            raise ValueError(...)
        for name, w in self.weights.items():
            if w < 0.0:
                raise ValueError(...)

    @classmethod
    def default(cls) -> "ScoringConfig":
        """Production scoring — used when no config is injected."""
        return cls(
            weights={"IBAN": 2.0, "CardNumber": 2.0, "LoanReference": 2.0,
                     "ColumnHeader": 1.5, "Header": 1.0, "Filename": 0.8, "Exclusion": 0.0},
            min_confidence_threshold=0.6,
        )

    def weight_for(self, detector_name: str) -> float:
        return self.weights.get(detector_name, 1.0)
```

Module-level `DETECTOR_WEIGHTS` and `MIN_CONFIDENCE_THRESHOLD` are **deleted**. `ScoringConfig.default()` is the single source of truth.

### 2. Backward-compatible constructor

```python
class TemplateDetector:
    def __init__(
        self,
        registry: TemplateRegistry,
        scoring: ScoringConfig | None = None,
    ) -> None:
        self.registry = registry
        self._scoring = scoring if scoring is not None else ScoringConfig.default()
        # detector list unchanged
```

`ExtractionOrchestrator._initialize_template_system()` calls `TemplateDetector(template_registry)` today — **zero changes required**.

### 3. Two mechanical substitutions in the scoring loop

```python
# Before
weight = DETECTOR_WEIGHTS.get(result.detector_name, 1.0)
...
if best_score < MIN_CONFIDENCE_THRESHOLD:

# After
weight = self._scoring.weight_for(result.detector_name)
...
if best_score < self._scoring.min_confidence_threshold:
```

### 4. `get_detection_explanation()` — structured debugging without mocking

```python
@dataclass
class DetectionExplanation:
    selected_template_id: str
    selected_score: float
    threshold: float
    passed_threshold: bool
    per_template_scores: dict[str, float]       # template_id -> weighted total
    per_template_breakdown: dict[str, list[str]] # template_id -> ["IBAN=0.95*2.0=1.90", ...]
    tie_broken: bool
    tie_winner_reason: str | None
    used_default: bool
    default_reason: str | None

def get_detection_explanation(
    self, pdf_path: Path, first_page: Page
) -> DetectionExplanation:
    ...
```

Implementation: extract `_run_detection(pdf_path, first_page)` private helper returning `tuple[BankTemplate, DetectionExplanation]`. `detect_template()` discards the explanation; `get_detection_explanation()` discards the template. No PDF is parsed twice.

`get_detection_explanation()` is **not** added to the `ITemplateDetector` protocol — it is a concrete-class-only debug/test method.

### 5. Exports

Add `ScoringConfig` and `DetectionExplanation` to `bankstatements_core/templates/__init__.py` `__all__`.

---

## Tests enabled by this change

**Threshold boundary without magic numbers:**
```python
def test_filename_at_threshold_selects_template(mock_registry, mock_page):
    scoring = ScoringConfig.default()  # weight=0.8, threshold=0.6
    # 0.75 * 0.8 = 0.60 >= 0.60 → should select
    with patch(FilenameDetector.detect) as mock_fn, ...:
        mock_fn.return_value = [DetectionResult(revolut, 0.75, "Filename", {})]
        ...
        result = TemplateDetector(mock_registry, scoring=scoring).detect_template(...)
    assert result == revolut
```

**Weight ordering determines winner:**
```python
def test_iban_weight_beats_higher_raw_column_header_confidence(mock_registry, mock_page):
    # aib: IBAN(0.5 * 2.0 = 1.0) vs revolut: ColumnHeader(0.6 * 1.5 = 0.9)
    # aib wins even though ColumnHeader raw confidence is higher
    scoring = ScoringConfig.default()
    ...
    assert result == aib
```

**Explanation reveals tie-break reason without reading `_break_tie()`:**
```python
def test_explanation_reports_iban_tie_break_reason(mock_registry, mock_page):
    ...
    explanation = detector.get_detection_explanation(Path("x.pdf"), mock_page)
    assert explanation.tie_broken is True
    assert explanation.tie_winner_reason == "IBAN match"
    assert explanation.per_template_scores["aib"] == pytest.approx(1.0)
    assert explanation.per_template_scores["revolut"] == pytest.approx(1.0)
```

---

## Scope

| File | Change |
|---|---|
| `templates/template_detector.py` | Add `ScoringConfig`, `DetectionExplanation`; revise `__init__`; replace 2 global refs; add `_run_detection`, `get_detection_explanation` |
| `templates/__init__.py` | Export `ScoringConfig`, `DetectionExplanation` |
| `tests/templates/test_template_detector.py` | Add 3–4 new tests; existing 18 tests untouched |
| `templates/detectors/*.py` (7 files) | **No changes** |
| `services/extraction_orchestrator.py` | **No changes** |
| `domain/protocols/services.py` | **No changes** |

---

## What this does NOT change

- The 7 detector files — they remain stateless signal producers
- `ExtractionOrchestrator` — zero call-site changes needed
- The `ITemplateDetector` protocol — `get_detection_explanation` is concrete-only
- The detector list ordering — `ExclusionDetector` stays first by convention; no ordering guard needed since the list is not injectable in this design
- Existing test mocking patterns — all 18 current `@patch` tests continue to work unchanged

File	Change
`templates/template_detector.py`	Add `ScoringConfig`, `DetectionExplanation`; revise `__init__`; replace 2 global refs; add `_run_detection`, `get_detection_explanation`
`templates/__init__.py`	Export `ScoringConfig`, `DetectionExplanation`
`tests/templates/test_template_detector.py`	Add 3–4 new tests; existing 18 tests untouched
`templates/detectors/*.py` (7 files)	No changes
`services/extraction_orchestrator.py`	No changes
`domain/protocols/services.py`	No changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Make template confidence scoring injectable via ScoringConfig #32

Problem

Proposed Solution

1. `ScoringConfig` — injectable weights + threshold

2. Backward-compatible constructor

3. Two mechanical substitutions in the scoring loop

4. `get_detection_explanation()` — structured debugging without mocking

5. Exports

Tests enabled by this change

Scope

What this does NOT change

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RFC: Make template confidence scoring injectable via ScoringConfig #32

Description

Problem

Proposed Solution

1. ScoringConfig — injectable weights + threshold

2. Backward-compatible constructor

3. Two mechanical substitutions in the scoring loop

4. get_detection_explanation() — structured debugging without mocking

5. Exports

Tests enabled by this change

Scope

What this does NOT change

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `ScoringConfig` — injectable weights + threshold

4. `get_detection_explanation()` — structured debugging without mocking