refactor(reranker-models): T-CONTEXTS partial — _RerankerRegistration by frapercan · Pull Request #77 · frapercan/PROTEA

frapercan · 2026-05-08T15:53:43Z

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Third incremental Parameter Object slice (after #73 KnnEnrichmentContext, #75 CafaEvalRunContext).

Changes

New private _RerankerRegistration frozen dataclass bundling the nine non-session inputs both register endpoints (multipart + by-reference) feed into _register_model.
_register_model signature collapses 10 args → 2 (session, reg).
Both call-sites build the registration inline.
Helper is private — only the two router endpoints in this file use it. Tests touch the endpoints by HTTP, never the helper directly, so no test churn.

Smell budget

78 → 77 offenders. _register_model retired from params>6 (23 → 22).

Test plan

poetry run ruff check protea scripts
poetry run flake8 protea/
poetry run pytest tests/ --ignore=tests/test_jobs_pg.py (1163 passed, 11 skipped)
poetry run python scripts/check_smells.py (77 known, none new)

Introduces a frozen private ``_RerankerRegistration`` dataclass bundling the nine non-session inputs both the multipart and by-reference reranker-model endpoints feed into ``_register_model``. ``_register_model`` signature collapses 10 args → 2 (session, reg). Both call-sites build the registration inline. Sizes: - reranker_models.py: +20 LOC (dataclass) - 16 LOC (kwargs at call-sites) ≈ +4 LOC net - Smell baseline: 78 -> 77 (`_register_model` retired from params>6: 23 → 22) Helper is private (underscore prefix) — only the two router endpoints in this file use it. Tests touch the endpoints by HTTP, never the helper directly, so no test churn. Local-first 5 verde (ruff + flake8 + pytest 1163 + check_smells).

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Fourth incremental Parameter Object slice. Tackles the last remaining 16-arg offender in core (after #73, #75, #77). ## Changes - ``protea/core/parquet_export.py``: - New ``@dataclass(frozen=True) ParquetExportContext`` bundling 15 per-call inputs grouped into 3 sections (source shards, dataset identity, publishing). - ``export_reranker_parquets`` signature collapses 16 args → 1 (``ctx``). - Body destructures the context once at the top so the rest of the implementation stays diff-minimal. - ``protea/core/training_dump_helpers.py``: production caller updated. - ``tests/test_parquet_export_boundary.py``: only test that invokes it updated. ## Smell budget 77 → 75 offenders. **params>6: 22 → 20** (`export_reranker_parquets` retired plus 1 knock-on improvement as the dataclass centralises type annotations). ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (75 known, none new) ## T-CONTEXTS progress (4 slices total) - #73: KnnEnrichmentContext — enrich_v6_features 9 → 4 args - #75: CafaEvalRunContext — evaluate_all_settings 18 → 3 args - #77: _RerankerRegistration — _register_model 10 → 2 args - this PR: ParquetExportContext — export_reranker_parquets 16 → 1 arg Combined: 53 → 10 args across the four entry points; baseline params>6: 24 → 20 (cumulative drop -4).

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Fifth incremental Parameter Object slice. Tackles the remaining 16-arg offender in core training pipeline. ## Changes - `protea/core/training_dump_helpers.py`: - New ``KnnTransferContext`` frozen dataclass bundling the 12 per-call data inputs (queries, references, ontology maps, optional enrichment helpers). - ``_knn_transfer_and_label`` signature collapses 16 args → 5 (``session``, ``p``, ``ctx``, plus existing ``sequence_context`` / ``stream_output``). - Two production call sites updated (train branch + test-stream branch). - `tests/test_knn_streaming_smoke.py`: shared test runner updated. ## Smell budget 77 → 75 offenders. **params>6: 22 → 20** (`_knn_transfer_and_label` retired plus 1 knock-on). ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (75 known, none new) ## T-CONTEXTS progress (5 slices total, 4 in flight pending CI) - #73 KnnEnrichmentContext — enrich_v6_features 9 → 4 args - #75 CafaEvalRunContext — evaluate_all_settings 18 → 3 args - #77 _RerankerRegistration — _register_model 10 → 2 args - #78 ParquetExportContext — export_reranker_parquets 16 → 1 arg - this PR KnnTransferContext — _knn_transfer_and_label 16 → 5 args Combined: 69 args → 15 across the 5 entry points.

frapercan enabled auto-merge (squash) May 8, 2026 15:53

frapercan merged commit 179b2e0 into develop May 8, 2026
13 checks passed

frapercan mentioned this pull request May 8, 2026

refactor(parquet): T-CONTEXTS partial — ParquetExportContext #78

Merged

4 tasks

frapercan mentioned this pull request May 8, 2026

refactor(training): T-CONTEXTS partial — KnnTransferContext #80

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(reranker-models): T-CONTEXTS partial — _RerankerRegistration#77

refactor(reranker-models): T-CONTEXTS partial — _RerankerRegistration#77
frapercan merged 1 commit intodevelopfrom
feat/t-contexts-reranker-register

frapercan commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frapercan commented May 8, 2026

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Changes

Smell budget

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant