refactor(parquet): T-CONTEXTS partial — ParquetExportContext by frapercan · Pull Request #78 · frapercan/PROTEA

frapercan · 2026-05-08T16:00:34Z

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Fourth incremental Parameter Object slice. Tackles the last remaining 16-arg offender in core (after #73, #75, #77).

Changes

protea/core/parquet_export.py:
- New @dataclass(frozen=True) ParquetExportContext bundling 15 per-call inputs grouped into 3 sections (source shards, dataset identity, publishing).
- export_reranker_parquets signature collapses 16 args → 1 (ctx).
- Body destructures the context once at the top so the rest of the implementation stays diff-minimal.
protea/core/training_dump_helpers.py: production caller updated.
tests/test_parquet_export_boundary.py: only test that invokes it updated.

Smell budget

77 → 75 offenders. params>6: 22 → 20 (export_reranker_parquets retired plus 1 knock-on improvement as the dataclass centralises type annotations).

Test plan

poetry run ruff check protea scripts
poetry run flake8 protea/
poetry run pytest tests/ --ignore=tests/test_jobs_pg.py (1163 passed, 11 skipped)
poetry run python scripts/check_smells.py (75 known, none new)

T-CONTEXTS progress (4 slices total)

refactor(features): T-CONTEXTS partial — KnnEnrichmentContext for enrich_v6_features #73: KnnEnrichmentContext — enrich_v6_features 9 → 4 args
refactor(cafa-eval): T-CONTEXTS partial — CafaEvalRunContext #75: CafaEvalRunContext — evaluate_all_settings 18 → 3 args
refactor(reranker-models): T-CONTEXTS partial — _RerankerRegistration #77: _RerankerRegistration — _register_model 10 → 2 args
this PR: ParquetExportContext — export_reranker_parquets 16 → 1 arg

Combined: 53 → 10 args across the four entry points; baseline params>6: 24 → 20 (cumulative drop -4).

Introduces ``ParquetExportContext`` frozen dataclass in ``parquet_export.py`` bundling the 15 per-call inputs (source shards, dataset identity, publishing options) so ``export_reranker_parquets`` no longer trips flake8-bugbear's parameter ceiling. Signature collapses 16 args → 1 (``ctx``). Body destructures the dataclass once at the top so the rest of the function stays diff-minimal. Both call-sites updated to construct the context inline: - ``protea/core/training_dump_helpers.py`` (production dump path) - ``tests/test_parquet_export_boundary.py`` (the only test that invokes it) Sizes: - parquet_export.py: +35 LOC (dataclass + destructuring header) - Smell baseline: 77 → 75 offenders (params>6: 22 → 20: ``export_reranker_parquets`` retired plus 1 knock-on improvement) Local-first 5 verde (ruff + flake8 + pytest 1163 + check_smells).

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Fifth incremental Parameter Object slice. Tackles the remaining 16-arg offender in core training pipeline. ## Changes - `protea/core/training_dump_helpers.py`: - New ``KnnTransferContext`` frozen dataclass bundling the 12 per-call data inputs (queries, references, ontology maps, optional enrichment helpers). - ``_knn_transfer_and_label`` signature collapses 16 args → 5 (``session``, ``p``, ``ctx``, plus existing ``sequence_context`` / ``stream_output``). - Two production call sites updated (train branch + test-stream branch). - `tests/test_knn_streaming_smoke.py`: shared test runner updated. ## Smell budget 77 → 75 offenders. **params>6: 22 → 20** (`_knn_transfer_and_label` retired plus 1 knock-on). ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (75 known, none new) ## T-CONTEXTS progress (5 slices total, 4 in flight pending CI) - #73 KnnEnrichmentContext — enrich_v6_features 9 → 4 args - #75 CafaEvalRunContext — evaluate_all_settings 18 → 3 args - #77 _RerankerRegistration — _register_model 10 → 2 args - #78 ParquetExportContext — export_reranker_parquets 16 → 1 arg - this PR KnnTransferContext — _knn_transfer_and_label 16 → 5 args Combined: 69 args → 15 across the 5 entry points.

frapercan enabled auto-merge (squash) May 8, 2026 16:00

frapercan merged commit 833ad66 into develop May 8, 2026
13 checks passed

frapercan mentioned this pull request May 8, 2026

refactor(training): T-CONTEXTS partial — KnnTransferContext #80

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(parquet): T-CONTEXTS partial — ParquetExportContext#78

refactor(parquet): T-CONTEXTS partial — ParquetExportContext#78
frapercan merged 1 commit intodevelopfrom
feat/t-contexts-parquet-export-reduce

frapercan commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frapercan commented May 8, 2026

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Changes

Smell budget

Test plan

T-CONTEXTS progress (4 slices total)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant