sync: develop ← main (post-CI rescue + F2A.6-real) by frapercan · Pull Request #9 · frapercan/PROTEA

frapercan · 2026-05-06T12:21:05Z

Summary

Fast-forward sync of develop with main after the CI rescue (PR #8) and the F2A.6-real plugin migration.

Develop's tip (c244d25 — workflow bumps) is a strict ancestor of main, so this is a content-equivalent sync with zero unique commits on develop. No conflicts expected.

50+ commits land on develop:

F2A.6-real GOA / QuickGO / UniProt FASTA / UniProt metadata plugin migrations
F2B.1-3 plugin registry API endpoints
D-MIGR-04 ORM forward to bio_utils
Doc-T3/T7/T11 documentation refresh
LAFA wrapper scaffolding (apps/lafa_container/*)
8 ADR resolutions (D04/D06/D07/D10/D25/D27/D28/D29)
The CI rescue squash itself (ccecf8a)

Test plan

CI lint (3.12) green
CI test (3.12) green
Visual scan of file diff vs main (should be empty since it's all of main)

v0.2.0: Scoring engine, CAFA evaluation framework and UI overhaul

Release v0.3.0

…1-6) Contract-first integration with protea-reranker-lab: PROTEA produces parquet datasets + manifest via export operation, consumes trained booster artifacts via RerankerModel + ArtifactStore. Zero runtime cross-imports — only protea_reranker_lab.contracts (pydantic-pure) is shared dev-time. - Phase 3: ArtifactStore abstraction (LocalFs default, MinIO opt-in via docker compose profiles: ["storage"] + [storage] extra). storage config under protea/infrastructure/storage/; PROTEA_STORAGE_* env overrides; tests/test_storage.py. - Phase 4: ExportResearchDatasetOperation + shared parquet_export utility refactored out of train_reranker; operation_catalog entry; routes via protea.jobs queue. - Phase 5: RerankerModel nullable artifact columns (artifact_uri, feature_schema_sha, embedding_config_id FK, ontology_snapshot_id FK, producer_version, producer_git_sha, spec_yaml); alembic migration c517e16da06b with named constraints; scripts/register_reranker.py CLI for run-dir → ORM row promotion. - Phase 6: predict_go_terms reranker integration — strict sha-equality validation at batch-worker level with reranker.schema_mismatch fallback (never crashes inference); reranking.py module (load_reranker, apply_reranker, infer_active_feature_families); 8 new tests covering coordinator validation + batch fallback paths. - Sphinx docs full pass + ADR-007-contract-first-lab-integration. - Thesis LaTeX: new Reranker Promotion Pipeline section in implementation chapter, RerankerModel subsection in data model. - Benchmark router + web UI pages (/benchmark, /experiments), Grafana visitor dashboard, CAFA evaluation pipeline updates, ablation tooling, embedding backend verification script.

Documents the upstream bug, the semantic fix applied in cafaeval-protea (commit cec8ccd), the regression test that gates against future regressions, and the operational steps required to propagate the fix to running worker-evaluations processes. Also records the impact on the 220→230 benchmark PK cells.

…ght 0.3→0.8 - PredictGOTermsPayload and its batch variant now default compute_alignments, compute_taxonomy and compute_reranker_features to True. Prevents future PredictionSets from silently missing features required by alignment-aware scoring configs. - DEFAULT_EVIDENCE_WEIGHTS["IEA"] 0.3 → 0.8: GOA history shows IEA is promoted to experimental codes at a higher rate than ISS/IBA/NAS, so the classic "electronic = weakest" hierarchy underestimates its prior quality. Only affects scorings that consume evidence_weight. - Docstrings refreshed (module, evidence_primary preset, /embeddings/predict). - Fixed 5 TestPredictBatch tests that treated _predict_batch's tuple return as a flat list — pre-existing bug, surfaced while adapting tests to the new defaults.

…ine reranker-train endpoint - embeddings.py: POST /embeddings/predict now publishes to protea.predictions (the dedicated queue already served by worker-predictions-coord) instead of protea.jobs. Brings the API in line with the running split-queue topology. - scoring.py: remove POST /scoring/rerankers/train together with its helpers (_TrainingPair, RerankerTrainRequest) and the model_to_string / reranker_train imports. In-PROTEA training was decoupled into protea-reranker-lab; this finishes the cleanup so no endpoint claims capability that no longer exists.

…verage guard - compute_score now recognises a sixth signal `neighbor_vote_fraction` (already persisted on every GOPrediction). Added to DEFAULT_WEIGHTS. - _PRESET_CONFIGS redesigned so each preset tests a discrete hypothesis: * embedding_only – cosine of the winning neighbour (baseline) * vote_fraction – KNN consensus (new) * alignment_only – NW+SW identity without embedding (new) * embedding_plus_alignment – embedding refined with alignment (renamed from alignment_weighted) * embedding_plus_vote – embedding + consensus (new) * evidence_veto – evidence as a pure multiplier (replaces embedding_plus_evidence; fixes the double-count with evi=0 in weights and formula=evidence_weighted) * composite – linear kitchen-sink with voting, evidence removed from the linear sum Dropped `evidence_primary` — its dominant signal was the evidence_code of the single nearest neighbour, which we know is noisy; revisit when the reranker exposes voter-distribution features. - ORM docstring for FORMULA_EVIDENCE_WEIGHTED now documents the recommended usage (set evidence_weight=0 in weights so the multiplier is applied once). - New `_check_signal_coverage` helper in the router: before /score.tsv and /metrics stream anything, it queries fill rates of the columns backing every active signal; if a required column has 0 rows populated, returns 409 with an actionable detail instead of silently degrading the score. - /score.tsv now exposes a neighbor_vote_fraction column. - Dropped TestTrainReranker tests — leftover after the prior commit that removed the inline POST /scoring/rerankers/train endpoint.

…ncher Previously scripts/worker.py launched the stale-job reaper with timeout_seconds=21600 (6h). With a single protea.predictions.batch worker and 23 predict_go_terms coords dispatched in a batch, the later coords routinely waited past 6h in the FIFO and were killed by the reaper even though the pipeline was making progress upstream. Raising the hard timeout to 24h (still paired with the 30-min stall window) leaves enough headroom for a full PLM×K cross-grid to drain while keeping the stall-based liveness check in place. Also adds the cross-scoring launcher used for the 8-PLM × 3-K × 7 scoring-config benchmark. Dedupes dispatch by (prediction_set_id, scoring_config_id) against both evaluation_result rows AND queued/running run_cafa_evaluation jobs — the earlier launcher only compared against persisted results and filled the queue with up to 20× duplicates per pair.

…YAML - /benchmark/matrix now reads PredictionSet.limit_per_entry and includes it in the per-row key, so the same (embedding, stage, cell) tuple no longer collapses K=3/5/10 into one row. Response exposes the full K catalog via `ks: [3, 5, 10]` and accepts `?k=N` as filter. - Frontend types gain `k`/`ks`; BenchmarkPage ships a K selector (chips next to the stage picker) defaulting to the first available K. - benchmark.yaml refreshed for the redesigned scoring presets: preferred_default now starts with embedding_only (widest coverage on partial runs), and labels cover the seven current presets (dropped labels for removed `evidence_primary`, `embedding_plus_evidence`, `alignment_weighted`).

_write_predictions() built the pred_dict passed to compute_score() with distance, identity_nw/sw, evidence_code and taxonomic_distance only. The newly-added neighbor_vote_fraction signal (scoring preset `vote_fraction`, half of `embedding_plus_vote`, 20 % of `composite`) was silently dropped — compute_score saw value=None and excluded it from numerator and denominator. vote_fraction therefore produced zero scores across every (protein, go_id) row, cafaeval returned empty NK/LK/PK buckets, and the three affected presets showed no cells in the benchmark UI. Add `pred.neighbor_vote_fraction` to pred_dict so the scoring engine sees the signal it already has a weight for.

Drop the MCP server scaffolding under protea_mcp/ — 21 files, ~1.1k LOC total. The module was never wired into any startup or deployment path and has no importers in protea/, scripts/, tests/, pyproject.toml, or docs.

…decoupled training LightGBM training moves out of PROTEA into the standalone protea-reranker-lab repo. PROTEA now owns the KNN + feature pipeline, dataset publishing, and inference; the lab owns training and evaluation. Schema * Add Dataset model (alembic c7bab0210568): immutable record of a published reranker dataset with train_uri / eval_uri / manifest_uri, content fingerprints (schema_sha, manifest_sha), dump parameters and producer provenance (producer_version, producer_git_sha). * Link RerankerModel to Dataset via dataset_id (alembic e037f3ae9f58) and add artifact_uri / external_source / spec_yaml / feature_importance columns so boosters trained in the lab can be registered by reference. Storage abstraction * protea/infrastructure/storage/ becomes an ArtifactStore interface with LocalArtifactStore (file://) and MinioArtifactStore (s3://) backends, selected by PROTEA_STORAGE_BACKEND. get_artifact_store() is the single factory used by export_research_dataset and /reranker-models/import. Operations * TrainRerankerOperation / TrainRerankerAutoOperation are unregistered in operation_catalog.py — they survive only as in-process helpers that ExportResearchDatasetOperation drives in dump_only mode. * export_research_dataset uploads train.parquet / eval.parquet / manifest.json via the configured ArtifactStore and inserts a Dataset row keyed by output_name. * protea.core.reranker exposes prepare_dataset / predict / model_from_string for inference; schema_sha validation is load-bearing. HTTP surface * New routers: /datasets (registry + import) and /reranker-models (multipart import, import-by-reference, list/get). * app.py wires both routers + ArtifactStore startup. Scripts * dump_reranker_dataset.py adapted to the new pipeline. * materialize_lab_intervals.py: new helper that creates EvaluationSet + QuerySet rows for every snapshot pair the lab needs. Tests * test_storage covers both backends. * test_datasets_and_reranker_import_smoke exercises the end-to-end publish → pull → import-by-reference flow. * test_reranker / test_train_reranker trimmed to inference-only paths.

EvaluationSet gains a groundtruth_uri column (alembic 76cafcb8d9be) that points to a frozen parquet of the snapshotted ground-truth annotations, materialised once and reused across run_cafa_evaluation invocations instead of being recomputed from the live ORM each time. The parquet lives in the configured artifact store and is content-addressed. * generate_evaluation_set writes the parquet on creation; existing rows are backfilled by scripts/backfill_evaluation_groundtruth.py. * load_evaluation_data_for_set reads the parquet via the artifact store when groundtruth_uri is set, falling back to the legacy ORM path. * run_cafa_evaluation consumes the parquet directly, eliminating the per-run BFS over GOTermRelationship for ground truth resolution. * load_goa_annotations adds canonical-accession filtering improvements measured against the lab's intervals. * annotations router exposes the groundtruth-related endpoints and switches to BenchmarkConfig dependency for IA file resolution. Worker layout * Add a dedicated worker-evaluations process on protea.evaluations so long Fmax / AuPRC runs don't block the general protea.jobs queue. * Add the missing worker-predictions-coord process (protea.predictions coordinator was previously not started by manage.sh). Tests * test_load_goa_annotations covers the canonical-accession filtering. * test_evaluation_parquet_roundtrip locks the serialise/deserialise contract for ground-truth data. * test_knn_streaming_smoke exercises the streaming KNN path that feeds reranker datasets. scripts/overnight_matrix.py: new orchestrator for the 8-PLM canonical benchmark (bootstrap + predict ×8 + eval ×8) — drives long unattended runs against the lab's interval matrix.

Empty Python packages with zero importers anywhere in the tree: * protea/cli/ + protea/cli/commands/ (CLI scaffolding never built) * protea/utils/ * protea/api/schemas/ * protea/api/services/ Orphan scripts (no references in code, docs, scripts, or pyproject.toml): * Phase A/B ablation cohort: cross_scoring_launcher, feed_evals_phaseA, hybrid_picker_eval, queue_evals_when_ready, query_eval_results, run_ablation_predictions, run_ablation_evaluations * vast.ai deploy/sync: deploy_vast.sh, setup_vast.sh, sync_db_vast.sh * Profilers / verifiers: profile_predict_batch, verify_embedding_backends * Stale overnight runs: overnight_v6, overnight_v6_retry, overnight_v7, overnight_v8 Misc: * Drop unused _JOBS_QUEUE constant in embeddings.py. * .gitignore: exclude data/ (with allow-list for data/benchmarks/), artifacts/, results/, var/, docs/_build/, and the entire logs/ dir (was only excluding *.log + logs/pids/, leaving *.pid files visible).

Queue routing fixes (operations.rst, system_overview.rst): * predict_go_terms publishes to protea.predictions, not protea.jobs. * run_cafa_evaluation publishes to protea.evaluations. * train_reranker / train_reranker_auto are no longer queued — they survive only as unregistered helpers consumed in-process by ExportResearchDatasetOperation in dump_only mode. * export_research_dataset publishes to protea.training (serialised). Lab decoupling narrative (introduction.rst, system_overview.rst, core.rst, configuration.rst): * Document the contract-first split — PROTEA produces the parquet triple + manifest; the lab consumes via the artifact store and registers boosters back through /reranker-models/import. * Drop the inline LightGBM training section; replace with the export + import flow. Reference fixes: * api.rst:282 — /annotations/snapshots/{id}/ia-url is the Information Accretion file URL (used to weight CAFA Fmax / AuPRC), not the InterPro Archive.

The lab refactor introduced four behavioural changes that the existing test suite did not yet reflect — every router and operation that touches artifact storage, ScoringConfig joins, or persisted EmbeddingConfig display columns produced collateral failures (54 of 1037). Root causes addressed: 1. Process-global TTL cache in protea.api.cache List endpoints (configs, snapshots, prediction sets, protein stats, showcase summary) memoise their result for 5 minutes. Tests sharing the cache key returned stale data from earlier sibling tests. Added an autouse fixture that calls ``invalidate()`` before/after every test in the affected modules. 2. ArtifactStore now reached for real run_cafa_evaluation, generate_evaluation_set, and the annotations router DELETE/download endpoints all call get_artifact_store(...) inline, which tries to reach MinIO at localhost:9000 in test environments. Added autouse / per-class fixtures that patch get_artifact_store + load_settings to a MagicMock so unit tests exercise just the orchestration path. 3. Endpoint signatures evolved: - benchmark.list returns a 5th column (prediction_count via correlated subquery) — added the column to the production query and a ``prediction_count`` field in the response so the frontend UI keeps working (apps/web reads p.prediction_count in 4 places). - benchmark.matrix select returns 4-tuples ``(er, embedding_id, k, scoring_name)`` and a separate 7-tuple query for eval-set metadata — tests now wire both via ``_dual_execute``. - showcase.get returns 3-tuples ``(er, cfg, scoring_name)`` — tests updated to match. - _stage_of(result, scoring_name) — second positional arg is now required, "baseline" stage no longer exists (returns None for evaluations with neither scoring nor reranker). - _describe_embedding heuristics deleted — display_name, family, param_count are persisted EmbeddingConfig columns. The dead TestDescribeEmbedding class was removed; remaining tests set the columns explicitly on _make_cfg fixtures. 4. RunCafaEvaluationPayload no longer carries artifacts_dir The op now stages artifacts in a tempfile.TemporaryDirectory and uploads via store.put — replaced the two artifacts_dir tests with one that asserts the store is consulted. Test fixtures updated: * test_api / _make_app: added missing ``operation_registry`` state. * test_annotations_router / test_benchmark_router / _make_app: added missing ``benchmark_config`` state. * test_api_query_sets: corrected ``test_preserves_full_description`` — _parse_fasta unwraps ``sp|P12345|FOO_HUMAN`` to ``P12345`` (preserves the full header in the description, not the accession). Production change (small): ``embeddings.list_prediction_sets`` regained the prediction_count correlated subquery so the frontend prediction-set list shows ``"<n> preds"`` again. This restores parity with the lab refactor's intent — the test mock always expected this column. Result: 1037 passed, 10 skipped, 0 failed.

Two modules with near-identical names lived next to each other: * protea.core.reranker — feature schema + predict / model_from_string * protea.core.reranking — booster cache + load_reranker / apply_reranker reranking.py had a single importer (predict_go_terms.py). Inlining its public surface into reranker.py removes the naming trap (grep "reranker" no longer turns up two unrelated files) and consolidates everything the inference path needs into one place. * load_reranker, apply_reranker, infer_active_feature_families and the process-local _BOOSTER_CACHE move verbatim into reranker.py. * predict_go_terms.py now imports both feature and loader helpers from the merged module. * reranking.py is removed. No behaviour change. 1037/1037 tests still pass.

TrainRerankerOperation was unregistered after the lab decoupling and contained zero ``self`` references — its 10 methods were already pure functions wrapped in a class only so TrainRerankerAutoOperation could share helpers. Removing the wrapper drops ~250 LOC of dead-but-loaded code (file: 2009 → 1757 LOC). Helpers actually used by TrainRerankerAutoOperation (now module-level functions in the same file): * ``_load_parent_map`` * ``_preload_all_embeddings`` * ``_build_reference_from_cache`` * ``_load_sequences`` * ``_load_taxonomy_ids`` * ``_knn_transfer_and_label`` Helpers with no production callers (deleted): * ``_validate`` — payload + name/duplicate checks * ``_load_go_maps`` — covered by ``_load_parent_map`` * ``_load_reference_per_aspect`` — alternative to _build_reference_from_cache * ``_load_query_embeddings`` — Auto loads its own query embeddings * ``summarize_payload`` — never called once the op left the registry Inside TrainRerankerAutoOperation.execute, every ``self._single._foo(...)`` is rewritten to ``_foo(...)`` and the ``_single`` attribute is gone. Tests adapted: * test_train_reranker.py — kept the payload-validation tests and the two helpers that survived (_load_sequences / _load_taxonomy_ids); dropped the test classes for deleted helpers. * test_knn_streaming_smoke.py — calls _knn_transfer_and_label directly instead of through op._knn_transfer_and_label. Suite stays green at 1028 / 1028.

predict_go_terms.py was a 2383 LOC god file mixing three concerns: I/O caching of reference embeddings, PCA artifact persistence, and the actual predict / batch / store operation classes. The first two have nothing to do with the operations themselves — they are reusable persistence helpers that happen to be called from the predict path. * protea/core/disk_cache.py — reference-pool + per-aspect index + annotation-CSR caches under data/ref_cache/. Exports ``_disk_cache_paths``, ``_aspect_index_path``, ``_anno_disk_cache_paths``, ``_build_anno_csr``, ``_load_anno_csr_from_disk``, ``_save_anno_csr_to_disk``, ``_csr_lookup``, ``_derive_reference_views``, ``_load_from_disk_cache``, ``_save_to_disk_cache``. * protea/core/pca_cache.py — per-EmbeddingConfig PCA artifact under protea/artifacts/pca/{id}.npz. Exports ``_pca_state_path``, ``_load_pca_state``, ``_save_pca_state``, ``_load_or_fit_pca_state``. * predict_go_terms.py imports the helpers from their new homes; no call sites change name or signature. * test_predict_go_terms.py imports the disk-cache helpers directly from ``protea.core.disk_cache`` rather than re-exported via ``predict_go_terms``. predict_go_terms.py shrinks 2383 → 2129 LOC (~10% smaller, the remaining bulk is the three operation classes which the next step splits separately). Suite stays green at 1028 / 1028.

…er_and_label The function used to take 20 keyword arguments — a textbook Long Parameter List smell that made every call site span ~15 lines and made parameter changes risky (refactoring.guru: Bloater · Long Parameter List → Introduce Parameter Object). Two natural clusters extracted as small frozen dataclasses: * ``SequenceContext`` — the four optional per-protein lookups (``query_sequences``, ``ref_sequences``, ``query_tax_ids``, ``ref_tax_ids``) used to drive alignment and taxonomy features. * ``StreamOutput`` — the streaming-parquet I/O config (``output_parquet``, ``chunk_rows``) used in dump_only mode to keep peak memory bounded. ``pivot_go_ids`` stays as its own kwarg — it filters records by go_id and is orthogonal to whether streaming is enabled (used by ``test_pivot_filter_drops_non_pivot_terms`` in non-streaming mode). Result: signature shrinks 20 → 15 named parameters; the body is unchanged (the dataclasses are unpacked into the same local names at the top of the function). All three call sites updated. Suite stays green at 1028 / 1028.

stage classification logic was duplicated in two routers: * benchmark.py defined ``_RERANKER_STAGE``, ``_stage_of()`` and ``_stage_kind()``. * showcase.py inlined the same conditional with a comment "Matches benchmark.py semantics without cross-importing" — an explicit acknowledgement of the duplication. That is the textbook Dispensable · Duplicate Code smell (refactoring.guru → Extract Method, then Move Method to a shared home). Created ``protea/api/stages.py`` exporting ``RERANKER_STAGE`` / ``stage_of`` / ``stage_kind`` / ``StageKind``. * benchmark.py re-imports them under the legacy private aliases (``_RERANKER_STAGE``, ``_stage_of``, ``_stage_kind``) so the rest of the file is unchanged. * showcase.py replaces the 6-line ``if/elif/else`` ladder with a single ``stage = stage_of(er, scoring_name)`` call. No behaviour change. 1028 / 1028 still green.

GO aspect strings appeared as bare literals in 30+ places — both as single-char codes ("P"/"F"/"C", the wire format in PostgreSQL and go-basic.obo) and as three-char CAFA codes ("BPO"/"MFO"/"CCO", what cafaeval and the UI use). That is the textbook Bloater · Primitive Obsession smell (refactoring.guru → Replace Type Code with Class). This commit lands the new domain type; consumers migrate in follow-up commits to keep each diff focused and reviewable. * protea/core/domain/aspect.py — ``Aspect`` enum with ``BIOLOGICAL_PROCESS`` / ``MOLECULAR_FUNCTION`` / ``CELLULAR_COMPONENT`` members and ``.code`` / ``.cafa`` properties for the two encodings; ``Aspect.from_code()`` / ``from_cafa()`` build instances from the legacy strings at the boundary. * protea/core/domain/__init__.py — package marker, deliberately free of infrastructure imports so the module can be imported from anywhere in core/ without cycles. * ASPECT_CODES / ASPECT_CAFA_CODES module constants for callers that iterate via tuple destructuring. * tests/test_aspect.py — 21 cases covering the two encodings, parameterised over all three aspects, with explicit roundtrip and invalid-input assertions. Suite: 1028 → 1049 (+21 new).

Six modules used to hardcode the GO aspect tuple ("P", "F", "C") or its CAFA equivalent ("BPO", "MFO", "CCO") inline. They now import the canonical constants from ``protea.core.domain.aspect`` so a future addition or rename happens in one place. * predict_go_terms.py + train_reranker.py — use ASPECT_CODES (single-char wire codes). * showcase.py + benchmark_config.py + annotations.py (download path) — use ASPECT_CAFA_CODES (three-char CAFA codes). * run_cafa_evaluation.py — _NS_LABELS now derives from the enum's ``.cafa`` property; _NS_SHORT is the set view of ASPECT_CAFA_CODES. * train_reranker._ASPECT_NAMES — built as a comprehension over the enum so the lower-cased CAFA suffixes (bpo/mfo/cco) used in model names stay in sync with the canonical encodings. No behaviour change. Suite stays at 1049 / 1049.

The 200-LOC method computed five independent intermediates (metadata collection, Anc2Vec pool, neighbor centroids, tax-voter counters, PCA projection) and then merged them per-row. Each intermediate is now its own private static method so the orchestrator reads as a six-line pipeline instead of a wall of inline computation (refactoring.guru: Bloater · Long Method → Extract Method). New private static methods on PredictGOTermsBatchOperation: * ``_collect_gtids_in_play`` — gather every go_term_id seen as candidate or neighbor annotation. * ``_build_anc2vec_pool`` — materialise the Anc2Vec embedding matrix + has-emb mask + index. * ``_compute_neighbor_centroids`` — per-(q_acc, aspect) centroid + nmat. * ``_compute_tax_voter_counters`` — five per-(q_acc, gtid) dicts that feed the tax_voters_* columns. * ``_compute_pca_projection`` — query embeddings × PCA components. The merge loop stays inline (it has 16 dependencies; extracting it would just trade a long method for an even longer parameter list). No behaviour change — the new helpers are pure rearrangement of the original code. 1049 / 1049 still green.

The 258-LOC method ran the per-aspect KNN, loaded feature-engineering inputs, pre-computed reranker stats, and merged everything into a prediction list. The first two phases are clean independent units; extracting them as helpers turns the orchestrator's prologue from 50 LOC of inline detail into two readable calls (refactoring.guru: Bloater · Long Method → Extract Method). New private methods on PredictGOTermsBatchOperation: * ``_run_knn_per_aspect`` — three independent KNN searches, returns ``(neighbors_by_aspect, all_unique_neighbors)``. * ``_load_feature_engineering_data`` — loads sequences and taxonomy IDs only for the flags that are on; returns the four lookup dicts the per-pair feature builder consumes downstream. The remaining merge phase (per-aspect predictions emission with shared reranker aggregates) stays inline — its 130 LOC carry too many mutually-aliased dicts (vote_count, k_position, vote_min_d, vote_sum_d, go_term_freq, ref_ann_density, pair_features, seen_per_query) for extraction to do anything but trade a long method for an even longer parameter list. No behaviour change. 1049 / 1049 still green.

…odule The seven v6-feature methods on ``PredictGOTermsBatchOperation`` (``_enrich_with_v6_features``, ``_load_go_term_metadata``, ``_collect_gtids_in_play``, ``_build_anc2vec_pool``, ``_compute_neighbor_centroids``, ``_compute_tax_voter_counters``, ``_compute_pca_projection``) used no instance state, ran in sequence, and were called from a single site in ``execute``. That is the textbook Bloater · Large Class smell — they are cohesive enough to become their own module (refactoring.guru → Extract Class). * protea/core/feature_enricher.py — new module exposing two public symbols: ``enrich_v6_features`` (the orchestrator) and ``NEW_V6_FEATURE_KEYS`` (the 25 column names downstream code composes into the bulk-insert schema). All six stage helpers stay module-private. * predict_go_terms.py — drops the seven methods and the two v6-related constants (``_TAX_CLOSE_RELATIONS`` and ``_NEW_V6_FEATURE_KEYS``); imports ``enrich_v6_features`` and re-imports ``NEW_V6_FEATURE_KEYS`` under the legacy private alias so ``_STORE_FLOAT_KEYS`` keeps working without further edits. predict_go_terms.py shrinks 2235 → 1928 LOC (~14% smaller) and ``PredictGOTermsBatchOperation`` is down to 18 attributes. The new module is independently testable — extending the v6 feature set in the future no longer means surgery on the batch operation. No behaviour change. 1049 / 1049 still green.

Pre-existing warnings outside the scope of any specific refactor are fixed in one mechanical pass so the next contributor starts from a clean slate (refactoring.guru: Dispensable · Comments / Dead Code + Composing Methods · Replace Magic Number with Symbolic Constant adjacent — all auto-fixable hygiene). * pyproject.toml — add ``[tool.ruff.lint.per-file-ignores]`` to silence E402 for ``scripts/*.py``: every runner script in there uses the deliberate ``sys.path.insert(0, PROJECT_ROOT)`` pattern before its protea imports so it can be invoked as ``python scripts/foo.py`` without ``poetry install`` first. * protea/api/cache.py — ``Callable`` now imported from ``collections.abc`` (UP035). * protea/api/middleware/visitor_counter.py — replaces ``timezone.utc`` with the ``datetime.UTC`` alias (UP017). * protea/api/routers/scoring.py + protea/infrastructure/orm/models/visitor_event.py + protea/infrastructure/storage/__init__.py — re-sorted import blocks (I001). * protea/api/middleware/visitor_counter.py — drop unused f-string prefix (F541). * tests/test_predict_go_terms.py — drop unused ``op = self._op()`` assignment in ``test_no_reranker_leaves_dicts_untouched`` (F841); the test only needs the payload check. * scripts/overnight_matrix.py — ruff also auto-cleaned a couple of redundant ``else`` branches and import order while we were here. Result: ``poetry run ruff check protea tests scripts`` is now green. 1049 / 1049 tests still pass.

The module was folded into ``protea.core.reranker`` in commit ``ccf8c96``; the dangling ``automodule:: protea.core.reranking`` block made every ``make -C docs html`` build emit a stale-import warning. The narrative the old block carried (``load_reranker`` / ``apply_reranker`` / ``infer_active_feature_families`` semantics) is preserved as a note pointing readers at the merged module, so the documentation page is still self-explanatory for new contributors. Sphinx ``warning`` count drops from 5 to 4. The four remaining are environment-level (numpy multiarray import races with autodoc when Torch is loaded first); they don't block the build.

Hot loops in predict / train build per-row dicts straight from a SQLAlchemy cursor; the DB driver returns each ``qualifier`` and ``evidence_code`` as a fresh Python string even though the value space is tiny (~5-10 distinct GO evidence codes plus ``None``). Without interning, every duplicate costs ~50 B in CPython, so a 5 M-row batch carries ~500 MB of redundant string objects. This is the textbook Flyweight pattern (refactoring.guru): share the immutable intrinsic state across many context objects via a small process-local factory. * ``protea/core/annotation_intern.py`` (new) — exposes ``intern_string`` backed by a setdefault-based pool. The pool tops out at the cardinality of the GO evidence vocabulary (~50 strings in practice); no LRU eviction needed. * ``predict_go_terms._load_annotations_for`` and ``_load_reference_data_per_aspect`` — wrap qualifier and evidence_code with intern_string before stuffing them into the per-row dict. * ``train_reranker._load_reference_per_aspect`` — same wrap. * ``tests/test_annotation_intern.py`` — 8 cases. Suite: 1057 / 1057 (was 1049, +8 new tests).

…OUP BY The Phase-2 commit (f33ad15) added a per-row correlated subquery to return ``prediction_count`` alongside each PredictionSet. Postgres' planner reliably falls into a per-row index probe on the 25M-row ``go_prediction`` table, turning a 100-row list endpoint into a ~30 s/row sequential count — the /evaluation page timed out at 60 s because it called this listing on first paint. Fix: pre-fetch the counts in one ``GROUP BY prediction_set_id`` query (scans the existing ``ix_go_prediction_prediction_set_id`` index in a single pass) and merge them into the response in Python, mirroring the ``list_embedding_configs`` pattern. Wrap the whole thing in the same 5-minute ``cached`` helper so subsequent calls are free. Measured before / after: * /embeddings/prediction-sets 60 000 ms (timeout) → 19 ms cold, 10 ms warm Tests: * test_embeddings_router updated — ``_wire_list_query`` now wires ``session.query.side_effect = [list_query, count_query]`` so the two-query mock matches the new endpoint shape. The third test (``test_annotation_set_without_version``) drops its count assertion because ``count_pairs`` is empty for that case (defaults to 0). Suite stays at 1057 / 1057.

The two store_X operations had identical 30-line implementations of _update_parent_progress (compute_embeddings.py and predict_go_terms.py), differing only in the operation-specific event name passed to emit on parent SUCCEEDED transition. Extracts to protea.core.contracts.parent_progress.update_parent_progress( session, parent_job_id, emit, *, event_name). Both operations now delegate. The DB-level JobEvent row is uniformly named "job.succeeded"; the operation-specific event name only flows through emit() so downstream observers can distinguish which store closed the parent. 5 new tests cover: silent when no row, silent when not last batch, SUCCEEDED transition + emit, race when succeeded returns nothing, event_name passthrough. Suite: 1093 passed, 10 skipped (was 1088 + 5). Part of F0 T0.7 of master plan v3.

@staticmethod

Replaces the inheritance-based UniProtHttpMixin (109 LOC of mixed state and behaviour) with UniProtHttpClient, a composable class. Operations hold a client instance via composition rather than inheritance: before: class InsertProteinsOperation(UniProtHttpMixin, Operation) after: class InsertProteinsOperation(Operation): def __init__(self): self._http_client = UniProtHttpClient() State is private to the client (.session, .requests, .retries) and is reset via .reset() at the start of each execute(). The extract_next_cursor utility moves to a @staticmethod since it has no instance state. Operations call: self._http_client.get_with_retries(url, p, emit) self._http_client.extract_next_cursor(link_header) self._http_client.requests / .retries (for emit fields) Migrates two operations (insert_proteins, fetch_uniprot_metadata) and three test files (test_core, test_insert_proteins, test_fetch_uniprot_metadata) accordingly. test_core renames the test class to TestUniProtHttpClient and adds test_reset_clears_counters. Suite: 1094 passed, 10 skipped (was 1093 + 1). Part of F0 T0.9 of master plan v3.

Inventario sistemático de constantes módulo-level y defaults hardcodeados en payloads y workers. 31 entradas categorizadas en 5 grupos (QueueTuning, WorkerTuning, OperationTuning, APILimits, ResearchKnobs) más 12 estructurales config-exempt (GAF indices, payload shape constraints, PCA dim). Detecta duplicación que la externalizacion dedupica por construcción: _ANNOTATION_CHUNK_SIZE x3, _STREAM_CHUNK_SIZE x2, _MAX_FASTA_BYTES x2. Base directa para T-CONF.2 (externalización a pydantic Settings) y T-CONF.3 (doc viva autogenerada). Part of F0 T-CONF.1 of master plan v3.

Validates a running stack via /health, /health/ready, POST /jobs (ping), poll until succeeded, and the events log. Does not start or stop services (per feedback_no_restart.md). Exits in <2s against a healthy local stack and is dimensioned for CI use too (PROTEA_API_URL + PROTEA_SMOKE_TIMEOUT_S env overrides). Validated against the live local stack: 1/5 -> 5/5 OK. Part of F0 T0.5 of master plan v3.

Introduces protea.config.tuning with: - QueueTuning pydantic model: publisher_max_attempts, publisher_base_delay, oom_max_retries, oom_base_delay, oom_max_delay. Defaults match the previous module-level constants exactly. - TuningSettings root model that composes per-category sub-models (more groups land in follow-up turns). - get_tuning() loader cached via lru_cache. Hierarchy: defaults < protea/config/system.yaml (tuning: section) < env vars PROTEA_TUNING__QUEUE__PUBLISHER_MAX_ATTEMPTS=20. - 19 new tests covering defaults, validation, env coercion, yaml override, env-overrides-yaml, missing yaml section. Migrates the 5 RabbitMQ publisher/consumer constants to read from QueueTuning at call time: - publisher.py: _MAX_ATTEMPTS, _BASE_DELAY removed. - consumer.py: _OOM_MAX_RETRIES, _OOM_BASE_DELAY, _OOM_MAX_DELAY removed; replaced by qsettings reads inside the OOM-handler branch. Tests: existing publisher and consumer tests pass unchanged since defaults match prior values. test_queue.py:524 updated to read from get_tuning() instead of the removed constant. Suite: 1113 passed, 10 skipped (was 1094 + 19). Skeleton for the categorisation in docs/CONFIG_INVENTORY.md. Remaining 4 categories (WorkerTuning, OperationTuning, APILimits, ResearchKnobs) follow the same pattern in subsequent T-CONF.2 increments. Part of F0 T-CONF.2 of master plan v3.

…pers Renames protea/core/operations/train_reranker.py to protea/core/training_dump_helpers.py and removes every literal "train_reranker" snake-case reference from the protea/ subtree. The helpers (TrainRerankerAutoOperation, TrainRerankerAutoPayload, StreamOutput, _knn_transfer_and_label, _load_sequences, _load_taxonomy_ids, _build_reference_from_cache, _preload_all_embeddings, _load_parent_map, TrainRerankerPayload, SequenceContext) keep their CamelCase names so existing call sites in tests and ExportResearchDatasetOperation continue to work via the new path. Updates: - module docstring: removes the "two operations" framing (both were unregistered) and explains the helper's surviving role. - event strings rebranded train_reranker_auto.* -> dump_helper.*. - export_research_dataset.py relay updated accordingly so consumers keep seeing export_research_dataset.* events on the wire. - constant ``name = "research_dataset_dump_helper"`` (was "train_reranker_auto"); the class remains unregistered. - comments in feature_enricher.py, parquet_export.py, generate_evaluation_set.py, predict_go_terms.py and scripts/materialize_lab_intervals.py: rephrased to "the dump helper". - tests/test_train_reranker.py renamed to test_training_dump_helpers.py (+ import path updated). Same for test_knn_streaming_smoke.py imports + mock target. - test_datasets_and_reranker_import_smoke.py asserts the new name is also unregistered; the historical asserts on the old names are gone since "train_reranker" no longer exists in the codebase. AC verification: ``grep -rn "train_reranker" protea/`` returns 0 hits. The same grep over the whole repo (including tests/, scripts/, docs/) is also 0 except for one-line *.md docs that document the historical rename and stay as-is on purpose. Suite: 1113 passed, 10 skipped (unchanged). Part of F0 T0.6 of master plan v3.

…rPayload Continues T0.6 (commit 527e51c) by removing TrainRerankerPayload, the single-pair payload class that no production code referenced. Used to live in train_reranker.py for an Operation that was retired when LightGBM training moved to protea-reranker-lab; the class hung on because tests in test_training_dump_helpers.py exercised it. - Class definition deleted. - Helper signature ``_knn_transfer_and_label`` simplified from ``p: TrainRerankerPayload | TrainRerankerAutoPayload`` to ``p: TrainRerankerAutoPayload``. - Cross-reference comments inlined directly into TrainRerankerAuto Payload field docstrings (KNN backend rationale, ancestor expansion rules, embedding PCA explanation). - 15 tests in TestTrainRerankerPayload removed; only the helper tests (_load_sequences, _load_taxonomy_ids) remain. - Header docstring trimmed to reflect new scope. LOC reduction: training_dump_helpers.py from 1914 to 1860 LOC. Suite: 1098 passed, 10 skipped (was 1113 - 15 dead payload tests). The deeper inline of TrainRerankerAutoOperation into ExportResearchDatasetOperation is deferred to F2 once the feature registry is in place; doing it now would balloon export_research_ dataset by 600 LOC of execute() body for marginal gain. Part of F0 T0.6 of master plan v3.

Second category of the externalised tuning settings. Migrates 7 hardcoded constants from the WorkerTuning row in CONFIG_INVENTORY: - db_pool_size (engine.py:12) 20 - db_pool_max_overflow (engine.py:13) 40 - db_pool_recycle_seconds (engine.py:14) 3600 - model_cache_max (compute_embeddings) 1 - ref_cache_max (predict_go_terms) 1 - reaper_main_timeout_seconds (worker) 86400 (was incorrectly 21600 in the inventory; fixed to match scripts/worker.py) - reaper_default_timeout_seconds 3600 - reaper_stall_seconds 1800 - api_cache_default_ttl_seconds 300.0 Behavioural: - infrastructure/database/engine.py: build_engine() reads pool settings from get_tuning().worker. - core/operations/compute_embeddings.py: removes _MODEL_CACHE_MAX constant; reads dynamically inside _get_or_load_model. - core/operations/predict_go_terms.py: removes _REF_CACHE_MAX constant; reads dynamically before evicting. - api/cache.py: removes _DEFAULT_TTL constant; exposes _default_ttl() function for callers that want the resolved default. The constant was never imported by anyone; it only appeared in __all__. - scripts/worker.py: reaper mode reads reaper_main_timeout_seconds and reaper_stall_seconds from settings, configurable via PROTEA_TUNING__WORKER__REAPER_MAIN_TIMEOUT_SECONDS and PROTEA_TUNING__WORKER__REAPER_STALL_SECONDS. 8 new tests in test_tuning.py: WorkerTuning defaults (pool, cache, reaper), validation (pool>0, reaper>=300), TuningSettings compose with worker, env override of db_pool_size. Suite: 1106 passed, 10 skipped (was 1098 + 8). Two of five categories migrated. OperationTuning, APILimits, ResearchKnobs follow. Part of F0 T-CONF.2 of master plan v3.

Third tuning category. Migrates 4 module-level chunk-size constants that were duplicated across feature_enricher, knn_search, training_dump_helpers, and predict_go_terms. OperationTuning fields: - annotation_chunk_size (10_000) feature_enricher, training_dump_helpers, predict_go_terms (5 helper sites total). - stream_chunk_size (2_000) training_dump_helpers (_preload_all_embeddings) and predict_go_terms (_load_query_embeddings). - store_chunk_size (10_000) predict_go_terms (publishing predictions to protea.predictions.write). - numpy_query_chunk (500) knn_search._search_numpy chunked matrix multiplication (caps the n_queries x n_refs distance matrix peak around 1 GB for default values). Removes 8 module-level constants from 4 files; resolves dynamically inside the helpers via get_tuning().operation.X. Eliminates the triplicate _ANNOTATION_CHUNK_SIZE / duplicate _STREAM_CHUNK_SIZE that the inventory flagged in CONFIG_INVENTORY.md §C. HTTP retry policy / timeouts in pydantic payloads (InsertProteinsPayload, LoadGoaAnnotationsPayload, etc.) intentionally stay where they are. Those are caller-controlled per job, not infra. 3 new tests in test_tuning.py: OperationTuning defaults, validation floors, env override of annotation_chunk_size. Suite: 1109 passed, 10 skipped (was 1106 + 3). Three of five categories migrated. APILimits and ResearchKnobs follow. Part of F0 T-CONF.2 of master plan v3.

Fourth tuning category. Migrates 4 hardcoded boundary limits from the FastAPI router layer. APILimits fields: - max_fasta_bytes (50 MB) duplicated as _MAX_FASTA_BYTES in api/routers/annotate.py and api/routers/query_sets.py; the externalisation dedupes by construction. - max_comment_length (500) api/routers/support.py - recent_limit (20) api/routers/support.py - page_limit (100) api/routers/support.py Behavioural: - annotate.py + query_sets.py: read max_fasta_bytes from get_tuning().api at request time. Error message now formats the configured limit instead of a literal "50 MB" so an operator-set override is reflected back to clients. - support.py: the SupportCreate pydantic Field's static max_length= moves to a field_validator that resolves max_comment_length dynamically. The /support GET reads page_limit and recent_limit from settings. 3 new tests in test_tuning.py: APILimits defaults, validation floors, env override of max_fasta_bytes. Suite: 1112 passed, 10 skipped (was 1109 + 3). Four of five categories migrated. Only ResearchKnobs (mostly config-exempt: PCA dim and N_THRESHOLDS sweep are research-side methodology constants documented in CONFIG_INVENTORY §E) left. Part of F0 T-CONF.2 of master plan v3.

Documents the four migrated TuningSettings categories (Queue, Worker, Operation, APILimits) with field/default/purpose tables, YAML and env-override examples, and config-exempt category callouts (PCA dim, N_THRESHOLDS, GAF indices). Lives inside the existing appendix/configuration.rst so the reference is a single document. Part of F0 T-CONF.3 of master plan v3.

Adds protea-contracts, protea-method, protea-sources, protea-runners and protea-backends as develop=true path-deps under [tool.poetry.group.plugins.dependencies]. Install with poetry install --with plugins. End-to-end discovery verified: importlib.metadata.entry_points(group='protea.sources|runners|backends') resolves correctly from inside PROTEA's venv: 3 sources (goa/quickgo/uniprot), 3 runners (baseline/knn/lightgbm), 4 backends (ankh/esm/esm3c/t5). Suite still 1112 passed, 10 skipped. Part of F0 T0.15 of master plan v3.

Adds .github/workflows/security.yml with two jobs: - pip-audit: scans installed dependencies against the OSV database. Non-blocking in F0 (the existing surface has 22 known CVEs, all in third-party transitive deps; transformers 4.48.x dominates with 11 CVEs that need a coordinated bump). Master plan v3 F-OPS T-OPS.7 will flip this to fail on severity HIGH. - bandit: security static analysis against protea/. Runs in HIGH severity + HIGH confidence mode at F0 (zero findings now); will tighten in F-OPS. Triggers: push, PR, and a weekly cron (Mon 06:00 UTC) so freshly disclosed CVEs surface even when no PR has landed. Inline fixes for the two bandit B324 findings (weak MD5 hash): - protea/core/reranker.py: cache key tag in _load_artifact_to_disk. - protea/infrastructure/orm/models/sequence/sequence.py: sequence dedup key. Both pass usedforsecurity=False (Python 3.9+ flag) to declare intent; collision resistance is irrelevant in either context (cache key tag and dedup hash, not security primitives). Bandit config in pyproject.toml [tool.bandit]: excludes tests/ and the lab archeology dump script; skips B404/B603/B101 (subprocess imports + assert usage) which are project-level acceptable. Suite: 1112 passed, 10 skipped (unchanged). Part of F0 T0.4 of master plan v3.

Removes the duplicated definitions of feature schema, payload classes and ProteaPayload base from PROTEA. They now live exclusively in ``protea-contracts`` (v0.1.0). PROTEA modules re-export the names from their original module locations so existing imports keep working; new code should import from ``protea_contracts`` directly. Files touched: - protea/core/reranker.py - Drops 73 lines of NUMERIC_FEATURES / EMBEDDING_PCA_DIM / CATEGORICAL_FEATURES / ALL_FEATURES / LABEL_COLUMN definitions. - Re-exports the same names from protea_contracts. - fit_embedding_pca remains local (it's logic, not contract). - protea/core/contracts/operation.py - Drops the 11-line ProteaPayload class definition. - Re-exports it from protea_contracts. - Drops the now-unused ``BaseModel, ConfigDict`` import. - protea/core/operations/predict_go_terms.py - Drops 119 lines of PredictGOTermsPayload / PredictGOTermsBatchPayload / StorePredictionsPayload classes. - Re-exports them from protea_contracts. - Drops now-unused imports (Annotated, Field, field_validator) and the local PositiveInt alias. Net diff: -218 / +30 in PROTEA. Logic preserved exactly: every existing call site (15 files imported one of these names) keeps working through the re-exports. Suite: 1112 passed, 10 skipped (unchanged). The protea-contracts suite (71 passed, cov 95%) covers the moved definitions; PROTEA's existing tests cover the integration. Part of F1 T1.5 of master plan v3.

Pins the contract between protea_contracts (canonical) and PROTEA's re-exports / future registry. 14 tests in 4 classes: - TestReexportIdentity (7 tests, active): every constant PROTEA still re-exports must be the same object as protea_contracts (ALL_FEATURES, NUMERIC, CATEGORICAL, EMBEDDING_PCA_DIM, LABEL_COLUMN, ProteaPayload, the 3 predict payloads). Hard guarantee that 'from protea.core.reranker import ALL_FEATURES' will not silently diverge from 'from protea_contracts import ALL_FEATURES'. - TestShaConsistency (2 tests, active): compute_schema_sha produces the same digest regardless of caller path; pinned to the golden 145592ed186c so PROTEA CI fails before the booster cache invalidates. - TestFeatureFamilyCoverage (3 tests, active): every family member lives in ALL_FEATURES; emb_pca family size matches EMBEDDING_PCA_DIM; canonical naming. - TestRegistryCoversContracts (2 tests, skipped): activates automatically when F2B.1 ships protea/core/features/registry.py; asserts set(REGISTRY.names()) == set(ALL_FEATURES) and family map equality. Suite: 1124 passed, 12 skipped (was 1112 + 12 active + 2 dormant). Part of F1 T1.7 of master plan v3.

…ference Two boundary validations against the canonical protea_contracts schema: Export side (parquet_export.py): before writing train/eval parquets, compute compute_schema_sha([c for c in shard.columns if c in ALL_FEATURES]) and compare to compute_schema_sha(ALL_FEATURES). Mismatch raises ValueError with the missing/extras list, instead of silently shipping a partial dump that LightGBM training would choke on. Pure invariant check; the legacy schema_sha hash in the manifest is unchanged (T1.6 of master plan v3 owns the migration to schema_sha_v2). Inference side (predict_go_terms._apply_reranker_if_aligned): switches the import of compute_feature_schema_sha from protea_reranker_lab.contracts to protea_contracts. Functions are byte-identical so behaviour is preserved; the canonical source is now protea_contracts (single source of truth). 5 new tests in test_parquet_export_boundary.py: full columns pass, missing column in train raises, missing column in eval raises, typo feature name raises, empty eval shard skipped. Suite: 1129 passed, 12 skipped (was 1124 + 5). Part of F1 T1.8 of master plan v3.

Replaces the hardcoded if/elif chain in compute_embeddings._load_model with discovery via the protea.backends entry_points group. The four backend plugins (esm, t5, ankh, esm3c) shipped by protea-backends are now resolved dynamically; adding a new backend is a pyproject entry plus a class — no edits to compute_embeddings required. Scope of this refactor: - Module-level _load_model now calls _resolve_backend(model_backend) + plugin.load_model(model_name, device, emit). The (model, tokenizer) return shape stays exactly the same (tokenizer is None for ESM-C, matching the legacy path). - The legacy "auto" alias maps to "esm" exactly as before. - Plugin discovery is cached in module-level _BACKEND_PLUGINS and populated on first call (lazy: avoids running entry_points scan at import time). - Plugin name attribute is asserted to match its entry_point name on first load. Silent drift would yield confusing "unknown backend" errors; we'd rather fail loud. Out of scope (deferred to F2C): - _embed_batch dispatch keeps the legacy if/elif chain calling _embed_esm / _embed_t5 / _embed_ankh / _embed_esm3c. The plugin's embed_batch returns a flat (batch_size, hidden_dim) ndarray, while the legacy _embed_* return list[list[ ChunkEmbedding]] with full chunk + layer + pooling support. The contract extension is a separate task; this commit only swaps the load path where the API signatures already line up. - Cov gate bump in protea-backends CI: deferred until an integration runner installs an extra and exercises the plugin's load_model. Bumping the gate to 25% on the strength of unit tests alone would just be theatre. Tests: - tests/test_compute_embeddings_backend_dispatch.py: 7 new tests covering plugin discovery, entry_point/name parity, "auto" alias, unknown-backend error path, _load_model emit/delegate behaviour, cache identity, and re-import semantics. Suite: PROTEA 1136 passed, 12 skipped (was 1129 / 12; +7 new). Plugin discovery confirmed working from the PROTEA venv: >>> from importlib.metadata import entry_points >>> {ep.name for ep in entry_points(group="protea.backends")} {'ankh', 'esm', 'esm3c', 't5'} Pairs with the protea-backends 011b27d commit declaring per-backend optional dependency extras. Part of F2A.5 of master plan v3.

Map every strategic decision in master plan v3 (2026-05-05) to a navigable ADR stub under docs/source/adr/. Uniform format per file: status (Accepted, Pending, Deferred or Obsolete), date, phase introduced, gate (if pending), context (2-3 sentences), decision (1-2 sentences), consequences (1-2 bullets) and resolution. Index reorganised into two layers: - Implementation decisions (001-008): runtime, ORM, queue topology and similar choices that surfaced while building. - Strategic decisions (D1-D30): plan-level decisions from the master plan. Each row in the strategic table carries a status badge so the open work visible at a glance. Eight gates pending human action explicitly listed: D4 API versioning (gate F4), D6 authentication (gate F5), D7 observability stack (gate F-OPS), D10 schema_sha v2 migration (T1.6 gate D10), D25 HPC mode (gate F-OPS), D27 image registry (gate F-OPS), D28 secrets management (gate F-OPS), D29 release pipeline (gate F-OPS). Sphinx build clean: build succeeded, 4 pre-existing warnings (none from the new files).

First Level-1 plugin migration: ``LoadGOAAnnotationsOperation`` delegates HTTP + gzip + GAF parsing to ``protea_sources.goa.GoaSource.stream``, becoming a thin persistence adapter that owns DB filtering, GO-term resolution, dedup, and ``pg_insert``. Pairs with ``protea-sources/d1d60f6`` (``GoaSource.stream`` real implementation) and ``protea-contracts/20987a5`` (``GoaStreamPayload`` + ``GoaAnnotationRecord``). What moved out: * ``_stream_gaf`` body (~30 LOC of HTTP/gzip/parsing): now a one-liner that constructs a typed ``GoaStreamPayload`` and yields from ``goa_plugin.stream``. * Eight ``_IDX_*`` GAF column constants: now in protea-sources. * ``import gzip``, ``import io``, ``import requests``: removed — the plugin owns the network and decode layers. What stayed: * ``_load_accessions`` (canonical-accession universe). * ``_load_go_term_map`` (GO-id → term-id). * ``_store_buffer`` (dedup + pg_insert with on_conflict_do_nothing). Now consumes ``GoaAnnotationRecord`` via attribute access (``rec.accession``) instead of dict access (``rec["accession"]``). * ``_maybe_enqueue_atomic_eval`` (auto-eval child job). * Operation lifecycle, ``LoadGOAAnnotationsPayload`` validation, ``OperationResult`` shaping. Tests updated, not extended: * ``_make_record`` test fixture now constructs ``GoaAnnotationRecord`` instances; ``with_from=""`` becomes ``with_from=None`` (semantically identical, the old code converted "" → None at insert time). * ``TestStreamGaf`` patches now target ``protea_sources.goa.requests .get`` instead of the operation-local ``requests.get``. Assertions migrated from dict access to attribute access. * ``rec.copy()`` → ``rec.model_copy()`` (pydantic v2 deprecation). Behavioural parity: * ``_store_buffer`` still does ``rec.accession.strip()`` for the DB-lookup field (parser preserves raw GAF columns; strip happens where the lookup needs it). Same observable behaviour as before. * Empty optional fields ("" → None) handled by the parser at the boundary, not by the operation. No DB-insert diff. * Dedup key ``(set_id, accession, go_term_id, evidence_code)``, ``on_conflict_do_nothing(constraint=...)`` constraint, page-level commit policy: all preserved verbatim. Suite: 1136 passed, 12 skipped (= unchanged from master). The 54 ``test_load_goa_annotations`` cases all pass on the new boundary. Why Level 1 only (the design discipline): protea-sources is a leaf C-stack package; importing ``protea.infrastructure.orm.*`` would invert the dependency direction. Level 1 (HTTP + parsing) cuts cleanly along the SQLAlchemy boundary; Level 2 (move the operation entirely) waits for F2C ORM extraction. See ``~/Thesis/f2a6_real_migration_design.md``. Pattern locked for the remaining migrations (QuickGO, UniProt FASTA, UniProt metadata): typed ``<Name>StreamPayload`` + ``<Name>Record`` in protea-contracts, ``<Name>Source.stream`` in protea-sources, operation refactor here. Part of F2A.6-real migration plan (master plan v3).

…s plugin Second Level-1 plugin migration. LoadQuickGOAnnotationsOperation delegates HTTP + TSV streaming + ECO mapping to the protea-sources QuickGoSource plugin, becoming a thin persistence adapter. Pairs with protea-sources/f37dfce (real plugin) and protea-contracts/ c5433ed (typed payloads + record). What moved out: * _stream_quickgo body (~70 LOC of batching, HTTP, TSV parsing): now a one-liner constructing a typed QuickGoStreamPayload and yielding from quickgo_plugin.stream. * _fetch_quickgo_page method: deleted entirely. Plugin owns the per-batch HTTP fetch. * _load_eco_mapping body (~13 LOC of HTTP + line parsing): now one call to quickgo_plugin.fetch_eco_mapping(EcoMappingPayload( url=...)). Operation keeps the wrapper for the empty-URL short circuit (returns {} when eco_mapping_url is None). * import io, import requests: removed from the operation module. What stayed: * _load_accessions (canonical + protein accession universes). * _load_go_term_map (GO-id -> term-id). * _store_buffer (dedup + ECO map application + pg_insert with on_conflict_do_nothing). Now consumes QuickGoAnnotationRecord via attribute access (rec.accession) instead of dict access (rec["GENE PRODUCT ID"]). * Operation lifecycle, LoadQuickGOAnnotationsPayload validation (which keeps page_size, total_limit, commit_every_page knobs that are operation-side concerns and don't belong in the plugin payload). Tests updated: * _record(...) helper builds QuickGoAnnotationRecord instances from kwargs; replaces the verbose dict-literal _QUICKGO_ROWS. * TestStoreBuffer (~9 tests) consumes records, not dicts. The test_empty_eco_id_becomes_none test now passes eco_id=None directly (parser-side empty-cell handling). The test_empty_accession_skipped test was renamed to test_unknown_accession_skipped: whitespace handling moved to the parser in protea-sources, so the operation only sees accessions that don't match valid_accessions. * TestLoadEcoMapping (~5 tests): patches and event names swapped to source.quickgo.eco_mapping_*. The empty-URL short circuit test stayed — operation-side behaviour, not plugin-side. * TestStreamQuickgo + TestExecute: patches swapped to protea_sources.quickgo.requests.get. Batching event name swap to source.quickgo.batching. * TestFetchQuickgoPage class deleted entirely (~135 LOC). The tests were exercising the parser through HTTP mocks; the parser is now in protea-sources where parse_quickgo_row and parse_quickgo_tsv have full unit tests. Net -8 tests in PROTEA, +9 unit tests in protea-sources for a strictly better surface. Behavioural parity: * Empty cells -> None at parser boundary (matches old _store_buffer "or None" handling at insert time). No DB-insert diff. * Dedup, on_conflict_do_nothing constraint, ECO map application via eco_map.get(rec.eco_id, rec.eco_id): preserved verbatim. * Multi-batch URL construction: identical (plugin's gene_product_batch_size matches operation's payload field). * Event names changed (load_quickgo_annotations.* -> source.quickgo.* for plugin-emitted events). Operation-side events unchanged. Downstream consumers reading JobEvent rows must filter on the new prefix; flagged here for the operator changelog. Suite: 1128 passed, 12 skipped (= -8 from master because the 8 redundant TestFetchQuickgoPage cases were deleted, not regressed). The 37 test_load_quickgo_annotations cases all pass on the new boundary. Part of F2A.6-real migration plan, step 2 of 4. Pattern locked for the remaining UniProt FASTA + UniProt metadata migrations.

Two fixes plus an expansion of the documented module surface. Removes: - The autodoc directive for protea.core.operations.train_reranker, orphaned since T0.6 removed the file. Sphinx no longer reports a ModuleNotFoundError during build. - A broken :doc: cross-reference to a non-existent /refactoring/design-patterns/flyweight page in the protea.core.annotation_intern module docstring. Replaced with plain text "Flyweight-style". Adds documentation for modules introduced or moved during F0 / F1: - protea.core.contracts.parent_progress (T0.7 dedup helper). - protea.core.retry (T0.3 retry middleware). - protea.core.operation_catalog (singleton OperationRegistry builder). - protea.core.training_dump_helpers (T0.6 home of helpers that survived the train_reranker.py deletion; reused by ExportResearchDatasetOperation). - An "Internal helpers" section covering protea.core.{ anc2vec_embeddings, annotation_intern, disk_cache, feature_enricher, pca_cache} for completeness. Build verification: poetry run sphinx-build returns "build succeeded, 5 warnings". Of those, 4 are pre-existing environmental failures (numpy._core.multiarray import error during autodoc of modules that import numpy, plus the cosmetic _static directory missing). The previously-introduced train_reranker and flyweight warnings are gone. Doc-T3 of the documentation lane.

…utils Replaces the inline implementations in Protein.parse_isoform and Sequence.compute_hash with one-line forwarders to ``protea_contracts.bio_utils``. The canonical authority moves to the contracts package so the upcoming UniProt FASTA parser in ``protea-sources`` can reuse the helpers without inverting the C-stack dependency direction. Files: * protea/infrastructure/orm/models/protein/protein.py: the isoform-splitting body becomes a single delegated call; module docstring on the wrapper explains the move so future grep on "parse_isoform" lands callers in the right place. * protea/infrastructure/orm/models/sequence/sequence.py: the MD5 body becomes a delegated call; the now-unused ``import hashlib`` is removed. Behavioural parity preserved bit-for-bit: * parse_isoform("P12345") -> ("P12345", True, None) — unchanged. * parse_isoform("P12345-2") -> ("P12345", False, 2) — unchanged. * compute_hash("MKTAYIAK") -> identical 32-hex MD5 — unchanged. Existing call sites in protea/api/routers/query_sets.py, protea/api/routers/annotate.py, protea/core/operations/fetch_uniprot_metadata.py, protea/core/operations/insert_proteins.py keep working unchanged because the public API on the ORM classes is preserved (Protein .parse_isoform, Sequence.compute_hash). They will be migrated to direct imports from protea_contracts as their respective files get refactored in F2A.6-real subsequent steps. Suite: PROTEA 1128 passed, 12 skipped (= unchanged from turn 27). The 6 callsites in tests (test_insert_proteins, test_integration) exercise the wrappers transparently. Pairs with protea-contracts/18e92af which adds the canonical ``parse_isoform`` and ``compute_sequence_hash`` plus 12 unit tests in protea_contracts/bio_utils.py. Part of F2A.6-real migration plan (D-MIGR-04), prerequisite for step 3 (UniProt FASTA migration).

Adds docs/source/plugin-authoring.rst as the canonical entry point for plugin authors, and links it from the main toctree in docs/source/index.rst. Scope: - Architecture overview in one paragraph (protea-core platform plus four sibling plugin layers). - Table of the four layers (annotation sources, embedding backends, experiment runners, feature registry) with their ABC, repository and entry-point group. - Decision tree for picking the right ABC depending on what the author wants to add. - Anatomy of a plugin in 5 steps that apply uniformly across the three entry-point-driven layers, plus the in-process pattern for feature registry contributions. - Pointers to the per-repo contributing guides shipped on the doc lane: protea-backends/docs (Doc-T1) and protea-contracts/docs (Doc-T2). The protea-sources and protea-runners guides land in Doc-T8. - Discovery snippet (importlib.metadata.entry_points) that mirrors what protea-core does at startup, including the name-vs-entry-point sanity check. - Schema invariants and reproducibility section linking ADR D10 (schema_sha v2 migration) and the float16 embedding contract. - Roadmap section pointing to upcoming master-plan v3 phases that affect plugin authors (F2A.7 lightgbm absorption, F2B feature registry wiring, F2C protea-method extraction, F9 post-defense granularity decision). Build verification: poetry run sphinx-build returns "build succeeded, 5 warnings" (same 5 pre-existing warnings as before; the new page introduces zero warnings). Doc-T7 of the documentation lane. Implements F7.6 of master plan v3 ("Plugin author guide").

Third Level-1 plugin migration. InsertProteinsOperation delegates HTTP retries + cursor pagination + gzip decoding + FASTA parsing to the protea-sources UniProtSource plugin, becoming a thin persistence adapter. Pairs with protea-sources/fadbd6b (real UniProtSource.stream_fasta + _http.py) and protea-contracts/ f1bf7b5 (typed payload + record). What moved out: * _fetch_fasta_pages body: now a one-liner constructing a typed UniProtFastaStreamPayload and yielding from uniprot_plugin.stream_fasta. Renamed to _stream_fasta to reflect per-record yield (D-MIGR-01). * _decode_response method (gzip / utf-8 wrapper): plugin owns it. * _parse_fasta + _parse_header methods (~70 LOC of FASTA parsing, OS/OX/GN regex, isoform splitting via Protein.parse_isoform): plugin owns it. The OS/OX/GN regex constants and isoform logic move with them. * UNIPROT_SEARCH_URL constant: now in UniProtFastaStreamPayload.base_url default. * Removed imports: gzip, re, requests, Response, BytesIO, quote, UniProtHttpClient (legacy PROTEA-side copy stays in protea/core/utils.py until step 4 deletes it as the last caller fetch_uniprot_metadata also migrates). * Removed state: self._http_client, self._total_results. What stayed: * Operation lifecycle, InsertProteinsPayload validation, batching policy (page_size buffer flush), session.add_all + flush against Protein + Sequence tables, conservative-update logic for existing proteins. * _store_records (full upsert path) — now consumes UniProtProteinRecord via attribute access (rec.accession, rec.canonical_accession, etc.) instead of dict access. * _load_existing_sequences + _load_existing_proteins (DB lookup helpers). Behavioural diffs surfaced: * pages now counts DB-side buffer flushes, not HTTP pages. The HTTP-page count is the plugin's internal concern (visible via source.uniprot_fasta.fetch_page_done events). pages is more useful to monitor DB throughput; the rename in semantics is captured in the relevant test docstrings. * X-Total-Results header capture (op._total_results) is removed. The header was nice-to-have for progress reporting and not load-bearing for correctness; progress totals now flow only when the user sets total_limit. Operator changelog flagged. * Plugin-emitted events use the source.uniprot_fasta.* prefix; operation-emitted events keep insert_proteins.* prefix. Tests refactored: * TestParseFasta class deleted (~95 LOC, 11 tests). Parser is now in protea-sources where parse_fasta_header + parse_fasta_text have full unit coverage. * TestDecodeResponse class deleted (~25 LOC, 2 tests). Decode is in the plugin's _decode_response_body helper, exercised via the gzip stream wiring tests in protea-sources. * test_total_results_from_header + test_total_results_invalid deleted (2 tests). The operation no longer captures X-Total- Results. * TestStoreRecords: dict-literal record fixtures replaced with a _make_record(...) helper that builds UniProtProteinRecord via the bio_utils helpers (same MD5 hash, same canonical splitting). Two test bodies shrink ~17 LOC each. * TestInsertProteinsOperationExecute: patch target swap from ``op._http_client.session.get`` to ``op._uniprot_plugin._client.session.get`` across 16 sites. test_empty_page_continues renamed to test_empty_page_does_not_flush with the new pages=0 expectation. test_progress_emission_with_total renamed to test_progress_emission_with_total_limit; uses page_size=1 + total_limit=100 to force a flush + carry the progress total. * Net -16 PROTEA tests (parser+decode+total_results all moved or deleted), corresponding +56 in protea-sources for a strictly better surface. Suite: PROTEA 1112 passed, 12 skipped (was 1128; -16 from deletions). Ruff full + mypy strict green on touched files. Part of F2A.6-real migration plan, step 3 (b) of 4. The legacy UniProtHttpClient in protea/core/utils.py becomes dead code once step 4 (UniProt metadata migration) lands; deletion deferred to that turn.

Closes F2A.6-real with the fourth Level-1 plugin migration plus the dead-code cleanup that was waiting on it. Migration: * FetchUniProtMetadataOperation delegates HTTP retries + cursor pagination + gzip decoding + TSV parsing to the protea-sources UniProtSource.stream_metadata plugin method (added in protea-sources/2a6ef55). Operation becomes a thin persistence adapter focused on FIELD_MAP DB upsert and update_protein_core side effects. * Removed: _fetch_tsv_pages (~70 LOC of HTTP + URL construction), _decode_response (gzip wrapper), _parse_tsv (csv.DictReader). All three live in the plugin now. * Removed state: self._http_client, self._total_results. * UNIPROT_FIELDS constant kept on the operation class — the field list is a persistence concern (which DB columns get populated). Same field set passed to the plugin via UniProtMetadataStreamPayload.fields. * Imports trimmed: csv, gzip, BytesIO, StringIO, quote, requests, Response, UniProtHttpClient. Replaced with protea_contracts imports for UniProtMetadataRecord, UniProtMetadataStreamPayload, parse_isoform. * _store_rows consumes UniProtMetadataRecord via attribute access (rec.accession, rec.raw_fields) instead of dict access. Field semantics preserved bit-for-bit: same FIELD_MAP application, same update_protein_core conservative-update logic. Behavioural diffs (same as the FASTA migration): * pages now counts DB-side buffer flushes (not HTTP pages). * X-Total-Results header capture removed. _progress_total flows only when total_limit is set. Dead-code cleanup: * protea/core/utils.py: deleted UniProtHttpClient (135 LOC) plus its _HttpPayload Protocol and the now-unused random/time/ requests/Response imports. The file shrinks to just chunks() + utcnow() helpers (~13 LOC). * tests/test_core.py: deleted TestUniProtHttpClient class (~75 LOC, 9 tests) and TestFetchUniProtMetadataExecute class (~290 LOC, 10 tests). The first migrated to protea-sources/tests/test_uniprot.py::TestUniProtRetryClient (5 tests, retries + Retry-After + max-retries + network errors) plus TestExtractNextCursor (4 tests). The second migrated partially to test_fetch_uniprot_metadata.py (which keeps the 14 execute-flow tests against the new plugin-based dispatch) and partially to protea-sources/tests/test_uniprot.py (TestParseMetadataTsv covers the parser directly). * tests/test_fetch_uniprot_metadata.py: deleted TestParseTsv class (4 tests, ~35 LOC) — parser now in protea-sources. 16 patch sites swapped from op._http_client.session.get to op._uniprot_plugin._client.session.get. Suite: PROTEA 1089 passed, 12 skipped (was 1112; -23 from deletions of the legacy class + parser/decode/total_results overlap with protea-sources). Ruff full + mypy strict green. Net diff PROTEA: -266 / +163 = -103 LOC across the operation, core/utils.py, test_core.py, and test_fetch_uniprot_metadata.py. This closes F2A.6-real: * GOA real-migrated (turn pre-25, commits 20987a5/d1d60f6/43da412). * QuickGO real-migrated (turn 27, c5433ed/f37dfce/42d4dd4). * D-MIGR-04 prereq (turn 29, 18e92af/434b14e). * UniProt FASTA real-migrated (turn 32, f1bf7b5/fadbd6b/56a6d87). * UniProt metadata real-migrated + UniProtHttpClient deleted (this turn, 09f3883/2a6ef55/<this>). protea-sources is now self-contained with respect to UniProt HTTP: the _http.py module owns the retry client; the plugin owns parsing and modality dispatch (FASTA vs metadata). PROTEA's only remaining involvement is persistence. Part of F2A.6-real migration plan, step 4 of 4. F2B (HTTP registry endpoints) is next on the autonomous queue once doc-lane gives it priority.

Adds three read-only HTTP endpoints listing plugins discovered via ``importlib.metadata.entry_points``, closing the F2B.1-3 block of master plan v3 in a single coherent router (the three endpoints share their lookup mechanism — splitting them across separate files would be artificial). Endpoints: * ``GET /backends`` — embedding backend plugins (``protea.backends`` group). Today: esm, t5, ankh, esm3c. * ``GET /sources`` — annotation source plugins (``protea.sources`` group). Today: goa, quickgo, uniprot. * ``GET /runners`` — experiment runner plugins (``protea.runners`` group). Today: baseline, knn, lightgbm. Response shape: ``` { "group": "protea.backends", "plugins": [ {"name": "esm", "cls": "EsmBackend", "module": "protea_backends.esm:plugin", "extra": {}}, ... ] } ``` The ``extra`` field carries plugin-class-specific metadata read from the loaded instance: today only sources expose ``version``, surfaced as ``extra.version`` (e.g. ``"uniprot-goa"``, ``"quickgo-rest"``). Backends and runners get an empty ``extra``; adding more probe-able metadata is a one-line change inside ``_discover``. Design choices: * No caching. The endpoint re-scans entry_points on every call so a worker that's just been restarted with a newly-installed extra surfaces in the next request without an API restart. The scan is sub-millisecond on the working set of ~10 plugins. * No authentication. These endpoints are public-read by design — they list installed software, not user data. * Plugin loading happens here. Loading the entry_point fires the plugin module's import side effects but should not raise for any first-party plugin (the bootstrapping pattern keeps top- level imports cheap). If a third-party plugin's load raises, the caller surfaces it as a 500 — fail loud beats silently hiding broken installs. * Fixed group whitelist (``_KNOWN_GROUPS``) prevents callers from probing arbitrary entry_points via the same code path. Files: * protea/api/routers/registry.py: new router (~140 LOC) with PluginInfo + PluginListResponse pydantic models, _discover + _list_for helpers, and the three endpoint functions. * protea/api/app.py: add registry_router to the import block and wire it via ``app.include_router(registry_router.router)``. * tests/test_registry_endpoints.py: 16 tests across four classes — TestBackendsEndpoint (5), TestSourcesEndpoint (5), TestRunnersEndpoint (4), TestResponseSchema (2). Tests run against the live entry_points discovery (the 10 plugins are real C-stack siblings installed via path-deps); no mocking. Suite: PROTEA 1105 passed, 12 skipped (was 1089; +16 new), ruff full + mypy strict green on the new files. This closes F2B autonomous-eligible work. F2B.4 (PredictGOTermsBatchOperation extract class) stays in the human-review queue because of reranker sensitivity. Part of F2B of master plan v3.

Adds a runnable submit-watch-result loop to PROTEA's README, satisfying the F7.1 acceptance criterion of master plan v3 ("5 minutes to first job") that the original README did not explicitly cover. The previous README documented Docker + From source bring-up but stopped at "scripts/manage.sh start" without showing the end-to-end machinery. The new section lives between Getting started and Documentation and shows three operations the user can run with curl + jq the moment the stack is up: 1. POST /jobs to enqueue a `ping` smoke-test operation, capturing the returned job id. 2. GET /jobs/{id}/events to tail the structured-event stream until the job reaches a terminal state. 3. GET /jobs/{id} to confirm the final status + result + any error code. Plus a sub-section showing the F2B plugin-discovery endpoints (GET /backends, /sources, /runners) that landed in turn 36 — the runtime catalogue the user can probe to see what models / sources / runners the running deployment ships. The example uses `ping` rather than a real ML operation so the quickstart doesn't depend on having sequence data loaded; the intent is to exercise the queue + worker + DB lifecycle end-to- end, which `ping` does in <1s. Real operations (insert_proteins, load_goa_annotations, compute_embeddings, predict_go_terms) are submitted the same way. PROTEA README size: 141 LOC → 187 LOC (+46 LOC). Suite + Sphinx build unchanged; this is doc-only. Part of Doc-T11 of the autonomous loop. Closes the README expansion sweep across the four C-stack repos plus PROTEA itself.

CI rescue: restore main to a green state after ~6 weeks of red The last green CI run on `main` was 2026-03-25 (PR #7). Between that and 2026-05-06, two latent breakages accumulated and were exposed when the F2 phase work landed via 7db0e0d..e9ae748: - `cafaeval-protea` declared as PEP 621 file:/// URL hardcoded to the original developer's machine (introduced 2026-04-21, commit ace4c4a). - Five sibling path-deps (`protea-{contracts,method,sources, runners,backends}`) added during the F2 plugin migration; their internal cross-deps to protea-contracts were also path-based. - `protea-reranker-lab` path-dep on a sibling that wasn't on GitHub at all. - Pre-existing pyproject.toml + workflow gaps: `--only dev` install scope in lint and docs jobs (skipped main deps), missing sphinxcontrib-bibtex declaration, accumulated tech debt across ruff / flake8 / mypy that hadn't been gated for ~6 weeks. What this PR does: 1. Replaces all path-deps with `git+https://github.com/frapercan/<repo>` URLs so CI runners can resolve them. The 5 C-stack siblings and protea-reranker-lab were pushed to GitHub as part of this work. Cross-sibling path-deps inside the siblings (e.g. protea-backends pointing at ../protea-contracts) were also converted; otherwise poetry's transitive resolution failed. 2. Fixes integration tests broken by the F2A.6-real migration (op._http_client references, dict→GoaAnnotationRecord fixture conversion, halfvec roundtrip tolerance). 3. Auto-fix + manual cleanup of 18 ruff errors, 10 flake8 spacing violations, and 15 mypy errors (mostly union-attr narrowing asserts and targeted type: ignore on legitimate runtime patterns mypy can't prove safe). 4. Fixes lint + docs workflows to use `poetry install --with dev` instead of `--only dev` so mypy / sphinx autodoc can resolve imports of pyarrow, sqlalchemy, fastapi, etc. 5. Declares sphinxcontrib-bibtex (was installed transitively in the local venv but missing from pyproject.toml). 6. Includes 8 ADR resolutions confirmed during the rescue session (D04 /v1/ versioning, D06 Authentik+oauth2-proxy, D07 Loki+Grafana+OTel, D10 schema_sha v2, D25 HPC mode B primary, D27 ghcr.io, D28 sops+age, D29 semantic-release). Local-dev trade-off: editable cross-sibling installs are lost. Devs who want hot-reload across siblings need to do `pip install -e ../<sibling>` after `poetry install`. CI verification on this PR: - lint (3.12, 2.1.0): pass (3m3s) - test (3.12, 2.1.0): pass (3m14s) - integration (3.12, 2.1.0): pass (4m11s) - docs (3.12, 2.1.0): pass (2m57s) - pip-audit, bandit, GitGuardian: pass - codecov informative-only (not in required checks) Local verification matched CI: 1105 unit passed, 1115 integration passed (with --with-postgres), ruff + flake8 + mypy clean, sphinx build succeeds with 5 pre-existing warnings. Includes the LAFA wrapper scaffolding (`apps/lafa_container/*`) that was sitting untracked in the working tree since the early F-LAFA exploration; kept because it has real value as the seed for a future functionbench.net submission.

codecov · 2026-05-06T12:24:53Z

Codecov Report

❌ Patch coverage is 67.87587% with 735 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.71%. Comparing base (c244d25) to head (ccecf8a).

Files with missing lines	Patch %	Lines
protea/core/feature_enricher.py	10.09%	276 Missing ⚠️
protea/core/operations/compute_embeddings.py	46.34%	88 Missing ⚠️
protea/core/operations/export_research_dataset.py	42.85%	80 Missing ⚠️
protea/core/evaluation.py	62.08%	69 Missing ⚠️
protea/core/disk_cache.py	53.84%	36 Missing ⚠️
protea/api/middleware/visitor_counter.py	56.41%	34 Missing ⚠️
protea/core/anc2vec_embeddings.py	42.42%	19 Missing ⚠️
protea/core/operations/generate_evaluation_set.py	66.66%	19 Missing ⚠️
protea/core/operations/load_goa_annotations.py	82.10%	17 Missing ⚠️
protea/api/routers/reranker_models.py	86.99%	16 Missing ⚠️
... and 14 more

Additional details and impacted files

@@             Coverage Diff             @@
##           develop       #9      +/-   ##
===========================================
- Coverage    82.07%   73.71%   -8.36%     
===========================================
  Files           63       91      +28     
  Lines         5959    10475    +4516     
===========================================
+ Hits          4891     7722    +2831     
- Misses        1068     2753    +1685

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

frapercan added 30 commits March 17, 2026 00:26

Merge pull request #6 from frapercan/develop

0b52969

v0.2.0: Scoring engine, CAFA evaluation framework and UI overhaul

Merge pull request #7 from frapercan/develop

cd433b8

Release v0.3.0

chore(mcp): remove unused protea_mcp module

7a480d6

Drop the MCP server scaffolding under protea_mcp/ — 21 files, ~1.1k LOC total. The module was never wired into any startup or deployment path and has no importers in protea/, scripts/, tests/, pyproject.toml, or docs.

frapercan and others added 28 commits May 5, 2026 16:35

frapercan merged commit 89c38f6 into develop May 6, 2026
15 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: develop ← main (post-CI rescue + F2A.6-real)#9

sync: develop ← main (post-CI rescue + F2A.6-real)#9
frapercan merged 75 commits intodevelopfrom
main

frapercan commented May 6, 2026

Uh oh!

codecov Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frapercan commented May 6, 2026

Summary

Test plan

Uh oh!

codecov Bot commented May 6, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant