Skip to content

sync: develop ← main (post-CI rescue + F2A.6-real)#9

Merged
frapercan merged 75 commits intodevelopfrom
main
May 6, 2026
Merged

sync: develop ← main (post-CI rescue + F2A.6-real)#9
frapercan merged 75 commits intodevelopfrom
main

Conversation

@frapercan
Copy link
Copy Markdown
Owner

Summary

Fast-forward sync of develop with main after the CI rescue (PR #8) and the F2A.6-real plugin migration.

Develop's tip (c244d25 — workflow bumps) is a strict ancestor of main, so this is a content-equivalent sync with zero unique commits on develop. No conflicts expected.

50+ commits land on develop:

  • F2A.6-real GOA / QuickGO / UniProt FASTA / UniProt metadata plugin migrations
  • F2B.1-3 plugin registry API endpoints
  • D-MIGR-04 ORM forward to bio_utils
  • Doc-T3/T7/T11 documentation refresh
  • LAFA wrapper scaffolding (apps/lafa_container/*)
  • 8 ADR resolutions (D04/D06/D07/D10/D25/D27/D28/D29)
  • The CI rescue squash itself (ccecf8a)

Test plan

  • CI lint (3.12) green
  • CI test (3.12) green
  • Visual scan of file diff vs main (should be empty since it's all of main)

v0.2.0: Scoring engine, CAFA evaluation framework and UI overhaul
…1-6)

Contract-first integration with protea-reranker-lab: PROTEA produces
parquet datasets + manifest via export operation, consumes trained
booster artifacts via RerankerModel + ArtifactStore. Zero runtime
cross-imports — only protea_reranker_lab.contracts (pydantic-pure)
is shared dev-time.

- Phase 3: ArtifactStore abstraction (LocalFs default, MinIO opt-in
  via docker compose profiles: ["storage"] + [storage] extra).
  storage config under protea/infrastructure/storage/; PROTEA_STORAGE_*
  env overrides; tests/test_storage.py.
- Phase 4: ExportResearchDatasetOperation + shared parquet_export
  utility refactored out of train_reranker; operation_catalog entry;
  routes via protea.jobs queue.
- Phase 5: RerankerModel nullable artifact columns (artifact_uri,
  feature_schema_sha, embedding_config_id FK, ontology_snapshot_id FK,
  producer_version, producer_git_sha, spec_yaml); alembic migration
  c517e16da06b with named constraints; scripts/register_reranker.py
  CLI for run-dir → ORM row promotion.
- Phase 6: predict_go_terms reranker integration — strict sha-equality
  validation at batch-worker level with reranker.schema_mismatch
  fallback (never crashes inference); reranking.py module
  (load_reranker, apply_reranker, infer_active_feature_families);
  8 new tests covering coordinator validation + batch fallback paths.
- Sphinx docs full pass + ADR-007-contract-first-lab-integration.
- Thesis LaTeX: new Reranker Promotion Pipeline section in
  implementation chapter, RerankerModel subsection in data model.
- Benchmark router + web UI pages (/benchmark, /experiments),
  Grafana visitor dashboard, CAFA evaluation pipeline updates,
  ablation tooling, embedding backend verification script.
Documents the upstream bug, the semantic fix applied in
cafaeval-protea (commit cec8ccd), the regression test that gates
against future regressions, and the operational steps required to
propagate the fix to running worker-evaluations processes. Also
records the impact on the 220→230 benchmark PK cells.
…ght 0.3→0.8

- PredictGOTermsPayload and its batch variant now default
  compute_alignments, compute_taxonomy and compute_reranker_features to
  True. Prevents future PredictionSets from silently missing features
  required by alignment-aware scoring configs.
- DEFAULT_EVIDENCE_WEIGHTS["IEA"] 0.3 → 0.8: GOA history shows IEA is
  promoted to experimental codes at a higher rate than ISS/IBA/NAS, so
  the classic "electronic = weakest" hierarchy underestimates its prior
  quality. Only affects scorings that consume evidence_weight.
- Docstrings refreshed (module, evidence_primary preset, /embeddings/predict).
- Fixed 5 TestPredictBatch tests that treated _predict_batch's tuple
  return as a flat list — pre-existing bug, surfaced while adapting tests
  to the new defaults.
…ine reranker-train endpoint

- embeddings.py: POST /embeddings/predict now publishes to protea.predictions
  (the dedicated queue already served by worker-predictions-coord) instead of
  protea.jobs. Brings the API in line with the running split-queue topology.
- scoring.py: remove POST /scoring/rerankers/train together with its helpers
  (_TrainingPair, RerankerTrainRequest) and the model_to_string / reranker_train
  imports. In-PROTEA training was decoupled into protea-reranker-lab; this
  finishes the cleanup so no endpoint claims capability that no longer exists.
…verage guard

- compute_score now recognises a sixth signal `neighbor_vote_fraction`
  (already persisted on every GOPrediction). Added to DEFAULT_WEIGHTS.
- _PRESET_CONFIGS redesigned so each preset tests a discrete hypothesis:
    * embedding_only            – cosine of the winning neighbour (baseline)
    * vote_fraction             – KNN consensus (new)
    * alignment_only            – NW+SW identity without embedding (new)
    * embedding_plus_alignment  – embedding refined with alignment
                                  (renamed from alignment_weighted)
    * embedding_plus_vote       – embedding + consensus (new)
    * evidence_veto             – evidence as a pure multiplier
                                  (replaces embedding_plus_evidence; fixes
                                  the double-count with evi=0 in weights
                                  and formula=evidence_weighted)
    * composite                 – linear kitchen-sink with voting, evidence
                                  removed from the linear sum
  Dropped `evidence_primary` — its dominant signal was the evidence_code of
  the single nearest neighbour, which we know is noisy; revisit when the
  reranker exposes voter-distribution features.
- ORM docstring for FORMULA_EVIDENCE_WEIGHTED now documents the recommended
  usage (set evidence_weight=0 in weights so the multiplier is applied once).
- New `_check_signal_coverage` helper in the router: before /score.tsv and
  /metrics stream anything, it queries fill rates of the columns backing
  every active signal; if a required column has 0 rows populated, returns
  409 with an actionable detail instead of silently degrading the score.
- /score.tsv now exposes a neighbor_vote_fraction column.
- Dropped TestTrainReranker tests — leftover after the prior commit that
  removed the inline POST /scoring/rerankers/train endpoint.
…ncher

Previously scripts/worker.py launched the stale-job reaper with
timeout_seconds=21600 (6h). With a single protea.predictions.batch
worker and 23 predict_go_terms coords dispatched in a batch, the later
coords routinely waited past 6h in the FIFO and were killed by the
reaper even though the pipeline was making progress upstream. Raising
the hard timeout to 24h (still paired with the 30-min stall window)
leaves enough headroom for a full PLM×K cross-grid to drain while
keeping the stall-based liveness check in place.

Also adds the cross-scoring launcher used for the 8-PLM × 3-K × 7
scoring-config benchmark. Dedupes dispatch by (prediction_set_id,
scoring_config_id) against both evaluation_result rows AND
queued/running run_cafa_evaluation jobs — the earlier launcher only
compared against persisted results and filled the queue with up to
20× duplicates per pair.
…YAML

- /benchmark/matrix now reads PredictionSet.limit_per_entry and includes
  it in the per-row key, so the same (embedding, stage, cell) tuple no
  longer collapses K=3/5/10 into one row. Response exposes the full K
  catalog via `ks: [3, 5, 10]` and accepts `?k=N` as filter.
- Frontend types gain `k`/`ks`; BenchmarkPage ships a K selector (chips
  next to the stage picker) defaulting to the first available K.
- benchmark.yaml refreshed for the redesigned scoring presets:
  preferred_default now starts with embedding_only (widest coverage on
  partial runs), and labels cover the seven current presets (dropped
  labels for removed `evidence_primary`, `embedding_plus_evidence`,
  `alignment_weighted`).
_write_predictions() built the pred_dict passed to compute_score() with
distance, identity_nw/sw, evidence_code and taxonomic_distance only.
The newly-added neighbor_vote_fraction signal (scoring preset
`vote_fraction`, half of `embedding_plus_vote`, 20 % of `composite`)
was silently dropped — compute_score saw value=None and excluded it
from numerator and denominator. vote_fraction therefore produced zero
scores across every (protein, go_id) row, cafaeval returned empty
NK/LK/PK buckets, and the three affected presets showed no cells in
the benchmark UI.

Add `pred.neighbor_vote_fraction` to pred_dict so the scoring engine
sees the signal it already has a weight for.
Drop the MCP server scaffolding under protea_mcp/ — 21 files, ~1.1k LOC
total. The module was never wired into any startup or deployment path
and has no importers in protea/, scripts/, tests/, pyproject.toml, or
docs.
…decoupled training

LightGBM training moves out of PROTEA into the standalone
protea-reranker-lab repo. PROTEA now owns the KNN + feature pipeline,
dataset publishing, and inference; the lab owns training and evaluation.

Schema
* Add Dataset model (alembic c7bab0210568): immutable record of a
  published reranker dataset with train_uri / eval_uri / manifest_uri,
  content fingerprints (schema_sha, manifest_sha), dump parameters and
  producer provenance (producer_version, producer_git_sha).
* Link RerankerModel to Dataset via dataset_id (alembic e037f3ae9f58)
  and add artifact_uri / external_source / spec_yaml / feature_importance
  columns so boosters trained in the lab can be registered by reference.

Storage abstraction
* protea/infrastructure/storage/ becomes an ArtifactStore interface with
  LocalArtifactStore (file://) and MinioArtifactStore (s3://) backends,
  selected by PROTEA_STORAGE_BACKEND. get_artifact_store() is the single
  factory used by export_research_dataset and /reranker-models/import.

Operations
* TrainRerankerOperation / TrainRerankerAutoOperation are unregistered
  in operation_catalog.py — they survive only as in-process helpers
  that ExportResearchDatasetOperation drives in dump_only mode.
* export_research_dataset uploads train.parquet / eval.parquet /
  manifest.json via the configured ArtifactStore and inserts a Dataset
  row keyed by output_name.
* protea.core.reranker exposes prepare_dataset / predict /
  model_from_string for inference; schema_sha validation is load-bearing.

HTTP surface
* New routers: /datasets (registry + import) and /reranker-models
  (multipart import, import-by-reference, list/get).
* app.py wires both routers + ArtifactStore startup.

Scripts
* dump_reranker_dataset.py adapted to the new pipeline.
* materialize_lab_intervals.py: new helper that creates EvaluationSet
  + QuerySet rows for every snapshot pair the lab needs.

Tests
* test_storage covers both backends.
* test_datasets_and_reranker_import_smoke exercises the end-to-end
  publish → pull → import-by-reference flow.
* test_reranker / test_train_reranker trimmed to inference-only paths.
EvaluationSet gains a groundtruth_uri column (alembic 76cafcb8d9be) that
points to a frozen parquet of the snapshotted ground-truth annotations,
materialised once and reused across run_cafa_evaluation invocations
instead of being recomputed from the live ORM each time. The parquet
lives in the configured artifact store and is content-addressed.

* generate_evaluation_set writes the parquet on creation; existing
  rows are backfilled by scripts/backfill_evaluation_groundtruth.py.
* load_evaluation_data_for_set reads the parquet via the artifact store
  when groundtruth_uri is set, falling back to the legacy ORM path.
* run_cafa_evaluation consumes the parquet directly, eliminating the
  per-run BFS over GOTermRelationship for ground truth resolution.
* load_goa_annotations adds canonical-accession filtering improvements
  measured against the lab's intervals.
* annotations router exposes the groundtruth-related endpoints and
  switches to BenchmarkConfig dependency for IA file resolution.

Worker layout
* Add a dedicated worker-evaluations process on protea.evaluations so
  long Fmax / AuPRC runs don't block the general protea.jobs queue.
* Add the missing worker-predictions-coord process (protea.predictions
  coordinator was previously not started by manage.sh).

Tests
* test_load_goa_annotations covers the canonical-accession filtering.
* test_evaluation_parquet_roundtrip locks the serialise/deserialise
  contract for ground-truth data.
* test_knn_streaming_smoke exercises the streaming KNN path that
  feeds reranker datasets.

scripts/overnight_matrix.py: new orchestrator for the 8-PLM canonical
benchmark (bootstrap + predict ×8 + eval ×8) — drives long unattended
runs against the lab's interval matrix.
Empty Python packages with zero importers anywhere in the tree:
* protea/cli/ + protea/cli/commands/ (CLI scaffolding never built)
* protea/utils/
* protea/api/schemas/
* protea/api/services/

Orphan scripts (no references in code, docs, scripts, or pyproject.toml):
* Phase A/B ablation cohort: cross_scoring_launcher,
  feed_evals_phaseA, hybrid_picker_eval, queue_evals_when_ready,
  query_eval_results, run_ablation_predictions, run_ablation_evaluations
* vast.ai deploy/sync: deploy_vast.sh, setup_vast.sh, sync_db_vast.sh
* Profilers / verifiers: profile_predict_batch, verify_embedding_backends
* Stale overnight runs: overnight_v6, overnight_v6_retry,
  overnight_v7, overnight_v8

Misc:
* Drop unused _JOBS_QUEUE constant in embeddings.py.
* .gitignore: exclude data/ (with allow-list for data/benchmarks/),
  artifacts/, results/, var/, docs/_build/, and the entire logs/ dir
  (was only excluding *.log + logs/pids/, leaving *.pid files visible).
Queue routing fixes (operations.rst, system_overview.rst):
* predict_go_terms publishes to protea.predictions, not protea.jobs.
* run_cafa_evaluation publishes to protea.evaluations.
* train_reranker / train_reranker_auto are no longer queued — they
  survive only as unregistered helpers consumed in-process by
  ExportResearchDatasetOperation in dump_only mode.
* export_research_dataset publishes to protea.training (serialised).

Lab decoupling narrative (introduction.rst, system_overview.rst,
core.rst, configuration.rst):
* Document the contract-first split — PROTEA produces the parquet
  triple + manifest; the lab consumes via the artifact store and
  registers boosters back through /reranker-models/import.
* Drop the inline LightGBM training section; replace with the export
  + import flow.

Reference fixes:
* api.rst:282 — /annotations/snapshots/{id}/ia-url is the Information
  Accretion file URL (used to weight CAFA Fmax / AuPRC), not the
  InterPro Archive.
The lab refactor introduced four behavioural changes that the existing
test suite did not yet reflect — every router and operation that touches
artifact storage, ScoringConfig joins, or persisted EmbeddingConfig
display columns produced collateral failures (54 of 1037).

Root causes addressed:

1. Process-global TTL cache in protea.api.cache
   List endpoints (configs, snapshots, prediction sets, protein stats,
   showcase summary) memoise their result for 5 minutes. Tests sharing
   the cache key returned stale data from earlier sibling tests. Added
   an autouse fixture that calls ``invalidate()`` before/after every
   test in the affected modules.

2. ArtifactStore now reached for real
   run_cafa_evaluation, generate_evaluation_set, and the annotations
   router DELETE/download endpoints all call get_artifact_store(...)
   inline, which tries to reach MinIO at localhost:9000 in test
   environments. Added autouse / per-class fixtures that patch
   get_artifact_store + load_settings to a MagicMock so unit tests
   exercise just the orchestration path.

3. Endpoint signatures evolved:
   - benchmark.list returns a 5th column (prediction_count via
     correlated subquery) — added the column to the production query
     and a ``prediction_count`` field in the response so the frontend
     UI keeps working (apps/web reads p.prediction_count in 4 places).
   - benchmark.matrix select returns 4-tuples
     ``(er, embedding_id, k, scoring_name)`` and a separate 7-tuple
     query for eval-set metadata — tests now wire both via
     ``_dual_execute``.
   - showcase.get returns 3-tuples ``(er, cfg, scoring_name)`` —
     tests updated to match.
   - _stage_of(result, scoring_name) — second positional arg is now
     required, "baseline" stage no longer exists (returns None for
     evaluations with neither scoring nor reranker).
   - _describe_embedding heuristics deleted — display_name, family,
     param_count are persisted EmbeddingConfig columns. The dead
     TestDescribeEmbedding class was removed; remaining tests set the
     columns explicitly on _make_cfg fixtures.

4. RunCafaEvaluationPayload no longer carries artifacts_dir
   The op now stages artifacts in a tempfile.TemporaryDirectory and
   uploads via store.put — replaced the two artifacts_dir tests with
   one that asserts the store is consulted.

Test fixtures updated:
* test_api / _make_app: added missing ``operation_registry`` state.
* test_annotations_router / test_benchmark_router / _make_app: added
  missing ``benchmark_config`` state.
* test_api_query_sets: corrected
  ``test_preserves_full_description`` — _parse_fasta unwraps
  ``sp|P12345|FOO_HUMAN`` to ``P12345`` (preserves the full header in
  the description, not the accession).

Production change (small): ``embeddings.list_prediction_sets`` regained
the prediction_count correlated subquery so the frontend prediction-set
list shows ``"<n> preds"`` again. This restores parity with the lab
refactor's intent — the test mock always expected this column.

Result: 1037 passed, 10 skipped, 0 failed.
Two modules with near-identical names lived next to each other:

* protea.core.reranker  — feature schema + predict / model_from_string
* protea.core.reranking — booster cache + load_reranker / apply_reranker

reranking.py had a single importer (predict_go_terms.py). Inlining its
public surface into reranker.py removes the naming trap (grep "reranker"
no longer turns up two unrelated files) and consolidates everything the
inference path needs into one place.

* load_reranker, apply_reranker, infer_active_feature_families and the
  process-local _BOOSTER_CACHE move verbatim into reranker.py.
* predict_go_terms.py now imports both feature and loader helpers from
  the merged module.
* reranking.py is removed.

No behaviour change. 1037/1037 tests still pass.
TrainRerankerOperation was unregistered after the lab decoupling and
contained zero ``self`` references — its 10 methods were already pure
functions wrapped in a class only so TrainRerankerAutoOperation could
share helpers. Removing the wrapper drops ~250 LOC of dead-but-loaded
code (file: 2009 → 1757 LOC).

Helpers actually used by TrainRerankerAutoOperation (now module-level
functions in the same file):
* ``_load_parent_map``
* ``_preload_all_embeddings``
* ``_build_reference_from_cache``
* ``_load_sequences``
* ``_load_taxonomy_ids``
* ``_knn_transfer_and_label``

Helpers with no production callers (deleted):
* ``_validate``               — payload + name/duplicate checks
* ``_load_go_maps``           — covered by ``_load_parent_map``
* ``_load_reference_per_aspect`` — alternative to _build_reference_from_cache
* ``_load_query_embeddings``  — Auto loads its own query embeddings
* ``summarize_payload``       — never called once the op left the registry

Inside TrainRerankerAutoOperation.execute, every
``self._single._foo(...)`` is rewritten to ``_foo(...)`` and the
``_single`` attribute is gone.

Tests adapted:
* test_train_reranker.py — kept the payload-validation tests and the
  two helpers that survived (_load_sequences / _load_taxonomy_ids);
  dropped the test classes for deleted helpers.
* test_knn_streaming_smoke.py — calls _knn_transfer_and_label
  directly instead of through op._knn_transfer_and_label.

Suite stays green at 1028 / 1028.
predict_go_terms.py was a 2383 LOC god file mixing three concerns: I/O
caching of reference embeddings, PCA artifact persistence, and the actual
predict / batch / store operation classes. The first two have nothing
to do with the operations themselves — they are reusable persistence
helpers that happen to be called from the predict path.

* protea/core/disk_cache.py — reference-pool + per-aspect index +
  annotation-CSR caches under data/ref_cache/. Exports
  ``_disk_cache_paths``, ``_aspect_index_path``,
  ``_anno_disk_cache_paths``, ``_build_anno_csr``,
  ``_load_anno_csr_from_disk``, ``_save_anno_csr_to_disk``,
  ``_csr_lookup``, ``_derive_reference_views``,
  ``_load_from_disk_cache``, ``_save_to_disk_cache``.

* protea/core/pca_cache.py — per-EmbeddingConfig PCA artifact under
  protea/artifacts/pca/{id}.npz. Exports ``_pca_state_path``,
  ``_load_pca_state``, ``_save_pca_state``, ``_load_or_fit_pca_state``.

* predict_go_terms.py imports the helpers from their new homes; no
  call sites change name or signature.

* test_predict_go_terms.py imports the disk-cache helpers directly
  from ``protea.core.disk_cache`` rather than re-exported via
  ``predict_go_terms``.

predict_go_terms.py shrinks 2383 → 2129 LOC (~10% smaller, the
remaining bulk is the three operation classes which the next step
splits separately). Suite stays green at 1028 / 1028.
…er_and_label

The function used to take 20 keyword arguments — a textbook Long
Parameter List smell that made every call site span ~15 lines and made
parameter changes risky (refactoring.guru: Bloater · Long Parameter
List → Introduce Parameter Object).

Two natural clusters extracted as small frozen dataclasses:

* ``SequenceContext`` — the four optional per-protein lookups
  (``query_sequences``, ``ref_sequences``, ``query_tax_ids``,
  ``ref_tax_ids``) used to drive alignment and taxonomy features.
* ``StreamOutput`` — the streaming-parquet I/O config
  (``output_parquet``, ``chunk_rows``) used in dump_only mode to keep
  peak memory bounded.

``pivot_go_ids`` stays as its own kwarg — it filters records by
go_id and is orthogonal to whether streaming is enabled (used by
``test_pivot_filter_drops_non_pivot_terms`` in non-streaming mode).

Result: signature shrinks 20 → 15 named parameters; the body is
unchanged (the dataclasses are unpacked into the same local names at
the top of the function). All three call sites updated.

Suite stays green at 1028 / 1028.
stage classification logic was duplicated in two routers:

* benchmark.py defined ``_RERANKER_STAGE``, ``_stage_of()`` and
  ``_stage_kind()``.
* showcase.py inlined the same conditional with a comment "Matches
  benchmark.py semantics without cross-importing" — an explicit
  acknowledgement of the duplication.

That is the textbook Dispensable · Duplicate Code smell
(refactoring.guru → Extract Method, then Move Method to a shared home).

Created ``protea/api/stages.py`` exporting
``RERANKER_STAGE`` / ``stage_of`` / ``stage_kind`` / ``StageKind``.

* benchmark.py re-imports them under the legacy private aliases
  (``_RERANKER_STAGE``, ``_stage_of``, ``_stage_kind``) so the rest
  of the file is unchanged.
* showcase.py replaces the 6-line ``if/elif/else`` ladder with a
  single ``stage = stage_of(er, scoring_name)`` call.

No behaviour change. 1028 / 1028 still green.
GO aspect strings appeared as bare literals in 30+ places — both as
single-char codes ("P"/"F"/"C", the wire format in PostgreSQL and
go-basic.obo) and as three-char CAFA codes ("BPO"/"MFO"/"CCO", what
cafaeval and the UI use). That is the textbook Bloater · Primitive
Obsession smell (refactoring.guru → Replace Type Code with Class).

This commit lands the new domain type; consumers migrate in follow-up
commits to keep each diff focused and reviewable.

* protea/core/domain/aspect.py — ``Aspect`` enum with
  ``BIOLOGICAL_PROCESS`` / ``MOLECULAR_FUNCTION`` / ``CELLULAR_COMPONENT``
  members and ``.code`` / ``.cafa`` properties for the two encodings;
  ``Aspect.from_code()`` / ``from_cafa()`` build instances from the
  legacy strings at the boundary.
* protea/core/domain/__init__.py — package marker, deliberately
  free of infrastructure imports so the module can be imported from
  anywhere in core/ without cycles.
* ASPECT_CODES / ASPECT_CAFA_CODES module constants for callers that
  iterate via tuple destructuring.
* tests/test_aspect.py — 21 cases covering the two encodings,
  parameterised over all three aspects, with explicit roundtrip and
  invalid-input assertions.

Suite: 1028 → 1049 (+21 new).
Six modules used to hardcode the GO aspect tuple ("P", "F", "C") or its
CAFA equivalent ("BPO", "MFO", "CCO") inline. They now import the
canonical constants from ``protea.core.domain.aspect`` so a future
addition or rename happens in one place.

* predict_go_terms.py + train_reranker.py — use ASPECT_CODES
  (single-char wire codes).
* showcase.py + benchmark_config.py + annotations.py (download path) —
  use ASPECT_CAFA_CODES (three-char CAFA codes).
* run_cafa_evaluation.py — _NS_LABELS now derives from the enum's
  ``.cafa`` property; _NS_SHORT is the set view of ASPECT_CAFA_CODES.
* train_reranker._ASPECT_NAMES — built as a comprehension over the
  enum so the lower-cased CAFA suffixes (bpo/mfo/cco) used in model
  names stay in sync with the canonical encodings.

No behaviour change. Suite stays at 1049 / 1049.
The 200-LOC method computed five independent intermediates
(metadata collection, Anc2Vec pool, neighbor centroids, tax-voter
counters, PCA projection) and then merged them per-row. Each
intermediate is now its own private static method so the orchestrator
reads as a six-line pipeline instead of a wall of inline computation
(refactoring.guru: Bloater · Long Method → Extract Method).

New private static methods on PredictGOTermsBatchOperation:

* ``_collect_gtids_in_play`` — gather every go_term_id seen as
  candidate or neighbor annotation.
* ``_build_anc2vec_pool`` — materialise the Anc2Vec embedding matrix
  + has-emb mask + index.
* ``_compute_neighbor_centroids`` — per-(q_acc, aspect) centroid + nmat.
* ``_compute_tax_voter_counters`` — five per-(q_acc, gtid) dicts that
  feed the tax_voters_* columns.
* ``_compute_pca_projection`` — query embeddings × PCA components.

The merge loop stays inline (it has 16 dependencies; extracting it
would just trade a long method for an even longer parameter list).

No behaviour change — the new helpers are pure rearrangement of the
original code. 1049 / 1049 still green.
The 258-LOC method ran the per-aspect KNN, loaded feature-engineering
inputs, pre-computed reranker stats, and merged everything into a
prediction list. The first two phases are clean independent units;
extracting them as helpers turns the orchestrator's prologue from
50 LOC of inline detail into two readable calls
(refactoring.guru: Bloater · Long Method → Extract Method).

New private methods on PredictGOTermsBatchOperation:

* ``_run_knn_per_aspect`` — three independent KNN searches, returns
  ``(neighbors_by_aspect, all_unique_neighbors)``.
* ``_load_feature_engineering_data`` — loads sequences and taxonomy
  IDs only for the flags that are on; returns the four lookup dicts
  the per-pair feature builder consumes downstream.

The remaining merge phase (per-aspect predictions emission with shared
reranker aggregates) stays inline — its 130 LOC carry too many
mutually-aliased dicts (vote_count, k_position, vote_min_d, vote_sum_d,
go_term_freq, ref_ann_density, pair_features, seen_per_query) for
extraction to do anything but trade a long method for an even longer
parameter list.

No behaviour change. 1049 / 1049 still green.
…odule

The seven v6-feature methods on ``PredictGOTermsBatchOperation``
(``_enrich_with_v6_features``, ``_load_go_term_metadata``,
``_collect_gtids_in_play``, ``_build_anc2vec_pool``,
``_compute_neighbor_centroids``, ``_compute_tax_voter_counters``,
``_compute_pca_projection``) used no instance state, ran in sequence,
and were called from a single site in ``execute``. That is the
textbook Bloater · Large Class smell — they are cohesive enough to
become their own module (refactoring.guru → Extract Class).

* protea/core/feature_enricher.py — new module exposing two public
  symbols: ``enrich_v6_features`` (the orchestrator) and
  ``NEW_V6_FEATURE_KEYS`` (the 25 column names downstream code
  composes into the bulk-insert schema). All six stage helpers stay
  module-private.

* predict_go_terms.py — drops the seven methods and the two
  v6-related constants (``_TAX_CLOSE_RELATIONS`` and
  ``_NEW_V6_FEATURE_KEYS``); imports ``enrich_v6_features`` and
  re-imports ``NEW_V6_FEATURE_KEYS`` under the legacy private alias
  so ``_STORE_FLOAT_KEYS`` keeps working without further edits.

predict_go_terms.py shrinks 2235 → 1928 LOC (~14% smaller) and
``PredictGOTermsBatchOperation`` is down to 18 attributes. The new
module is independently testable — extending the v6 feature set in
the future no longer means surgery on the batch operation.

No behaviour change. 1049 / 1049 still green.
Pre-existing warnings outside the scope of any specific refactor are
fixed in one mechanical pass so the next contributor starts from a
clean slate (refactoring.guru: Dispensable · Comments / Dead Code +
Composing Methods · Replace Magic Number with Symbolic Constant
adjacent — all auto-fixable hygiene).

* pyproject.toml — add ``[tool.ruff.lint.per-file-ignores]`` to
  silence E402 for ``scripts/*.py``: every runner script in there
  uses the deliberate ``sys.path.insert(0, PROJECT_ROOT)`` pattern
  before its protea imports so it can be invoked as
  ``python scripts/foo.py`` without ``poetry install`` first.

* protea/api/cache.py — ``Callable`` now imported from
  ``collections.abc`` (UP035).

* protea/api/middleware/visitor_counter.py — replaces
  ``timezone.utc`` with the ``datetime.UTC`` alias (UP017).

* protea/api/routers/scoring.py +
  protea/infrastructure/orm/models/visitor_event.py +
  protea/infrastructure/storage/__init__.py — re-sorted import
  blocks (I001).

* protea/api/middleware/visitor_counter.py — drop unused f-string
  prefix (F541).

* tests/test_predict_go_terms.py — drop unused
  ``op = self._op()`` assignment in ``test_no_reranker_leaves_dicts_untouched``
  (F841); the test only needs the payload check.

* scripts/overnight_matrix.py — ruff also auto-cleaned a couple of
  redundant ``else`` branches and import order while we were here.

Result: ``poetry run ruff check protea tests scripts`` is now
green. 1049 / 1049 tests still pass.
The module was folded into ``protea.core.reranker`` in commit
``ccf8c96``; the dangling ``automodule:: protea.core.reranking`` block
made every ``make -C docs html`` build emit a stale-import warning.

The narrative the old block carried (``load_reranker`` /
``apply_reranker`` / ``infer_active_feature_families`` semantics) is
preserved as a note pointing readers at the merged module, so the
documentation page is still self-explanatory for new contributors.

Sphinx ``warning`` count drops from 5 to 4. The four remaining are
environment-level (numpy multiarray import races with autodoc when
Torch is loaded first); they don't block the build.
Hot loops in predict / train build per-row dicts straight from a
SQLAlchemy cursor; the DB driver returns each ``qualifier`` and
``evidence_code`` as a fresh Python string even though the value
space is tiny (~5-10 distinct GO evidence codes plus ``None``).
Without interning, every duplicate costs ~50 B in CPython, so a
5 M-row batch carries ~500 MB of redundant string objects.

This is the textbook Flyweight pattern (refactoring.guru): share
the immutable intrinsic state across many context objects via a
small process-local factory.

* ``protea/core/annotation_intern.py`` (new) — exposes
  ``intern_string`` backed by a setdefault-based pool. The pool
  tops out at the cardinality of the GO evidence vocabulary
  (~50 strings in practice); no LRU eviction needed.

* ``predict_go_terms._load_annotations_for`` and
  ``_load_reference_data_per_aspect`` — wrap qualifier and
  evidence_code with intern_string before stuffing them into the
  per-row dict.

* ``train_reranker._load_reference_per_aspect`` — same wrap.

* ``tests/test_annotation_intern.py`` — 8 cases.

Suite: 1057 / 1057 (was 1049, +8 new tests).
…OUP BY

The Phase-2 commit (f33ad15) added a per-row correlated subquery to
return ``prediction_count`` alongside each PredictionSet. Postgres'
planner reliably falls into a per-row index probe on the 25M-row
``go_prediction`` table, turning a 100-row list endpoint into a
~30 s/row sequential count — the /evaluation page timed out at 60 s
because it called this listing on first paint.

Fix: pre-fetch the counts in one ``GROUP BY prediction_set_id`` query
(scans the existing ``ix_go_prediction_prediction_set_id`` index in a
single pass) and merge them into the response in Python, mirroring the
``list_embedding_configs`` pattern. Wrap the whole thing in the same
5-minute ``cached`` helper so subsequent calls are free.

Measured before / after:

* /embeddings/prediction-sets   60 000 ms (timeout)  →  19 ms cold,
                                                       10 ms warm

Tests:

* test_embeddings_router updated — ``_wire_list_query`` now wires
  ``session.query.side_effect = [list_query, count_query]`` so the
  two-query mock matches the new endpoint shape. The third test
  (``test_annotation_set_without_version``) drops its count assertion
  because ``count_pairs`` is empty for that case (defaults to 0).

Suite stays at 1057 / 1057.
frapercan and others added 28 commits May 5, 2026 16:35
The two store_X operations had identical 30-line implementations
of _update_parent_progress (compute_embeddings.py and
predict_go_terms.py), differing only in the operation-specific
event name passed to emit on parent SUCCEEDED transition.

Extracts to protea.core.contracts.parent_progress.update_parent_progress(
session, parent_job_id, emit, *, event_name). Both operations now
delegate. The DB-level JobEvent row is uniformly named "job.succeeded";
the operation-specific event name only flows through emit() so
downstream observers can distinguish which store closed the parent.

5 new tests cover: silent when no row, silent when not last batch,
SUCCEEDED transition + emit, race when succeeded returns nothing,
event_name passthrough.

Suite: 1093 passed, 10 skipped (was 1088 + 5).

Part of F0 T0.7 of master plan v3.
Replaces the inheritance-based UniProtHttpMixin (109 LOC of mixed
state and behaviour) with UniProtHttpClient, a composable class.
Operations hold a client instance via composition rather than
inheritance:

  before: class InsertProteinsOperation(UniProtHttpMixin, Operation)
  after:  class InsertProteinsOperation(Operation):
              def __init__(self):
                  self._http_client = UniProtHttpClient()

State is private to the client (.session, .requests, .retries) and
is reset via .reset() at the start of each execute(). The
extract_next_cursor utility moves to a @staticmethod since it has
no instance state. Operations call:

  self._http_client.get_with_retries(url, p, emit)
  self._http_client.extract_next_cursor(link_header)
  self._http_client.requests / .retries  (for emit fields)

Migrates two operations (insert_proteins, fetch_uniprot_metadata)
and three test files (test_core, test_insert_proteins,
test_fetch_uniprot_metadata) accordingly. test_core renames the
test class to TestUniProtHttpClient and adds test_reset_clears_counters.

Suite: 1094 passed, 10 skipped (was 1093 + 1).

Part of F0 T0.9 of master plan v3.
Inventario sistemático de constantes módulo-level y defaults hardcodeados en payloads y workers. 31 entradas categorizadas en 5 grupos (QueueTuning, WorkerTuning, OperationTuning, APILimits, ResearchKnobs) más 12 estructurales config-exempt (GAF indices, payload shape constraints, PCA dim). Detecta duplicación que la externalizacion dedupica por construcción: _ANNOTATION_CHUNK_SIZE x3, _STREAM_CHUNK_SIZE x2, _MAX_FASTA_BYTES x2.

Base directa para T-CONF.2 (externalización a pydantic Settings) y T-CONF.3 (doc viva autogenerada).

Part of F0 T-CONF.1 of master plan v3.
Validates a running stack via /health, /health/ready, POST /jobs (ping), poll until succeeded, and the events log. Does not start or stop services (per feedback_no_restart.md). Exits in <2s against a healthy local stack and is dimensioned for CI use too (PROTEA_API_URL + PROTEA_SMOKE_TIMEOUT_S env overrides).

Validated against the live local stack: 1/5 -> 5/5 OK.

Part of F0 T0.5 of master plan v3.
Introduces protea.config.tuning with:
  - QueueTuning pydantic model: publisher_max_attempts,
    publisher_base_delay, oom_max_retries, oom_base_delay,
    oom_max_delay. Defaults match the previous module-level
    constants exactly.
  - TuningSettings root model that composes per-category
    sub-models (more groups land in follow-up turns).
  - get_tuning() loader cached via lru_cache. Hierarchy:
    defaults < protea/config/system.yaml (tuning: section) <
    env vars PROTEA_TUNING__QUEUE__PUBLISHER_MAX_ATTEMPTS=20.
  - 19 new tests covering defaults, validation, env coercion,
    yaml override, env-overrides-yaml, missing yaml section.

Migrates the 5 RabbitMQ publisher/consumer constants to read
from QueueTuning at call time:
  - publisher.py: _MAX_ATTEMPTS, _BASE_DELAY removed.
  - consumer.py: _OOM_MAX_RETRIES, _OOM_BASE_DELAY,
    _OOM_MAX_DELAY removed; replaced by qsettings reads
    inside the OOM-handler branch.

Tests: existing publisher and consumer tests pass unchanged
since defaults match prior values. test_queue.py:524 updated
to read from get_tuning() instead of the removed constant.

Suite: 1113 passed, 10 skipped (was 1094 + 19).

Skeleton for the categorisation in docs/CONFIG_INVENTORY.md.
Remaining 4 categories (WorkerTuning, OperationTuning,
APILimits, ResearchKnobs) follow the same pattern in
subsequent T-CONF.2 increments.

Part of F0 T-CONF.2 of master plan v3.
…pers

Renames protea/core/operations/train_reranker.py to
protea/core/training_dump_helpers.py and removes every literal
"train_reranker" snake-case reference from the protea/ subtree.
The helpers (TrainRerankerAutoOperation, TrainRerankerAutoPayload,
StreamOutput, _knn_transfer_and_label, _load_sequences,
_load_taxonomy_ids, _build_reference_from_cache, _preload_all_embeddings,
_load_parent_map, TrainRerankerPayload, SequenceContext) keep their
CamelCase names so existing call sites in tests and
ExportResearchDatasetOperation continue to work via the new path.

Updates:
  - module docstring: removes the "two operations" framing (both were
    unregistered) and explains the helper's surviving role.
  - event strings rebranded train_reranker_auto.* -> dump_helper.*.
  - export_research_dataset.py relay updated accordingly so consumers
    keep seeing export_research_dataset.* events on the wire.
  - constant ``name = "research_dataset_dump_helper"`` (was
    "train_reranker_auto"); the class remains unregistered.
  - comments in feature_enricher.py, parquet_export.py,
    generate_evaluation_set.py, predict_go_terms.py and
    scripts/materialize_lab_intervals.py: rephrased to "the dump helper".
  - tests/test_train_reranker.py renamed to test_training_dump_helpers.py
    (+ import path updated). Same for test_knn_streaming_smoke.py
    imports + mock target.
  - test_datasets_and_reranker_import_smoke.py asserts the new name
    is also unregistered; the historical asserts on the old names
    are gone since "train_reranker" no longer exists in the codebase.

AC verification: ``grep -rn "train_reranker" protea/`` returns 0
hits. The same grep over the whole repo (including tests/, scripts/,
docs/) is also 0 except for one-line *.md docs that document the
historical rename and stay as-is on purpose.

Suite: 1113 passed, 10 skipped (unchanged).

Part of F0 T0.6 of master plan v3.
…rPayload

Continues T0.6 (commit 527e51c) by removing TrainRerankerPayload, the
single-pair payload class that no production code referenced. Used
to live in train_reranker.py for an Operation that was retired when
LightGBM training moved to protea-reranker-lab; the class hung on
because tests in test_training_dump_helpers.py exercised it.

  - Class definition deleted.
  - Helper signature ``_knn_transfer_and_label`` simplified from
    ``p: TrainRerankerPayload | TrainRerankerAutoPayload`` to
    ``p: TrainRerankerAutoPayload``.
  - Cross-reference comments inlined directly into TrainRerankerAuto
    Payload field docstrings (KNN backend rationale, ancestor
    expansion rules, embedding PCA explanation).
  - 15 tests in TestTrainRerankerPayload removed; only the
    helper tests (_load_sequences, _load_taxonomy_ids) remain.
  - Header docstring trimmed to reflect new scope.

LOC reduction: training_dump_helpers.py from 1914 to 1860 LOC.
Suite: 1098 passed, 10 skipped (was 1113 - 15 dead payload tests).

The deeper inline of TrainRerankerAutoOperation into
ExportResearchDatasetOperation is deferred to F2 once the feature
registry is in place; doing it now would balloon export_research_
dataset by 600 LOC of execute() body for marginal gain.

Part of F0 T0.6 of master plan v3.
Second category of the externalised tuning settings. Migrates 7
hardcoded constants from the WorkerTuning row in CONFIG_INVENTORY:

  - db_pool_size (engine.py:12)            20
  - db_pool_max_overflow (engine.py:13)    40
  - db_pool_recycle_seconds (engine.py:14) 3600
  - model_cache_max (compute_embeddings)   1
  - ref_cache_max (predict_go_terms)       1
  - reaper_main_timeout_seconds (worker)   86400 (was incorrectly
    21600 in the inventory; fixed to match scripts/worker.py)
  - reaper_default_timeout_seconds         3600
  - reaper_stall_seconds                   1800
  - api_cache_default_ttl_seconds          300.0

Behavioural:
  - infrastructure/database/engine.py: build_engine() reads pool
    settings from get_tuning().worker.
  - core/operations/compute_embeddings.py: removes _MODEL_CACHE_MAX
    constant; reads dynamically inside _get_or_load_model.
  - core/operations/predict_go_terms.py: removes _REF_CACHE_MAX
    constant; reads dynamically before evicting.
  - api/cache.py: removes _DEFAULT_TTL constant; exposes
    _default_ttl() function for callers that want the resolved
    default. The constant was never imported by anyone; it only
    appeared in __all__.
  - scripts/worker.py: reaper mode reads reaper_main_timeout_seconds
    and reaper_stall_seconds from settings, configurable via
    PROTEA_TUNING__WORKER__REAPER_MAIN_TIMEOUT_SECONDS and
    PROTEA_TUNING__WORKER__REAPER_STALL_SECONDS.

8 new tests in test_tuning.py: WorkerTuning defaults (pool,
cache, reaper), validation (pool>0, reaper>=300), TuningSettings
compose with worker, env override of db_pool_size.

Suite: 1106 passed, 10 skipped (was 1098 + 8).

Two of five categories migrated. OperationTuning, APILimits,
ResearchKnobs follow.

Part of F0 T-CONF.2 of master plan v3.
Third tuning category. Migrates 4 module-level chunk-size constants
that were duplicated across feature_enricher, knn_search,
training_dump_helpers, and predict_go_terms.

OperationTuning fields:
  - annotation_chunk_size  (10_000)  feature_enricher,
    training_dump_helpers, predict_go_terms (5 helper sites total).
  - stream_chunk_size      (2_000)   training_dump_helpers
    (_preload_all_embeddings) and predict_go_terms (_load_query_embeddings).
  - store_chunk_size       (10_000)  predict_go_terms (publishing
    predictions to protea.predictions.write).
  - numpy_query_chunk      (500)     knn_search._search_numpy chunked
    matrix multiplication (caps the n_queries x n_refs distance matrix
    peak around 1 GB for default values).

Removes 8 module-level constants from 4 files; resolves dynamically
inside the helpers via get_tuning().operation.X. Eliminates the
triplicate _ANNOTATION_CHUNK_SIZE / duplicate _STREAM_CHUNK_SIZE
that the inventory flagged in CONFIG_INVENTORY.md §C.

HTTP retry policy / timeouts in pydantic payloads
(InsertProteinsPayload, LoadGoaAnnotationsPayload, etc.) intentionally
stay where they are. Those are caller-controlled per job, not infra.

3 new tests in test_tuning.py: OperationTuning defaults, validation
floors, env override of annotation_chunk_size.

Suite: 1109 passed, 10 skipped (was 1106 + 3).

Three of five categories migrated. APILimits and ResearchKnobs follow.

Part of F0 T-CONF.2 of master plan v3.
Fourth tuning category. Migrates 4 hardcoded boundary limits from
the FastAPI router layer.

APILimits fields:
  - max_fasta_bytes      (50 MB)  duplicated as _MAX_FASTA_BYTES
    in api/routers/annotate.py and api/routers/query_sets.py;
    the externalisation dedupes by construction.
  - max_comment_length   (500)    api/routers/support.py
  - recent_limit         (20)     api/routers/support.py
  - page_limit           (100)    api/routers/support.py

Behavioural:
  - annotate.py + query_sets.py: read max_fasta_bytes from
    get_tuning().api at request time. Error message now formats
    the configured limit instead of a literal "50 MB" so an
    operator-set override is reflected back to clients.
  - support.py: the SupportCreate pydantic Field's static
    max_length= moves to a field_validator that resolves
    max_comment_length dynamically. The /support GET reads
    page_limit and recent_limit from settings.

3 new tests in test_tuning.py: APILimits defaults, validation
floors, env override of max_fasta_bytes.

Suite: 1112 passed, 10 skipped (was 1109 + 3).

Four of five categories migrated. Only ResearchKnobs (mostly
config-exempt: PCA dim and N_THRESHOLDS sweep are research-side
methodology constants documented in CONFIG_INVENTORY §E) left.

Part of F0 T-CONF.2 of master plan v3.
Documents the four migrated TuningSettings categories (Queue, Worker, Operation, APILimits) with field/default/purpose tables, YAML and env-override examples, and config-exempt category callouts (PCA dim, N_THRESHOLDS, GAF indices). Lives inside the existing appendix/configuration.rst so the reference is a single document.

Part of F0 T-CONF.3 of master plan v3.
Adds protea-contracts, protea-method, protea-sources, protea-runners and protea-backends as develop=true path-deps under [tool.poetry.group.plugins.dependencies]. Install with poetry install --with plugins.

End-to-end discovery verified: importlib.metadata.entry_points(group='protea.sources|runners|backends') resolves correctly from inside PROTEA's venv: 3 sources (goa/quickgo/uniprot), 3 runners (baseline/knn/lightgbm), 4 backends (ankh/esm/esm3c/t5).

Suite still 1112 passed, 10 skipped.

Part of F0 T0.15 of master plan v3.
Adds .github/workflows/security.yml with two jobs:
  - pip-audit: scans installed dependencies against the OSV database.
    Non-blocking in F0 (the existing surface has 22 known CVEs, all
    in third-party transitive deps; transformers 4.48.x dominates
    with 11 CVEs that need a coordinated bump). Master plan v3
    F-OPS T-OPS.7 will flip this to fail on severity HIGH.
  - bandit: security static analysis against protea/. Runs in HIGH
    severity + HIGH confidence mode at F0 (zero findings now);
    will tighten in F-OPS.

Triggers: push, PR, and a weekly cron (Mon 06:00 UTC) so freshly
disclosed CVEs surface even when no PR has landed.

Inline fixes for the two bandit B324 findings (weak MD5 hash):
  - protea/core/reranker.py: cache key tag in _load_artifact_to_disk.
  - protea/infrastructure/orm/models/sequence/sequence.py: sequence
    dedup key.
Both pass usedforsecurity=False (Python 3.9+ flag) to declare intent;
collision resistance is irrelevant in either context (cache key tag
and dedup hash, not security primitives).

Bandit config in pyproject.toml [tool.bandit]: excludes tests/ and
the lab archeology dump script; skips B404/B603/B101 (subprocess
imports + assert usage) which are project-level acceptable.

Suite: 1112 passed, 10 skipped (unchanged).

Part of F0 T0.4 of master plan v3.
Removes the duplicated definitions of feature schema, payload classes
and ProteaPayload base from PROTEA. They now live exclusively in
``protea-contracts`` (v0.1.0). PROTEA modules re-export the names from
their original module locations so existing imports keep working;
new code should import from ``protea_contracts`` directly.

Files touched:

  - protea/core/reranker.py
    - Drops 73 lines of NUMERIC_FEATURES / EMBEDDING_PCA_DIM /
      CATEGORICAL_FEATURES / ALL_FEATURES / LABEL_COLUMN definitions.
    - Re-exports the same names from protea_contracts.
    - fit_embedding_pca remains local (it's logic, not contract).

  - protea/core/contracts/operation.py
    - Drops the 11-line ProteaPayload class definition.
    - Re-exports it from protea_contracts.
    - Drops the now-unused ``BaseModel, ConfigDict`` import.

  - protea/core/operations/predict_go_terms.py
    - Drops 119 lines of PredictGOTermsPayload /
      PredictGOTermsBatchPayload / StorePredictionsPayload classes.
    - Re-exports them from protea_contracts.
    - Drops now-unused imports (Annotated, Field, field_validator) and
      the local PositiveInt alias.

Net diff: -218 / +30 in PROTEA. Logic preserved exactly: every
existing call site (15 files imported one of these names) keeps
working through the re-exports.

Suite: 1112 passed, 10 skipped (unchanged). The protea-contracts
suite (71 passed, cov 95%) covers the moved definitions; PROTEA's
existing tests cover the integration.

Part of F1 T1.5 of master plan v3.
Pins the contract between protea_contracts (canonical) and PROTEA's re-exports / future registry. 14 tests in 4 classes:

  - TestReexportIdentity (7 tests, active): every constant PROTEA still re-exports must be the same object as protea_contracts (ALL_FEATURES, NUMERIC, CATEGORICAL, EMBEDDING_PCA_DIM, LABEL_COLUMN, ProteaPayload, the 3 predict payloads). Hard guarantee that 'from protea.core.reranker import ALL_FEATURES' will not silently diverge from 'from protea_contracts import ALL_FEATURES'.

  - TestShaConsistency (2 tests, active): compute_schema_sha produces the same digest regardless of caller path; pinned to the golden 145592ed186c so PROTEA CI fails before the booster cache invalidates.

  - TestFeatureFamilyCoverage (3 tests, active): every family member lives in ALL_FEATURES; emb_pca family size matches EMBEDDING_PCA_DIM; canonical naming.

  - TestRegistryCoversContracts (2 tests, skipped): activates automatically when F2B.1 ships protea/core/features/registry.py; asserts set(REGISTRY.names()) == set(ALL_FEATURES) and family map equality.

Suite: 1124 passed, 12 skipped (was 1112 + 12 active + 2 dormant).

Part of F1 T1.7 of master plan v3.
…ference

Two boundary validations against the canonical protea_contracts schema:

Export side (parquet_export.py): before writing train/eval parquets, compute compute_schema_sha([c for c in shard.columns if c in ALL_FEATURES]) and compare to compute_schema_sha(ALL_FEATURES). Mismatch raises ValueError with the missing/extras list, instead of silently shipping a partial dump that LightGBM training would choke on. Pure invariant check; the legacy schema_sha hash in the manifest is unchanged (T1.6 of master plan v3 owns the migration to schema_sha_v2).

Inference side (predict_go_terms._apply_reranker_if_aligned): switches the import of compute_feature_schema_sha from protea_reranker_lab.contracts to protea_contracts. Functions are byte-identical so behaviour is preserved; the canonical source is now protea_contracts (single source of truth).

5 new tests in test_parquet_export_boundary.py: full columns pass, missing column in train raises, missing column in eval raises, typo feature name raises, empty eval shard skipped.

Suite: 1129 passed, 12 skipped (was 1124 + 5).

Part of F1 T1.8 of master plan v3.
Replaces the hardcoded if/elif chain in compute_embeddings._load_model
with discovery via the protea.backends entry_points group. The four
backend plugins (esm, t5, ankh, esm3c) shipped by protea-backends are
now resolved dynamically; adding a new backend is a pyproject entry
plus a class — no edits to compute_embeddings required.

Scope of this refactor:

  - Module-level _load_model now calls _resolve_backend(model_backend)
    + plugin.load_model(model_name, device, emit). The (model,
      tokenizer) return shape stays exactly the same (tokenizer is
      None for ESM-C, matching the legacy path).
  - The legacy "auto" alias maps to "esm" exactly as before.
  - Plugin discovery is cached in module-level _BACKEND_PLUGINS and
    populated on first call (lazy: avoids running entry_points scan
    at import time).
  - Plugin name attribute is asserted to match its entry_point name
    on first load. Silent drift would yield confusing "unknown
    backend" errors; we'd rather fail loud.

Out of scope (deferred to F2C):

  - _embed_batch dispatch keeps the legacy if/elif chain calling
    _embed_esm / _embed_t5 / _embed_ankh / _embed_esm3c. The
    plugin's embed_batch returns a flat (batch_size, hidden_dim)
    ndarray, while the legacy _embed_* return list[list[
    ChunkEmbedding]] with full chunk + layer + pooling support. The
    contract extension is a separate task; this commit only swaps
    the load path where the API signatures already line up.
  - Cov gate bump in protea-backends CI: deferred until an
    integration runner installs an extra and exercises the plugin's
    load_model. Bumping the gate to 25% on the strength of unit
    tests alone would just be theatre.

Tests:

  - tests/test_compute_embeddings_backend_dispatch.py: 7 new tests
    covering plugin discovery, entry_point/name parity, "auto" alias,
    unknown-backend error path, _load_model emit/delegate behaviour,
    cache identity, and re-import semantics.

Suite: PROTEA 1136 passed, 12 skipped (was 1129 / 12; +7 new).
Plugin discovery confirmed working from the PROTEA venv:

    >>> from importlib.metadata import entry_points
    >>> {ep.name for ep in entry_points(group="protea.backends")}
    {'ankh', 'esm', 'esm3c', 't5'}

Pairs with the protea-backends 011b27d commit declaring per-backend
optional dependency extras.

Part of F2A.5 of master plan v3.
Map every strategic decision in master plan v3 (2026-05-05) to a navigable
ADR stub under docs/source/adr/. Uniform format per file: status (Accepted,
Pending, Deferred or Obsolete), date, phase introduced, gate (if pending),
context (2-3 sentences), decision (1-2 sentences), consequences (1-2
bullets) and resolution.

Index reorganised into two layers:

- Implementation decisions (001-008): runtime, ORM, queue topology and
  similar choices that surfaced while building.
- Strategic decisions (D1-D30): plan-level decisions from the master plan.

Each row in the strategic table carries a status badge so the open work
visible at a glance.

Eight gates pending human action explicitly listed: D4 API versioning
(gate F4), D6 authentication (gate F5), D7 observability stack
(gate F-OPS), D10 schema_sha v2 migration (T1.6 gate D10), D25 HPC mode
(gate F-OPS), D27 image registry (gate F-OPS), D28 secrets management
(gate F-OPS), D29 release pipeline (gate F-OPS).

Sphinx build clean: build succeeded, 4 pre-existing warnings (none from
the new files).
First Level-1 plugin migration: ``LoadGOAAnnotationsOperation``
delegates HTTP + gzip + GAF parsing to
``protea_sources.goa.GoaSource.stream``, becoming a thin persistence
adapter that owns DB filtering, GO-term resolution, dedup, and
``pg_insert``. Pairs with ``protea-sources/d1d60f6``
(``GoaSource.stream`` real implementation) and
``protea-contracts/20987a5`` (``GoaStreamPayload`` +
``GoaAnnotationRecord``).

What moved out:

  * ``_stream_gaf`` body (~30 LOC of HTTP/gzip/parsing): now a
    one-liner that constructs a typed ``GoaStreamPayload`` and yields
    from ``goa_plugin.stream``.
  * Eight ``_IDX_*`` GAF column constants: now in protea-sources.
  * ``import gzip``, ``import io``, ``import requests``: removed —
    the plugin owns the network and decode layers.

What stayed:

  * ``_load_accessions`` (canonical-accession universe).
  * ``_load_go_term_map`` (GO-id → term-id).
  * ``_store_buffer`` (dedup + pg_insert with on_conflict_do_nothing).
    Now consumes ``GoaAnnotationRecord`` via attribute access
    (``rec.accession``) instead of dict access (``rec["accession"]``).
  * ``_maybe_enqueue_atomic_eval`` (auto-eval child job).
  * Operation lifecycle, ``LoadGOAAnnotationsPayload`` validation,
    ``OperationResult`` shaping.

Tests updated, not extended:

  * ``_make_record`` test fixture now constructs ``GoaAnnotationRecord``
    instances; ``with_from=""`` becomes ``with_from=None`` (semantically
    identical, the old code converted "" → None at insert time).
  * ``TestStreamGaf`` patches now target ``protea_sources.goa.requests
    .get`` instead of the operation-local ``requests.get``.
    Assertions migrated from dict access to attribute access.
  * ``rec.copy()`` → ``rec.model_copy()`` (pydantic v2 deprecation).

Behavioural parity:

  * ``_store_buffer`` still does ``rec.accession.strip()`` for the
    DB-lookup field (parser preserves raw GAF columns; strip happens
    where the lookup needs it). Same observable behaviour as before.
  * Empty optional fields ("" → None) handled by the parser at the
    boundary, not by the operation. No DB-insert diff.
  * Dedup key ``(set_id, accession, go_term_id, evidence_code)``,
    ``on_conflict_do_nothing(constraint=...)`` constraint, page-level
    commit policy: all preserved verbatim.

Suite: 1136 passed, 12 skipped (= unchanged from master). The 54
``test_load_goa_annotations`` cases all pass on the new boundary.

Why Level 1 only (the design discipline):

protea-sources is a leaf C-stack package; importing
``protea.infrastructure.orm.*`` would invert the dependency
direction. Level 1 (HTTP + parsing) cuts cleanly along the
SQLAlchemy boundary; Level 2 (move the operation entirely) waits for
F2C ORM extraction. See ``~/Thesis/f2a6_real_migration_design.md``.

Pattern locked for the remaining migrations (QuickGO, UniProt
FASTA, UniProt metadata): typed ``<Name>StreamPayload`` +
``<Name>Record`` in protea-contracts, ``<Name>Source.stream`` in
protea-sources, operation refactor here.

Part of F2A.6-real migration plan (master plan v3).
…s plugin

Second Level-1 plugin migration. LoadQuickGOAnnotationsOperation
delegates HTTP + TSV streaming + ECO mapping to the protea-sources
QuickGoSource plugin, becoming a thin persistence adapter. Pairs
with protea-sources/f37dfce (real plugin) and protea-contracts/
c5433ed (typed payloads + record).

What moved out:

  * _stream_quickgo body (~70 LOC of batching, HTTP, TSV parsing):
    now a one-liner constructing a typed QuickGoStreamPayload and
    yielding from quickgo_plugin.stream.
  * _fetch_quickgo_page method: deleted entirely. Plugin owns the
    per-batch HTTP fetch.
  * _load_eco_mapping body (~13 LOC of HTTP + line parsing): now
    one call to quickgo_plugin.fetch_eco_mapping(EcoMappingPayload(
    url=...)). Operation keeps the wrapper for the empty-URL short
    circuit (returns {} when eco_mapping_url is None).
  * import io, import requests: removed from the operation module.

What stayed:

  * _load_accessions (canonical + protein accession universes).
  * _load_go_term_map (GO-id -> term-id).
  * _store_buffer (dedup + ECO map application + pg_insert with
    on_conflict_do_nothing). Now consumes QuickGoAnnotationRecord
    via attribute access (rec.accession) instead of dict access
    (rec["GENE PRODUCT ID"]).
  * Operation lifecycle, LoadQuickGOAnnotationsPayload validation
    (which keeps page_size, total_limit, commit_every_page knobs
    that are operation-side concerns and don't belong in the
    plugin payload).

Tests updated:

  * _record(...) helper builds QuickGoAnnotationRecord instances
    from kwargs; replaces the verbose dict-literal _QUICKGO_ROWS.
  * TestStoreBuffer (~9 tests) consumes records, not dicts. The
    test_empty_eco_id_becomes_none test now passes eco_id=None
    directly (parser-side empty-cell handling). The
    test_empty_accession_skipped test was renamed to
    test_unknown_accession_skipped: whitespace handling moved to
    the parser in protea-sources, so the operation only sees
    accessions that don't match valid_accessions.
  * TestLoadEcoMapping (~5 tests): patches and event names swapped
    to source.quickgo.eco_mapping_*. The empty-URL short circuit
    test stayed — operation-side behaviour, not plugin-side.
  * TestStreamQuickgo + TestExecute: patches swapped to
    protea_sources.quickgo.requests.get. Batching event name swap
    to source.quickgo.batching.
  * TestFetchQuickgoPage class deleted entirely (~135 LOC). The
    tests were exercising the parser through HTTP mocks; the
    parser is now in protea-sources where parse_quickgo_row and
    parse_quickgo_tsv have full unit tests. Net -8 tests in PROTEA,
    +9 unit tests in protea-sources for a strictly better surface.

Behavioural parity:

  * Empty cells -> None at parser boundary (matches old _store_buffer
    "or None" handling at insert time). No DB-insert diff.
  * Dedup, on_conflict_do_nothing constraint, ECO map application
    via eco_map.get(rec.eco_id, rec.eco_id): preserved verbatim.
  * Multi-batch URL construction: identical (plugin's
    gene_product_batch_size matches operation's payload field).
  * Event names changed (load_quickgo_annotations.* -> source.quickgo.*
    for plugin-emitted events). Operation-side events unchanged.
    Downstream consumers reading JobEvent rows must filter on the
    new prefix; flagged here for the operator changelog.

Suite: 1128 passed, 12 skipped (= -8 from master because the
8 redundant TestFetchQuickgoPage cases were deleted, not regressed).
The 37 test_load_quickgo_annotations cases all pass on the new
boundary.

Part of F2A.6-real migration plan, step 2 of 4. Pattern locked for
the remaining UniProt FASTA + UniProt metadata migrations.
Two fixes plus an expansion of the documented module surface.

Removes:
- The autodoc directive for protea.core.operations.train_reranker,
  orphaned since T0.6 removed the file. Sphinx no longer reports a
  ModuleNotFoundError during build.
- A broken :doc: cross-reference to a non-existent
  /refactoring/design-patterns/flyweight page in the
  protea.core.annotation_intern module docstring. Replaced with plain
  text "Flyweight-style".

Adds documentation for modules introduced or moved during F0 / F1:
- protea.core.contracts.parent_progress (T0.7 dedup helper).
- protea.core.retry (T0.3 retry middleware).
- protea.core.operation_catalog (singleton OperationRegistry
  builder).
- protea.core.training_dump_helpers (T0.6 home of helpers that
  survived the train_reranker.py deletion; reused by
  ExportResearchDatasetOperation).
- An "Internal helpers" section covering protea.core.{
  anc2vec_embeddings, annotation_intern, disk_cache,
  feature_enricher, pca_cache} for completeness.

Build verification: poetry run sphinx-build returns
"build succeeded, 5 warnings". Of those, 4 are pre-existing
environmental failures (numpy._core.multiarray import error during
autodoc of modules that import numpy, plus the cosmetic
_static directory missing). The previously-introduced train_reranker
and flyweight warnings are gone.

Doc-T3 of the documentation lane.
…utils

Replaces the inline implementations in Protein.parse_isoform and
Sequence.compute_hash with one-line forwarders to
``protea_contracts.bio_utils``. The canonical authority moves to
the contracts package so the upcoming UniProt FASTA parser in
``protea-sources`` can reuse the helpers without inverting the
C-stack dependency direction.

Files:

  * protea/infrastructure/orm/models/protein/protein.py: the
    isoform-splitting body becomes a single delegated call; module
    docstring on the wrapper explains the move so future grep on
    "parse_isoform" lands callers in the right place.
  * protea/infrastructure/orm/models/sequence/sequence.py: the
    MD5 body becomes a delegated call; the now-unused ``import
    hashlib`` is removed.

Behavioural parity preserved bit-for-bit:

  * parse_isoform("P12345") -> ("P12345", True, None) — unchanged.
  * parse_isoform("P12345-2") -> ("P12345", False, 2) — unchanged.
  * compute_hash("MKTAYIAK") -> identical 32-hex MD5 — unchanged.

Existing call sites in protea/api/routers/query_sets.py,
protea/api/routers/annotate.py,
protea/core/operations/fetch_uniprot_metadata.py,
protea/core/operations/insert_proteins.py keep working unchanged
because the public API on the ORM classes is preserved (Protein
.parse_isoform, Sequence.compute_hash). They will be migrated to
direct imports from protea_contracts as their respective files get
refactored in F2A.6-real subsequent steps.

Suite: PROTEA 1128 passed, 12 skipped (= unchanged from turn 27).
The 6 callsites in tests (test_insert_proteins, test_integration)
exercise the wrappers transparently.

Pairs with protea-contracts/18e92af which adds the canonical
``parse_isoform`` and ``compute_sequence_hash`` plus 12 unit
tests in protea_contracts/bio_utils.py.

Part of F2A.6-real migration plan (D-MIGR-04), prerequisite for
step 3 (UniProt FASTA migration).
Adds docs/source/plugin-authoring.rst as the canonical entry point
for plugin authors, and links it from the main toctree in
docs/source/index.rst.

Scope:

- Architecture overview in one paragraph (protea-core platform plus
  four sibling plugin layers).
- Table of the four layers (annotation sources, embedding backends,
  experiment runners, feature registry) with their ABC, repository
  and entry-point group.
- Decision tree for picking the right ABC depending on what the
  author wants to add.
- Anatomy of a plugin in 5 steps that apply uniformly across the
  three entry-point-driven layers, plus the in-process pattern for
  feature registry contributions.
- Pointers to the per-repo contributing guides shipped on the doc
  lane: protea-backends/docs (Doc-T1) and
  protea-contracts/docs (Doc-T2). The protea-sources and
  protea-runners guides land in Doc-T8.
- Discovery snippet (importlib.metadata.entry_points) that mirrors
  what protea-core does at startup, including the name-vs-entry-point
  sanity check.
- Schema invariants and reproducibility section linking ADR D10
  (schema_sha v2 migration) and the float16 embedding contract.
- Roadmap section pointing to upcoming master-plan v3 phases that
  affect plugin authors (F2A.7 lightgbm absorption, F2B feature
  registry wiring, F2C protea-method extraction, F9 post-defense
  granularity decision).

Build verification: poetry run sphinx-build returns
"build succeeded, 5 warnings" (same 5 pre-existing warnings as
before; the new page introduces zero warnings).

Doc-T7 of the documentation lane. Implements F7.6 of master plan v3
("Plugin author guide").
Third Level-1 plugin migration. InsertProteinsOperation delegates
HTTP retries + cursor pagination + gzip decoding + FASTA parsing to
the protea-sources UniProtSource plugin, becoming a thin
persistence adapter. Pairs with protea-sources/fadbd6b (real
UniProtSource.stream_fasta + _http.py) and protea-contracts/
f1bf7b5 (typed payload + record).

What moved out:

  * _fetch_fasta_pages body: now a one-liner constructing a typed
    UniProtFastaStreamPayload and yielding from
    uniprot_plugin.stream_fasta. Renamed to _stream_fasta to
    reflect per-record yield (D-MIGR-01).
  * _decode_response method (gzip / utf-8 wrapper): plugin owns it.
  * _parse_fasta + _parse_header methods (~70 LOC of FASTA parsing,
    OS/OX/GN regex, isoform splitting via Protein.parse_isoform):
    plugin owns it. The OS/OX/GN regex constants and isoform logic
    move with them.
  * UNIPROT_SEARCH_URL constant: now in
    UniProtFastaStreamPayload.base_url default.
  * Removed imports: gzip, re, requests, Response, BytesIO, quote,
    UniProtHttpClient (legacy PROTEA-side copy stays in
    protea/core/utils.py until step 4 deletes it as the last caller
    fetch_uniprot_metadata also migrates).
  * Removed state: self._http_client, self._total_results.

What stayed:

  * Operation lifecycle, InsertProteinsPayload validation, batching
    policy (page_size buffer flush), session.add_all + flush against
    Protein + Sequence tables, conservative-update logic for
    existing proteins.
  * _store_records (full upsert path) — now consumes
    UniProtProteinRecord via attribute access (rec.accession,
    rec.canonical_accession, etc.) instead of dict access.
  * _load_existing_sequences + _load_existing_proteins (DB lookup
    helpers).

Behavioural diffs surfaced:

  * pages now counts DB-side buffer flushes, not HTTP pages. The
    HTTP-page count is the plugin's internal concern (visible via
    source.uniprot_fasta.fetch_page_done events). pages is more
    useful to monitor DB throughput; the rename in semantics is
    captured in the relevant test docstrings.
  * X-Total-Results header capture (op._total_results) is removed.
    The header was nice-to-have for progress reporting and not
    load-bearing for correctness; progress totals now flow only
    when the user sets total_limit. Operator changelog flagged.
  * Plugin-emitted events use the source.uniprot_fasta.* prefix;
    operation-emitted events keep insert_proteins.* prefix.

Tests refactored:

  * TestParseFasta class deleted (~95 LOC, 11 tests). Parser is now
    in protea-sources where parse_fasta_header + parse_fasta_text
    have full unit coverage.
  * TestDecodeResponse class deleted (~25 LOC, 2 tests). Decode is
    in the plugin's _decode_response_body helper, exercised via the
    gzip stream wiring tests in protea-sources.
  * test_total_results_from_header + test_total_results_invalid
    deleted (2 tests). The operation no longer captures X-Total-
    Results.
  * TestStoreRecords: dict-literal record fixtures replaced with
    a _make_record(...) helper that builds UniProtProteinRecord
    via the bio_utils helpers (same MD5 hash, same canonical
    splitting). Two test bodies shrink ~17 LOC each.
  * TestInsertProteinsOperationExecute: patch target swap from
    ``op._http_client.session.get`` to
    ``op._uniprot_plugin._client.session.get`` across 16 sites.
    test_empty_page_continues renamed to test_empty_page_does_not_flush
    with the new pages=0 expectation. test_progress_emission_with_total
    renamed to test_progress_emission_with_total_limit; uses
    page_size=1 + total_limit=100 to force a flush + carry the
    progress total.
  * Net -16 PROTEA tests (parser+decode+total_results all moved or
    deleted), corresponding +56 in protea-sources for a strictly
    better surface.

Suite: PROTEA 1112 passed, 12 skipped (was 1128; -16 from
deletions). Ruff full + mypy strict green on touched files.

Part of F2A.6-real migration plan, step 3 (b) of 4. The legacy
UniProtHttpClient in protea/core/utils.py becomes dead code once
step 4 (UniProt metadata migration) lands; deletion deferred to
that turn.
Closes F2A.6-real with the fourth Level-1 plugin migration plus
the dead-code cleanup that was waiting on it.

Migration:

  * FetchUniProtMetadataOperation delegates HTTP retries + cursor
    pagination + gzip decoding + TSV parsing to the protea-sources
    UniProtSource.stream_metadata plugin method (added in
    protea-sources/2a6ef55). Operation becomes a thin persistence
    adapter focused on FIELD_MAP DB upsert and update_protein_core
    side effects.
  * Removed: _fetch_tsv_pages (~70 LOC of HTTP + URL construction),
    _decode_response (gzip wrapper), _parse_tsv (csv.DictReader).
    All three live in the plugin now.
  * Removed state: self._http_client, self._total_results.
  * UNIPROT_FIELDS constant kept on the operation class — the field
    list is a persistence concern (which DB columns get
    populated). Same field set passed to the plugin via
    UniProtMetadataStreamPayload.fields.
  * Imports trimmed: csv, gzip, BytesIO, StringIO, quote, requests,
    Response, UniProtHttpClient. Replaced with protea_contracts
    imports for UniProtMetadataRecord, UniProtMetadataStreamPayload,
    parse_isoform.
  * _store_rows consumes UniProtMetadataRecord via attribute access
    (rec.accession, rec.raw_fields) instead of dict access. Field
    semantics preserved bit-for-bit: same FIELD_MAP application,
    same update_protein_core conservative-update logic.

Behavioural diffs (same as the FASTA migration):
  * pages now counts DB-side buffer flushes (not HTTP pages).
  * X-Total-Results header capture removed. _progress_total flows
    only when total_limit is set.

Dead-code cleanup:

  * protea/core/utils.py: deleted UniProtHttpClient (135 LOC) plus
    its _HttpPayload Protocol and the now-unused random/time/
    requests/Response imports. The file shrinks to just
    chunks() + utcnow() helpers (~13 LOC).
  * tests/test_core.py: deleted TestUniProtHttpClient class
    (~75 LOC, 9 tests) and TestFetchUniProtMetadataExecute class
    (~290 LOC, 10 tests). The first migrated to
    protea-sources/tests/test_uniprot.py::TestUniProtRetryClient
    (5 tests, retries + Retry-After + max-retries + network errors)
    plus TestExtractNextCursor (4 tests). The second migrated
    partially to test_fetch_uniprot_metadata.py (which keeps the
    14 execute-flow tests against the new plugin-based dispatch)
    and partially to protea-sources/tests/test_uniprot.py
    (TestParseMetadataTsv covers the parser directly).
  * tests/test_fetch_uniprot_metadata.py: deleted TestParseTsv
    class (4 tests, ~35 LOC) — parser now in protea-sources.
    16 patch sites swapped from op._http_client.session.get to
    op._uniprot_plugin._client.session.get.

Suite: PROTEA 1089 passed, 12 skipped (was 1112; -23 from
deletions of the legacy class + parser/decode/total_results
overlap with protea-sources). Ruff full + mypy strict green.

Net diff PROTEA: -266 / +163 = -103 LOC across the operation,
core/utils.py, test_core.py, and test_fetch_uniprot_metadata.py.

This closes F2A.6-real:

  * GOA real-migrated (turn pre-25, commits 20987a5/d1d60f6/43da412).
  * QuickGO real-migrated (turn 27, c5433ed/f37dfce/42d4dd4).
  * D-MIGR-04 prereq (turn 29, 18e92af/434b14e).
  * UniProt FASTA real-migrated (turn 32, f1bf7b5/fadbd6b/56a6d87).
  * UniProt metadata real-migrated + UniProtHttpClient deleted
    (this turn, 09f3883/2a6ef55/<this>).

protea-sources is now self-contained with respect to UniProt HTTP:
the _http.py module owns the retry client; the plugin owns parsing
and modality dispatch (FASTA vs metadata). PROTEA's only remaining
involvement is persistence.

Part of F2A.6-real migration plan, step 4 of 4. F2B (HTTP registry
endpoints) is next on the autonomous queue once doc-lane gives it
priority.
Adds three read-only HTTP endpoints listing plugins discovered via
``importlib.metadata.entry_points``, closing the F2B.1-3 block of
master plan v3 in a single coherent router (the three endpoints
share their lookup mechanism — splitting them across separate
files would be artificial).

Endpoints:

  * ``GET /backends`` — embedding backend plugins
    (``protea.backends`` group). Today: esm, t5, ankh, esm3c.
  * ``GET /sources`` — annotation source plugins
    (``protea.sources`` group). Today: goa, quickgo, uniprot.
  * ``GET /runners`` — experiment runner plugins
    (``protea.runners`` group). Today: baseline, knn, lightgbm.

Response shape:

  ```
  {
    "group": "protea.backends",
    "plugins": [
      {"name": "esm", "cls": "EsmBackend",
       "module": "protea_backends.esm:plugin", "extra": {}},
      ...
    ]
  }
  ```

The ``extra`` field carries plugin-class-specific metadata read
from the loaded instance: today only sources expose
``version``, surfaced as ``extra.version`` (e.g. ``"uniprot-goa"``,
``"quickgo-rest"``). Backends and runners get an empty ``extra``;
adding more probe-able metadata is a one-line change inside
``_discover``.

Design choices:

  * No caching. The endpoint re-scans entry_points on every call so
    a worker that's just been restarted with a newly-installed
    extra surfaces in the next request without an API restart. The
    scan is sub-millisecond on the working set of ~10 plugins.
  * No authentication. These endpoints are public-read by design —
    they list installed software, not user data.
  * Plugin loading happens here. Loading the entry_point fires the
    plugin module's import side effects but should not raise for
    any first-party plugin (the bootstrapping pattern keeps top-
    level imports cheap). If a third-party plugin's load raises,
    the caller surfaces it as a 500 — fail loud beats silently
    hiding broken installs.
  * Fixed group whitelist (``_KNOWN_GROUPS``) prevents callers
    from probing arbitrary entry_points via the same code path.

Files:

  * protea/api/routers/registry.py: new router (~140 LOC) with
    PluginInfo + PluginListResponse pydantic models, _discover +
    _list_for helpers, and the three endpoint functions.
  * protea/api/app.py: add registry_router to the import block
    and wire it via ``app.include_router(registry_router.router)``.
  * tests/test_registry_endpoints.py: 16 tests across four
    classes — TestBackendsEndpoint (5), TestSourcesEndpoint (5),
    TestRunnersEndpoint (4), TestResponseSchema (2). Tests run
    against the live entry_points discovery (the 10 plugins are
    real C-stack siblings installed via path-deps); no mocking.

Suite: PROTEA 1105 passed, 12 skipped (was 1089; +16 new), ruff
full + mypy strict green on the new files.

This closes F2B autonomous-eligible work. F2B.4
(PredictGOTermsBatchOperation extract class) stays in the
human-review queue because of reranker sensitivity.

Part of F2B of master plan v3.
Adds a runnable submit-watch-result loop to PROTEA's README,
satisfying the F7.1 acceptance criterion of master plan v3
("5 minutes to first job") that the original README did not
explicitly cover. The previous README documented Docker + From
source bring-up but stopped at "scripts/manage.sh start" without
showing the end-to-end machinery.

The new section lives between Getting started and Documentation
and shows three operations the user can run with curl + jq the
moment the stack is up:

  1. POST /jobs to enqueue a `ping` smoke-test operation, capturing
     the returned job id.
  2. GET /jobs/{id}/events to tail the structured-event stream
     until the job reaches a terminal state.
  3. GET /jobs/{id} to confirm the final status + result + any
     error code.

Plus a sub-section showing the F2B plugin-discovery endpoints
(GET /backends, /sources, /runners) that landed in turn 36 — the
runtime catalogue the user can probe to see what models /
sources / runners the running deployment ships.

The example uses `ping` rather than a real ML operation so the
quickstart doesn't depend on having sequence data loaded; the
intent is to exercise the queue + worker + DB lifecycle end-to-
end, which `ping` does in <1s. Real operations
(insert_proteins, load_goa_annotations, compute_embeddings,
predict_go_terms) are submitted the same way.

PROTEA README size: 141 LOC → 187 LOC (+46 LOC).

Suite + Sphinx build unchanged; this is doc-only.

Part of Doc-T11 of the autonomous loop. Closes the README
expansion sweep across the four C-stack repos plus PROTEA itself.
CI rescue: restore main to a green state after ~6 weeks of red

The last green CI run on `main` was 2026-03-25 (PR #7). Between
that and 2026-05-06, two latent breakages accumulated and were
exposed when the F2 phase work landed via 7db0e0d..e9ae748:

- `cafaeval-protea` declared as PEP 621 file:/// URL hardcoded to
  the original developer's machine (introduced 2026-04-21,
  commit ace4c4a).
- Five sibling path-deps (`protea-{contracts,method,sources,
  runners,backends}`) added during the F2 plugin migration; their
  internal cross-deps to protea-contracts were also path-based.
- `protea-reranker-lab` path-dep on a sibling that wasn't on
  GitHub at all.
- Pre-existing pyproject.toml + workflow gaps: `--only dev` install
  scope in lint and docs jobs (skipped main deps), missing
  sphinxcontrib-bibtex declaration, accumulated tech debt across
  ruff / flake8 / mypy that hadn't been gated for ~6 weeks.

What this PR does:

1. Replaces all path-deps with `git+https://github.com/frapercan/<repo>`
   URLs so CI runners can resolve them. The 5 C-stack siblings and
   protea-reranker-lab were pushed to GitHub as part of this work.
   Cross-sibling path-deps inside the siblings (e.g.
   protea-backends pointing at ../protea-contracts) were also
   converted; otherwise poetry's transitive resolution failed.
2. Fixes integration tests broken by the F2A.6-real migration
   (op._http_client references, dict→GoaAnnotationRecord fixture
   conversion, halfvec roundtrip tolerance).
3. Auto-fix + manual cleanup of 18 ruff errors, 10 flake8 spacing
   violations, and 15 mypy errors (mostly union-attr narrowing
   asserts and targeted type: ignore on legitimate runtime
   patterns mypy can't prove safe).
4. Fixes lint + docs workflows to use `poetry install --with dev`
   instead of `--only dev` so mypy / sphinx autodoc can resolve
   imports of pyarrow, sqlalchemy, fastapi, etc.
5. Declares sphinxcontrib-bibtex (was installed transitively in
   the local venv but missing from pyproject.toml).
6. Includes 8 ADR resolutions confirmed during the rescue session
   (D04 /v1/ versioning, D06 Authentik+oauth2-proxy, D07
   Loki+Grafana+OTel, D10 schema_sha v2, D25 HPC mode B primary,
   D27 ghcr.io, D28 sops+age, D29 semantic-release).

Local-dev trade-off: editable cross-sibling installs are lost.
Devs who want hot-reload across siblings need to do
`pip install -e ../<sibling>` after `poetry install`.

CI verification on this PR:
- lint (3.12, 2.1.0): pass (3m3s)
- test (3.12, 2.1.0): pass (3m14s)
- integration (3.12, 2.1.0): pass (4m11s)
- docs (3.12, 2.1.0): pass (2m57s)
- pip-audit, bandit, GitGuardian: pass
- codecov informative-only (not in required checks)

Local verification matched CI: 1105 unit passed, 1115 integration
passed (with --with-postgres), ruff + flake8 + mypy clean,
sphinx build succeeds with 5 pre-existing warnings.

Includes the LAFA wrapper scaffolding (`apps/lafa_container/*`)
that was sitting untracked in the working tree since the early
F-LAFA exploration; kept because it has real value as the seed
for a future functionbench.net submission.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 67.87587% with 735 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.71%. Comparing base (c244d25) to head (ccecf8a).

Files with missing lines Patch % Lines
protea/core/feature_enricher.py 10.09% 276 Missing ⚠️
protea/core/operations/compute_embeddings.py 46.34% 88 Missing ⚠️
protea/core/operations/export_research_dataset.py 42.85% 80 Missing ⚠️
protea/core/evaluation.py 62.08% 69 Missing ⚠️
protea/core/disk_cache.py 53.84% 36 Missing ⚠️
protea/api/middleware/visitor_counter.py 56.41% 34 Missing ⚠️
protea/core/anc2vec_embeddings.py 42.42% 19 Missing ⚠️
protea/core/operations/generate_evaluation_set.py 66.66% 19 Missing ⚠️
protea/core/operations/load_goa_annotations.py 82.10% 17 Missing ⚠️
protea/api/routers/reranker_models.py 86.99% 16 Missing ⚠️
... and 14 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop       #9      +/-   ##
===========================================
- Coverage    82.07%   73.71%   -8.36%     
===========================================
  Files           63       91      +28     
  Lines         5959    10475    +4516     
===========================================
+ Hits          4891     7722    +2831     
- Misses        1068     2753    +1685     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@frapercan frapercan merged commit 89c38f6 into develop May 6, 2026
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant