Conversation
v0.2.0: Scoring engine, CAFA evaluation framework and UI overhaul
Release v0.3.0
…1-6) Contract-first integration with protea-reranker-lab: PROTEA produces parquet datasets + manifest via export operation, consumes trained booster artifacts via RerankerModel + ArtifactStore. Zero runtime cross-imports — only protea_reranker_lab.contracts (pydantic-pure) is shared dev-time. - Phase 3: ArtifactStore abstraction (LocalFs default, MinIO opt-in via docker compose profiles: ["storage"] + [storage] extra). storage config under protea/infrastructure/storage/; PROTEA_STORAGE_* env overrides; tests/test_storage.py. - Phase 4: ExportResearchDatasetOperation + shared parquet_export utility refactored out of train_reranker; operation_catalog entry; routes via protea.jobs queue. - Phase 5: RerankerModel nullable artifact columns (artifact_uri, feature_schema_sha, embedding_config_id FK, ontology_snapshot_id FK, producer_version, producer_git_sha, spec_yaml); alembic migration c517e16da06b with named constraints; scripts/register_reranker.py CLI for run-dir → ORM row promotion. - Phase 6: predict_go_terms reranker integration — strict sha-equality validation at batch-worker level with reranker.schema_mismatch fallback (never crashes inference); reranking.py module (load_reranker, apply_reranker, infer_active_feature_families); 8 new tests covering coordinator validation + batch fallback paths. - Sphinx docs full pass + ADR-007-contract-first-lab-integration. - Thesis LaTeX: new Reranker Promotion Pipeline section in implementation chapter, RerankerModel subsection in data model. - Benchmark router + web UI pages (/benchmark, /experiments), Grafana visitor dashboard, CAFA evaluation pipeline updates, ablation tooling, embedding backend verification script.
Documents the upstream bug, the semantic fix applied in cafaeval-protea (commit cec8ccd), the regression test that gates against future regressions, and the operational steps required to propagate the fix to running worker-evaluations processes. Also records the impact on the 220→230 benchmark PK cells.
…ght 0.3→0.8 - PredictGOTermsPayload and its batch variant now default compute_alignments, compute_taxonomy and compute_reranker_features to True. Prevents future PredictionSets from silently missing features required by alignment-aware scoring configs. - DEFAULT_EVIDENCE_WEIGHTS["IEA"] 0.3 → 0.8: GOA history shows IEA is promoted to experimental codes at a higher rate than ISS/IBA/NAS, so the classic "electronic = weakest" hierarchy underestimates its prior quality. Only affects scorings that consume evidence_weight. - Docstrings refreshed (module, evidence_primary preset, /embeddings/predict). - Fixed 5 TestPredictBatch tests that treated _predict_batch's tuple return as a flat list — pre-existing bug, surfaced while adapting tests to the new defaults.
…ine reranker-train endpoint - embeddings.py: POST /embeddings/predict now publishes to protea.predictions (the dedicated queue already served by worker-predictions-coord) instead of protea.jobs. Brings the API in line with the running split-queue topology. - scoring.py: remove POST /scoring/rerankers/train together with its helpers (_TrainingPair, RerankerTrainRequest) and the model_to_string / reranker_train imports. In-PROTEA training was decoupled into protea-reranker-lab; this finishes the cleanup so no endpoint claims capability that no longer exists.
…verage guard
- compute_score now recognises a sixth signal `neighbor_vote_fraction`
(already persisted on every GOPrediction). Added to DEFAULT_WEIGHTS.
- _PRESET_CONFIGS redesigned so each preset tests a discrete hypothesis:
* embedding_only – cosine of the winning neighbour (baseline)
* vote_fraction – KNN consensus (new)
* alignment_only – NW+SW identity without embedding (new)
* embedding_plus_alignment – embedding refined with alignment
(renamed from alignment_weighted)
* embedding_plus_vote – embedding + consensus (new)
* evidence_veto – evidence as a pure multiplier
(replaces embedding_plus_evidence; fixes
the double-count with evi=0 in weights
and formula=evidence_weighted)
* composite – linear kitchen-sink with voting, evidence
removed from the linear sum
Dropped `evidence_primary` — its dominant signal was the evidence_code of
the single nearest neighbour, which we know is noisy; revisit when the
reranker exposes voter-distribution features.
- ORM docstring for FORMULA_EVIDENCE_WEIGHTED now documents the recommended
usage (set evidence_weight=0 in weights so the multiplier is applied once).
- New `_check_signal_coverage` helper in the router: before /score.tsv and
/metrics stream anything, it queries fill rates of the columns backing
every active signal; if a required column has 0 rows populated, returns
409 with an actionable detail instead of silently degrading the score.
- /score.tsv now exposes a neighbor_vote_fraction column.
- Dropped TestTrainReranker tests — leftover after the prior commit that
removed the inline POST /scoring/rerankers/train endpoint.
…ncher Previously scripts/worker.py launched the stale-job reaper with timeout_seconds=21600 (6h). With a single protea.predictions.batch worker and 23 predict_go_terms coords dispatched in a batch, the later coords routinely waited past 6h in the FIFO and were killed by the reaper even though the pipeline was making progress upstream. Raising the hard timeout to 24h (still paired with the 30-min stall window) leaves enough headroom for a full PLM×K cross-grid to drain while keeping the stall-based liveness check in place. Also adds the cross-scoring launcher used for the 8-PLM × 3-K × 7 scoring-config benchmark. Dedupes dispatch by (prediction_set_id, scoring_config_id) against both evaluation_result rows AND queued/running run_cafa_evaluation jobs — the earlier launcher only compared against persisted results and filled the queue with up to 20× duplicates per pair.
…YAML - /benchmark/matrix now reads PredictionSet.limit_per_entry and includes it in the per-row key, so the same (embedding, stage, cell) tuple no longer collapses K=3/5/10 into one row. Response exposes the full K catalog via `ks: [3, 5, 10]` and accepts `?k=N` as filter. - Frontend types gain `k`/`ks`; BenchmarkPage ships a K selector (chips next to the stage picker) defaulting to the first available K. - benchmark.yaml refreshed for the redesigned scoring presets: preferred_default now starts with embedding_only (widest coverage on partial runs), and labels cover the seven current presets (dropped labels for removed `evidence_primary`, `embedding_plus_evidence`, `alignment_weighted`).
_write_predictions() built the pred_dict passed to compute_score() with distance, identity_nw/sw, evidence_code and taxonomic_distance only. The newly-added neighbor_vote_fraction signal (scoring preset `vote_fraction`, half of `embedding_plus_vote`, 20 % of `composite`) was silently dropped — compute_score saw value=None and excluded it from numerator and denominator. vote_fraction therefore produced zero scores across every (protein, go_id) row, cafaeval returned empty NK/LK/PK buckets, and the three affected presets showed no cells in the benchmark UI. Add `pred.neighbor_vote_fraction` to pred_dict so the scoring engine sees the signal it already has a weight for.
Drop the MCP server scaffolding under protea_mcp/ — 21 files, ~1.1k LOC total. The module was never wired into any startup or deployment path and has no importers in protea/, scripts/, tests/, pyproject.toml, or docs.
…decoupled training LightGBM training moves out of PROTEA into the standalone protea-reranker-lab repo. PROTEA now owns the KNN + feature pipeline, dataset publishing, and inference; the lab owns training and evaluation. Schema * Add Dataset model (alembic c7bab0210568): immutable record of a published reranker dataset with train_uri / eval_uri / manifest_uri, content fingerprints (schema_sha, manifest_sha), dump parameters and producer provenance (producer_version, producer_git_sha). * Link RerankerModel to Dataset via dataset_id (alembic e037f3ae9f58) and add artifact_uri / external_source / spec_yaml / feature_importance columns so boosters trained in the lab can be registered by reference. Storage abstraction * protea/infrastructure/storage/ becomes an ArtifactStore interface with LocalArtifactStore (file://) and MinioArtifactStore (s3://) backends, selected by PROTEA_STORAGE_BACKEND. get_artifact_store() is the single factory used by export_research_dataset and /reranker-models/import. Operations * TrainRerankerOperation / TrainRerankerAutoOperation are unregistered in operation_catalog.py — they survive only as in-process helpers that ExportResearchDatasetOperation drives in dump_only mode. * export_research_dataset uploads train.parquet / eval.parquet / manifest.json via the configured ArtifactStore and inserts a Dataset row keyed by output_name. * protea.core.reranker exposes prepare_dataset / predict / model_from_string for inference; schema_sha validation is load-bearing. HTTP surface * New routers: /datasets (registry + import) and /reranker-models (multipart import, import-by-reference, list/get). * app.py wires both routers + ArtifactStore startup. Scripts * dump_reranker_dataset.py adapted to the new pipeline. * materialize_lab_intervals.py: new helper that creates EvaluationSet + QuerySet rows for every snapshot pair the lab needs. Tests * test_storage covers both backends. * test_datasets_and_reranker_import_smoke exercises the end-to-end publish → pull → import-by-reference flow. * test_reranker / test_train_reranker trimmed to inference-only paths.
EvaluationSet gains a groundtruth_uri column (alembic 76cafcb8d9be) that points to a frozen parquet of the snapshotted ground-truth annotations, materialised once and reused across run_cafa_evaluation invocations instead of being recomputed from the live ORM each time. The parquet lives in the configured artifact store and is content-addressed. * generate_evaluation_set writes the parquet on creation; existing rows are backfilled by scripts/backfill_evaluation_groundtruth.py. * load_evaluation_data_for_set reads the parquet via the artifact store when groundtruth_uri is set, falling back to the legacy ORM path. * run_cafa_evaluation consumes the parquet directly, eliminating the per-run BFS over GOTermRelationship for ground truth resolution. * load_goa_annotations adds canonical-accession filtering improvements measured against the lab's intervals. * annotations router exposes the groundtruth-related endpoints and switches to BenchmarkConfig dependency for IA file resolution. Worker layout * Add a dedicated worker-evaluations process on protea.evaluations so long Fmax / AuPRC runs don't block the general protea.jobs queue. * Add the missing worker-predictions-coord process (protea.predictions coordinator was previously not started by manage.sh). Tests * test_load_goa_annotations covers the canonical-accession filtering. * test_evaluation_parquet_roundtrip locks the serialise/deserialise contract for ground-truth data. * test_knn_streaming_smoke exercises the streaming KNN path that feeds reranker datasets. scripts/overnight_matrix.py: new orchestrator for the 8-PLM canonical benchmark (bootstrap + predict ×8 + eval ×8) — drives long unattended runs against the lab's interval matrix.
Empty Python packages with zero importers anywhere in the tree: * protea/cli/ + protea/cli/commands/ (CLI scaffolding never built) * protea/utils/ * protea/api/schemas/ * protea/api/services/ Orphan scripts (no references in code, docs, scripts, or pyproject.toml): * Phase A/B ablation cohort: cross_scoring_launcher, feed_evals_phaseA, hybrid_picker_eval, queue_evals_when_ready, query_eval_results, run_ablation_predictions, run_ablation_evaluations * vast.ai deploy/sync: deploy_vast.sh, setup_vast.sh, sync_db_vast.sh * Profilers / verifiers: profile_predict_batch, verify_embedding_backends * Stale overnight runs: overnight_v6, overnight_v6_retry, overnight_v7, overnight_v8 Misc: * Drop unused _JOBS_QUEUE constant in embeddings.py. * .gitignore: exclude data/ (with allow-list for data/benchmarks/), artifacts/, results/, var/, docs/_build/, and the entire logs/ dir (was only excluding *.log + logs/pids/, leaving *.pid files visible).
Queue routing fixes (operations.rst, system_overview.rst):
* predict_go_terms publishes to protea.predictions, not protea.jobs.
* run_cafa_evaluation publishes to protea.evaluations.
* train_reranker / train_reranker_auto are no longer queued — they
survive only as unregistered helpers consumed in-process by
ExportResearchDatasetOperation in dump_only mode.
* export_research_dataset publishes to protea.training (serialised).
Lab decoupling narrative (introduction.rst, system_overview.rst,
core.rst, configuration.rst):
* Document the contract-first split — PROTEA produces the parquet
triple + manifest; the lab consumes via the artifact store and
registers boosters back through /reranker-models/import.
* Drop the inline LightGBM training section; replace with the export
+ import flow.
Reference fixes:
* api.rst:282 — /annotations/snapshots/{id}/ia-url is the Information
Accretion file URL (used to weight CAFA Fmax / AuPRC), not the
InterPro Archive.
The lab refactor introduced four behavioural changes that the existing
test suite did not yet reflect — every router and operation that touches
artifact storage, ScoringConfig joins, or persisted EmbeddingConfig
display columns produced collateral failures (54 of 1037).
Root causes addressed:
1. Process-global TTL cache in protea.api.cache
List endpoints (configs, snapshots, prediction sets, protein stats,
showcase summary) memoise their result for 5 minutes. Tests sharing
the cache key returned stale data from earlier sibling tests. Added
an autouse fixture that calls ``invalidate()`` before/after every
test in the affected modules.
2. ArtifactStore now reached for real
run_cafa_evaluation, generate_evaluation_set, and the annotations
router DELETE/download endpoints all call get_artifact_store(...)
inline, which tries to reach MinIO at localhost:9000 in test
environments. Added autouse / per-class fixtures that patch
get_artifact_store + load_settings to a MagicMock so unit tests
exercise just the orchestration path.
3. Endpoint signatures evolved:
- benchmark.list returns a 5th column (prediction_count via
correlated subquery) — added the column to the production query
and a ``prediction_count`` field in the response so the frontend
UI keeps working (apps/web reads p.prediction_count in 4 places).
- benchmark.matrix select returns 4-tuples
``(er, embedding_id, k, scoring_name)`` and a separate 7-tuple
query for eval-set metadata — tests now wire both via
``_dual_execute``.
- showcase.get returns 3-tuples ``(er, cfg, scoring_name)`` —
tests updated to match.
- _stage_of(result, scoring_name) — second positional arg is now
required, "baseline" stage no longer exists (returns None for
evaluations with neither scoring nor reranker).
- _describe_embedding heuristics deleted — display_name, family,
param_count are persisted EmbeddingConfig columns. The dead
TestDescribeEmbedding class was removed; remaining tests set the
columns explicitly on _make_cfg fixtures.
4. RunCafaEvaluationPayload no longer carries artifacts_dir
The op now stages artifacts in a tempfile.TemporaryDirectory and
uploads via store.put — replaced the two artifacts_dir tests with
one that asserts the store is consulted.
Test fixtures updated:
* test_api / _make_app: added missing ``operation_registry`` state.
* test_annotations_router / test_benchmark_router / _make_app: added
missing ``benchmark_config`` state.
* test_api_query_sets: corrected
``test_preserves_full_description`` — _parse_fasta unwraps
``sp|P12345|FOO_HUMAN`` to ``P12345`` (preserves the full header in
the description, not the accession).
Production change (small): ``embeddings.list_prediction_sets`` regained
the prediction_count correlated subquery so the frontend prediction-set
list shows ``"<n> preds"`` again. This restores parity with the lab
refactor's intent — the test mock always expected this column.
Result: 1037 passed, 10 skipped, 0 failed.
Two modules with near-identical names lived next to each other: * protea.core.reranker — feature schema + predict / model_from_string * protea.core.reranking — booster cache + load_reranker / apply_reranker reranking.py had a single importer (predict_go_terms.py). Inlining its public surface into reranker.py removes the naming trap (grep "reranker" no longer turns up two unrelated files) and consolidates everything the inference path needs into one place. * load_reranker, apply_reranker, infer_active_feature_families and the process-local _BOOSTER_CACHE move verbatim into reranker.py. * predict_go_terms.py now imports both feature and loader helpers from the merged module. * reranking.py is removed. No behaviour change. 1037/1037 tests still pass.
TrainRerankerOperation was unregistered after the lab decoupling and contained zero ``self`` references — its 10 methods were already pure functions wrapped in a class only so TrainRerankerAutoOperation could share helpers. Removing the wrapper drops ~250 LOC of dead-but-loaded code (file: 2009 → 1757 LOC). Helpers actually used by TrainRerankerAutoOperation (now module-level functions in the same file): * ``_load_parent_map`` * ``_preload_all_embeddings`` * ``_build_reference_from_cache`` * ``_load_sequences`` * ``_load_taxonomy_ids`` * ``_knn_transfer_and_label`` Helpers with no production callers (deleted): * ``_validate`` — payload + name/duplicate checks * ``_load_go_maps`` — covered by ``_load_parent_map`` * ``_load_reference_per_aspect`` — alternative to _build_reference_from_cache * ``_load_query_embeddings`` — Auto loads its own query embeddings * ``summarize_payload`` — never called once the op left the registry Inside TrainRerankerAutoOperation.execute, every ``self._single._foo(...)`` is rewritten to ``_foo(...)`` and the ``_single`` attribute is gone. Tests adapted: * test_train_reranker.py — kept the payload-validation tests and the two helpers that survived (_load_sequences / _load_taxonomy_ids); dropped the test classes for deleted helpers. * test_knn_streaming_smoke.py — calls _knn_transfer_and_label directly instead of through op._knn_transfer_and_label. Suite stays green at 1028 / 1028.
predict_go_terms.py was a 2383 LOC god file mixing three concerns: I/O
caching of reference embeddings, PCA artifact persistence, and the actual
predict / batch / store operation classes. The first two have nothing
to do with the operations themselves — they are reusable persistence
helpers that happen to be called from the predict path.
* protea/core/disk_cache.py — reference-pool + per-aspect index +
annotation-CSR caches under data/ref_cache/. Exports
``_disk_cache_paths``, ``_aspect_index_path``,
``_anno_disk_cache_paths``, ``_build_anno_csr``,
``_load_anno_csr_from_disk``, ``_save_anno_csr_to_disk``,
``_csr_lookup``, ``_derive_reference_views``,
``_load_from_disk_cache``, ``_save_to_disk_cache``.
* protea/core/pca_cache.py — per-EmbeddingConfig PCA artifact under
protea/artifacts/pca/{id}.npz. Exports ``_pca_state_path``,
``_load_pca_state``, ``_save_pca_state``, ``_load_or_fit_pca_state``.
* predict_go_terms.py imports the helpers from their new homes; no
call sites change name or signature.
* test_predict_go_terms.py imports the disk-cache helpers directly
from ``protea.core.disk_cache`` rather than re-exported via
``predict_go_terms``.
predict_go_terms.py shrinks 2383 → 2129 LOC (~10% smaller, the
remaining bulk is the three operation classes which the next step
splits separately). Suite stays green at 1028 / 1028.
…er_and_label The function used to take 20 keyword arguments — a textbook Long Parameter List smell that made every call site span ~15 lines and made parameter changes risky (refactoring.guru: Bloater · Long Parameter List → Introduce Parameter Object). Two natural clusters extracted as small frozen dataclasses: * ``SequenceContext`` — the four optional per-protein lookups (``query_sequences``, ``ref_sequences``, ``query_tax_ids``, ``ref_tax_ids``) used to drive alignment and taxonomy features. * ``StreamOutput`` — the streaming-parquet I/O config (``output_parquet``, ``chunk_rows``) used in dump_only mode to keep peak memory bounded. ``pivot_go_ids`` stays as its own kwarg — it filters records by go_id and is orthogonal to whether streaming is enabled (used by ``test_pivot_filter_drops_non_pivot_terms`` in non-streaming mode). Result: signature shrinks 20 → 15 named parameters; the body is unchanged (the dataclasses are unpacked into the same local names at the top of the function). All three call sites updated. Suite stays green at 1028 / 1028.
stage classification logic was duplicated in two routers: * benchmark.py defined ``_RERANKER_STAGE``, ``_stage_of()`` and ``_stage_kind()``. * showcase.py inlined the same conditional with a comment "Matches benchmark.py semantics without cross-importing" — an explicit acknowledgement of the duplication. That is the textbook Dispensable · Duplicate Code smell (refactoring.guru → Extract Method, then Move Method to a shared home). Created ``protea/api/stages.py`` exporting ``RERANKER_STAGE`` / ``stage_of`` / ``stage_kind`` / ``StageKind``. * benchmark.py re-imports them under the legacy private aliases (``_RERANKER_STAGE``, ``_stage_of``, ``_stage_kind``) so the rest of the file is unchanged. * showcase.py replaces the 6-line ``if/elif/else`` ladder with a single ``stage = stage_of(er, scoring_name)`` call. No behaviour change. 1028 / 1028 still green.
GO aspect strings appeared as bare literals in 30+ places — both as
single-char codes ("P"/"F"/"C", the wire format in PostgreSQL and
go-basic.obo) and as three-char CAFA codes ("BPO"/"MFO"/"CCO", what
cafaeval and the UI use). That is the textbook Bloater · Primitive
Obsession smell (refactoring.guru → Replace Type Code with Class).
This commit lands the new domain type; consumers migrate in follow-up
commits to keep each diff focused and reviewable.
* protea/core/domain/aspect.py — ``Aspect`` enum with
``BIOLOGICAL_PROCESS`` / ``MOLECULAR_FUNCTION`` / ``CELLULAR_COMPONENT``
members and ``.code`` / ``.cafa`` properties for the two encodings;
``Aspect.from_code()`` / ``from_cafa()`` build instances from the
legacy strings at the boundary.
* protea/core/domain/__init__.py — package marker, deliberately
free of infrastructure imports so the module can be imported from
anywhere in core/ without cycles.
* ASPECT_CODES / ASPECT_CAFA_CODES module constants for callers that
iterate via tuple destructuring.
* tests/test_aspect.py — 21 cases covering the two encodings,
parameterised over all three aspects, with explicit roundtrip and
invalid-input assertions.
Suite: 1028 → 1049 (+21 new).
Six modules used to hardcode the GO aspect tuple ("P", "F", "C") or its
CAFA equivalent ("BPO", "MFO", "CCO") inline. They now import the
canonical constants from ``protea.core.domain.aspect`` so a future
addition or rename happens in one place.
* predict_go_terms.py + train_reranker.py — use ASPECT_CODES
(single-char wire codes).
* showcase.py + benchmark_config.py + annotations.py (download path) —
use ASPECT_CAFA_CODES (three-char CAFA codes).
* run_cafa_evaluation.py — _NS_LABELS now derives from the enum's
``.cafa`` property; _NS_SHORT is the set view of ASPECT_CAFA_CODES.
* train_reranker._ASPECT_NAMES — built as a comprehension over the
enum so the lower-cased CAFA suffixes (bpo/mfo/cco) used in model
names stay in sync with the canonical encodings.
No behaviour change. Suite stays at 1049 / 1049.
The 200-LOC method computed five independent intermediates (metadata collection, Anc2Vec pool, neighbor centroids, tax-voter counters, PCA projection) and then merged them per-row. Each intermediate is now its own private static method so the orchestrator reads as a six-line pipeline instead of a wall of inline computation (refactoring.guru: Bloater · Long Method → Extract Method). New private static methods on PredictGOTermsBatchOperation: * ``_collect_gtids_in_play`` — gather every go_term_id seen as candidate or neighbor annotation. * ``_build_anc2vec_pool`` — materialise the Anc2Vec embedding matrix + has-emb mask + index. * ``_compute_neighbor_centroids`` — per-(q_acc, aspect) centroid + nmat. * ``_compute_tax_voter_counters`` — five per-(q_acc, gtid) dicts that feed the tax_voters_* columns. * ``_compute_pca_projection`` — query embeddings × PCA components. The merge loop stays inline (it has 16 dependencies; extracting it would just trade a long method for an even longer parameter list). No behaviour change — the new helpers are pure rearrangement of the original code. 1049 / 1049 still green.
The 258-LOC method ran the per-aspect KNN, loaded feature-engineering inputs, pre-computed reranker stats, and merged everything into a prediction list. The first two phases are clean independent units; extracting them as helpers turns the orchestrator's prologue from 50 LOC of inline detail into two readable calls (refactoring.guru: Bloater · Long Method → Extract Method). New private methods on PredictGOTermsBatchOperation: * ``_run_knn_per_aspect`` — three independent KNN searches, returns ``(neighbors_by_aspect, all_unique_neighbors)``. * ``_load_feature_engineering_data`` — loads sequences and taxonomy IDs only for the flags that are on; returns the four lookup dicts the per-pair feature builder consumes downstream. The remaining merge phase (per-aspect predictions emission with shared reranker aggregates) stays inline — its 130 LOC carry too many mutually-aliased dicts (vote_count, k_position, vote_min_d, vote_sum_d, go_term_freq, ref_ann_density, pair_features, seen_per_query) for extraction to do anything but trade a long method for an even longer parameter list. No behaviour change. 1049 / 1049 still green.
…odule The seven v6-feature methods on ``PredictGOTermsBatchOperation`` (``_enrich_with_v6_features``, ``_load_go_term_metadata``, ``_collect_gtids_in_play``, ``_build_anc2vec_pool``, ``_compute_neighbor_centroids``, ``_compute_tax_voter_counters``, ``_compute_pca_projection``) used no instance state, ran in sequence, and were called from a single site in ``execute``. That is the textbook Bloater · Large Class smell — they are cohesive enough to become their own module (refactoring.guru → Extract Class). * protea/core/feature_enricher.py — new module exposing two public symbols: ``enrich_v6_features`` (the orchestrator) and ``NEW_V6_FEATURE_KEYS`` (the 25 column names downstream code composes into the bulk-insert schema). All six stage helpers stay module-private. * predict_go_terms.py — drops the seven methods and the two v6-related constants (``_TAX_CLOSE_RELATIONS`` and ``_NEW_V6_FEATURE_KEYS``); imports ``enrich_v6_features`` and re-imports ``NEW_V6_FEATURE_KEYS`` under the legacy private alias so ``_STORE_FLOAT_KEYS`` keeps working without further edits. predict_go_terms.py shrinks 2235 → 1928 LOC (~14% smaller) and ``PredictGOTermsBatchOperation`` is down to 18 attributes. The new module is independently testable — extending the v6 feature set in the future no longer means surgery on the batch operation. No behaviour change. 1049 / 1049 still green.
Pre-existing warnings outside the scope of any specific refactor are fixed in one mechanical pass so the next contributor starts from a clean slate (refactoring.guru: Dispensable · Comments / Dead Code + Composing Methods · Replace Magic Number with Symbolic Constant adjacent — all auto-fixable hygiene). * pyproject.toml — add ``[tool.ruff.lint.per-file-ignores]`` to silence E402 for ``scripts/*.py``: every runner script in there uses the deliberate ``sys.path.insert(0, PROJECT_ROOT)`` pattern before its protea imports so it can be invoked as ``python scripts/foo.py`` without ``poetry install`` first. * protea/api/cache.py — ``Callable`` now imported from ``collections.abc`` (UP035). * protea/api/middleware/visitor_counter.py — replaces ``timezone.utc`` with the ``datetime.UTC`` alias (UP017). * protea/api/routers/scoring.py + protea/infrastructure/orm/models/visitor_event.py + protea/infrastructure/storage/__init__.py — re-sorted import blocks (I001). * protea/api/middleware/visitor_counter.py — drop unused f-string prefix (F541). * tests/test_predict_go_terms.py — drop unused ``op = self._op()`` assignment in ``test_no_reranker_leaves_dicts_untouched`` (F841); the test only needs the payload check. * scripts/overnight_matrix.py — ruff also auto-cleaned a couple of redundant ``else`` branches and import order while we were here. Result: ``poetry run ruff check protea tests scripts`` is now green. 1049 / 1049 tests still pass.
The module was folded into ``protea.core.reranker`` in commit ``ccf8c96``; the dangling ``automodule:: protea.core.reranking`` block made every ``make -C docs html`` build emit a stale-import warning. The narrative the old block carried (``load_reranker`` / ``apply_reranker`` / ``infer_active_feature_families`` semantics) is preserved as a note pointing readers at the merged module, so the documentation page is still self-explanatory for new contributors. Sphinx ``warning`` count drops from 5 to 4. The four remaining are environment-level (numpy multiarray import races with autodoc when Torch is loaded first); they don't block the build.
Hot loops in predict / train build per-row dicts straight from a SQLAlchemy cursor; the DB driver returns each ``qualifier`` and ``evidence_code`` as a fresh Python string even though the value space is tiny (~5-10 distinct GO evidence codes plus ``None``). Without interning, every duplicate costs ~50 B in CPython, so a 5 M-row batch carries ~500 MB of redundant string objects. This is the textbook Flyweight pattern (refactoring.guru): share the immutable intrinsic state across many context objects via a small process-local factory. * ``protea/core/annotation_intern.py`` (new) — exposes ``intern_string`` backed by a setdefault-based pool. The pool tops out at the cardinality of the GO evidence vocabulary (~50 strings in practice); no LRU eviction needed. * ``predict_go_terms._load_annotations_for`` and ``_load_reference_data_per_aspect`` — wrap qualifier and evidence_code with intern_string before stuffing them into the per-row dict. * ``train_reranker._load_reference_per_aspect`` — same wrap. * ``tests/test_annotation_intern.py`` — 8 cases. Suite: 1057 / 1057 (was 1049, +8 new tests).
…OUP BY The Phase-2 commit (f33ad15) added a per-row correlated subquery to return ``prediction_count`` alongside each PredictionSet. Postgres' planner reliably falls into a per-row index probe on the 25M-row ``go_prediction`` table, turning a 100-row list endpoint into a ~30 s/row sequential count — the /evaluation page timed out at 60 s because it called this listing on first paint. Fix: pre-fetch the counts in one ``GROUP BY prediction_set_id`` query (scans the existing ``ix_go_prediction_prediction_set_id`` index in a single pass) and merge them into the response in Python, mirroring the ``list_embedding_configs`` pattern. Wrap the whole thing in the same 5-minute ``cached`` helper so subsequent calls are free. Measured before / after: * /embeddings/prediction-sets 60 000 ms (timeout) → 19 ms cold, 10 ms warm Tests: * test_embeddings_router updated — ``_wire_list_query`` now wires ``session.query.side_effect = [list_query, count_query]`` so the two-query mock matches the new endpoint shape. The third test (``test_annotation_set_without_version``) drops its count assertion because ``count_pairs`` is empty for that case (defaults to 0). Suite stays at 1057 / 1057.
The two store_X operations had identical 30-line implementations of _update_parent_progress (compute_embeddings.py and predict_go_terms.py), differing only in the operation-specific event name passed to emit on parent SUCCEEDED transition. Extracts to protea.core.contracts.parent_progress.update_parent_progress( session, parent_job_id, emit, *, event_name). Both operations now delegate. The DB-level JobEvent row is uniformly named "job.succeeded"; the operation-specific event name only flows through emit() so downstream observers can distinguish which store closed the parent. 5 new tests cover: silent when no row, silent when not last batch, SUCCEEDED transition + emit, race when succeeded returns nothing, event_name passthrough. Suite: 1093 passed, 10 skipped (was 1088 + 5). Part of F0 T0.7 of master plan v3.
Replaces the inheritance-based UniProtHttpMixin (109 LOC of mixed
state and behaviour) with UniProtHttpClient, a composable class.
Operations hold a client instance via composition rather than
inheritance:
before: class InsertProteinsOperation(UniProtHttpMixin, Operation)
after: class InsertProteinsOperation(Operation):
def __init__(self):
self._http_client = UniProtHttpClient()
State is private to the client (.session, .requests, .retries) and
is reset via .reset() at the start of each execute(). The
extract_next_cursor utility moves to a @staticmethod since it has
no instance state. Operations call:
self._http_client.get_with_retries(url, p, emit)
self._http_client.extract_next_cursor(link_header)
self._http_client.requests / .retries (for emit fields)
Migrates two operations (insert_proteins, fetch_uniprot_metadata)
and three test files (test_core, test_insert_proteins,
test_fetch_uniprot_metadata) accordingly. test_core renames the
test class to TestUniProtHttpClient and adds test_reset_clears_counters.
Suite: 1094 passed, 10 skipped (was 1093 + 1).
Part of F0 T0.9 of master plan v3.
Inventario sistemático de constantes módulo-level y defaults hardcodeados en payloads y workers. 31 entradas categorizadas en 5 grupos (QueueTuning, WorkerTuning, OperationTuning, APILimits, ResearchKnobs) más 12 estructurales config-exempt (GAF indices, payload shape constraints, PCA dim). Detecta duplicación que la externalizacion dedupica por construcción: _ANNOTATION_CHUNK_SIZE x3, _STREAM_CHUNK_SIZE x2, _MAX_FASTA_BYTES x2. Base directa para T-CONF.2 (externalización a pydantic Settings) y T-CONF.3 (doc viva autogenerada). Part of F0 T-CONF.1 of master plan v3.
Validates a running stack via /health, /health/ready, POST /jobs (ping), poll until succeeded, and the events log. Does not start or stop services (per feedback_no_restart.md). Exits in <2s against a healthy local stack and is dimensioned for CI use too (PROTEA_API_URL + PROTEA_SMOKE_TIMEOUT_S env overrides). Validated against the live local stack: 1/5 -> 5/5 OK. Part of F0 T0.5 of master plan v3.
Introduces protea.config.tuning with:
- QueueTuning pydantic model: publisher_max_attempts,
publisher_base_delay, oom_max_retries, oom_base_delay,
oom_max_delay. Defaults match the previous module-level
constants exactly.
- TuningSettings root model that composes per-category
sub-models (more groups land in follow-up turns).
- get_tuning() loader cached via lru_cache. Hierarchy:
defaults < protea/config/system.yaml (tuning: section) <
env vars PROTEA_TUNING__QUEUE__PUBLISHER_MAX_ATTEMPTS=20.
- 19 new tests covering defaults, validation, env coercion,
yaml override, env-overrides-yaml, missing yaml section.
Migrates the 5 RabbitMQ publisher/consumer constants to read
from QueueTuning at call time:
- publisher.py: _MAX_ATTEMPTS, _BASE_DELAY removed.
- consumer.py: _OOM_MAX_RETRIES, _OOM_BASE_DELAY,
_OOM_MAX_DELAY removed; replaced by qsettings reads
inside the OOM-handler branch.
Tests: existing publisher and consumer tests pass unchanged
since defaults match prior values. test_queue.py:524 updated
to read from get_tuning() instead of the removed constant.
Suite: 1113 passed, 10 skipped (was 1094 + 19).
Skeleton for the categorisation in docs/CONFIG_INVENTORY.md.
Remaining 4 categories (WorkerTuning, OperationTuning,
APILimits, ResearchKnobs) follow the same pattern in
subsequent T-CONF.2 increments.
Part of F0 T-CONF.2 of master plan v3.
…pers
Renames protea/core/operations/train_reranker.py to
protea/core/training_dump_helpers.py and removes every literal
"train_reranker" snake-case reference from the protea/ subtree.
The helpers (TrainRerankerAutoOperation, TrainRerankerAutoPayload,
StreamOutput, _knn_transfer_and_label, _load_sequences,
_load_taxonomy_ids, _build_reference_from_cache, _preload_all_embeddings,
_load_parent_map, TrainRerankerPayload, SequenceContext) keep their
CamelCase names so existing call sites in tests and
ExportResearchDatasetOperation continue to work via the new path.
Updates:
- module docstring: removes the "two operations" framing (both were
unregistered) and explains the helper's surviving role.
- event strings rebranded train_reranker_auto.* -> dump_helper.*.
- export_research_dataset.py relay updated accordingly so consumers
keep seeing export_research_dataset.* events on the wire.
- constant ``name = "research_dataset_dump_helper"`` (was
"train_reranker_auto"); the class remains unregistered.
- comments in feature_enricher.py, parquet_export.py,
generate_evaluation_set.py, predict_go_terms.py and
scripts/materialize_lab_intervals.py: rephrased to "the dump helper".
- tests/test_train_reranker.py renamed to test_training_dump_helpers.py
(+ import path updated). Same for test_knn_streaming_smoke.py
imports + mock target.
- test_datasets_and_reranker_import_smoke.py asserts the new name
is also unregistered; the historical asserts on the old names
are gone since "train_reranker" no longer exists in the codebase.
AC verification: ``grep -rn "train_reranker" protea/`` returns 0
hits. The same grep over the whole repo (including tests/, scripts/,
docs/) is also 0 except for one-line *.md docs that document the
historical rename and stay as-is on purpose.
Suite: 1113 passed, 10 skipped (unchanged).
Part of F0 T0.6 of master plan v3.
…rPayload Continues T0.6 (commit 527e51c) by removing TrainRerankerPayload, the single-pair payload class that no production code referenced. Used to live in train_reranker.py for an Operation that was retired when LightGBM training moved to protea-reranker-lab; the class hung on because tests in test_training_dump_helpers.py exercised it. - Class definition deleted. - Helper signature ``_knn_transfer_and_label`` simplified from ``p: TrainRerankerPayload | TrainRerankerAutoPayload`` to ``p: TrainRerankerAutoPayload``. - Cross-reference comments inlined directly into TrainRerankerAuto Payload field docstrings (KNN backend rationale, ancestor expansion rules, embedding PCA explanation). - 15 tests in TestTrainRerankerPayload removed; only the helper tests (_load_sequences, _load_taxonomy_ids) remain. - Header docstring trimmed to reflect new scope. LOC reduction: training_dump_helpers.py from 1914 to 1860 LOC. Suite: 1098 passed, 10 skipped (was 1113 - 15 dead payload tests). The deeper inline of TrainRerankerAutoOperation into ExportResearchDatasetOperation is deferred to F2 once the feature registry is in place; doing it now would balloon export_research_ dataset by 600 LOC of execute() body for marginal gain. Part of F0 T0.6 of master plan v3.
Second category of the externalised tuning settings. Migrates 7
hardcoded constants from the WorkerTuning row in CONFIG_INVENTORY:
- db_pool_size (engine.py:12) 20
- db_pool_max_overflow (engine.py:13) 40
- db_pool_recycle_seconds (engine.py:14) 3600
- model_cache_max (compute_embeddings) 1
- ref_cache_max (predict_go_terms) 1
- reaper_main_timeout_seconds (worker) 86400 (was incorrectly
21600 in the inventory; fixed to match scripts/worker.py)
- reaper_default_timeout_seconds 3600
- reaper_stall_seconds 1800
- api_cache_default_ttl_seconds 300.0
Behavioural:
- infrastructure/database/engine.py: build_engine() reads pool
settings from get_tuning().worker.
- core/operations/compute_embeddings.py: removes _MODEL_CACHE_MAX
constant; reads dynamically inside _get_or_load_model.
- core/operations/predict_go_terms.py: removes _REF_CACHE_MAX
constant; reads dynamically before evicting.
- api/cache.py: removes _DEFAULT_TTL constant; exposes
_default_ttl() function for callers that want the resolved
default. The constant was never imported by anyone; it only
appeared in __all__.
- scripts/worker.py: reaper mode reads reaper_main_timeout_seconds
and reaper_stall_seconds from settings, configurable via
PROTEA_TUNING__WORKER__REAPER_MAIN_TIMEOUT_SECONDS and
PROTEA_TUNING__WORKER__REAPER_STALL_SECONDS.
8 new tests in test_tuning.py: WorkerTuning defaults (pool,
cache, reaper), validation (pool>0, reaper>=300), TuningSettings
compose with worker, env override of db_pool_size.
Suite: 1106 passed, 10 skipped (was 1098 + 8).
Two of five categories migrated. OperationTuning, APILimits,
ResearchKnobs follow.
Part of F0 T-CONF.2 of master plan v3.
Third tuning category. Migrates 4 module-level chunk-size constants
that were duplicated across feature_enricher, knn_search,
training_dump_helpers, and predict_go_terms.
OperationTuning fields:
- annotation_chunk_size (10_000) feature_enricher,
training_dump_helpers, predict_go_terms (5 helper sites total).
- stream_chunk_size (2_000) training_dump_helpers
(_preload_all_embeddings) and predict_go_terms (_load_query_embeddings).
- store_chunk_size (10_000) predict_go_terms (publishing
predictions to protea.predictions.write).
- numpy_query_chunk (500) knn_search._search_numpy chunked
matrix multiplication (caps the n_queries x n_refs distance matrix
peak around 1 GB for default values).
Removes 8 module-level constants from 4 files; resolves dynamically
inside the helpers via get_tuning().operation.X. Eliminates the
triplicate _ANNOTATION_CHUNK_SIZE / duplicate _STREAM_CHUNK_SIZE
that the inventory flagged in CONFIG_INVENTORY.md §C.
HTTP retry policy / timeouts in pydantic payloads
(InsertProteinsPayload, LoadGoaAnnotationsPayload, etc.) intentionally
stay where they are. Those are caller-controlled per job, not infra.
3 new tests in test_tuning.py: OperationTuning defaults, validation
floors, env override of annotation_chunk_size.
Suite: 1109 passed, 10 skipped (was 1106 + 3).
Three of five categories migrated. APILimits and ResearchKnobs follow.
Part of F0 T-CONF.2 of master plan v3.
Fourth tuning category. Migrates 4 hardcoded boundary limits from
the FastAPI router layer.
APILimits fields:
- max_fasta_bytes (50 MB) duplicated as _MAX_FASTA_BYTES
in api/routers/annotate.py and api/routers/query_sets.py;
the externalisation dedupes by construction.
- max_comment_length (500) api/routers/support.py
- recent_limit (20) api/routers/support.py
- page_limit (100) api/routers/support.py
Behavioural:
- annotate.py + query_sets.py: read max_fasta_bytes from
get_tuning().api at request time. Error message now formats
the configured limit instead of a literal "50 MB" so an
operator-set override is reflected back to clients.
- support.py: the SupportCreate pydantic Field's static
max_length= moves to a field_validator that resolves
max_comment_length dynamically. The /support GET reads
page_limit and recent_limit from settings.
3 new tests in test_tuning.py: APILimits defaults, validation
floors, env override of max_fasta_bytes.
Suite: 1112 passed, 10 skipped (was 1109 + 3).
Four of five categories migrated. Only ResearchKnobs (mostly
config-exempt: PCA dim and N_THRESHOLDS sweep are research-side
methodology constants documented in CONFIG_INVENTORY §E) left.
Part of F0 T-CONF.2 of master plan v3.
Documents the four migrated TuningSettings categories (Queue, Worker, Operation, APILimits) with field/default/purpose tables, YAML and env-override examples, and config-exempt category callouts (PCA dim, N_THRESHOLDS, GAF indices). Lives inside the existing appendix/configuration.rst so the reference is a single document. Part of F0 T-CONF.3 of master plan v3.
Adds protea-contracts, protea-method, protea-sources, protea-runners and protea-backends as develop=true path-deps under [tool.poetry.group.plugins.dependencies]. Install with poetry install --with plugins. End-to-end discovery verified: importlib.metadata.entry_points(group='protea.sources|runners|backends') resolves correctly from inside PROTEA's venv: 3 sources (goa/quickgo/uniprot), 3 runners (baseline/knn/lightgbm), 4 backends (ankh/esm/esm3c/t5). Suite still 1112 passed, 10 skipped. Part of F0 T0.15 of master plan v3.
Adds .github/workflows/security.yml with two jobs:
- pip-audit: scans installed dependencies against the OSV database.
Non-blocking in F0 (the existing surface has 22 known CVEs, all
in third-party transitive deps; transformers 4.48.x dominates
with 11 CVEs that need a coordinated bump). Master plan v3
F-OPS T-OPS.7 will flip this to fail on severity HIGH.
- bandit: security static analysis against protea/. Runs in HIGH
severity + HIGH confidence mode at F0 (zero findings now);
will tighten in F-OPS.
Triggers: push, PR, and a weekly cron (Mon 06:00 UTC) so freshly
disclosed CVEs surface even when no PR has landed.
Inline fixes for the two bandit B324 findings (weak MD5 hash):
- protea/core/reranker.py: cache key tag in _load_artifact_to_disk.
- protea/infrastructure/orm/models/sequence/sequence.py: sequence
dedup key.
Both pass usedforsecurity=False (Python 3.9+ flag) to declare intent;
collision resistance is irrelevant in either context (cache key tag
and dedup hash, not security primitives).
Bandit config in pyproject.toml [tool.bandit]: excludes tests/ and
the lab archeology dump script; skips B404/B603/B101 (subprocess
imports + assert usage) which are project-level acceptable.
Suite: 1112 passed, 10 skipped (unchanged).
Part of F0 T0.4 of master plan v3.
Removes the duplicated definitions of feature schema, payload classes
and ProteaPayload base from PROTEA. They now live exclusively in
``protea-contracts`` (v0.1.0). PROTEA modules re-export the names from
their original module locations so existing imports keep working;
new code should import from ``protea_contracts`` directly.
Files touched:
- protea/core/reranker.py
- Drops 73 lines of NUMERIC_FEATURES / EMBEDDING_PCA_DIM /
CATEGORICAL_FEATURES / ALL_FEATURES / LABEL_COLUMN definitions.
- Re-exports the same names from protea_contracts.
- fit_embedding_pca remains local (it's logic, not contract).
- protea/core/contracts/operation.py
- Drops the 11-line ProteaPayload class definition.
- Re-exports it from protea_contracts.
- Drops the now-unused ``BaseModel, ConfigDict`` import.
- protea/core/operations/predict_go_terms.py
- Drops 119 lines of PredictGOTermsPayload /
PredictGOTermsBatchPayload / StorePredictionsPayload classes.
- Re-exports them from protea_contracts.
- Drops now-unused imports (Annotated, Field, field_validator) and
the local PositiveInt alias.
Net diff: -218 / +30 in PROTEA. Logic preserved exactly: every
existing call site (15 files imported one of these names) keeps
working through the re-exports.
Suite: 1112 passed, 10 skipped (unchanged). The protea-contracts
suite (71 passed, cov 95%) covers the moved definitions; PROTEA's
existing tests cover the integration.
Part of F1 T1.5 of master plan v3.
Pins the contract between protea_contracts (canonical) and PROTEA's re-exports / future registry. 14 tests in 4 classes: - TestReexportIdentity (7 tests, active): every constant PROTEA still re-exports must be the same object as protea_contracts (ALL_FEATURES, NUMERIC, CATEGORICAL, EMBEDDING_PCA_DIM, LABEL_COLUMN, ProteaPayload, the 3 predict payloads). Hard guarantee that 'from protea.core.reranker import ALL_FEATURES' will not silently diverge from 'from protea_contracts import ALL_FEATURES'. - TestShaConsistency (2 tests, active): compute_schema_sha produces the same digest regardless of caller path; pinned to the golden 145592ed186c so PROTEA CI fails before the booster cache invalidates. - TestFeatureFamilyCoverage (3 tests, active): every family member lives in ALL_FEATURES; emb_pca family size matches EMBEDDING_PCA_DIM; canonical naming. - TestRegistryCoversContracts (2 tests, skipped): activates automatically when F2B.1 ships protea/core/features/registry.py; asserts set(REGISTRY.names()) == set(ALL_FEATURES) and family map equality. Suite: 1124 passed, 12 skipped (was 1112 + 12 active + 2 dormant). Part of F1 T1.7 of master plan v3.
…ference Two boundary validations against the canonical protea_contracts schema: Export side (parquet_export.py): before writing train/eval parquets, compute compute_schema_sha([c for c in shard.columns if c in ALL_FEATURES]) and compare to compute_schema_sha(ALL_FEATURES). Mismatch raises ValueError with the missing/extras list, instead of silently shipping a partial dump that LightGBM training would choke on. Pure invariant check; the legacy schema_sha hash in the manifest is unchanged (T1.6 of master plan v3 owns the migration to schema_sha_v2). Inference side (predict_go_terms._apply_reranker_if_aligned): switches the import of compute_feature_schema_sha from protea_reranker_lab.contracts to protea_contracts. Functions are byte-identical so behaviour is preserved; the canonical source is now protea_contracts (single source of truth). 5 new tests in test_parquet_export_boundary.py: full columns pass, missing column in train raises, missing column in eval raises, typo feature name raises, empty eval shard skipped. Suite: 1129 passed, 12 skipped (was 1124 + 5). Part of F1 T1.8 of master plan v3.
Replaces the hardcoded if/elif chain in compute_embeddings._load_model
with discovery via the protea.backends entry_points group. The four
backend plugins (esm, t5, ankh, esm3c) shipped by protea-backends are
now resolved dynamically; adding a new backend is a pyproject entry
plus a class — no edits to compute_embeddings required.
Scope of this refactor:
- Module-level _load_model now calls _resolve_backend(model_backend)
+ plugin.load_model(model_name, device, emit). The (model,
tokenizer) return shape stays exactly the same (tokenizer is
None for ESM-C, matching the legacy path).
- The legacy "auto" alias maps to "esm" exactly as before.
- Plugin discovery is cached in module-level _BACKEND_PLUGINS and
populated on first call (lazy: avoids running entry_points scan
at import time).
- Plugin name attribute is asserted to match its entry_point name
on first load. Silent drift would yield confusing "unknown
backend" errors; we'd rather fail loud.
Out of scope (deferred to F2C):
- _embed_batch dispatch keeps the legacy if/elif chain calling
_embed_esm / _embed_t5 / _embed_ankh / _embed_esm3c. The
plugin's embed_batch returns a flat (batch_size, hidden_dim)
ndarray, while the legacy _embed_* return list[list[
ChunkEmbedding]] with full chunk + layer + pooling support. The
contract extension is a separate task; this commit only swaps
the load path where the API signatures already line up.
- Cov gate bump in protea-backends CI: deferred until an
integration runner installs an extra and exercises the plugin's
load_model. Bumping the gate to 25% on the strength of unit
tests alone would just be theatre.
Tests:
- tests/test_compute_embeddings_backend_dispatch.py: 7 new tests
covering plugin discovery, entry_point/name parity, "auto" alias,
unknown-backend error path, _load_model emit/delegate behaviour,
cache identity, and re-import semantics.
Suite: PROTEA 1136 passed, 12 skipped (was 1129 / 12; +7 new).
Plugin discovery confirmed working from the PROTEA venv:
>>> from importlib.metadata import entry_points
>>> {ep.name for ep in entry_points(group="protea.backends")}
{'ankh', 'esm', 'esm3c', 't5'}
Pairs with the protea-backends 011b27d commit declaring per-backend
optional dependency extras.
Part of F2A.5 of master plan v3.
Map every strategic decision in master plan v3 (2026-05-05) to a navigable ADR stub under docs/source/adr/. Uniform format per file: status (Accepted, Pending, Deferred or Obsolete), date, phase introduced, gate (if pending), context (2-3 sentences), decision (1-2 sentences), consequences (1-2 bullets) and resolution. Index reorganised into two layers: - Implementation decisions (001-008): runtime, ORM, queue topology and similar choices that surfaced while building. - Strategic decisions (D1-D30): plan-level decisions from the master plan. Each row in the strategic table carries a status badge so the open work visible at a glance. Eight gates pending human action explicitly listed: D4 API versioning (gate F4), D6 authentication (gate F5), D7 observability stack (gate F-OPS), D10 schema_sha v2 migration (T1.6 gate D10), D25 HPC mode (gate F-OPS), D27 image registry (gate F-OPS), D28 secrets management (gate F-OPS), D29 release pipeline (gate F-OPS). Sphinx build clean: build succeeded, 4 pre-existing warnings (none from the new files).
First Level-1 plugin migration: ``LoadGOAAnnotationsOperation``
delegates HTTP + gzip + GAF parsing to
``protea_sources.goa.GoaSource.stream``, becoming a thin persistence
adapter that owns DB filtering, GO-term resolution, dedup, and
``pg_insert``. Pairs with ``protea-sources/d1d60f6``
(``GoaSource.stream`` real implementation) and
``protea-contracts/20987a5`` (``GoaStreamPayload`` +
``GoaAnnotationRecord``).
What moved out:
* ``_stream_gaf`` body (~30 LOC of HTTP/gzip/parsing): now a
one-liner that constructs a typed ``GoaStreamPayload`` and yields
from ``goa_plugin.stream``.
* Eight ``_IDX_*`` GAF column constants: now in protea-sources.
* ``import gzip``, ``import io``, ``import requests``: removed —
the plugin owns the network and decode layers.
What stayed:
* ``_load_accessions`` (canonical-accession universe).
* ``_load_go_term_map`` (GO-id → term-id).
* ``_store_buffer`` (dedup + pg_insert with on_conflict_do_nothing).
Now consumes ``GoaAnnotationRecord`` via attribute access
(``rec.accession``) instead of dict access (``rec["accession"]``).
* ``_maybe_enqueue_atomic_eval`` (auto-eval child job).
* Operation lifecycle, ``LoadGOAAnnotationsPayload`` validation,
``OperationResult`` shaping.
Tests updated, not extended:
* ``_make_record`` test fixture now constructs ``GoaAnnotationRecord``
instances; ``with_from=""`` becomes ``with_from=None`` (semantically
identical, the old code converted "" → None at insert time).
* ``TestStreamGaf`` patches now target ``protea_sources.goa.requests
.get`` instead of the operation-local ``requests.get``.
Assertions migrated from dict access to attribute access.
* ``rec.copy()`` → ``rec.model_copy()`` (pydantic v2 deprecation).
Behavioural parity:
* ``_store_buffer`` still does ``rec.accession.strip()`` for the
DB-lookup field (parser preserves raw GAF columns; strip happens
where the lookup needs it). Same observable behaviour as before.
* Empty optional fields ("" → None) handled by the parser at the
boundary, not by the operation. No DB-insert diff.
* Dedup key ``(set_id, accession, go_term_id, evidence_code)``,
``on_conflict_do_nothing(constraint=...)`` constraint, page-level
commit policy: all preserved verbatim.
Suite: 1136 passed, 12 skipped (= unchanged from master). The 54
``test_load_goa_annotations`` cases all pass on the new boundary.
Why Level 1 only (the design discipline):
protea-sources is a leaf C-stack package; importing
``protea.infrastructure.orm.*`` would invert the dependency
direction. Level 1 (HTTP + parsing) cuts cleanly along the
SQLAlchemy boundary; Level 2 (move the operation entirely) waits for
F2C ORM extraction. See ``~/Thesis/f2a6_real_migration_design.md``.
Pattern locked for the remaining migrations (QuickGO, UniProt
FASTA, UniProt metadata): typed ``<Name>StreamPayload`` +
``<Name>Record`` in protea-contracts, ``<Name>Source.stream`` in
protea-sources, operation refactor here.
Part of F2A.6-real migration plan (master plan v3).
…s plugin
Second Level-1 plugin migration. LoadQuickGOAnnotationsOperation
delegates HTTP + TSV streaming + ECO mapping to the protea-sources
QuickGoSource plugin, becoming a thin persistence adapter. Pairs
with protea-sources/f37dfce (real plugin) and protea-contracts/
c5433ed (typed payloads + record).
What moved out:
* _stream_quickgo body (~70 LOC of batching, HTTP, TSV parsing):
now a one-liner constructing a typed QuickGoStreamPayload and
yielding from quickgo_plugin.stream.
* _fetch_quickgo_page method: deleted entirely. Plugin owns the
per-batch HTTP fetch.
* _load_eco_mapping body (~13 LOC of HTTP + line parsing): now
one call to quickgo_plugin.fetch_eco_mapping(EcoMappingPayload(
url=...)). Operation keeps the wrapper for the empty-URL short
circuit (returns {} when eco_mapping_url is None).
* import io, import requests: removed from the operation module.
What stayed:
* _load_accessions (canonical + protein accession universes).
* _load_go_term_map (GO-id -> term-id).
* _store_buffer (dedup + ECO map application + pg_insert with
on_conflict_do_nothing). Now consumes QuickGoAnnotationRecord
via attribute access (rec.accession) instead of dict access
(rec["GENE PRODUCT ID"]).
* Operation lifecycle, LoadQuickGOAnnotationsPayload validation
(which keeps page_size, total_limit, commit_every_page knobs
that are operation-side concerns and don't belong in the
plugin payload).
Tests updated:
* _record(...) helper builds QuickGoAnnotationRecord instances
from kwargs; replaces the verbose dict-literal _QUICKGO_ROWS.
* TestStoreBuffer (~9 tests) consumes records, not dicts. The
test_empty_eco_id_becomes_none test now passes eco_id=None
directly (parser-side empty-cell handling). The
test_empty_accession_skipped test was renamed to
test_unknown_accession_skipped: whitespace handling moved to
the parser in protea-sources, so the operation only sees
accessions that don't match valid_accessions.
* TestLoadEcoMapping (~5 tests): patches and event names swapped
to source.quickgo.eco_mapping_*. The empty-URL short circuit
test stayed — operation-side behaviour, not plugin-side.
* TestStreamQuickgo + TestExecute: patches swapped to
protea_sources.quickgo.requests.get. Batching event name swap
to source.quickgo.batching.
* TestFetchQuickgoPage class deleted entirely (~135 LOC). The
tests were exercising the parser through HTTP mocks; the
parser is now in protea-sources where parse_quickgo_row and
parse_quickgo_tsv have full unit tests. Net -8 tests in PROTEA,
+9 unit tests in protea-sources for a strictly better surface.
Behavioural parity:
* Empty cells -> None at parser boundary (matches old _store_buffer
"or None" handling at insert time). No DB-insert diff.
* Dedup, on_conflict_do_nothing constraint, ECO map application
via eco_map.get(rec.eco_id, rec.eco_id): preserved verbatim.
* Multi-batch URL construction: identical (plugin's
gene_product_batch_size matches operation's payload field).
* Event names changed (load_quickgo_annotations.* -> source.quickgo.*
for plugin-emitted events). Operation-side events unchanged.
Downstream consumers reading JobEvent rows must filter on the
new prefix; flagged here for the operator changelog.
Suite: 1128 passed, 12 skipped (= -8 from master because the
8 redundant TestFetchQuickgoPage cases were deleted, not regressed).
The 37 test_load_quickgo_annotations cases all pass on the new
boundary.
Part of F2A.6-real migration plan, step 2 of 4. Pattern locked for
the remaining UniProt FASTA + UniProt metadata migrations.
Two fixes plus an expansion of the documented module surface.
Removes:
- The autodoc directive for protea.core.operations.train_reranker,
orphaned since T0.6 removed the file. Sphinx no longer reports a
ModuleNotFoundError during build.
- A broken :doc: cross-reference to a non-existent
/refactoring/design-patterns/flyweight page in the
protea.core.annotation_intern module docstring. Replaced with plain
text "Flyweight-style".
Adds documentation for modules introduced or moved during F0 / F1:
- protea.core.contracts.parent_progress (T0.7 dedup helper).
- protea.core.retry (T0.3 retry middleware).
- protea.core.operation_catalog (singleton OperationRegistry
builder).
- protea.core.training_dump_helpers (T0.6 home of helpers that
survived the train_reranker.py deletion; reused by
ExportResearchDatasetOperation).
- An "Internal helpers" section covering protea.core.{
anc2vec_embeddings, annotation_intern, disk_cache,
feature_enricher, pca_cache} for completeness.
Build verification: poetry run sphinx-build returns
"build succeeded, 5 warnings". Of those, 4 are pre-existing
environmental failures (numpy._core.multiarray import error during
autodoc of modules that import numpy, plus the cosmetic
_static directory missing). The previously-introduced train_reranker
and flyweight warnings are gone.
Doc-T3 of the documentation lane.
…utils
Replaces the inline implementations in Protein.parse_isoform and
Sequence.compute_hash with one-line forwarders to
``protea_contracts.bio_utils``. The canonical authority moves to
the contracts package so the upcoming UniProt FASTA parser in
``protea-sources`` can reuse the helpers without inverting the
C-stack dependency direction.
Files:
* protea/infrastructure/orm/models/protein/protein.py: the
isoform-splitting body becomes a single delegated call; module
docstring on the wrapper explains the move so future grep on
"parse_isoform" lands callers in the right place.
* protea/infrastructure/orm/models/sequence/sequence.py: the
MD5 body becomes a delegated call; the now-unused ``import
hashlib`` is removed.
Behavioural parity preserved bit-for-bit:
* parse_isoform("P12345") -> ("P12345", True, None) — unchanged.
* parse_isoform("P12345-2") -> ("P12345", False, 2) — unchanged.
* compute_hash("MKTAYIAK") -> identical 32-hex MD5 — unchanged.
Existing call sites in protea/api/routers/query_sets.py,
protea/api/routers/annotate.py,
protea/core/operations/fetch_uniprot_metadata.py,
protea/core/operations/insert_proteins.py keep working unchanged
because the public API on the ORM classes is preserved (Protein
.parse_isoform, Sequence.compute_hash). They will be migrated to
direct imports from protea_contracts as their respective files get
refactored in F2A.6-real subsequent steps.
Suite: PROTEA 1128 passed, 12 skipped (= unchanged from turn 27).
The 6 callsites in tests (test_insert_proteins, test_integration)
exercise the wrappers transparently.
Pairs with protea-contracts/18e92af which adds the canonical
``parse_isoform`` and ``compute_sequence_hash`` plus 12 unit
tests in protea_contracts/bio_utils.py.
Part of F2A.6-real migration plan (D-MIGR-04), prerequisite for
step 3 (UniProt FASTA migration).
Adds docs/source/plugin-authoring.rst as the canonical entry point
for plugin authors, and links it from the main toctree in
docs/source/index.rst.
Scope:
- Architecture overview in one paragraph (protea-core platform plus
four sibling plugin layers).
- Table of the four layers (annotation sources, embedding backends,
experiment runners, feature registry) with their ABC, repository
and entry-point group.
- Decision tree for picking the right ABC depending on what the
author wants to add.
- Anatomy of a plugin in 5 steps that apply uniformly across the
three entry-point-driven layers, plus the in-process pattern for
feature registry contributions.
- Pointers to the per-repo contributing guides shipped on the doc
lane: protea-backends/docs (Doc-T1) and
protea-contracts/docs (Doc-T2). The protea-sources and
protea-runners guides land in Doc-T8.
- Discovery snippet (importlib.metadata.entry_points) that mirrors
what protea-core does at startup, including the name-vs-entry-point
sanity check.
- Schema invariants and reproducibility section linking ADR D10
(schema_sha v2 migration) and the float16 embedding contract.
- Roadmap section pointing to upcoming master-plan v3 phases that
affect plugin authors (F2A.7 lightgbm absorption, F2B feature
registry wiring, F2C protea-method extraction, F9 post-defense
granularity decision).
Build verification: poetry run sphinx-build returns
"build succeeded, 5 warnings" (same 5 pre-existing warnings as
before; the new page introduces zero warnings).
Doc-T7 of the documentation lane. Implements F7.6 of master plan v3
("Plugin author guide").
Third Level-1 plugin migration. InsertProteinsOperation delegates
HTTP retries + cursor pagination + gzip decoding + FASTA parsing to
the protea-sources UniProtSource plugin, becoming a thin
persistence adapter. Pairs with protea-sources/fadbd6b (real
UniProtSource.stream_fasta + _http.py) and protea-contracts/
f1bf7b5 (typed payload + record).
What moved out:
* _fetch_fasta_pages body: now a one-liner constructing a typed
UniProtFastaStreamPayload and yielding from
uniprot_plugin.stream_fasta. Renamed to _stream_fasta to
reflect per-record yield (D-MIGR-01).
* _decode_response method (gzip / utf-8 wrapper): plugin owns it.
* _parse_fasta + _parse_header methods (~70 LOC of FASTA parsing,
OS/OX/GN regex, isoform splitting via Protein.parse_isoform):
plugin owns it. The OS/OX/GN regex constants and isoform logic
move with them.
* UNIPROT_SEARCH_URL constant: now in
UniProtFastaStreamPayload.base_url default.
* Removed imports: gzip, re, requests, Response, BytesIO, quote,
UniProtHttpClient (legacy PROTEA-side copy stays in
protea/core/utils.py until step 4 deletes it as the last caller
fetch_uniprot_metadata also migrates).
* Removed state: self._http_client, self._total_results.
What stayed:
* Operation lifecycle, InsertProteinsPayload validation, batching
policy (page_size buffer flush), session.add_all + flush against
Protein + Sequence tables, conservative-update logic for
existing proteins.
* _store_records (full upsert path) — now consumes
UniProtProteinRecord via attribute access (rec.accession,
rec.canonical_accession, etc.) instead of dict access.
* _load_existing_sequences + _load_existing_proteins (DB lookup
helpers).
Behavioural diffs surfaced:
* pages now counts DB-side buffer flushes, not HTTP pages. The
HTTP-page count is the plugin's internal concern (visible via
source.uniprot_fasta.fetch_page_done events). pages is more
useful to monitor DB throughput; the rename in semantics is
captured in the relevant test docstrings.
* X-Total-Results header capture (op._total_results) is removed.
The header was nice-to-have for progress reporting and not
load-bearing for correctness; progress totals now flow only
when the user sets total_limit. Operator changelog flagged.
* Plugin-emitted events use the source.uniprot_fasta.* prefix;
operation-emitted events keep insert_proteins.* prefix.
Tests refactored:
* TestParseFasta class deleted (~95 LOC, 11 tests). Parser is now
in protea-sources where parse_fasta_header + parse_fasta_text
have full unit coverage.
* TestDecodeResponse class deleted (~25 LOC, 2 tests). Decode is
in the plugin's _decode_response_body helper, exercised via the
gzip stream wiring tests in protea-sources.
* test_total_results_from_header + test_total_results_invalid
deleted (2 tests). The operation no longer captures X-Total-
Results.
* TestStoreRecords: dict-literal record fixtures replaced with
a _make_record(...) helper that builds UniProtProteinRecord
via the bio_utils helpers (same MD5 hash, same canonical
splitting). Two test bodies shrink ~17 LOC each.
* TestInsertProteinsOperationExecute: patch target swap from
``op._http_client.session.get`` to
``op._uniprot_plugin._client.session.get`` across 16 sites.
test_empty_page_continues renamed to test_empty_page_does_not_flush
with the new pages=0 expectation. test_progress_emission_with_total
renamed to test_progress_emission_with_total_limit; uses
page_size=1 + total_limit=100 to force a flush + carry the
progress total.
* Net -16 PROTEA tests (parser+decode+total_results all moved or
deleted), corresponding +56 in protea-sources for a strictly
better surface.
Suite: PROTEA 1112 passed, 12 skipped (was 1128; -16 from
deletions). Ruff full + mypy strict green on touched files.
Part of F2A.6-real migration plan, step 3 (b) of 4. The legacy
UniProtHttpClient in protea/core/utils.py becomes dead code once
step 4 (UniProt metadata migration) lands; deletion deferred to
that turn.
Closes F2A.6-real with the fourth Level-1 plugin migration plus
the dead-code cleanup that was waiting on it.
Migration:
* FetchUniProtMetadataOperation delegates HTTP retries + cursor
pagination + gzip decoding + TSV parsing to the protea-sources
UniProtSource.stream_metadata plugin method (added in
protea-sources/2a6ef55). Operation becomes a thin persistence
adapter focused on FIELD_MAP DB upsert and update_protein_core
side effects.
* Removed: _fetch_tsv_pages (~70 LOC of HTTP + URL construction),
_decode_response (gzip wrapper), _parse_tsv (csv.DictReader).
All three live in the plugin now.
* Removed state: self._http_client, self._total_results.
* UNIPROT_FIELDS constant kept on the operation class — the field
list is a persistence concern (which DB columns get
populated). Same field set passed to the plugin via
UniProtMetadataStreamPayload.fields.
* Imports trimmed: csv, gzip, BytesIO, StringIO, quote, requests,
Response, UniProtHttpClient. Replaced with protea_contracts
imports for UniProtMetadataRecord, UniProtMetadataStreamPayload,
parse_isoform.
* _store_rows consumes UniProtMetadataRecord via attribute access
(rec.accession, rec.raw_fields) instead of dict access. Field
semantics preserved bit-for-bit: same FIELD_MAP application,
same update_protein_core conservative-update logic.
Behavioural diffs (same as the FASTA migration):
* pages now counts DB-side buffer flushes (not HTTP pages).
* X-Total-Results header capture removed. _progress_total flows
only when total_limit is set.
Dead-code cleanup:
* protea/core/utils.py: deleted UniProtHttpClient (135 LOC) plus
its _HttpPayload Protocol and the now-unused random/time/
requests/Response imports. The file shrinks to just
chunks() + utcnow() helpers (~13 LOC).
* tests/test_core.py: deleted TestUniProtHttpClient class
(~75 LOC, 9 tests) and TestFetchUniProtMetadataExecute class
(~290 LOC, 10 tests). The first migrated to
protea-sources/tests/test_uniprot.py::TestUniProtRetryClient
(5 tests, retries + Retry-After + max-retries + network errors)
plus TestExtractNextCursor (4 tests). The second migrated
partially to test_fetch_uniprot_metadata.py (which keeps the
14 execute-flow tests against the new plugin-based dispatch)
and partially to protea-sources/tests/test_uniprot.py
(TestParseMetadataTsv covers the parser directly).
* tests/test_fetch_uniprot_metadata.py: deleted TestParseTsv
class (4 tests, ~35 LOC) — parser now in protea-sources.
16 patch sites swapped from op._http_client.session.get to
op._uniprot_plugin._client.session.get.
Suite: PROTEA 1089 passed, 12 skipped (was 1112; -23 from
deletions of the legacy class + parser/decode/total_results
overlap with protea-sources). Ruff full + mypy strict green.
Net diff PROTEA: -266 / +163 = -103 LOC across the operation,
core/utils.py, test_core.py, and test_fetch_uniprot_metadata.py.
This closes F2A.6-real:
* GOA real-migrated (turn pre-25, commits 20987a5/d1d60f6/43da412).
* QuickGO real-migrated (turn 27, c5433ed/f37dfce/42d4dd4).
* D-MIGR-04 prereq (turn 29, 18e92af/434b14e).
* UniProt FASTA real-migrated (turn 32, f1bf7b5/fadbd6b/56a6d87).
* UniProt metadata real-migrated + UniProtHttpClient deleted
(this turn, 09f3883/2a6ef55/<this>).
protea-sources is now self-contained with respect to UniProt HTTP:
the _http.py module owns the retry client; the plugin owns parsing
and modality dispatch (FASTA vs metadata). PROTEA's only remaining
involvement is persistence.
Part of F2A.6-real migration plan, step 4 of 4. F2B (HTTP registry
endpoints) is next on the autonomous queue once doc-lane gives it
priority.
Adds three read-only HTTP endpoints listing plugins discovered via
``importlib.metadata.entry_points``, closing the F2B.1-3 block of
master plan v3 in a single coherent router (the three endpoints
share their lookup mechanism — splitting them across separate
files would be artificial).
Endpoints:
* ``GET /backends`` — embedding backend plugins
(``protea.backends`` group). Today: esm, t5, ankh, esm3c.
* ``GET /sources`` — annotation source plugins
(``protea.sources`` group). Today: goa, quickgo, uniprot.
* ``GET /runners`` — experiment runner plugins
(``protea.runners`` group). Today: baseline, knn, lightgbm.
Response shape:
```
{
"group": "protea.backends",
"plugins": [
{"name": "esm", "cls": "EsmBackend",
"module": "protea_backends.esm:plugin", "extra": {}},
...
]
}
```
The ``extra`` field carries plugin-class-specific metadata read
from the loaded instance: today only sources expose
``version``, surfaced as ``extra.version`` (e.g. ``"uniprot-goa"``,
``"quickgo-rest"``). Backends and runners get an empty ``extra``;
adding more probe-able metadata is a one-line change inside
``_discover``.
Design choices:
* No caching. The endpoint re-scans entry_points on every call so
a worker that's just been restarted with a newly-installed
extra surfaces in the next request without an API restart. The
scan is sub-millisecond on the working set of ~10 plugins.
* No authentication. These endpoints are public-read by design —
they list installed software, not user data.
* Plugin loading happens here. Loading the entry_point fires the
plugin module's import side effects but should not raise for
any first-party plugin (the bootstrapping pattern keeps top-
level imports cheap). If a third-party plugin's load raises,
the caller surfaces it as a 500 — fail loud beats silently
hiding broken installs.
* Fixed group whitelist (``_KNOWN_GROUPS``) prevents callers
from probing arbitrary entry_points via the same code path.
Files:
* protea/api/routers/registry.py: new router (~140 LOC) with
PluginInfo + PluginListResponse pydantic models, _discover +
_list_for helpers, and the three endpoint functions.
* protea/api/app.py: add registry_router to the import block
and wire it via ``app.include_router(registry_router.router)``.
* tests/test_registry_endpoints.py: 16 tests across four
classes — TestBackendsEndpoint (5), TestSourcesEndpoint (5),
TestRunnersEndpoint (4), TestResponseSchema (2). Tests run
against the live entry_points discovery (the 10 plugins are
real C-stack siblings installed via path-deps); no mocking.
Suite: PROTEA 1105 passed, 12 skipped (was 1089; +16 new), ruff
full + mypy strict green on the new files.
This closes F2B autonomous-eligible work. F2B.4
(PredictGOTermsBatchOperation extract class) stays in the
human-review queue because of reranker sensitivity.
Part of F2B of master plan v3.
Adds a runnable submit-watch-result loop to PROTEA's README,
satisfying the F7.1 acceptance criterion of master plan v3
("5 minutes to first job") that the original README did not
explicitly cover. The previous README documented Docker + From
source bring-up but stopped at "scripts/manage.sh start" without
showing the end-to-end machinery.
The new section lives between Getting started and Documentation
and shows three operations the user can run with curl + jq the
moment the stack is up:
1. POST /jobs to enqueue a `ping` smoke-test operation, capturing
the returned job id.
2. GET /jobs/{id}/events to tail the structured-event stream
until the job reaches a terminal state.
3. GET /jobs/{id} to confirm the final status + result + any
error code.
Plus a sub-section showing the F2B plugin-discovery endpoints
(GET /backends, /sources, /runners) that landed in turn 36 — the
runtime catalogue the user can probe to see what models /
sources / runners the running deployment ships.
The example uses `ping` rather than a real ML operation so the
quickstart doesn't depend on having sequence data loaded; the
intent is to exercise the queue + worker + DB lifecycle end-to-
end, which `ping` does in <1s. Real operations
(insert_proteins, load_goa_annotations, compute_embeddings,
predict_go_terms) are submitted the same way.
PROTEA README size: 141 LOC → 187 LOC (+46 LOC).
Suite + Sphinx build unchanged; this is doc-only.
Part of Doc-T11 of the autonomous loop. Closes the README
expansion sweep across the four C-stack repos plus PROTEA itself.
CI rescue: restore main to a green state after ~6 weeks of red The last green CI run on `main` was 2026-03-25 (PR #7). Between that and 2026-05-06, two latent breakages accumulated and were exposed when the F2 phase work landed via 7db0e0d..e9ae748: - `cafaeval-protea` declared as PEP 621 file:/// URL hardcoded to the original developer's machine (introduced 2026-04-21, commit ace4c4a). - Five sibling path-deps (`protea-{contracts,method,sources, runners,backends}`) added during the F2 plugin migration; their internal cross-deps to protea-contracts were also path-based. - `protea-reranker-lab` path-dep on a sibling that wasn't on GitHub at all. - Pre-existing pyproject.toml + workflow gaps: `--only dev` install scope in lint and docs jobs (skipped main deps), missing sphinxcontrib-bibtex declaration, accumulated tech debt across ruff / flake8 / mypy that hadn't been gated for ~6 weeks. What this PR does: 1. Replaces all path-deps with `git+https://github.com/frapercan/<repo>` URLs so CI runners can resolve them. The 5 C-stack siblings and protea-reranker-lab were pushed to GitHub as part of this work. Cross-sibling path-deps inside the siblings (e.g. protea-backends pointing at ../protea-contracts) were also converted; otherwise poetry's transitive resolution failed. 2. Fixes integration tests broken by the F2A.6-real migration (op._http_client references, dict→GoaAnnotationRecord fixture conversion, halfvec roundtrip tolerance). 3. Auto-fix + manual cleanup of 18 ruff errors, 10 flake8 spacing violations, and 15 mypy errors (mostly union-attr narrowing asserts and targeted type: ignore on legitimate runtime patterns mypy can't prove safe). 4. Fixes lint + docs workflows to use `poetry install --with dev` instead of `--only dev` so mypy / sphinx autodoc can resolve imports of pyarrow, sqlalchemy, fastapi, etc. 5. Declares sphinxcontrib-bibtex (was installed transitively in the local venv but missing from pyproject.toml). 6. Includes 8 ADR resolutions confirmed during the rescue session (D04 /v1/ versioning, D06 Authentik+oauth2-proxy, D07 Loki+Grafana+OTel, D10 schema_sha v2, D25 HPC mode B primary, D27 ghcr.io, D28 sops+age, D29 semantic-release). Local-dev trade-off: editable cross-sibling installs are lost. Devs who want hot-reload across siblings need to do `pip install -e ../<sibling>` after `poetry install`. CI verification on this PR: - lint (3.12, 2.1.0): pass (3m3s) - test (3.12, 2.1.0): pass (3m14s) - integration (3.12, 2.1.0): pass (4m11s) - docs (3.12, 2.1.0): pass (2m57s) - pip-audit, bandit, GitGuardian: pass - codecov informative-only (not in required checks) Local verification matched CI: 1105 unit passed, 1115 integration passed (with --with-postgres), ruff + flake8 + mypy clean, sphinx build succeeds with 5 pre-existing warnings. Includes the LAFA wrapper scaffolding (`apps/lafa_container/*`) that was sitting untracked in the working tree since the early F-LAFA exploration; kept because it has real value as the seed for a future functionbench.net submission.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #9 +/- ##
===========================================
- Coverage 82.07% 73.71% -8.36%
===========================================
Files 63 91 +28
Lines 5959 10475 +4516
===========================================
+ Hits 4891 7722 +2831
- Misses 1068 2753 +1685 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fast-forward sync of develop with main after the CI rescue (PR #8) and the F2A.6-real plugin migration.
Develop's tip (c244d25 — workflow bumps) is a strict ancestor of main, so this is a content-equivalent sync with zero unique commits on develop. No conflicts expected.
50+ commits land on develop:
Test plan