PEARL stands for Protein Engineering Adapter via Reinforcement Learning.
This repository explores PETase-family sequence design through remote generation/training on Tinker plus local scoring, selection, mining, and evaluation logic. It is an experimental research codebase, not a validated product.
- Active workspace map after the April 28 cleanup:
REPO_MAP.md - Technical white paper:
WHITEPAPER.pdf - Repo structure and supported surface:
docs/overview.md - Supported workflows:
docs/workflows.md - Operator notes:
docs/operations.md - Current scientific status:
docs/science.md - Manifold-construction pivot:
docs/manifold_construction.md - Phase 8 DPO pilot readout:
docs/phase8_dpo_pilot_readout.md - Phase 8 no-logits OPD packet:
docs/phase8_no_logits_opd.md - Experiment configs:
configs/experiments/README.md - Full experimental history:
notes/LABNOTES.md
June 13, 2026 scoring change (BREAKING): the local ESM scorer no longer emits a 0-100 "pseudo-pLDDT" (it was misnamed and saturated). It now returns the raw ESM-2 mean per-residue pseudo-log-likelihood (PLL), and selection/gates use the calibrated natural-reference percentile (scripts/calibrate_esm_pll.py -> configs/esm_pll_calibration.<model>.json). The gate flag is now --esm-pll-gate-percentile (default 0.05); the report field raw_esm_score is now esm_pll (+ esm_pll_natural_percentile). No back-compat shim: old reports/audits will not resume. pLDDT now refers only to real structural confidence (ColabFold/ESMFold). See docs/science.md and notes/LABNOTES.md.
June 13, 2026 platform note: the Kimi generator (current Phase 8 path targets moonshotai/Kimi-K2.6, bumped from K2.5 after K2.6's 2026-04-20 release) is a Tinker compatibility/cost constraint (the project ran on free Tinker credits, and protein-native models are not Tinker-hostable), not a deliberate scientific choice. A general text LLM has no protein/structure prior, which is consistent with the observed mirage wall. As soon as a protein-native generator (ESM3/ProGen2/ZymCTRL) is hostable on our stack or the constraint lifts, the generator model question should be reopened as a first-class decision. ESM3-open can also fold locally and is the intended automated structure gate before the current manual ColabFold step. The Kimi K2 line is a strong model on its own merits; the gap is the missing protein prior, not model quality. (Several older scripts still default to Kimi-K2.5; standardizing those defaults is pending.) See docs/science.md.
June 11, 2026 Phase 8 update: the Tinker custom-loss DPO path has now run beyond smoke scale. A 3k-pair natural-positive DPO pilot completed, W&B/local batch metrics show a strong training-distribution move, and the DPO runner now preserves local batch reports alongside W&B logging. First-100-batch mean DPO loss was 0.6775 versus last-100 0.3655; first-100 mean reward margin was 0.0419 versus last-100 2.7476; positive-min-margin batches rose from 6% to 87%. That strengthens DPO as a learned preference baseline.
The biological readout is still unresolved. The only completed post-DPO evaluation slice remains p12, temperature 0.8, seed 7: local proxy movement, 0 functional or family-faithful bridge hits, and a five-candidate folded subset with low pLDDT (25.61-36.27) and 0 / 5 CA-triad passes. Treat that as an underpowered warning, not a falsification of DPO-only. DPO remains a live baseline/control that needs higher-resolution, preferably shuffled or held-out, evaluation before its failure modes or yield can be estimated.
June 2026 heat check: the project has enough working components to continue the preference-learning path, but not enough evidence to claim the protein-design thesis is solved. The strongest direction is to keep characterizing DPO while preparing sparse OPD/multi-teacher feedback as the comparison branch: natural PETase/cutinase records as positives, generated/fold-failed artifacts and new low-confidence generated candidates as hard negatives, then compact post-train generation and structural validation before any larger library expansion.
April 29, 2026 DPO correction: Phase 7 generated/local-library sequences are no longer allowed on the chosen side of the paid-run DPO dataset. The current local Phase 8 build uses reviewed natural PETase/cutinase records as chosen positives and demotes the fold-failed Phase 7 generated panel to hard negatives.
April 28, 2026 cleanup note: the active workspace is now focused on Phase 8 DPO readiness. The current 10k DPO dataset lives locally in data/phase8_dpo/, its structural evidence lives in reports/analysis/phase7_local_library_v1/, and old run outputs/scripts/configs were moved to the local ignored archive at archive/2026-04-28-labyrinth-cleanup/. See REPO_MAP.md and notes/LABNOTES.md for the current map and latest scientific status.
Historical April 23, 2026 snapshot, retained for continuity:
- merged
stage-b-litemined pool:1,597,184raw candidates179exact-unique functional hits54exact-unique family-faithful hits197lineage clusters at0.85
- best historical strict branch:
strict-core-v7-repair- stage-A and stage-B-lite trained cleanly
- full robustness stayed narrow:
p12:[0, 0, 0]p24:[0, 2, 0]p48:[0, 3, 1]- the main miss was prompt coverage breadth, with only
4 / 48prompts hit atp48
- negative strict/repair evidence:
strict-core-v8-coveragefailed to broadenv7, regressed atp12/p24, and lost family-faithful robustness- the April 21/22
v9p12/p24 repair rescue found79loose high-ESM survivors but0strict shortlist rows and0retrain positives - local Gemma mining and historical local-exploit scans did not expose a usable passive basin
- scaffold-first manifold pivot:
- Phase 1 built a local scaffold bank with
12,619unique sequences,4,893family-manifold scaffolds,3,769strict-manifold scaffolds,79recoveredv9negatives, and274strict candidate positives - Phase 2 built and ESM-scored a
10,000-candidate same-length strict-manifold frontier; all candidates scored>=95 - Phase 2 selection passed readiness with
230selected strict candidates across79parent scaffolds,8lengths, and100two-mutants
- Phase 1 built a local scaffold bank with
- manifold curriculum outcomes:
v1: nonzero transfer but failed breadth;p12passed with tier-2 hits[1, 2, 0], whilep24failed with[0, 1, 0]v1.1: p24-only gate failed cleanly with0tier-2 hits and0raw single-motif plus geometry plus ESM candidatesv1.2: length-retargeted repair distillation recovered real but narrow signal:3functional hits,2family-faithful hits, and3 / 24prompt coveragev1.3: support-prompt widening regressed to[0, 0, 1]tier-2 hits,1 / 24prompt coverage, and0family-faithful hits
- April 23 rule, superseded by the June Phase 8 preference-learning path:
- do not launch another paid manifold
v1.xreplay, stage-B, p48, or broad mining tranche from this branch line - the manifold
v2objective panel is now built atreports/analysis/manifold_v2_objective_panel_20260424/ - use its
2v1.2family-faithful hits as positive anchors and45v1.3stable-only / geometry-only finalists as hard negatives - use its
305v9/v1.1 drift negatives and190historical support positives to shape the next offline constructor - the first v2 offline constructor selected
64hard-gated pre-ESM candidates across38parents and8exact lengths - the expanded v2 constructor scored
192 / 192candidates above ESM85 - final reselection produced
34strict/core/ESM candidates across18parent source keys and14exact lengths - the finalized v2 curriculum has
42rows:34selected candidates plus8purebred anchors - the v2 p24/c128 diagnostic completed but failed durability with tier-2 hits
[0, 1, 0], prompt coverage1 / 24, and0family-faithful hits - then-active next branch was v2.1 bridge-weighted replay at
reports/curriculum/manifold_v21_20260424/manifold_v21_bridge_curriculum.jsonl - v2.1 has
71rows:28v2 strict-breadth anchors,15measured bridge replay rows,12support prompt anchors,12historical family-faithful anchors, and4purebred anchors - then-current paid scope was a tiny v2.1 stage-A train plus p24-only diagnostic gate; no stage-B, p48, or broad mining from this artifact
- keep paid mining as a small diagnostic only if the offline v2 redesign stalls
- do not launch another paid manifold
See docs/science.md for the current research readout and primary artifact links.
The supported reusable workflows are:
minepostprocessanalyzebuild-datasetrepairtrainrobustnessrerankermanifold-construction(Phase 1 and Phase 2 selection implemented)preference-dpo-opd(Phase 8 DPO baseline, sparse OPD materials, and matched structural readouts)structure-gate(automated ESMFold/ESMAtlas fold + real 3D catalytic-triad geometry gate; pre-ColabFold)
The details and entrypoints for those workflows live in docs/workflows.md.
Versioned strict_core_* and strict_first_union wrappers now live under the archive and are exposed at their old scripts/ paths through symlinks for continuity with the historical record. They are not the supported workflow surface anymore. The supported control flow is now config-driven and library-backed through src/pearl.
python3.13 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtPinned local/dev requirements are in requirements.txt. The local baseline
uses Python 3.13 because the current Tinker SDK requires Python >=3.11:
tinker==0.21.0torch==2.12.0transformers==5.8.1tiktoken==0.13.0numpy==2.4.6safetensors==0.7.0sentencepiece==0.2.1rapidfuzz==3.14.5charset-normalizer==3.4.6
Production CUDA environments used on Nebius are separate from the local/dev baseline.
main.py: current generation/eval engine with shared helpers now extracted intosrc/pearlsrc/pearl/family.py: family scoring and catalytic geometry checkssrc/pearl/esm_proxy.py: local ESM proxy scorersrc/pearl/: reusable library surface for paths, detached jobs, reports, smoke gates, curricula, and run-record assemblyscripts/: supported workflow entrypoints plus archived compatibility symlinks, includingscripts/manifold_construction_experiment.py- Phase 8 preference runners:
scripts/run_tinker_dpo_smoke.py,scripts/run_tinker_sparse_opd_smoke.py,scripts/build_tinker_teacher_traces.py,scripts/build_sparse_opd_targets.py,scripts/phase8_paid_run_preflight.py - Structural gate:
src/pearl/structure_gate.py+scripts/run_structure_gate.py(fold + real Ser-His-Asp 3D H-bond geometry; backendsesmfold/esmatlas; gradedstructural_scorewhen calibrated). Calibration:scripts/calibrate_esm_pll.py(sequence PLL),scripts/calibrate_structure_gate.py(fold/triad vs natural) reports/: local run artifactsdata/: prompts, records, and family datasets
The repo boundary is now explicit:
- reusable engine and shared helpers live under
src/pearl - supported workflow runners are config-driven entrypoints
- historical campaign wrappers are archived and kept only through compatibility symlinks
python scripts/run_ablation.py \
--name my-eval-run \
--model moonshotai/Kimi-K2.6 \
--variant baseline \
--prompts-path /abs/path/prompts.jsonl \
--reference-records-path /abs/path/petase_records.jsonl \
--prompt-count 24 \
--candidate-sample-count 128 \
--second-stage-top-k 16 \
--second-stage-esm-weight 0.4 \
--second-stage-motif-weight 0.3 \
--second-stage-geometry-weight 0.3 \
--second-stage-template-weight 0.05 \
--init-state-path tinker://.../weights/... \
--eval-only \
--resume \
--capture-candidate-audit \
--seed 41python scripts/run_robustness_suite.py \
--name my-robustness \
--init-state-path tinker://.../weights/... \
--model moonshotai/Kimi-K2.6 \
--variant baseline \
--suite-sizes 12,24,48 \
--temperatures 0.8 \
--seeds 41,53,67 \
--candidate-sample-count 128 \
--second-stage-top-k 16 \
--second-stage-esm-weight 0.4 \
--second-stage-motif-weight 0.3 \
--second-stage-geometry-weight 0.3 \
--second-stage-template-weight 0.05Use this path when remote Tinker sampling dominates wall clock and you want to decouple:
- stockpile candidate pools first
- then run H100 ESM rescoring/finalization only on completed pools
Sync the bundle to a Nebius H100 VM from your Mac:
bash scripts/sync_topoff1m_a_eval_bundle.sh <VM_IP>Set up the VM once:
ssh -i ~/.ssh/nebius_h200 svdr@<VM_IP>
bash ~/work/tinker/scripts/setup_nebius_h100_eval_env.sh
export TINKER_API_KEY=...Launch ultra on the VM:
export STOCKPILE_JOBS=4
export STOCKPILE_RETRIES=2
bash ~/work/tinker/scripts/launch_topoff1m_a_robustness_h100.sh ultraQueue balanced only after ultra is actually complete:
python3 ~/work/tinker/scripts/launch_detached_job.py \
--job-name pearl-topoff1m-a-balanced-robustness-2phase-h100-queue \
--cwd ~/work/tinker \
--metadata-path ~/work/tinker/reports/logs/pearl-topoff1m-a-balanced-robustness-2phase-h100-queue.json \
--log-path ~/work/tinker/reports/logs/pearl-topoff1m-a-balanced-robustness-2phase-h100-queue.log \
--env "TINKER_API_KEY=$TINKER_API_KEY" \
--env "STOCKPILE_JOBS=$STOCKPILE_JOBS" \
--env "STOCKPILE_RETRIES=$STOCKPILE_RETRIES" \
-- bash -lc 'while [ ! -f "$HOME/work/tinker/reports/robustness/pearl-topoff1m-a-ultra-robustness-2phase-h100-p12p24p48-t08-s41s53s67/robustness_summary.json" ]; do sleep 60; done; bash ~/work/tinker/scripts/launch_topoff1m_a_robustness_h100.sh balanced'Operational notes:
- The queue gate should watch for
robustness_summary.json, not the parent PID. - The VM venv needs
sentencepiece,protobuf, andtiktokeninstalled or some stockpile lanes can fail during tokenizer init. run_robustness_two_phase.pynow supports:--stockpile-jobs--stockpile-retries
- Kill the VM only after both of these files exist:
reports/robustness/pearl-topoff1m-a-ultra-robustness-2phase-h100-p12p24p48-t08-s41s53s67/robustness_summary.jsonreports/robustness/pearl-topoff1m-a-balanced-robustness-2phase-h100-p12p24p48-t08-s41s53s67/robustness_summary.json
Fold the selected survivors and check fold quality plus the real 3D Ser-His-Asp triad geometry (the automated step before any manual ColabFold):
python scripts/run_structure_gate.py \
--backend esmfold \
--input reports/ablations/<run>/candidate_audit.json \
--selected-only \
--output reports/structure_gate/<run>.jsonUse --backend esmatlas for low-volume API folding with no local weights, or
--sequence <SEQ> for a one-off. Gate defaults: mean pLDDT >= 70, triad H-bonds <= 3.5 A.
The standalone check_retrain_readiness.py was retired in the April 28 cleanup and now lives
only under archive/2026-04-28-labyrinth-cleanup/scripts/ (no compatibility symlink). It is
not part of the supported surface. Retrain-readiness signal is emitted directly in raft
postprocess artifacts (e.g. retrain_readiness_selected_only.json); for current pre-run
checks use scripts/phase8_paid_run_preflight.py.
python scripts/run_raft_wave.py \
--name wave1 \
--init-state-path tinker://.../weights/... \
--total-prompt-count 200 \
--shard-count 4 \
--candidate-sample-count 256 \
--second-stage-top-k 16 \
--temperature 0.8Most runs produce:
report.json: step-level selected output recordssummary.json: aggregate run metricscandidate_audit.json: full per-candidate pool (if enabled)
Robustness suites additionally produce:
runs_manifest.jsonrobustness_summary.jsonwith durability-gate pass/fail and seed vectors
- Sequences from this repo are computational outputs only.
- ESM proxy is a lightweight stability proxy, not a structural truth model.
- Passing local gates does not imply biochemical activity or wet-lab success.
Apache License 2.0. See LICENSE.