PEARL

PEARL stands for Protein Engineering Adapter via Reinforcement Learning.

This repository explores PETase-family sequence design through remote generation/training on Tinker plus local scoring, selection, mining, and evaluation logic. It is an experimental research codebase, not a validated product.

Start Here

Active workspace map after the April 28 cleanup: REPO_MAP.md
Technical white paper: WHITEPAPER.pdf
Repo structure and supported surface: docs/overview.md
Supported workflows: docs/workflows.md
Operator notes: docs/operations.md
Current scientific status: docs/science.md
Manifold-construction pivot: docs/manifold_construction.md
Phase 8 DPO pilot readout: docs/phase8_dpo_pilot_readout.md
Phase 8 no-logits OPD packet: docs/phase8_no_logits_opd.md
Experiment configs: configs/experiments/README.md
Full experimental history: notes/LABNOTES.md

Current State

June 13, 2026 scoring change (BREAKING): the local ESM scorer no longer emits a 0-100 "pseudo-pLDDT" (it was misnamed and saturated). It now returns the raw ESM-2 mean per-residue pseudo-log-likelihood (PLL), and selection/gates use the calibrated natural-reference percentile (scripts/calibrate_esm_pll.py -> configs/esm_pll_calibration.<model>.json). The gate flag is now --esm-pll-gate-percentile (default 0.05); the report field raw_esm_score is now esm_pll (+ esm_pll_natural_percentile). No back-compat shim: old reports/audits will not resume. pLDDT now refers only to real structural confidence (ColabFold/ESMFold). See docs/science.md and notes/LABNOTES.md.

June 13, 2026 platform note: the Kimi generator (current Phase 8 path targets moonshotai/Kimi-K2.6, bumped from K2.5 after K2.6's 2026-04-20 release) is a Tinker compatibility/cost constraint (the project ran on free Tinker credits, and protein-native models are not Tinker-hostable), not a deliberate scientific choice. A general text LLM has no protein/structure prior, which is consistent with the observed mirage wall. As soon as a protein-native generator (ESM3/ProGen2/ZymCTRL) is hostable on our stack or the constraint lifts, the generator model question should be reopened as a first-class decision. ESM3-open can also fold locally and is the intended automated structure gate before the current manual ColabFold step. The Kimi K2 line is a strong model on its own merits; the gap is the missing protein prior, not model quality. (Several older scripts still default to Kimi-K2.5; standardizing those defaults is pending.) See docs/science.md.

June 11, 2026 Phase 8 update: the Tinker custom-loss DPO path has now run beyond smoke scale. A 3k-pair natural-positive DPO pilot completed, W&B/local batch metrics show a strong training-distribution move, and the DPO runner now preserves local batch reports alongside W&B logging. First-100-batch mean DPO loss was 0.6775 versus last-100 0.3655; first-100 mean reward margin was 0.0419 versus last-100 2.7476; positive-min-margin batches rose from 6% to 87%. That strengthens DPO as a learned preference baseline.

The biological readout is still unresolved. The only completed post-DPO evaluation slice remains p12, temperature 0.8, seed 7: local proxy movement, 0 functional or family-faithful bridge hits, and a five-candidate folded subset with low pLDDT (25.61-36.27) and 0 / 5 CA-triad passes. Treat that as an underpowered warning, not a falsification of DPO-only. DPO remains a live baseline/control that needs higher-resolution, preferably shuffled or held-out, evaluation before its failure modes or yield can be estimated.

June 2026 heat check: the project has enough working components to continue the preference-learning path, but not enough evidence to claim the protein-design thesis is solved. The strongest direction is to keep characterizing DPO while preparing sparse OPD/multi-teacher feedback as the comparison branch: natural PETase/cutinase records as positives, generated/fold-failed artifacts and new low-confidence generated candidates as hard negatives, then compact post-train generation and structural validation before any larger library expansion.

April 29, 2026 DPO correction: Phase 7 generated/local-library sequences are no longer allowed on the chosen side of the paid-run DPO dataset. The current local Phase 8 build uses reviewed natural PETase/cutinase records as chosen positives and demotes the fold-failed Phase 7 generated panel to hard negatives.

April 28, 2026 cleanup note: the active workspace is now focused on Phase 8 DPO readiness. The current 10k DPO dataset lives locally in data/phase8_dpo/, its structural evidence lives in reports/analysis/phase7_local_library_v1/, and old run outputs/scripts/configs were moved to the local ignored archive at archive/2026-04-28-labyrinth-cleanup/. See REPO_MAP.md and notes/LABNOTES.md for the current map and latest scientific status.

Historical April 23, 2026 snapshot, retained for continuity:

merged stage-b-lite mined pool:
- 1,597,184 raw candidates
- 179 exact-unique functional hits
- 54 exact-unique family-faithful hits
- 197 lineage clusters at 0.85
best historical strict branch:
- strict-core-v7-repair
- stage-A and stage-B-lite trained cleanly
- full robustness stayed narrow:
  - p12: [0, 0, 0]
  - p24: [0, 2, 0]
  - p48: [0, 3, 1]
  - the main miss was prompt coverage breadth, with only 4 / 48 prompts hit at p48
negative strict/repair evidence:
- strict-core-v8-coverage failed to broaden v7, regressed at p12/p24, and lost family-faithful robustness
- the April 21/22 v9 p12/p24 repair rescue found 79 loose high-ESM survivors but 0 strict shortlist rows and 0 retrain positives
- local Gemma mining and historical local-exploit scans did not expose a usable passive basin
scaffold-first manifold pivot:
- Phase 1 built a local scaffold bank with 12,619 unique sequences, 4,893 family-manifold scaffolds, 3,769 strict-manifold scaffolds, 79 recovered v9 negatives, and 274 strict candidate positives
- Phase 2 built and ESM-scored a 10,000-candidate same-length strict-manifold frontier; all candidates scored >=95
- Phase 2 selection passed readiness with 230 selected strict candidates across 79 parent scaffolds, 8 lengths, and 100 two-mutants
manifold curriculum outcomes:
- v1: nonzero transfer but failed breadth; p12 passed with tier-2 hits [1, 2, 0], while p24 failed with [0, 1, 0]
- v1.1: p24-only gate failed cleanly with 0 tier-2 hits and 0 raw single-motif plus geometry plus ESM candidates
- v1.2: length-retargeted repair distillation recovered real but narrow signal: 3 functional hits, 2 family-faithful hits, and 3 / 24 prompt coverage
- v1.3: support-prompt widening regressed to [0, 0, 1] tier-2 hits, 1 / 24 prompt coverage, and 0 family-faithful hits
April 23 rule, superseded by the June Phase 8 preference-learning path:
- do not launch another paid manifold v1.x replay, stage-B, p48, or broad mining tranche from this branch line
- the manifold v2 objective panel is now built at reports/analysis/manifold_v2_objective_panel_20260424/
- use its 2 v1.2 family-faithful hits as positive anchors and 45 v1.3 stable-only / geometry-only finalists as hard negatives
- use its 305 v9/v1.1 drift negatives and 190 historical support positives to shape the next offline constructor
- the first v2 offline constructor selected 64 hard-gated pre-ESM candidates across 38 parents and 8 exact lengths
- the expanded v2 constructor scored 192 / 192 candidates above ESM 85
- final reselection produced 34 strict/core/ESM candidates across 18 parent source keys and 14 exact lengths
- the finalized v2 curriculum has 42 rows: 34 selected candidates plus 8 purebred anchors
- the v2 p24/c128 diagnostic completed but failed durability with tier-2 hits [0, 1, 0], prompt coverage 1 / 24, and 0 family-faithful hits
- then-active next branch was v2.1 bridge-weighted replay at reports/curriculum/manifold_v21_20260424/manifold_v21_bridge_curriculum.jsonl
- v2.1 has 71 rows: 28 v2 strict-breadth anchors, 15 measured bridge replay rows, 12 support prompt anchors, 12 historical family-faithful anchors, and 4 purebred anchors
- then-current paid scope was a tiny v2.1 stage-A train plus p24-only diagnostic gate; no stage-B, p48, or broad mining from this artifact
- keep paid mining as a small diagnostic only if the offline v2 redesign stalls

See docs/science.md for the current research readout and primary artifact links.

Supported Surface

The supported reusable workflows are:

mine
postprocess
analyze
build-dataset
repair
train
robustness
reranker
manifold-construction (Phase 1 and Phase 2 selection implemented)
preference-dpo-opd (Phase 8 DPO baseline, sparse OPD materials, and matched structural readouts)
structure-gate (automated ESMFold/ESMAtlas fold + real 3D catalytic-triad geometry gate; pre-ColabFold)

The details and entrypoints for those workflows live in docs/workflows.md.

Versioned strict_core_* and strict_first_union wrappers now live under the archive and are exposed at their old scripts/ paths through symlinks for continuity with the historical record. They are not the supported workflow surface anymore. The supported control flow is now config-driven and library-backed through src/pearl.

Installation

python3.13 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Pinned local/dev requirements are in requirements.txt. The local baseline uses Python 3.13 because the current Tinker SDK requires Python >=3.11:

tinker==0.21.0
torch==2.12.0
transformers==5.8.1
tiktoken==0.13.0
numpy==2.4.6
safetensors==0.7.0
sentencepiece==0.2.1
rapidfuzz==3.14.5
charset-normalizer==3.4.6

Production CUDA environments used on Nebius are separate from the local/dev baseline.

Repo Landmarks

main.py: current generation/eval engine with shared helpers now extracted into src/pearl
src/pearl/family.py: family scoring and catalytic geometry checks
src/pearl/esm_proxy.py: local ESM proxy scorer
src/pearl/: reusable library surface for paths, detached jobs, reports, smoke gates, curricula, and run-record assembly
scripts/: supported workflow entrypoints plus archived compatibility symlinks, including scripts/manifold_construction_experiment.py
Phase 8 preference runners: scripts/run_tinker_dpo_smoke.py, scripts/run_tinker_sparse_opd_smoke.py, scripts/build_tinker_teacher_traces.py, scripts/build_sparse_opd_targets.py, scripts/phase8_paid_run_preflight.py
Structural gate: src/pearl/structure_gate.py + scripts/run_structure_gate.py (fold + real Ser-His-Asp 3D H-bond geometry; backends esmfold/esmatlas; graded structural_score when calibrated). Calibration: scripts/calibrate_esm_pll.py (sequence PLL), scripts/calibrate_structure_gate.py (fold/triad vs natural)
reports/: local run artifacts
data/: prompts, records, and family datasets

The repo boundary is now explicit:

reusable engine and shared helpers live under src/pearl
supported workflow runners are config-driven entrypoints
historical campaign wrappers are archived and kept only through compatibility symlinks

Typical Workflows

1. Reproducible eval run

python scripts/run_ablation.py \
  --name my-eval-run \
  --model moonshotai/Kimi-K2.6 \
  --variant baseline \
  --prompts-path /abs/path/prompts.jsonl \
  --reference-records-path /abs/path/petase_records.jsonl \
  --prompt-count 24 \
  --candidate-sample-count 128 \
  --second-stage-top-k 16 \
  --second-stage-esm-weight 0.4 \
  --second-stage-motif-weight 0.3 \
  --second-stage-geometry-weight 0.3 \
  --second-stage-template-weight 0.05 \
  --init-state-path tinker://.../weights/... \
  --eval-only \
  --resume \
  --capture-candidate-audit \
  --seed 41

2. Durability suite (`12/24/48`)

python scripts/run_robustness_suite.py \
  --name my-robustness \
  --init-state-path tinker://.../weights/... \
  --model moonshotai/Kimi-K2.6 \
  --variant baseline \
  --suite-sizes 12,24,48 \
  --temperatures 0.8 \
  --seeds 41,53,67 \
  --candidate-sample-count 128 \
  --second-stage-top-k 16 \
  --second-stage-esm-weight 0.4 \
  --second-stage-motif-weight 0.3 \
  --second-stage-geometry-weight 0.3 \
  --second-stage-template-weight 0.05

2b. Two-phase H100 durability suite

Use this path when remote Tinker sampling dominates wall clock and you want to decouple:

stockpile candidate pools first
then run H100 ESM rescoring/finalization only on completed pools

Sync the bundle to a Nebius H100 VM from your Mac:

bash scripts/sync_topoff1m_a_eval_bundle.sh <VM_IP>

Set up the VM once:

ssh -i ~/.ssh/nebius_h200 svdr@<VM_IP>
bash ~/work/tinker/scripts/setup_nebius_h100_eval_env.sh
export TINKER_API_KEY=...

Launch ultra on the VM:

export STOCKPILE_JOBS=4
export STOCKPILE_RETRIES=2
bash ~/work/tinker/scripts/launch_topoff1m_a_robustness_h100.sh ultra

Queue balanced only after ultra is actually complete:

python3 ~/work/tinker/scripts/launch_detached_job.py \
  --job-name pearl-topoff1m-a-balanced-robustness-2phase-h100-queue \
  --cwd ~/work/tinker \
  --metadata-path ~/work/tinker/reports/logs/pearl-topoff1m-a-balanced-robustness-2phase-h100-queue.json \
  --log-path ~/work/tinker/reports/logs/pearl-topoff1m-a-balanced-robustness-2phase-h100-queue.log \
  --env "TINKER_API_KEY=$TINKER_API_KEY" \
  --env "STOCKPILE_JOBS=$STOCKPILE_JOBS" \
  --env "STOCKPILE_RETRIES=$STOCKPILE_RETRIES" \
  -- bash -lc 'while [ ! -f "$HOME/work/tinker/reports/robustness/pearl-topoff1m-a-ultra-robustness-2phase-h100-p12p24p48-t08-s41s53s67/robustness_summary.json" ]; do sleep 60; done; bash ~/work/tinker/scripts/launch_topoff1m_a_robustness_h100.sh balanced'

Operational notes:

The queue gate should watch for robustness_summary.json, not the parent PID.
The VM venv needs sentencepiece, protobuf, and tiktoken installed or some stockpile lanes can fail during tokenizer init.
run_robustness_two_phase.py now supports:
- --stockpile-jobs
- --stockpile-retries
Kill the VM only after both of these files exist:
- reports/robustness/pearl-topoff1m-a-ultra-robustness-2phase-h100-p12p24p48-t08-s41s53s67/robustness_summary.json
- reports/robustness/pearl-topoff1m-a-balanced-robustness-2phase-h100-p12p24p48-t08-s41s53s67/robustness_summary.json

2c. Structural gate on a run's shortlist

Fold the selected survivors and check fold quality plus the real 3D Ser-His-Asp triad geometry (the automated step before any manual ColabFold):

python scripts/run_structure_gate.py \
  --backend esmfold \
  --input reports/ablations/<run>/candidate_audit.json \
  --selected-only \
  --output reports/structure_gate/<run>.json

Use --backend esmatlas for low-volume API folding with no local weights, or --sequence <SEQ> for a one-off. Gate defaults: mean pLDDT >= 70, triad H-bonds <= 3.5 A.

3. Retrain readiness check (archived)

The standalone check_retrain_readiness.py was retired in the April 28 cleanup and now lives only under archive/2026-04-28-labyrinth-cleanup/scripts/ (no compatibility symlink). It is not part of the supported surface. Retrain-readiness signal is emitted directly in raft postprocess artifacts (e.g. retrain_readiness_selected_only.json); for current pre-run checks use scripts/phase8_paid_run_preflight.py.

4. Detached mining wave

python scripts/run_raft_wave.py \
  --name wave1 \
  --init-state-path tinker://.../weights/... \
  --total-prompt-count 200 \
  --shard-count 4 \
  --candidate-sample-count 256 \
  --second-stage-top-k 16 \
  --temperature 0.8

Outputs You Should Expect

Most runs produce:

report.json: step-level selected output records
summary.json: aggregate run metrics
candidate_audit.json: full per-candidate pool (if enabled)

Robustness suites additionally produce:

runs_manifest.json
robustness_summary.json with durability-gate pass/fail and seed vectors

Safety And Scientific Scope

Sequences from this repo are computational outputs only.
ESM proxy is a lightweight stability proxy, not a structural truth model.
Passing local gates does not imply biochemical activity or wet-lab success.

License

Apache License 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEARL

Start Here

Current State

Supported Surface

Installation

Repo Landmarks

Typical Workflows

1. Reproducible eval run

2. Durability suite (`12/24/48`)

2b. Two-phase H100 durability suite

2c. Structural gate on a run's shortlist

3. Retrain readiness check (archived)

4. Detached mining wave

Outputs You Should Expect

Safety And Scientific Scope

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets/fonts		assets/fonts
configs		configs
data/phase7_dpo		data/phase7_dpo
docs		docs
hpc		hpc
notes		notes
reports		reports
scripts		scripts
src/pearl		src/pearl
tests		tests
.gitignore		.gitignore
ENGINEERING_STANDARDS.md		ENGINEERING_STANDARDS.md
LICENSE		LICENSE
README.md		README.md
REPO_MAP.md		REPO_MAP.md
WHITEPAPER.pdf		WHITEPAPER.pdf
local_proxy.py		local_proxy.py
main.py		main.py
petase_family.py		petase_family.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PEARL

Start Here

Current State

Supported Surface

Installation

Repo Landmarks

Typical Workflows

1. Reproducible eval run

2. Durability suite (12/24/48)

2b. Two-phase H100 durability suite

2c. Structural gate on a run's shortlist

3. Retrain readiness check (archived)

4. Detached mining wave

Outputs You Should Expect

Safety And Scientific Scope

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Durability suite (`12/24/48`)

Packages