This page documents the ProteinGym zero-shot benchmarking flow: ProteinGymRunner, scoring methods, run_proteingym_zero_shot(), CLI arguments, expand_dms_ids_all, and the benchmark performance subprocess. For dataset-based probing, see Probes and training.
ProteinGym provides zero-shot variant effect prediction: score substitution or indel assays with a base model (e.g. ESM2) and compare to experimental data via Spearman correlation. Protify runs scoring for selected DMS assay IDs and optionally runs the official benchmark performance script. You can compare multiple scoring methods in one run with --compare_scoring_methods.
- When
--proteingymis set and no datasets are passed,main()callsMainProcess.run_proteingym_zero_shot(). - run_proteingym_zero_shot() reads
dms_ids,mode,model_names,scoring_method,scoring_window, andpg_batch_sizefromfull_args. It expandsdms_ids(e.g. "all") viaexpand_dms_ids_all(), then instantiatesProteinGymRunner(results_dir, repo_id="GleghornLab/ProteinGym_DMS")and callsrunner.run(...). - For each model, the runner loads the base model (masked LM), builds
ProteinGymScorer, loads each DMS assay withload_proteingym_dms(dms_id, mode, repo_id), runs scoring (substitutions or indels), and saves/merges per-assay CSVs. Optionally it runsrun_benchmark()to compute Spearman vs reference. - If
--compare_scoring_methodsis set (and--proteingym), the main entry runscompare_scoring_methods(...)instead of the normal zero-shot path; norun_proteingym_zero_shot()is called in that branch.
Defined in scorer.py.
- Constructor:
ProteinGymRunner(results_dir, repo_id="GleghornLab/ProteinGym_DMS", device=None). Createsresults_dir; uses CUDA if available. - run(dms_ids, model_names, mode="benchmark", scoring_method="masked_marginal", scoring_window="optimal", batch_size=32): For each model, loads via
get_base_model(..., masked_lm=True), buildsProteinGymScorer, then for each DMS ID loads data withload_proteingym_dms(dms_id, mode, repo_id). For substitutions:scorer.score_substitutions(..., scoring_method, scoring_window); for indels:scorer.score_indels(..., scoring_window="sliding"). Saves/merges CSVs via_save_results(). Returns a dictmodel_name -> elapsed_time. - run_benchmark(model_names, dms_ids, mode, scoring_method): Runs the
DMS_benchmark_performance.pysubprocess; outputs go toresults_dir/benchmark_performance. - collect_spearman(results_dir, model_names): Static method; reads
benchmark_performance/Spearman/Summary_performance_DMS_*_Spearman.csvand returns{model_name: spearman}.
For substitutions (ProteinGymScorer.score_substitutions):
| Method | Description |
|---|---|
| masked_marginal | Marginal log prob of the mutated residue in masked context (E1 uses a dedicated path). |
| mutant_marginal | Marginal at the mutated position for the mutant sequence. |
| wildtype_marginal | Marginal at the position for the wildtype sequence. |
| pll | Pseudo-log-likelihood. |
| global_log_prob | Full sequence log probability. |
For indels, only PLL over sliding windows is supported; scoring_window is typically "sliding".
scoring_window: "optimal" (single window per variant when possible) or "sliding" (multiple windows, then aggregate). For indels, scoring_window is forced to "sliding" in the run.
- load_proteingym_dms(dms_id, mode, repo_id="GleghornLab/ProteinGym_DMS") (data_loader.py) downloads
by_dms_id/{dms_id}.parquetfrom HuggingFace. - Modes:
"benchmark"(substitutions, no indels),"indels","singles","multiples". - expand_dms_ids_all(dms_ids, mode) (utils.py): If any element is
"all", replaces withALL_INDEL_DMS_IDSorALL_SUBSTITUTION_DMS_IDSfrom dms_ids.py.
| Argument | Type | Default | Description |
|---|---|---|---|
--proteingym |
flag | False | Enable ProteinGym zero-shot run. |
--dms_ids |
list | [all] | DMS assay IDs or "all". |
--mode |
choice | benchmark | benchmark, indels, multiples, singles. |
--scoring_method |
choice | masked_marginal | masked_marginal, mutant_marginal, wildtype_marginal, pll, global_log_prob. |
--scoring_window |
choice | optimal | optimal, sliding. |
--pg_batch_size |
int | 32 | Batch size for scoring. |
--compare_scoring_methods |
flag | False | Run scoring method comparison instead of single method. |
--score_only |
flag | False | Skip scoring; run benchmark on existing CSVs only. |
When --proteingym is True and --compare_scoring_methods is True, the main entry runs only the comparison and exits; otherwise it runs run_proteingym_zero_shot() (and for indels mode, scoring_method is forced to pll).
py -m src.protify.main --proteingym --model_names ESM2-150 --dms_ids all --mode benchmarkpy -m src.protify.main --proteingym --model_names ESM2-150 --dms_ids all --mode indelspy -m src.protify.main --proteingym --compare_scoring_methods --model_names ESM2-150 ESM2-650 --dms_ids allUse --score_only after you have already produced the per-assay CSVs and only want to run the benchmark performance step.
- Configuration for ProteinGym CLI flags
- Models and embeddings for base model loading
- Getting started for entry points