Our code is based on TRL and PEFT for training and Model_Arithmetic for inference.
The data is stored in the zenodo_record.
conda create -n parm python=3.10
conda activate parm
cd language-model-arithmetic/
pip install -e .
cd ../peft/
pip install -e .
conda install -c nvidia cuda-compiler
cd ..
git clone https://github.com/PKU-Alignment/safe-rlhf.git
cd safe-rlhf
pip install .
cd ..
######### GFP #########
# GFP Base Model
mkdir prollama-gfp
## put ProLLaMA-GFP-merged folder from zenodo in the prollama-gfp folder
# TemBERTure model for GFP stability evaluation
git clone https://github.com/ibmm-unibe-ch/TemBERTure.git
# MosPro GFP evaluation model already included within MosPro folder
pip install -r requirements.txt
pip install adapters==1.0.1
pip install biopython
cd code/data
# put all the files in data/ folder in zenodo in the code/data folder
python relabel.py
The training job script for slurm system is provided in the train folder:
train
├── GFP
│ └── train_stparm_prollama_score_reg_8gpu.sh
├── train_stparm_none_score_reg.sh
└── train_stparm_prollama_score_reg_8gpu.sh
The evaluation job script for slurm system is provided in the eval folder:
eval
├── GFP
│ └── eval_all_ckpts_GFP.sh
├── eval_all_ckpts.sh
└── eval_all_ckpts_protein.sh