- Install Anaconda.
It is recommended to use Anaconda to build the model training environment.
- Enter the PreGP directory.
cd PreGP
- Execute the following command to create the PreGP environment
conda env export -n PreGP > pregp_env.yml
- Activate the PreGP environment.
conda activate PreGP
- Start pretraining.
bash pretrain.sh
Here are the meanings of some key parameters, which need to be modified as needed.
| Parameter | Description |
|---|---|
geno_path |
Path to genotype data CSV file |
pretrain_model_path |
Directory to save pre-trained model checkpoints |
run_log_path |
Directory for training logs |
vocab_path |
Directory containing vocabulary files |
checkpoint_save_path |
Directory to save training checkpoints |
checkpoint_load_file_path |
Directory to load checkpoints from |
For more parameter descriptions, please refer to PARAMETER.md.
- Activate the PreGP environment.
conda activate PreGP
- Start finetuning.
bash finetuning.sh
Here are the meanings of some key parameters, which need to be modified as needed.
| Parameter | Description |
|---|---|
load_model_name |
Filename of the pre-trained model checkpoint to load |
cvf_path |
Path to cross-validation folds CSV file |
phe_path |
Path to phenotype data CSV file |
pretrain_model_path |
Directory containing the pre-trained model |
fine_tuning_model_path |
Directory to save fine-tuned model checkpoints |
geno_path |
Path to genotype data CSV file |
run_log_path |
Directory for training logs |
vocab_path |
Directory containing vocabulary files |
vocab_name |
Name of the vocabulary file |
pred_save_path |
Directory to save prediction results |
unfreeze_from_layer |
Layer index from which to unfreeze parameters |
For more parameter descriptions, please refer to PARAMETER.md.
You can download our pretrained models trained on large-scale genotype data and vocabularies at Hugging Face. Access Hugging Face 🤗 through the following link.
https://huggingface.co/integer8/PreGP
- Method 1: Directly download the model and vocabularies through the browser.
Simply click the download button next to the model to download the model weights and vocabularies you need.
- Method 2: Use git clone to get the models and vocabularies.
- Install
git lfs. If you are usingUbuntu, you can install it with the following command.
sudo apt update
sudo apt install git
sudo apt install git-lfs
git lfs install
- Execute the following command.
git clone https://huggingface.co/integer8/PreGP
