In silico protein optimization through genetic algorithms.
About The Project •
Getting Started •
Usage •
Contributing
GAPO is an in silico genetic algorithm used to optimize proteins for a desired function, such as stability and affinity. The algorithm mimics the evolutionary process by recombining and adding mutations to the best sequences in order to generate a new population with higher diversity and optimized for the given objective function. Preprint DOI
Follow these steps to set up and run the project locally.
Before you begin, ensure you have Conda installed on your system.
- If you don't have Conda, follow the installation instructions on the official website: Anaconda Installation.
-
Clone the Repository
git clone [https://github.com/izzetbiophysicist/prot_eng_GA.git](https://github.com/izzetbiophysicist/prot_eng_GA.git) cd prot_eng_GA -
Create the Base Conda Environment This command uses the
environment.ymlfile to create a new environment namedgapo_envwith all the base dependencies.conda env create -f environment.yml
-
Activate the New Environment
conda activate gapo_env
-
Install PyTorch (
⚠️ Crucial Step) Theenvironment.ymlfile does not install PyTorch to ensure you choose the correct version for your hardware. You must install it manually.-
🚀 For NVIDIA GPU Users (Highly Recommended): Visit the Official PyTorch Website. Select the settings that match your system (e.g., Conda, Python, your CUDA version) and run the generated command. It will look something like this:
# This is an EXAMPLE command, get the correct one from the PyTorch website! conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -
💻 For CPU-Only Users: If you do not have a compatible GPU, install the CPU-only version of PyTorch with this command:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
-
-
Install PyRosetta Finally, install PyRosetta using its dedicated installer.
pip install pyrosetta-installer python -c 'import pyrosetta_installer; pyrosetta_installer.install_pyrosetta()'
GAPO is run from the command line, specifying the optimization mode (structure or sequence) and the desired parameters.
Below are the command-line parameters for each mode of operation in GAPO.
| Parameter | Description |
|---|---|
--pdb |
The input PDB file for the optimization. |
--residues_to_mut |
List of residue indices (PDB numbering) to be mutated. |
| Parameter | Description | Default Value |
|---|---|---|
--pop_size |
Size of the population in each generation. | 50 |
--cycles |
Number of cycles (generations) for the genetic algorithm. | 50 |
--mutation_type |
Type of mutation to be used during optimization. | esm |
--mutation_rate |
The mutation rate applied to the population. | 0.9 |
--direction |
Optimization direction: up (maximize) or down (minimize). |
down |
--apt_function |
Aptitude function to be used. | rosetta |
--temp |
ESM2 temperature to control the randomness of mutations. | 1.5 |
--output_file |
Base name for the output file. | gapo_results |
--cpus |
Number of CPUs to use for parallel processing. | 1 |
| Parameter | Description |
|---|---|
--seq |
The initial amino acid sequence for optimization. |
--residues_to_mut |
List of indices for the residues in the sequence to be mutated. |
| Parameter | Description | Default Value |
|---|---|---|
--pop_size |
Size of the population in each generation. | 50 |
--cycles |
Number of cycles (generations) for the genetic algorithm. | 50 |
--mutation_type |
Type of mutation to be used during optimization. | esm |
--mutation_rate |
The mutation rate applied to the population. | 0.9 |
--direction |
Optimization direction: up (maximize) or down (minimize). |
up |
--apt_function |
Aptitude function to be used. | esm |
--temp |
ESM2 temperature to control the randomness of mutations. | 1.5 |
--output_file |
Base name for the output file. | gapo_results |
This example optimizes the CDRs of an scFv based on its PDB structure, using the Rosetta score as the objective function.
python GA_main.py structure \
--pdb ab_trimed_relax.pdb \
--residues_to_mut 62 63 64 65 66 67 68 69 70 71 72 88 89 90 91 92 93 94 127 128 129 130 131 132 133 134 135 186 187 188 189 190 191 192 212 213 214 215 216 257 258 259 260 261 262 263 264 265 266 267 268 269 \
--apt_function rosetta \
--pop_size 50 \
--cycles 10 \
--opt_direction down \
--output_file rosetta_run_01This example takes an initial peptide sequence and evolves it to maximize its likelihood according to the ESM-2 model, mutating only the core region.
python GA_main.py sequence \
--seq "RKVCNGIGIGEFKDSLSINATNIKHFKNCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAW" \
--residues_to_mut 3 4 5 6 7 8 9 \
--apt_function esm \
--pop_size 100 \
--cycles 20 \
--direction up \
--output_file esm_run_peptideContributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
