Skip to content

Example scripts for navigating protein sequence structure function space.

License

Notifications You must be signed in to change notification settings

BrooksResearchGroup-UM/seq_struct_func

Repository files navigation

Guide to running sequence-structure-function pipeline

Refer to our paper in Bioinformatics for more details.

Setup

Data: (si_data/) Data necessary for running examples

  • asr_seq_annotations.xlsx
    • All enzymes, sequences, and annotations from structure-function pipeline
  • extant_msa.fasta
    • Multiple sequence alignment used previously to construct ancestral sequence resurrects
  • fasta/
    • Sequences in asr_seq_annotations.xlsx written as fasta format
  • pdb_with_fad/
    • Directory containing all AlphaFold2 models with FAD cofactor
  • top_dock_pose/
    • Directory cotaining lowest energy poses from minimization in explicit protein
  • log_reg_models/
    • Pretrained statsmodels logistic regression models

Toppar: (toppar/) CHARMM Topology and Parameter files

Step 1: (consensus/) Generating consensus sequence hits library

  • script/gen_consensus_db.ipynb
    • Create database of consensus sequence hits from AlphaFold2 MSAs

Step 2: (model/) Generating AlphaFold2 Structures

  • script/run_alphafold_consensus.ipynb
    • Run example protein with AlphaFold2 using consensus sequence hits

Step 3: (cofactor/) Adding FAD Cofactor

  • script/fad.ipynb
    • Add FAD cofactor into generated example protein

Step 4: (dock/) Docking Array of Ligands

  • script/fftdock.ipynb
    • Use CHARMM Fast Fourier Transform Docking to get initial positions of ligand
  • script/prot_min.ipynb
    • Refine FFT poses in explicit protein representation
  • script/cluster.ipynb
    • Cluster poses to select representative poses

Step 5: (pred/) Prediction of Stereochemistry and Reactivity

  • script/stereo.ipynb
    • Predict stereochemistry from boltzmann weighted representative poses
  • script/reactivity.ipynb
    • Predict reactivity from pose features
  • script/vis_pred.ipynb
    • Visuallize predicted poses

Step 6: (msa/) Generate Multiple Sequence Alignment Localized to Binding Site

  • script/gen_msa.ipynb
    • Generate Multiple Sequence Alignment
  • script/get_bs_ss_residues.ipynb
    • Get set of binding site and second shell residues
  • script/slice_msa.ipynb
    • Modify MSA to be limited to binding site and second shell residues

Step 7: (seq_func/) Training Sequence-Function Model and SHAP Analysis

  • script/run_automl.ipynb
    • Fit multiple sequence alignment to predicted stereochemistry labels with gradient boosted trees and random forest models
  • script/shap_analysis.ipynb
    • Calculate SHAP values for residues and visuallize how residues affect stereochemistry

About

Example scripts for navigating protein sequence structure function space.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •