B-PPI: A Cross-Attention Model for Large-Scale Bacterial Protein-Protein Interaction Prediction

B-PPI provides a specialized framework for rapid prediction of bacterial protein-protein interactions (PPIs). B-PPI was trained on B-PPI-DB, a database of positive and negative bacterial protein-protein interactions (derived from STRING) and utilizes a cross-attention mechanism to capture residue-level relationships between protein pairs.

Installation

pip install -r requirements.txt

Usage

B-PPI offers modules for prediction (inference) and fine-tuning. It is recommended to run it on GPU. Below are the details for each command.

1. Prediction (Inference)

Option A: Target Specific Pairs (predict command) Use this command to predict interactions of a specific list of protein pairs.

Input:

fasta_path: (Required) Path to a standard .fasta file containing the sequences for all proteins involved.
input_csv: (Required) Path to a .csv file defining the pairs to test. Must contain a header row with columns protein1 and protein2. The names must match the headers in the FASTA file.
model_path: (Required) Path to the pre-trained model file (.pt).
output_csv: (Required) Path where the results will be saved.
score_cutoff: (Required) A score above which to consider proteins pair as binding.

Command:

    python main.py predict \
      --fasta_path sample.fasta \
      --input_csv sample.csv \
      --model_path model.pt \
      --output_csv bppi_output.csv \
      --score_cutoff 0.6

Option B: All-vs-All Screening (predict_all command) Use this command to predict interactions between all potential pairs of proteins contained in two FASTA files (or within a single file if the same path is provided twice).

Input:

fasta_A_path: (Required) Path to the first .fasta file.
fasta_B_path: (Required) Path to the second .fasta file.
model_path: (Required) Path to the pre-trained model file (.pt).
output_csv: (Required) Path where the results will be saved.
score_cutoff: (Required) A score above which to consider proteins pair as binding.

Command:

python main.py predict_all \
  --fasta_A_path sample1.fasta \
  --fasta_B_path sample2.fasta \
  --model_path model.pt \
  --output_csv bppi_output.csv \
  --score_cutoff 0.6

2. Fine-Tuning

If you have specific data (pairs known to bind vs. not bind) of bacteria, you can fine-tune the model on your dataset to improve accuracy. This is a two-step process. Step 1: Extract Embeddings (prostT5_embeddings command) Before fine-tuning, you should extract embeddings for your protein sequences using the ProstT5 model.

Input:

fasta_path: (Required) Path to a .fasta file containing all protein sequences used in your training/testing data.
embeddings_h5_path: (Required) Output path for the generated .h5 embeddings file.

Command:

python main.py prostT5_embeddings \
  --fasta_path sample.fasta \
  --embeddings_h5_path sample_emb.h5

Step 2: Run Fine-Tuning (finetune command) Train the model using your labeled data and the embeddings generated in Step 1.

Input:

train_csv, val_csv, test_csv: (Required) Paths to your training, validation, and testing datasets. The .csv files must contain a header with columns: protein1, protein2, and label (1 for binding - positive, 0 for non-binding - negative).
embeddings_h5: (Required) Path to the .h5 file generated in Step 1.
model_to_finetune: (Required) Path to the base model (.pt) you wish to fine-tune.
model_save_path: (Required) Path where the new fine-tuned model will be saved.

Command:

python main.py finetune \
  --train_csv train_sample_to_finetune.csv \
  --val_csv val_sample_to_finetune.csv \
  --test_csv test_sample_to_finetune.csv \
  --embeddings_h5 sample_emb.h5 \
  --model_to_finetune model.pt \
  --model_save_path model_finetuned.pt

Development

The dev/ folder contains additional modules for reproduction and benchmarking:

create_dataset.py: Script used to create B-PPI-DB from the STRING database.
train.py: The original script used to train the B-PPI model.
evaluate_bppi_db.ipynb: Notebook containing evaluation results of a 5-fold cross-validation on B-PPI-DB.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dev		dev
README.md		README.md
finetune.py		finetune.py
main.py		main.py
model.pt		model.pt
predict.py		predict.py
prostT5_embeddings.py		prostT5_embeddings.py
requierments.txt		requierments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

B-PPI: A Cross-Attention Model for Large-Scale Bacterial Protein-Protein Interaction Prediction

Installation

Usage

1. Prediction (Inference)

2. Fine-Tuning

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

B-PPI: A Cross-Attention Model for Large-Scale Bacterial Protein-Protein Interaction Prediction

Installation

Usage

1. Prediction (Inference)

2. Fine-Tuning

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages