GitHub - ShuklaGroup/BioLLMComposition: Learning physical interactions to compose biological LLMs

Learning Physical Interactions to Compose Biological LLMs [Paper]

Large language models (LLMs) trained on biochemical sequences learn feature vectors that guide drug discovery through virtual screening. However, LLMs do not capture the molecular interactions important for binding affinity and specificity prediction. We compare a variety of methods to combine representations from distinct biological modalities to effectively represent molecular complexes. We demonstrate that learning to merge the representations from the internal layers of domain specific biological language models outperforms standard molecular interactions representations despite having significantly fewer features.

Quick Start

Our Google Colab Notebook compares and visualizes embeddings from four multimodal representation strategies.

Python Scripts

You can also run each experiment and generate the corresponding plots using a standalone python script. We reccomend using Python 3.10.18 and PyTorch 2.5.1:

git clone https://github.com/ShuklaGroup/BioLLMComposition
cd BioLLMComposition
python BioLLMComposition_peptide_mhc.py
python BioLLMComposition_protein_ligand.py

Citation:

@article{Clark2026,
      title = {Learning physical interactions to compose biological large language models},
      ISSN = {2399-3669},
      url = {http://dx.doi.org/10.1038/s42004-025-01883-7},
      DOI = {10.1038/s42004-025-01883-7},
      journal = {Communications Chemistry},
      publisher = {Springer Science and Business Media LLC},
      author = {Clark,  Joseph D. and Dean,  Tanner J. and Shukla,  Diwakar},
      year = {2026},
      month = jan 
}

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
data		data
images		images
BioLLMComposition.ipynb		BioLLMComposition.ipynb
BioLLMComposition_peptide_mhc.py		BioLLMComposition_peptide_mhc.py
BioLLMComposition_protein_ligand.py		BioLLMComposition_protein_ligand.py
README.md		README.md
toc.jpg		toc.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning Physical Interactions to Compose Biological LLMs [Paper]

Quick Start

Python Scripts

Citation:

About

Uh oh!

Releases

Packages

Languages

ShuklaGroup/BioLLMComposition

Folders and files

Latest commit

History

Repository files navigation

Learning Physical Interactions to Compose Biological LLMs [Paper]

Quick Start

Python Scripts

Citation:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages