AbLangPDB1: Epitope-Aware Antibody Embeddings

State-of-the-art antibody embeddings that predict epitope overlap for targeted therapeutic discovery

AbLangPDB1 generates 1536-dimensional embeddings where antibodies targeting similar epitopes cluster together - enabling rapid epitope classification, antibody search, and therapeutic discovery.

Model Description

AbLangPDB1 is designed to predict epitope overlap between antibodies by generating high-quality embeddings that capture epitope-specificity information. The model uses contrastive learning on paired heavy and light chain sequences to learn representations where antibodies targeting similar epitopes cluster together in embedding space.

Architecture

Heavy Chain Seq → [AbLang Heavy] → 768-dim → |
                                              | → [Concatenate] → [Mixer Network] → 1536-dim Paired Embedding
Light Chain Seq → [AbLang Light] → 768-dim → |

The model processes heavy and light chains independently using pre-trained AbLang models, then fuses their embeddings through a custom Mixer network (6 fully connected layers) to produce a unified 1536-dimensional embedding.

Training Data

Source: 1,909 non-redundant human antibodies from Structural Antibody Database (SAbDab)
Cutoff Date: February 19, 2024
Antigen Assignment: Pfam domain-based categorization using pfam_scan
Data Splits: 80% training, 10% validation, 10% test (clone-group aware splitting)

Quick Start

Model Download

AbLangPDB1 is hosted on HuggingFace for optimal download experience:

🤗 Download from HuggingFace Hub

# Option 1: Direct download (recommended)
curl -L "https://huggingface.co/clint-holt/AbLangPDB1/resolve/main/ablangpdb_model.safetensors?download=true" -o ablangpdb_model.safetensors

# Option 2: Using huggingface_hub
pip install huggingface_hub
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='clint-holt/AbLangPDB1', filename='ablangpdb_model.safetensors', local_dir='.')"

Get Started in 3 Steps

# 1. Clone repository and install dependencies
git clone https://github.com/your-username/AbLangPDB1.git
cd AbLangPDB1
pip install torch pandas transformers safetensors

# 2. Download model weights (738MB)
curl -L "https://huggingface.co/clint-holt/AbLangPDB1/resolve/main/ablangpdb_model.safetensors?download=true" -o ablangpdb_model.safetensors

# 3. Run inference
python quick_start_example.py

Explore examples: pdb_inference_examples.ipynb | 🔬 Run benchmarks: benchmarking/

30-Second Example

import torch
from transformers import AutoTokenizer
from ablangpaired_model import AbLangPaired, AbLangPairedConfig

# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AbLangPairedConfig(checkpoint_filename="ablangpdb_model.safetensors")
model = AbLangPaired(config, device).eval()

# Your antibody sequences
heavy_chain = "EVQLVESGGGLVQPGGSLRLSCAASGFNLYYYSIHWVRQAPGKGLEWVASISPYSSSTSYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARGRWYRRALDYWGQGTLVTVSS"
light_chain = "DIQMTQSPSSLSASVGDRVTITCRASQSVSSAVAWYQQKPGKAPKLLIYSASSLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQYPYYSSLITFGQGTKVEIK"

# Tokenize (add spaces between amino acids)
heavy_tokenizer = AutoTokenizer.from_pretrained("clint-holt/AbLangPDB1", subfolder="heavy_tokenizer")
light_tokenizer = AutoTokenizer.from_pretrained("clint-holt/AbLangPDB1", subfolder="light_tokenizer")

h_tokens = heavy_tokenizer(" ".join(heavy_chain), return_tensors="pt")
l_tokens = light_tokenizer(" ".join(light_chain), return_tensors="pt")

# Generate embedding
with torch.no_grad():
    embedding = model(
        h_input_ids=h_tokens['input_ids'].to(device),
        h_attention_mask=h_tokens['attention_mask'].to(device),
        l_input_ids=l_tokens['input_ids'].to(device),
        l_attention_mask=l_tokens['attention_mask'].to(device)
    )

print(f"Generated embedding shape: {embedding.shape}")  # (1, 1536)

Performance Highlights

Dataset	Metric	AbLangPDB1	Best Baseline
SAbDab	ROC-AUC	0.89	0.82
SAbDab	F1 Score	0.76	0.69
DMS	ROC-AUC	0.85	0.79
DMS	F1 Score	0.72	0.65

Comparison against ESM-2, AbLang, AntiBERTy, and other state-of-the-art models

Use Cases

Epitope Classification: Compare antibodies with unknown epitopes against reference databases
Antibody Search: Find antibodies with similar epitope specificity in large sequence databases
Therapeutic Discovery: Identify candidate antibodies targeting the same epitope as reference therapeutics
Antibody Clustering: Group antibodies by epitope similarity for analysis

📄 Publication

Contrastive Learning Enables Epitope Overlap Predictions for Targeted Antibody Discovery
Clinton M. Holt, Alexis K. Janke, Parastoo Amlashi, Parker J. Jamieson, Toma M. Marinov, Ivelin S. Georgiev
bioRxiv, 2025. https://doi.org/10.1101/2025.02.25.640114

Vanderbilt Center for Antibody Therapeutics, Vanderbilt University Medical Center

📈 Benchmarking

This repository includes comprehensive benchmarking code to evaluate AbLangPDB1 against other methods:

Available Scripts

benchmarking/calculate_metrics.py: Evaluation on SAbDab dataset with continuous epitope/antigen labels
benchmarking/calculate_metrics_dms.py: Evaluation on DMS dataset with binary epitope labels
benchmarking/get_metrics.ipynb: Interactive notebook for running benchmarks

Benchmark Datasets

SAbDab Dataset: Structural antibody database with epitope/antigen overlap labels
DMS Dataset: Deep mutational scanning data with binary epitope matching

Metrics

ROC-AUC: Area under the receiver operating characteristic curve
Average Precision: Area under the precision-recall curve
F1 Score: Harmonic mean of precision and recall

Training

Code for training is in the folder train To perform training:

cd train
python train_ablangpdb.py

Interpretability

Code for doing Captum Integrated Gradient Analysis is in the folder interpretability Look at the README as well as interpretability/DataAnalysis.ipynb to get started.

Paper Figures

See Paper_Figures.ipynb for the code that generated the paper figures. Not all data files are included to generate these figures. Email clinton.m.holt@gmail.com if there are specific data files you would like.

⚠️ Limitations

Species: Optimized for human antibodies; reduced accuracy expected for mouse BCRs
Domain Coverage: Lower performance for antibodies targeting antigen families not included in training. Also lower performance when not referencing new antibodies against antibodies in the PDB.
Custom Architecture: Requires the provided ablangpaired_model.py implementation (not compatible with standard HuggingFace model loading)

Model Weights & Resources

HuggingFace Model: clint-holt/AbLangPDB1

Direct Download:

curl -L https://huggingface.co/clint-holt/AbLangPDB1/resolve/main/ablangpdb_model.safetensors?download=true -o ablangpdb_model.safetensors

Paper: bioRxiv link

📚 Citation

If you use this model or code in your research, please cite our paper:

@article{Holt2025.02.25.640114,
    author = {Holt, Clinton M. and Janke, Alexis K. and Amlashi, Parastoo and Jamieson, Parker J. and Marinov, Toma M. and Georgiev, Ivelin S.},
    title = {Contrastive Learning Enables Epitope Overlap Predictions for Targeted Antibody Discovery},
    elocation-id = {2025.02.25.640114},
    year = {2025},
    doi = {10.1101/2025.02.25.640114},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2025/04/01/2025.02.25.640114},
    eprint = {https://www.biorxiv.org/content/early/2025/04/01/2025.02.25.640114.full.pdf},
    journal = {bioRxiv}
}

📝 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
benchmarking		benchmarking
interpretability		interpretability
train		train
.gitattributes		.gitattributes
CHANGES_SUMMARY.md		CHANGES_SUMMARY.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Paper_Figures.ipynb		Paper_Figures.ipynb
README.md		README.md
__init__.py		__init__.py
ablangpaired_model.py		ablangpaired_model.py
combine_embeddings.py		combine_embeddings.py
config.json		config.json
pdb_inference_examples.ipynb		pdb_inference_examples.ipynb
pdb_inference_examples_original.ipynb		pdb_inference_examples_original.ipynb
quick_start_example.py		quick_start_example.py
requirements.txt		requirements.txt
setup.py		setup.py
update_notebooks.py		update_notebooks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AbLangPDB1: Epitope-Aware Antibody Embeddings

Model Description

Architecture

Training Data

Quick Start

Model Download

Get Started in 3 Steps

30-Second Example

Performance Highlights

Use Cases

📄 Publication

📈 Benchmarking

Available Scripts

Benchmark Datasets

Metrics

Training

Interpretability

Paper Figures

⚠️ Limitations

Model Weights & Resources

📚 Citation

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AbLangPDB1: Epitope-Aware Antibody Embeddings

Model Description

Architecture

Training Data

Quick Start

Model Download

Get Started in 3 Steps

30-Second Example

Performance Highlights

Use Cases

📄 Publication

📈 Benchmarking

Available Scripts

Benchmark Datasets

Metrics

Training

Interpretability

Paper Figures

⚠️ Limitations

Model Weights & Resources

📚 Citation

📝 License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages