Releases: ribesstefano/PROTAC-Splitter
Releases · ribesstefano/PROTAC-Splitter
Initial stable release
PROTAC-Splitter Release Description
Overview
PROTAC-Splitter is an open-source machine learning framework for the automated identification and splitting of PROTAC (Proteolysis Targeting Chimera) molecules into their constituent substructures. The toolkit provides robust cheminformatics utilities, graph-based algorithms, and deep learning models for the analysis, curation, and prediction of PROTAC substructures. It is designed for researchers and practitioners in computational chemistry, drug discovery, and cheminformatics.
Key Features
- Automated PROTAC Splitting:
Accurately splits PROTAC SMILES into POI ligand, linker, and E3 ligase binder substructures using thesplit_protacfunction. - Graph-Based Algorithms:
Includes heuristic and ML-based graph algorithms for substructure identification (protac_splitter/graphs). - Deep Learning Models:
Supports fine-tuning and inference with transformer-based models for PROTAC splitting. - Data Curation Utilities:
Tools for cleaning, mapping, and curating PROTAC datasets (notebooks/data_curation.ipynb). - Evaluation Suite:
Comprehensive metrics for evaluating splitting accuracy, including chemical and graph-based metrics (protac_splitter/evaluation.py). - Interactive Gradio App:
User-friendly web interface for visualizing and splitting PROTACs (scripts/protac_splitter_app.py and HuggingFace Space). - Extensive Script Library:
Scripts for dataset generation, model training, prediction collection, scoring, and visualization (scripts/README.md). - Reproducible Pipelines:
Example notebooks and scripts for end-to-end workflows (notebooks/).
Installation
- Requires Python 3.10.8.
- Install dependencies:
pip install -r requirements.txt pip install -r scripts/requirements.txt
- Or install directly via pip:
pip install git+https://github.com/ribesstefano/PROTAC-Splitter.git
Usage
- Python API:
Use thesplit_protacfunction to split PROTAC SMILES or DataFrames.from protac_splitter import split_protac ligands = split_protac("CC(C)(C)S(=O)(=O)...")
- Gradio App:
Launch the GUI with:gradio scripts/protac_splitter_app.py
- Scripts:
See README.md for dataset generation, model training, prediction, and evaluation workflows.
Data & Models
- Curated datasets and trained models are available at Zenodo.
Documentation
- Main usage and API: README.md
- Scripts and workflows: README.md
- Example notebooks: notebooks
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
License
Distributed under the MIT License. See LICENSE for details.