Skip to content

Releases: ribesstefano/PROTAC-Splitter

Initial stable release

03 Jul 14:44

Choose a tag to compare

PROTAC-Splitter Release Description

Overview

PROTAC-Splitter is an open-source machine learning framework for the automated identification and splitting of PROTAC (Proteolysis Targeting Chimera) molecules into their constituent substructures. The toolkit provides robust cheminformatics utilities, graph-based algorithms, and deep learning models for the analysis, curation, and prediction of PROTAC substructures. It is designed for researchers and practitioners in computational chemistry, drug discovery, and cheminformatics.

Key Features

  • Automated PROTAC Splitting:
    Accurately splits PROTAC SMILES into POI ligand, linker, and E3 ligase binder substructures using the split_protac function.
  • Graph-Based Algorithms:
    Includes heuristic and ML-based graph algorithms for substructure identification (protac_splitter/graphs).
  • Deep Learning Models:
    Supports fine-tuning and inference with transformer-based models for PROTAC splitting.
  • Data Curation Utilities:
    Tools for cleaning, mapping, and curating PROTAC datasets (notebooks/data_curation.ipynb).
  • Evaluation Suite:
    Comprehensive metrics for evaluating splitting accuracy, including chemical and graph-based metrics (protac_splitter/evaluation.py).
  • Interactive Gradio App:
    User-friendly web interface for visualizing and splitting PROTACs (scripts/protac_splitter_app.py and HuggingFace Space).
  • Extensive Script Library:
    Scripts for dataset generation, model training, prediction collection, scoring, and visualization (scripts/README.md).
  • Reproducible Pipelines:
    Example notebooks and scripts for end-to-end workflows (notebooks/).

Installation

  • Requires Python 3.10.8.
  • Install dependencies:
    pip install -r requirements.txt
    pip install -r scripts/requirements.txt
  • Or install directly via pip:
    pip install git+https://github.com/ribesstefano/PROTAC-Splitter.git

Usage

  • Python API:
    Use the split_protac function to split PROTAC SMILES or DataFrames.
    from protac_splitter import split_protac
    ligands = split_protac("CC(C)(C)S(=O)(=O)...")
  • Gradio App:
    Launch the GUI with:
    gradio scripts/protac_splitter_app.py
  • Scripts:
    See README.md for dataset generation, model training, prediction, and evaluation workflows.

Data & Models

  • Curated datasets and trained models are available at Zenodo.

Documentation

  • Main usage and API: README.md
  • Scripts and workflows: README.md
  • Example notebooks: notebooks

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

Distributed under the MIT License. See LICENSE for details.