Skip to content

kaih-b/leadoptima-mpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LeadOptima: Hit-to-Lead MPO Pipeline

Overview

In early-stage drug discovery, transitioning a promising High-Throughput Screening (HTS) hit into a clinical lead is rarely straightforward. Medicinal chemistry optimization often becomes a long, iterative process: tweaking a scaffold to improve target affinity frequently ruins its aqueous solubility, while fixing that solubility might inadvertently introduce toxicities. Finding the ideal analog requires balancing multiple properties simultaneously.

LeadOptima is designed to streamline this triage phase. By automating combinatorial scaffold hopping via established SMIRKS transformations, the pipeline generates chemically viable analogs from a single parent hit. It rapidly evaluates these candidates using predictive machine learning models. The core of the tool is its Multi-Parameter Optimization (MPO) desirability engine, which forces a balance between aqueous solubility and low hERG toxicity risk.

LeadOptima acts as a rigorous early-stage computational filter, identifying the candidates that succeed on both fronts and saving chemists from wasting time and resources on likely dead-end compounds.

Example Molecule: Paracetamol

Paracetamol (acetaminophen, the main ingredient in Tylenol) generates 12 analogs which the MPO ranks as highly desirable.

SMILES: CC(=O)Nc1ccc(O)cc1

Paracetamol MPO Visualization

Scaffold Hopping

The table below highlights a sample of the generated analogs, demonstrating the desirability function's ability to reward the maintenance of aqueous solubility while heavily penalizing structural shifts that introduce hERG liabilities.

Analog Structure (SMILES) Transformation Applied Predicted $\log S$ hERG Risk (0 - 1) Global Desirability
CC(=O)Nc1ccc(O)nc1 Phenyl $\rightarrow$ Pyridine -1.37 0.07 0.962
CC(=O)Nc1ccc(O)c(F)c1 Aromatic Fluorination -1.87 0.09 0.953
CC(=O)Nc1ccc(O)c(N2CCOCC2)c1 Morpholine Appendage -2.34 0.08 0.913
CNC(=O)c1ccc(O)cc1 Amide Reversal / Shift -1.48 0.29 0.839

In the third row, the addition of a morpholine ring pushes the $\log S$ past the -2.0 threshold, triggering a penalty in the Solubility_Desirability. In the final row, a significant increase in the raw toxicity risk heavily drags down the global score via the geometric mean calculation.

Architecture

The pipeline is completely modular, separating the ML inference from the mathematical optimization to allow for real-time dashboard rendering.

  • Stage 1: Models
    • RDKit: Standardizes SMILES, strips salts, and only intakes organic molecules.
    • Aqueous Solubility (SolPredict): A Random Forest regressor trained on a curated AqSolDB dataset. Utilizes RDKit 2D physical descriptors to predict $\log S$.
      • Test RMSE of $0.442$ log units (AqSolDB train, Delaney e.g. drug-like molecule test)
      • $\ge-2.0 \log S$ as maximally desirable (1.0); $\le6.0 \log S$ as completely undesirable (0.0)
    • hERG Toxicity (ADMET-LLM): A hybrid gradient-boosting and LLM architecture. Utilizes 768-dimensional ChemBERTa embeddings passed into an XGBoost classifier.
      • ROC-AUC of $0.748$
      • 87% Recall for true hERG blockers
  • Stage 2: Combinatorial Scaffold Hopping
    • Utilizes SMIRKS strings to apply curated, chemically defensible bioisosteric replacements (e.g., aromatic fluorination, amide reversals, solubilizing piperazine appendages). See below for an example.
  • Stage 3: MPO Desirability Engine
    • Calculates the weighted geometric mean of solubility and toxicity desirability.
  • Stage 4: Interactive Triage Dashboard
    • A local Streamlit web application utilizes Plotly to visualize the MPO in real-time.

Quickstart

  1. Clone the repository and build the environment: This project uses Conda to manage complex C++ multi-threading dependencies (OpenMP) required by XGBoost and PyTorch on Apple Silicon.
git clone https://github.com/kaih-b/leadoptima-mpo
cd leadoptima-mpo
conda env create -f environment.yml
conda activate leadoptima
pip install -e .
  1. Run the Unit Tests: Ensure everything is functioning correctly before launching.
pytest -v
  1. Launch the Dashboard
python -m streamlit run src/dashboard/app.py

Dashboard Usage

  1. Enter a valid SMILES string (e.g. c1ccccc1) into the sidebar.
  2. Click Run Analog Generator. The engine will apply the SMIRKS library, standardize the outputs, and push the analogs through the ML scoring models.
  3. Once the visualization renders, adjust the MPO Strategy Weights in the sidebar to dynamically recalculate the global scores and watch the optimal candidates shift in real-time

About

An automated combinatorial scaffold-hopping and multi-parameter optimization (MPO) pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages