LeadOptima: Hit-to-Lead MPO Pipeline

Overview

In early-stage drug discovery, transitioning a promising High-Throughput Screening (HTS) hit into a clinical lead is rarely straightforward. Medicinal chemistry optimization often becomes a long, iterative process: tweaking a scaffold to improve target affinity frequently ruins its aqueous solubility, while fixing that solubility might inadvertently introduce toxicities. Finding the ideal analog requires balancing multiple properties simultaneously.

LeadOptima is designed to streamline this triage phase. By automating combinatorial scaffold hopping via established SMIRKS transformations, the pipeline generates chemically viable analogs from a single parent hit. It rapidly evaluates these candidates using predictive machine learning models. The core of the tool is its Multi-Parameter Optimization (MPO) desirability engine, which forces a balance between aqueous solubility and low hERG toxicity risk.

LeadOptima acts as a rigorous early-stage computational filter, identifying the candidates that succeed on both fronts and saving chemists from wasting time and resources on likely dead-end compounds.

Example Molecule: Paracetamol

Paracetamol (acetaminophen, the main ingredient in Tylenol) generates 12 analogs which the MPO ranks as highly desirable.

SMILES: CC(=O)Nc1ccc(O)cc1

Scaffold Hopping

The table below highlights a sample of the generated analogs, demonstrating the desirability function's ability to reward the maintenance of aqueous solubility while heavily penalizing structural shifts that introduce hERG liabilities.

Analog Structure (SMILES)	Transformation Applied	Predicted $\log S$	hERG Risk (0 - 1)	Global Desirability
`CC(=O)Nc1ccc(O)nc1`	Phenyl $\rightarrow$ Pyridine	-1.37	0.07	0.962
`CC(=O)Nc1ccc(O)c(F)c1`	Aromatic Fluorination	-1.87	0.09	0.953
`CC(=O)Nc1ccc(O)c(N2CCOCC2)c1`	Morpholine Appendage	-2.34	0.08	0.913
`CNC(=O)c1ccc(O)cc1`	Amide Reversal / Shift	-1.48	0.29	0.839

In the third row, the addition of a morpholine ring pushes the $\log S$ past the -2.0 threshold, triggering a penalty in the Solubility_Desirability. In the final row, a significant increase in the raw toxicity risk heavily drags down the global score via the geometric mean calculation.

Architecture

The pipeline is completely modular, separating the ML inference from the mathematical optimization to allow for real-time dashboard rendering.

Stage 1: Models
- RDKit: Standardizes SMILES, strips salts, and only intakes organic molecules.
- Aqueous Solubility (SolPredict): A Random Forest regressor trained on a curated AqSolDB dataset. Utilizes RDKit 2D physical descriptors to predict $\log S$.
  - Test RMSE of $0.442$ log units (AqSolDB train, Delaney e.g. drug-like molecule test)
  - $\ge-2.0 \log S$ as maximally desirable (1.0); $\le6.0 \log S$ as completely undesirable (0.0)
- hERG Toxicity (ADMET-LLM): A hybrid gradient-boosting and LLM architecture. Utilizes 768-dimensional ChemBERTa embeddings passed into an XGBoost classifier.
  - ROC-AUC of $0.748$
  - 87% Recall for true hERG blockers
Stage 2: Combinatorial Scaffold Hopping
- Utilizes SMIRKS strings to apply curated, chemically defensible bioisosteric replacements (e.g., aromatic fluorination, amide reversals, solubilizing piperazine appendages). See below for an example.
Stage 3: MPO Desirability Engine
- Calculates the weighted geometric mean of solubility and toxicity desirability.
Stage 4: Interactive Triage Dashboard
- A local Streamlit web application utilizes Plotly to visualize the MPO in real-time.

Quickstart

Clone the repository and build the environment: This project uses Conda to manage complex C++ multi-threading dependencies (OpenMP) required by XGBoost and PyTorch on Apple Silicon.

git clone https://github.com/kaih-b/leadoptima-mpo
cd leadoptima-mpo
conda env create -f environment.yml
conda activate leadoptima
pip install -e .

Run the Unit Tests: Ensure everything is functioning correctly before launching.

pytest -v

Launch the Dashboard

python -m streamlit run src/dashboard/app.py

Dashboard Usage

Enter a valid SMILES string (e.g. c1ccccc1) into the sidebar.
Click Run Analog Generator. The engine will apply the SMIRKS library, standardize the outputs, and push the analogs through the ML scoring models.
Once the visualization renders, adjust the MPO Strategy Weights in the sidebar to dynamically recalculate the global scores and watch the optimal candidates shift in real-time

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
data/models		data/models
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeadOptima: Hit-to-Lead MPO Pipeline

Overview

Example Molecule: Paracetamol

Scaffold Hopping

Architecture

Quickstart

Dashboard Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LeadOptima: Hit-to-Lead MPO Pipeline

Overview

Example Molecule: Paracetamol

Scaffold Hopping

Architecture

Quickstart

Dashboard Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages