In early-stage drug discovery, transitioning a promising High-Throughput Screening (HTS) hit into a clinical lead is rarely straightforward. Medicinal chemistry optimization often becomes a long, iterative process: tweaking a scaffold to improve target affinity frequently ruins its aqueous solubility, while fixing that solubility might inadvertently introduce toxicities. Finding the ideal analog requires balancing multiple properties simultaneously.
LeadOptima is designed to streamline this triage phase. By automating combinatorial scaffold hopping via established SMIRKS transformations, the pipeline generates chemically viable analogs from a single parent hit. It rapidly evaluates these candidates using predictive machine learning models. The core of the tool is its Multi-Parameter Optimization (MPO) desirability engine, which forces a balance between aqueous solubility and low hERG toxicity risk.
LeadOptima acts as a rigorous early-stage computational filter, identifying the candidates that succeed on both fronts and saving chemists from wasting time and resources on likely dead-end compounds.
Paracetamol (acetaminophen, the main ingredient in Tylenol) generates 12 analogs which the MPO ranks as highly desirable.
SMILES: CC(=O)Nc1ccc(O)cc1
The table below highlights a sample of the generated analogs, demonstrating the desirability function's ability to reward the maintenance of aqueous solubility while heavily penalizing structural shifts that introduce hERG liabilities.
| Analog Structure (SMILES) | Transformation Applied | Predicted |
hERG Risk (0 - 1) | Global Desirability |
|---|---|---|---|---|
CC(=O)Nc1ccc(O)nc1 |
Phenyl |
-1.37 | 0.07 | 0.962 |
CC(=O)Nc1ccc(O)c(F)c1 |
Aromatic Fluorination | -1.87 | 0.09 | 0.953 |
CC(=O)Nc1ccc(O)c(N2CCOCC2)c1 |
Morpholine Appendage | -2.34 | 0.08 | 0.913 |
CNC(=O)c1ccc(O)cc1 |
Amide Reversal / Shift | -1.48 | 0.29 | 0.839 |
In the third row, the addition of a morpholine ring pushes the Solubility_Desirability. In the final row, a significant increase in the raw toxicity risk heavily drags down the global score via the geometric mean calculation.
The pipeline is completely modular, separating the ML inference from the mathematical optimization to allow for real-time dashboard rendering.
-
Stage 1: Models
- RDKit: Standardizes SMILES, strips salts, and only intakes organic molecules.
-
Aqueous Solubility (SolPredict): A Random Forest regressor trained on a curated AqSolDB dataset. Utilizes RDKit 2D physical descriptors to predict
$\log S$ .- Test RMSE of
$0.442$ log units (AqSolDB train, Delaney e.g. drug-like molecule test) -
$\ge-2.0 \log S$ as maximally desirable (1.0);$\le6.0 \log S$ as completely undesirable (0.0)
- Test RMSE of
-
hERG Toxicity (ADMET-LLM): A hybrid gradient-boosting and LLM architecture. Utilizes 768-dimensional ChemBERTa embeddings passed into an XGBoost classifier.
- ROC-AUC of
$0.748$ - 87% Recall for true hERG blockers
- ROC-AUC of
-
Stage 2: Combinatorial Scaffold Hopping
- Utilizes SMIRKS strings to apply curated, chemically defensible bioisosteric replacements (e.g., aromatic fluorination, amide reversals, solubilizing piperazine appendages). See below for an example.
-
Stage 3: MPO Desirability Engine
- Calculates the weighted geometric mean of solubility and toxicity desirability.
-
Stage 4: Interactive Triage Dashboard
- A local Streamlit web application utilizes Plotly to visualize the MPO in real-time.
- Clone the repository and build the environment: This project uses Conda to manage complex C++ multi-threading dependencies (OpenMP) required by XGBoost and PyTorch on Apple Silicon.
git clone https://github.com/kaih-b/leadoptima-mpo
cd leadoptima-mpo
conda env create -f environment.yml
conda activate leadoptima
pip install -e .- Run the Unit Tests: Ensure everything is functioning correctly before launching.
pytest -v- Launch the Dashboard
python -m streamlit run src/dashboard/app.py- Enter a valid SMILES string (e.g.
c1ccccc1) into the sidebar. - Click Run Analog Generator. The engine will apply the SMIRKS library, standardize the outputs, and push the analogs through the ML scoring models.
- Once the visualization renders, adjust the MPO Strategy Weights in the sidebar to dynamically recalculate the global scores and watch the optimal candidates shift in real-time
