AI-driven clinical decision support system for interpretable HIV drug resistance prediction.
Designed as a computational dry-lab research platform integrating machine learning, genomics, and explainable AI.
エイズウイルス(HIV)の薬剤耐性メカニズムを解明し、臨床意思決定を支援する説明可能AIシステムを開発する。本システムは、アンサンブル機械学習モデルとSHAP(SHapley Additive exPlanations)に基づく解釈可能性フレームワークを統合し、個別配列と大規模ゲノムバッチ処理の両方で透明性の高い耐性評価を提供することを目的とする。
- k-merエンコーディング: 配列を長さ6-13のk-merに分割し、頻度ベクトルとして表現
- 疎行列形式:
scipy.sparse.csr_matrixを用いた効率的な特徴表現 - 遺伝子特異的特徴: RT(逆転写酵素)とPR(プロテアーゼ)で独立したk-mer辞書
- アンサンブル学習: Random Forest + XGBoostの確率平均
- 二値分類: 耐性/感受性の確率予測
- 閾値分類: >0.75(高度耐性)、>0.5(中等度耐性)、>0.2(低度耐性)、感受性
- SHAP値計算: TreeExplainerによる特徴寄与度の定量化
- 局所解釈: 各予測に対するk-merレベルの影響分析
- 可視化: 正負の影響を色分けしたインタラクティブチャート
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Streamlit UI │────│ FastAPI API │────│ ML Models + │
│ (Port 8501) │ │ (Port 8000) │ │ SHAP Explainer │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Batch FASTA │ │ REST Endpoints │ │ Model Registry │
│ Processing │ │ /predict │ │ (v1/) │
│ & Visualization│ │ /explain │ │ RT/PR Models │
│ │ │ /health │ │ + K-mers │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Resistance Drivers: Positive SHAP values indicating resistance-promoting k-mers
- Susceptibility Indicators: Negative SHAP values for drug effectiveness
- Clinical Interpretation: Actionable insights for treatment planning
- Interactive Charts: Color-coded red (positive) / blue (negative) impacts
- Drug-wise Tabs: Individual explanation views per medication
- Batch Heatmaps: Resistance probability visualization across sequences
- Python 3.12+
- Docker & Docker Compose
- 8GB+ RAM(モデル読み込み時)
# Dockerによる再現可能な環境
docker compose up --build
# ローカル開発環境
pip install -r requirements.txt
uvicorn app.api.main:app --reload
streamlit run ui/app.pyThis system is intended for research and educational purposes only. AI predictions serve as decision-support indicators and must not replace clinical judgment.
Final treatment decisions remain the responsibility of qualified clinicians.
- 配列長制限: RT(>400bp)、PR(<400bp)の事前分類
- モデル汎化性: 訓練データサブタイプへの依存
- 計算コスト: SHAP値計算の計算負荷
- 臨床使用: 研究目的限定、臨床決定支援ツールとして使用
- データプライバシー: 配列データの匿名化処理
- 解釈の責任: AI予測の臨床的解釈は専門医の判断を要する
- 透明性: モデルの限界と不確実性の明示
- 公平性: サブタイプバイアスの検証と緩和
@software{hiv_drug_resistance_prediction,
title={Explainable HIV Drug Resistance Prediction System},
author={Tushar Garg},
year={2025},
url={https://github.com/TusharGarg07/hiv-drug-resistance-prediction},
version={1.0.0},
doi={10.5281/zenodo.XXXXXXX}
}To develop an explainable AI system for elucidating HIV drug resistance mechanisms and supporting clinical decision-making. This system integrates ensemble machine learning models with a SHAP-based interpretability framework to provide transparent resistance assessments for both individual sequences and large-scale genomic batch processing.
- k-mer Encoding: Sequence segmentation into length 6-13 k-mers represented as frequency vectors
- Sparse Matrix Format: Efficient feature representation using
scipy.sparse.csr_matrix - Gene-specific Features: Independent k-mer dictionaries for RT (Reverse Transcriptase) and PR (Protease)
- Ensemble Learning: Random Forest + XGBoost probability averaging
- Binary Classification: Resistance/susceptibility probability prediction
- Threshold Classification: >0.75 (Highly Resistant), >0.5 (Medium Resistant), >0.2 (Low Resistant), Susceptible
- SHAP Value Computation: Feature contribution quantification using TreeExplainer
- Local Interpretation: k-mer level impact analysis for each prediction
- Visualization: Interactive charts with color-coded positive/negative impacts
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Streamlit UI │────│ FastAPI API │────│ ML Models + │
│ (Port 8501) │ │ (Port 8000) │ │ SHAP Explainer │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Batch FASTA │ │ REST Endpoints │ │ Model Registry │
│ Processing │ │ /predict │ │ (v1/) │
│ & Visualization│ │ /explain │ │ RT/PR Models │
│ │ │ /health │ │ + K-mers │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Resistance Drivers: Positive SHAP values indicating resistance-promoting k-mers
- Susceptibility Indicators: Negative SHAP values for drug effectiveness
- Clinical Interpretation: Actionable insights for treatment planning
- Interactive Charts: Color-coded red (positive) / blue (negative) impacts
- Drug-wise Tabs: Individual explanation views per medication
- Batch Heatmaps: Resistance probability visualization across sequences
- Python 3.12+
- Docker & Docker Compose
- 8GB+ RAM (model loading)
# Reproducible environment with Docker
docker compose up --build
# Local development environment
pip install -r requirements.txt
uvicorn app.api.main:app --reload
streamlit run ui/app.pyThis system is intended for research and educational purposes only. AI predictions serve as decision-support indicators and must not replace clinical judgment.
Final treatment decisions remain the responsibility of qualified clinicians.
- Sequence Length Constraint: Pre-classification for RT (>400bp) and PR (<400bp)
- Model Generalizability: Dependency on training data subtypes
- Computational Cost: SHAP value calculation overhead
- Clinical Use: Research-only, as clinical decision support tool
- Data Privacy: Sequence data anonymization
- Interpretation Responsibility: Clinical interpretation requires expert physician judgment
- Transparency: Clear communication of model limitations and uncertainties
- Fairness: Subtype bias validation and mitigation
@software{hiv_drug_resistance_prediction,
title={Explainable HIV Drug Resistance Prediction System},
author={Tushar Garg},
year={2025},
url={https://github.com/TusharGarg07/hiv-drug-resistance-prediction},
version={1.0.0},
doi={10.5281/zenodo.XXXXXXX}
}- Python 3.12+
- Docker & Docker Compose (recommended)
git clone https://github.com/TusharGarg07/hiv-drug-resistance-prediction.git
cd hiv-drug-resistance-prediction
pip install -r requirements.txt
# Backend
uvicorn app.api.main:app --reload --host 0.0.0.0 --port 8000
# Frontend
streamlit run ui/app.py --server.port 8501docker compose up --build
# Backend: http://localhost:8000/docs
# UI: http://localhost:8501- Backend: FastAPI service with health checks
- UI: Streamlit with
BACKEND_URLenvironment variable - Model registry included in container builds
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"sequence": "ATGCGTATCGATCGATCGATCGATCGATCG", "gene_type": "rt"}'curl -X POST "http://localhost:8000/explain" \
-H "Content-Type: application/json" \
-d '{"sequence": "ATGCGTATCGATCGATCGATCGATCGATCG", "gene_type": "rt", "top_k": 10}'curl http://localhost:8000/health{
"model_version": "v1",
"status": "success",
"result": {
"sequence_type": "RT",
"predictions": [
{
"Drug": "AZT",
"Resistance Level": "Highly Resistant",
"Average Probability": 0.82,
"Random Forest Probability": 0.79,
"XGBoost Probability": 0.85
}
]
}
}{
"model_version": "v1",
"status": "success",
"result": [
{
"drug": "AZT",
"gene_type": "rt",
"top_features": [
{"feature": "ATGCGT", "impact": 0.42}
]
}
]
}Upload FASTA files for high-throughput resistance prediction:
- File Format:
.fasta,.fa,.txt - Progress Tracking: Real-time processing status
- Results: Sequence-wise resistance table and probability heatmap
- Export: CSV/JSON for downstream analysis
hiv-drug-resistance-prediction/
├── README.md # This file
├── LICENSE # MIT License
├── CITATION.bib # Bibliography
├── CHANGELOG.md # Version history
├── environment.yml # Conda environment
├── docs/ # Research documentation
│ ├── methodology.md # Detailed methodology
│ ├── system_architecture.md # System design
│ ├── explainability_framework.md # SHAP framework
│ └── experimental_pipeline.md # Experiment workflow
├── paper/ # Publication materials
│ ├── abstract.md # Paper abstract
│ ├── methods.md # Methods section
│ ├── results.md # Results section
│ ├── limitations.md # Study limitations
│ └── future_work.md # Future directions
├── experiments/ # Experiment tracking
│ ├── README.md # Experiment overview
│ ├── experiment_v1_baseline.md # Baseline models
│ ├── experiment_v2_ensemble.md # Ensemble models
│ └── shap_analysis.md # SHAP explainability
├── notebooks/ # Research notebooks
│ ├── README.md # Notebook overview
│ ├── 01_data_analysis.ipynb # Sequence data exploration
│ ├── 02_feature_engineering.ipynb # k-mer feature analysis
│ ├── 03_model_training.ipynb # Model development
│ ├── 04_shap_analysis.ipynb # Explainability analysis
│ └── 05_batch_analysis.ipynb # Batch processing validation
├── results/ # Experimental results
│ ├── figures/ # Generated figures
│ ├── tables/ # Result tables
│ └── exports/ # Exported datasets
├── figures/ # Static figures for papers
├── config/ # System configuration
├── tests/ # Test suite
├── models/ # Trained models
│ └── v1/ # Version 1 models
├── data_sample/ # Sample datasets
├── scripts/ # Utility scripts
├── app/ # Application code
│ ├── api/ # FastAPI backend
│ ├── core/ # Core inference engine
│ ├── models/ # Model loading utilities
│ ├── preprocessing/ # Data preprocessing
│ └── services/ # Business logic
├── ui/ # Streamlit frontend
├── Dockerfile # Backend container
├── Dockerfile.ui # UI container
├── docker-compose.yml # Local orchestration
├── render.yaml # Render deployment
└── requirements.txt # Dependencies
- Multi-drug resistance prediction for combination therapy
- Clinical decision support integration
- Real-time variant tracking with sequence databases
- Advanced explainability with counterfactual analysis
- Mobile-responsive interface
- GPU acceleration for batch processing
- Prospective clinical validation studies
This project is licensed under MIT License - see LICENSE file for details.
Computational Biology Research Laboratory
Bioinformatics & AI Research Division
[research-contact@example.edu]
This research prototype represents current work in explainable AI for healthcare applications, suitable for academic collaboration and computational biology research environments.