🧬 Explainable HIV Drug Resistance Prediction System

AI-driven clinical decision support system for interpretable HIV drug resistance prediction.

Designed as a computational dry-lab research platform integrating machine learning, genomics, and explainable AI.

🇯🇵 日本語

🧪 Research Objectives

エイズウイルス（HIV）の薬剤耐性メカニズムを解明し、臨床意思決定を支援する説明可能AIシステムを開発する。本システムは、アンサンブル機械学習モデルとSHAP（SHapley Additive exPlanations）に基づく解釈可能性フレームワークを統合し、個別配列と大規模ゲノムバッチ処理の両方で透明性の高い耐性評価を提供することを目的とする。

🔬 Methodology

データ表現

k-merエンコーディング: 配列を長さ6-13のk-merに分割し、頻度ベクトルとして表現
疎行列形式: scipy.sparse.csr_matrixを用いた効率的な特徴表現
遺伝子特異的特徴: RT（逆転写酵素）とPR（プロテアーゼ）で独立したk-mer辞書

モデル設計

アンサンブル学習: Random Forest + XGBoostの確率平均
二値分類: 耐性/感受性の確率予測
閾値分類: >0.75（高度耐性）、>0.5（中等度耐性）、>0.2（低度耐性）、感受性

説明可能性フレームワーク

SHAP値計算: TreeExplainerによる特徴寄与度の定量化
局所解釈: 各予測に対するk-merレベルの影響分析
可視化: 正負の影響を色分けしたインタラクティブチャート

🧬 Experimental Pipeline

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Streamlit UI  │────│   FastAPI API    │────│  ML Models +    │
│   (Port 8501)   │    │   (Port 8000)    │    │  SHAP Explainer │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Batch FASTA    │    │  REST Endpoints  │    │  Model Registry  │
│  Processing     │    │  /predict        │    │  (v1/)          │
│  & Visualization│    │  /explain        │    │  RT/PR Models   │
│                 │    │  /health         │    │  + K-mers       │
└─────────────────┘    └──────────────────┘    └─────────────────┘

📊 Explainability Framework

k-mer Impact Analysis

Resistance Drivers: Positive SHAP values indicating resistance-promoting k-mers
Susceptibility Indicators: Negative SHAP values for drug effectiveness
Clinical Interpretation: Actionable insights for treatment planning

Visualization System

Interactive Charts: Color-coded red (positive) / blue (negative) impacts
Drug-wise Tabs: Individual explanation views per medication
Batch Heatmaps: Resistance probability visualization across sequences

⚙️ Reproducibility

依存環境

Python 3.12+
Docker & Docker Compose
8GB+ RAM（モデル読み込み時）

環境構築

# Dockerによる再現可能な環境
docker compose up --build

# ローカル開発環境
pip install -r requirements.txt
uvicorn app.api.main:app --reload
streamlit run ui/app.py

🏥 Clinical Disclaimer

This system is intended for research and educational purposes only. AI predictions serve as decision-support indicators and must not replace clinical judgment.

Final treatment decisions remain the responsibility of qualified clinicians.

⚠️ Research Use Disclaimer

配列長制限: RT（>400bp）、PR（<400bp）の事前分類
モデル汎化性: 訓練データサブタイプへの依存
計算コスト: SHAP値計算の計算負荷
臨床使用: 研究目的限定、臨床決定支援ツールとして使用

🧭 倫理的配慮

データプライバシー: 配列データの匿名化処理
解釈の責任: AI予測の臨床的解釈は専門医の判断を要する
透明性: モデルの限界と不確実性の明示
公平性: サブタイプバイアスの検証と緩和

📚 Citation

@software{hiv_drug_resistance_prediction,
  title={Explainable HIV Drug Resistance Prediction System},
  author={Tushar Garg},
  year={2025},
  url={https://github.com/TusharGarg07/hiv-drug-resistance-prediction},
  version={1.0.0},
  doi={10.5281/zenodo.XXXXXXX}
}

🇬🇧 English

🧪 Research Objectives

To develop an explainable AI system for elucidating HIV drug resistance mechanisms and supporting clinical decision-making. This system integrates ensemble machine learning models with a SHAP-based interpretability framework to provide transparent resistance assessments for both individual sequences and large-scale genomic batch processing.

🔬 Methodology

Data Representation

k-mer Encoding: Sequence segmentation into length 6-13 k-mers represented as frequency vectors
Sparse Matrix Format: Efficient feature representation using scipy.sparse.csr_matrix
Gene-specific Features: Independent k-mer dictionaries for RT (Reverse Transcriptase) and PR (Protease)

Model Design

Ensemble Learning: Random Forest + XGBoost probability averaging
Binary Classification: Resistance/susceptibility probability prediction
Threshold Classification: >0.75 (Highly Resistant), >0.5 (Medium Resistant), >0.2 (Low Resistant), Susceptible

Explainability Framework

SHAP Value Computation: Feature contribution quantification using TreeExplainer
Local Interpretation: k-mer level impact analysis for each prediction
Visualization: Interactive charts with color-coded positive/negative impacts

🧬 Experimental Pipeline

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Streamlit UI  │────│   FastAPI API    │────│  ML Models +    │
│   (Port 8501)   │    │   (Port 8000)    │    │  SHAP Explainer │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Batch FASTA    │    │  REST Endpoints  │    │  Model Registry  │
│  Processing     │    │  /predict        │    │  (v1/)          │
│  & Visualization│    │  /explain        │    │  RT/PR Models   │
│                 │    │  /health         │    │  + K-mers       │
└─────────────────┘    └──────────────────┘    └─────────────────┘

📊 Explainability Framework

k-mer Impact Analysis

Resistance Drivers: Positive SHAP values indicating resistance-promoting k-mers
Susceptibility Indicators: Negative SHAP values for drug effectiveness
Clinical Interpretation: Actionable insights for treatment planning

Visualization System

Interactive Charts: Color-coded red (positive) / blue (negative) impacts
Drug-wise Tabs: Individual explanation views per medication
Batch Heatmaps: Resistance probability visualization across sequences

⚙️ Reproducibility

Dependencies

Python 3.12+
Docker & Docker Compose
8GB+ RAM (model loading)

Environment Setup

# Reproducible environment with Docker
docker compose up --build

# Local development environment
pip install -r requirements.txt
uvicorn app.api.main:app --reload
streamlit run ui/app.py

🏥 Clinical Disclaimer

This system is intended for research and educational purposes only. AI predictions serve as decision-support indicators and must not replace clinical judgment.

Final treatment decisions remain the responsibility of qualified clinicians.

⚠️ Research Use Disclaimer

Sequence Length Constraint: Pre-classification for RT (>400bp) and PR (<400bp)
Model Generalizability: Dependency on training data subtypes
Computational Cost: SHAP value calculation overhead
Clinical Use: Research-only, as clinical decision support tool

🧭 Ethical Considerations

Data Privacy: Sequence data anonymization
Interpretation Responsibility: Clinical interpretation requires expert physician judgment
Transparency: Clear communication of model limitations and uncertainties
Fairness: Subtype bias validation and mitigation

📚 Citation

@software{hiv_drug_resistance_prediction,
  title={Explainable HIV Drug Resistance Prediction System},
  author={Tushar Garg},
  year={2025},
  url={https://github.com/TusharGarg07/hiv-drug-resistance-prediction},
  version={1.0.0},
  doi={10.5281/zenodo.XXXXXXX}
}

📦 Installation & Deployment

Prerequisites

Python 3.12+
Docker & Docker Compose (recommended)

Local Development

git clone https://github.com/TusharGarg07/hiv-drug-resistance-prediction.git
cd hiv-drug-resistance-prediction
pip install -r requirements.txt

# Backend
uvicorn app.api.main:app --reload --host 0.0.0.0 --port 8000

# Frontend
streamlit run ui/app.py --server.port 8501

Docker Deployment

docker compose up --build
# Backend: http://localhost:8000/docs
# UI: http://localhost:8501

Cloud Deployment (Render)

Backend: FastAPI service with health checks
UI: Streamlit with BACKEND_URL environment variable
Model registry included in container builds

🔌 API Documentation

Prediction Endpoint

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"sequence": "ATGCGTATCGATCGATCGATCGATCGATCG", "gene_type": "rt"}'

Explanation Endpoint

curl -X POST "http://localhost:8000/explain" \
  -H "Content-Type: application/json" \
  -d '{"sequence": "ATGCGTATCGATCGATCGATCGATCGATCG", "gene_type": "rt", "top_k": 10}'

Health Check

curl http://localhost:8000/health

📊 Example Outputs

Prediction Response

{
  "model_version": "v1",
  "status": "success",
  "result": {
    "sequence_type": "RT",
    "predictions": [
      {
        "Drug": "AZT",
        "Resistance Level": "Highly Resistant",
        "Average Probability": 0.82,
        "Random Forest Probability": 0.79,
        "XGBoost Probability": 0.85
      }
    ]
  }
}

SHAP Explanation

{
  "model_version": "v1",
  "status": "success",
  "result": [
    {
      "drug": "AZT",
      "gene_type": "rt",
      "top_features": [
        {"feature": "ATGCGT", "impact": 0.42}
      ]
    }
  ]
}

🧬 Batch Genome Analysis

Upload FASTA files for high-throughput resistance prediction:

File Format: .fasta, .fa, .txt
Progress Tracking: Real-time processing status
Results: Sequence-wise resistance table and probability heatmap
Export: CSV/JSON for downstream analysis

🗂️ Repository Structure

hiv-drug-resistance-prediction/
├── README.md                     # This file
├── LICENSE                       # MIT License
├── CITATION.bib                  # Bibliography
├── CHANGELOG.md                  # Version history
├── environment.yml               # Conda environment
├── docs/                         # Research documentation
│   ├── methodology.md            # Detailed methodology
│   ├── system_architecture.md    # System design
│   ├── explainability_framework.md # SHAP framework
│   └── experimental_pipeline.md  # Experiment workflow
├── paper/                        # Publication materials
│   ├── abstract.md               # Paper abstract
│   ├── methods.md                # Methods section
│   ├── results.md                # Results section
│   ├── limitations.md            # Study limitations
│   └── future_work.md            # Future directions
├── experiments/                  # Experiment tracking
│   ├── README.md                 # Experiment overview
│   ├── experiment_v1_baseline.md # Baseline models
│   ├── experiment_v2_ensemble.md # Ensemble models
│   └── shap_analysis.md          # SHAP explainability
├── notebooks/                    # Research notebooks
│   ├── README.md                 # Notebook overview
│   ├── 01_data_analysis.ipynb    # Sequence data exploration
│   ├── 02_feature_engineering.ipynb # k-mer feature analysis
│   ├── 03_model_training.ipynb   # Model development
│   ├── 04_shap_analysis.ipynb    # Explainability analysis
│   └── 05_batch_analysis.ipynb   # Batch processing validation
├── results/                      # Experimental results
│   ├── figures/                  # Generated figures
│   ├── tables/                   # Result tables
│   └── exports/                  # Exported datasets
├── figures/                      # Static figures for papers
├── config/                       # System configuration
├── tests/                        # Test suite
├── models/                       # Trained models
│   └── v1/                       # Version 1 models
├── data_sample/                  # Sample datasets
├── scripts/                      # Utility scripts
├── app/                          # Application code
│   ├── api/                      # FastAPI backend
│   ├── core/                     # Core inference engine
│   ├── models/                   # Model loading utilities
│   ├── preprocessing/            # Data preprocessing
│   └── services/                 # Business logic
├── ui/                           # Streamlit frontend
├── Dockerfile                    # Backend container
├── Dockerfile.ui                 # UI container
├── docker-compose.yml            # Local orchestration
├── render.yaml                  # Render deployment
└── requirements.txt              # Dependencies

🚀 Future Directions

Multi-drug resistance prediction for combination therapy
Clinical decision support integration
Real-time variant tracking with sequence databases
Advanced explainability with counterfactual analysis
Mobile-responsive interface
GPU acceleration for batch processing
Prospective clinical validation studies

📜 License

This project is licensed under MIT License - see LICENSE file for details.

👨‍🔬 Research Contact

Computational Biology Research Laboratory
Bioinformatics & AI Research Division
[research-contact@example.edu]

This research prototype represents current work in explainable AI for healthcare applications, suitable for academic collaboration and computational biology research environments.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
docs		docs
experiments		experiments
notebooks		notebooks
paper		paper
scripts		scripts
ui		ui
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.bib		CITATION.bib
Dockerfile		Dockerfile
Dockerfile.ui		Dockerfile.ui
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
render.yaml		render.yaml
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Folders and files

Latest commit

History

Repository files navigation

🧬 Explainable HIV Drug Resistance Prediction System

🇯🇵 日本語

🧪 Research Objectives

🔬 Methodology

データ表現

モデル設計

説明可能性フレームワーク

🧬 Experimental Pipeline

📊 Explainability Framework

k-mer Impact Analysis

Visualization System

⚙️ Reproducibility

依存環境

環境構築

🏥 Clinical Disclaimer

⚠️ Research Use Disclaimer

🧭 倫理的配慮

📚 Citation

🇬🇧 English

🧪 Research Objectives

🔬 Methodology

Data Representation

Model Design

Explainability Framework

🧬 Experimental Pipeline

📊 Explainability Framework

k-mer Impact Analysis

Visualization System

⚙️ Reproducibility

Dependencies

Environment Setup

🏥 Clinical Disclaimer

⚠️ Research Use Disclaimer

🧭 Ethical Considerations

📚 Citation

📦 Installation & Deployment

Prerequisites

Local Development

Docker Deployment

Cloud Deployment (Render)

🔌 API Documentation

Prediction Endpoint

Explanation Endpoint

Health Check

📊 Example Outputs

Prediction Response

SHAP Explanation

🧬 Batch Genome Analysis

🗂️ Repository Structure

🚀 Future Directions

📜 License

👨‍🔬 Research Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages