🫀 Clinical Risk Prediction System

Developed as a Bioengineering + Machine Learning project focused on clinically reliable prediction systems.

A machine learning-based clinical decision support system for predicting cardiovascular disease risk using patient clinical data.

Designed as a clinical decision support prototype for cardiovascular risk prediction, prioritizing high recall to minimize missed diagnoses and providing calibrated probability estimates for reliable decision-making.

🌐 Live Demo

👉 https://clinical-risk-predictor.streamlit.app

🚨 Problem Statement

Early detection of heart disease is critical. In clinical settings, missing a positive case (false negative) can lead to severe consequences.

This project focuses on:

Maximizing recall (reducing missed diagnoses)
Providing reliable probability estimates for decision-making.

🧠 Approach

Dataset: UCI Heart Disease dataset (clinical tabular data).

Model: Calibrated Logistic Regression
Threshold tuning: 0.3 (recall-focused decision boundary)
Calibration: Ensures predicted probabilities reflect real-world likelihood
Evaluation:
- ROC-AUC
- Precision-Recall Curve
- Confusion Matrix
- Cross-validation

⚖️ Key Design Decisions

1. Logistic Regression over Random Forest

Although Random Forest achieved near-perfect performance, it showed signs of overfitting due to dataset size.

Logistic Regression was chosen because:

Better generalization
Interpretability (important in healthcare)
Supports calibrated probabilities

2. Threshold = 0.3 (instead of default 0.5)

Improves recall
Reduces false negatives
Aligns with clinical priority of early detection

3. Calibration

Used CalibratedClassifierCV to ensure:

Probability outputs are trustworthy
Model can support risk-based decision making

📊 Results

High recall (0.91) for disease class, prioritized to reduce missed diagnoses (false negatives).
Well-calibrated probabilities suitable for clinical decision-making
Model Performance Comparison

Logistic Regression:

ROC-AUC: 0.93
Recall (Disease Class): 0.91
Precision (Disease Class): 0.76
F1-score: 0.83

Random Forest:

ROC-AUC: 0.999
Recall (Disease Class): 1.00
Precision (Disease Class): 0.98
F1-score: 0.99

Although Random Forest achieves near-perfect performance, this is likely due to overfitting given the limited dataset size.

Logistic Regression was selected as the final model due to better generalization, interpretability, and more reliable calibrated probabilities, making it more suitable for clinical use.

Threshold tuning (0.3) was applied to improve recall (0.91) and align predictions with clinical priorities.

🔍 Feature Insights

Key contributing features include:

Chest pain type
Maximum heart rate achieved
ST depression (oldpeak)
Number of major vessels

These align with known clinical indicators of cardiovascular disease risk.

💡 Features

Risk stratification:
- Low Risk (< 0.3)
- Medium Risk (0.3 – 0.6)
- High Risk (> 0.6)
Human-readable clinical reasoning (heuristic-based explanations)
Recommendation system based on risk level
Deployed interactive Streamlit web application (public access)

🖥️ Demo

High Risk Prediction

Low Risk Prediction

⚠️ Limitations

Dataset is relatively small
Categorical variables treated as ordinal; proper encoding (e.g., OneHotEncoder) can improve performance.
No external clinical validation

🔮 Future Improvements

SHAP-based explainability
Proper categorical encoding (OneHotEncoder)
Validation on real-world clinical datasets

🛠️ Tech Stack

Python
scikit-learn
pandas, numpy
matplotlib, seaborn
Streamlit (for deployment)
joblib (model serialization)

📁 Project Structure

app.py → Streamlit web app
notebook.ipynb → model development & analysis
model.pkl → trained model
columns.pkl → feature order used for prediction
requirements.txt → dependencies

🚀 How to Run

Clone the repository:

git clone https://github.com/TarunaJ2006/Clinical-Risk-Prediction-System.git
cd Clinical-Risk-Prediction-System

Install dependencies:

pip install -r requirements.txt

Run the app:

streamlit run app.py

⚠️ Disclaimer

This project is for educational purposes and is not a substitute for medical diagnosis.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.devcontainer		.devcontainer
README.md		README.md
app.py		app.py
columns.pkl		columns.pkl
high_input.png		high_input.png
high_output.png		high_output.png
low_input.png		low_input.png
low_output.png		low_output.png
model.pkl		model.pkl
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🫀 Clinical Risk Prediction System

🌐 Live Demo

🚨 Problem Statement

🧠 Approach

⚖️ Key Design Decisions

1. Logistic Regression over Random Forest

2. Threshold = 0.3 (instead of default 0.5)

3. Calibration

📊 Results

🔍 Feature Insights

💡 Features

🖥️ Demo

High Risk Prediction

Low Risk Prediction

⚠️ Limitations

🔮 Future Improvements

🛠️ Tech Stack

📁 Project Structure

🚀 How to Run

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🫀 Clinical Risk Prediction System

🌐 Live Demo

🚨 Problem Statement

🧠 Approach

⚖️ Key Design Decisions

1. Logistic Regression over Random Forest

2. Threshold = 0.3 (instead of default 0.5)

3. Calibration

📊 Results

🔍 Feature Insights

💡 Features

🖥️ Demo

High Risk Prediction

Low Risk Prediction

⚠️ Limitations

🔮 Future Improvements

🛠️ Tech Stack

📁 Project Structure

🚀 How to Run

⚠️ Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages