This project combines predictive analytics with patient clustering to provide comprehensive heart disease risk assessment. The system uses both Machine Learning and Deep Learning approaches to predict heart disease likelihood while segmenting patients into distinct risk profiles for personalized healthcare interventions.
- Predictive Modeling: ANN, RNN, Random Forest, Logistic Regression
- Patient Profiling: K-Means clustering for risk segmentation
- Interactive Dashboard: Real-time predictions with visualizations
- Model Comparison: Performance analysis across multiple algorithms
| Variable | Type | Description | Units | Missing |
|---|---|---|---|---|
age |
Continuous | Patient age | Years | ❌ |
cholesterol |
Continuous | Cholesterol level | mg/dL | ❌ |
blood_pressure |
Continuous | Systolic BP | mmHg | ❌ |
chest_pain_type |
Categorical | Chest pain type | 0-3 | ❌ |
max_heart_rate |
Continuous | Max heart rate | BPM | ❌ |
target |
Binary | Heart disease presence | 0/1 | ❌ |
Dataset: 1,025 patients | Features: 13 clinical indicators
ANN: Input(13) → Dense(64,ReLU) → Dense(32,ReLU) → Output(1,Sigmoid) RNN: Input(Sequential) → LSTM(50) → Dropout(0.2) → Dense(25,ReLU) → Output(1,Sigmoid)
K-Means: 3 clusters (Low/Medium/High Risk) with Silhouette Score validation
# Clone repository
git clone https://github.com/Aryanwadhwa14/HEART-DISEASE.git
cd HEART-DISEASE
# Install dependencies
pip install -r requirements.txt
# Run dashboard
streamlit run app/dashboard.pypandas>=1.3.0, numpy>=1.21.0, scikit-learn>=1.0.0
tensorflow>=2.6.0, streamlit>=1.2.0, matplotlib>=3.4.0
python src/train_models.py --models all # Train all models
python src/train_models.py --models rnn # Train specific model
python src/clustering.py # Run clusteringfrom src.predictor import HeartDiseasePredictor
predictor = HeartDiseasePredictor()
patient_data = {'age': 45, 'cholesterol': 240, 'blood_pressure': 140}
prediction = predictor.predict(patient_data)| Model | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| Logistic Regression | 0.834 | 0.821 | 0.847 | 0.834 | 0.891 |
| Random Forest | 0.852 | 0.839 | 0.865 | 0.852 | 0.923 |
| ANN | 0.847 | 0.834 | 0.859 | 0.847 | 0.912 |
| RNN | 0.901 | 0.897 | 0.905 | 0.901 | 0.956 |
Best Performer: RNN model with 90.1% accuracy and 0.956 ROC-AUC
HEART-DISEASE/
├── data/ # Dataset files
├── src/ # Source code
│ ├── models/ # ML/DL model implementations
│ ├── data_preprocessing.py
│ ├── clustering.py
│ └── predictor.py
├── app/ # Streamlit dashboard
├── notebooks/ # Jupyter notebooks
├── models/ # Saved model files
└── requirements.txt
- Real-time Prediction: Interactive patient data input
- Risk Visualization: Probability gauges and charts
- Patient Clustering: Visual cluster analysis
- Model Comparison: Performance metrics comparison
- Data Explorer: Interactive dataset exploration
POST /api/predict
{
"age": 45,
"cholesterol": 240,
"blood_pressure": 140
}
Response:
{
"prediction": 1,
"probability": 0.78,
"risk_level": "High"
}This heart disease prediction system has been officially recognized and patented by the Harvard Innovation Labs, demonstrating its novel approach in combining ML/DL models with patient clustering for cardiovascular risk assessment.
- Fork the repository
- Create feature branch:
git checkout -b feature/new-feature - Commit changes:
git commit -m 'Add new feature' - Push to branch:
git push origin feature/new-feature - Open Pull Request
Areas for contribution: Model improvements, UI enhancements, documentation, testing
MIT License - see LICENSE file for details.