An end-to-end ML-powered web application that predicts risk for multiple diseases in real time.
- About the Project
- Supported Diseases
- Features
- Tech Stack
- How It Works
- Getting Started
- Usage
- Project Structure
- API Endpoints
- Model Performance
- Key Learnings
- Future Improvements
- Contributing
- License
- Contact
Health-Insight is a full-stack web application that integrates multiple machine learning models with a Flask backend to deliver real-time disease risk predictions directly in the browser.
Designed with production-style considerations in mind, this project addresses challenges like feature consistency between training and inference, model serialization, dynamic form generation, and robust error handling — making it more than just a demo, but a template for real-world ML deployment.
⚠️ Disclaimer: This application is intended for educational and research purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider.
| Disease | Model File | Dataset |
|---|---|---|
| 🩸 Diabetes | diabetes.pkl |
diabetes.csv |
| ❤️ Heart Disease | heart.pkl |
heart.csv |
| 🫘 Kidney Disease | kidney.pkl |
kidney.csv |
| 🫀 Liver Disease | liver.pkl |
liver.csv |
| 🔬 Cancer | cancer.pkl |
cancer.csv |
- 🤖 Individual ML model per disease — each condition has its own dedicated RandomForestClassifier
- 📋 Dynamic input forms — automatically generated based on disease-specific feature sets
- ⚡ Real-time predictions — instant inference via Flask REST backend
- 💾 Model persistence — serialized and loaded using Pickle for fast startup
- ✅ Input validation — client and server-side checks before inference
- 🔡 Categorical encoding — handles mixed data types in production
- 🧱 Modular architecture — clean separation of training, inference, and UI layers
- 📱 Responsive UI — works across desktop and mobile browsers
| Layer | Technology |
|---|---|
| Backend | Python 3.x, Flask |
| ML Framework | Scikit-learn (RandomForestClassifier) |
| Data Handling | Pandas, NumPy |
| Serialization | Pickle |
| Frontend | HTML5, CSS3, JavaScript |
| Templating | Jinja2 |
User selects a disease
│
▼
Dynamic form rendered with disease-specific input fields
│
▼
User submits health parameters
│
▼
Flask validates & encodes inputs
│
▼
Correct .pkl model loaded for the selected disease
│
▼
RandomForestClassifier runs inference
│
▼
Prediction result displayed on result.html
- Training Phase — Each disease has a standalone training script (
training/<disease>.py) that preprocesses the dataset, trains aRandomForestClassifier, and saves the model as a.pklfile. - Inference Phase — When a user submits a form, Flask loads the corresponding
.pklmodel, applies the same preprocessing pipeline, and returns the prediction. - Feature Consistency — Feature names and encoding schemes are kept consistent between training and inference to prevent silent prediction errors.
Ensure the following are installed on your system:
- Clone the repository:
git clone https://github.com/your-username/Health-Insight.git
cd Health-Insight- Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows- Install all dependencies:
pip install -r requirements.txt- (Optional) Retrain the models:
python training/diabetes.py
python training/heart.py
python training/kidney.py
python training/liver.py
python training/cancer.pyPre-trained
.pklfiles are included in themodels/directory so retraining is optional.
- Start the Flask development server:
python app.py- Open your browser and visit:
http://127.0.0.1:5000/
- Select a disease, fill in the health parameters, and click Predict to receive your risk assessment instantly.
Health-Insight/
│
├── app.py # Main Flask app — routes & inference logic
├── requirements.txt # All Python dependencies
│
├── models/ # Serialized trained ML models
│ ├── diabetes.pkl
│ ├── heart.pkl
│ ├── kidney.pkl
│ ├── liver.pkl
│ └── cancer.pkl
│
├── training/ # Standalone training scripts per disease
│ ├── diabetes.py
│ ├── heart.py
│ ├── kidney.py
│ ├── liver.py
│ └── cancer.py
│
├── datasets/ # Raw CSV datasets used for training
│ ├── diabetes.csv
│ ├── heart.csv
│ ├── kidney.csv
│ ├── liver.csv
│ └── cancer.csv
│
├── templates/ # Jinja2 HTML templates
│ ├── index.html # Landing page — disease selector
│ ├── form.html # Dynamic input form
│ └── result.html # Prediction result display
│
├── static/ # Static assets
│ ├── css/ # Stylesheets
│ └── js/ # JavaScript files
│
└── README.md
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Home page — disease selection |
| GET | /predict/<disease> |
Load input form for selected disease |
| POST | /predict/<disease> |
Submit form and return prediction result |
Example POST body for diabetes prediction:
{
"pregnancies": 2,
"glucose": 138,
"blood_pressure": 62,
"skin_thickness": 35,
"insulin": 0,
"bmi": 33.6,
"diabetes_pedigree": 0.627,
"age": 47
}Results from training on the provided datasets. Metrics may vary with different train/test splits.
| Disease | Algorithm | Accuracy |
|---|---|---|
| Diabetes | RandomForestClassifier | ~76–80% |
| Heart Disease | RandomForestClassifier | ~82–86% |
| Kidney Disease | RandomForestClassifier | ~96–99% |
| Liver Disease | RandomForestClassifier | ~72–76% |
| Cancer | RandomForestClassifier | ~94–97% |
- Feature consistency — Ensuring training feature names/order exactly match inference inputs to prevent silent errors
- Categorical encoding in production — Handling label encoding and one-hot encoding at inference time without refitting
- Real-world ML deployment — Debugging shape mismatches, missing values, and dtype inconsistencies
- Modular backend design — Separating training logic from inference for clean, maintainable code
- Flask routing patterns — Building dynamic, parameterized routes for multi-model applications
- REST API endpoints with JSON responses for mobile/external integration
- User authentication and prediction history dashboard
- Model monitoring, drift detection & automated retraining pipeline
- Dockerized deployment with
docker-compose - SHAP-based explainability — show which features drove the prediction
- Confidence scores alongside binary predictions
- CI/CD pipeline with GitHub Actions
Contributions are welcome and appreciated!
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m "Add: your feature description" - Push to your branch:
git push origin feature/your-feature - Open a Pull Request
Please follow PEP 8 coding standards and include docstrings for any new functions.
Distributed under the MIT License. See LICENSE for more information.
Your Name — ganesh1a0576@gmail.com
GitHub: Ganesh-a0576
Project Link: https://github.com/your-username/Health-Insight
⭐ If this project helped you, please consider giving it a star — it means a lot!
Made with ❤️ and Python