An interactive web application that predicts cancer risk using machine learning trained on real medical data with 93.7% accuracy
Features β’ Demo β’ Installation β’ Usage β’ Model
The Cancer Risk Simulator is a full-stack machine learning application that provides real-time cancer risk predictions based on 8 key health factors. Built with a modern tech stack, it combines a powerful XGBoost model (trained on 1,500 real patient records) with an intuitive React interface featuring live visualizations and instant feedback.
- π§ 93.7% Accuracy - XGBoost model trained on real medical data
- β‘ Real-Time Predictions - Instant risk calculation as you adjust parameters
- π Interactive Visualizations - Beautiful charts with Recharts
- π¨ Modern UI - Glassmorphism design with Tailwind CSS
- π Optimal Risk Comparison - See potential improvements with lifestyle changes
- π RESTful API - FastAPI backend with automatic documentation
- 8 Comprehensive Factors: Age, Gender, BMI, Smoking Status, Genetic Risk, Physical Activity, Alcohol Intake, Cancer History
- Real-Time Updates: Predictions update automatically with debouncing (500ms)
- Risk Categorization: Low, Moderate, High, Very High with color-coded indicators
- Current vs Optimal Risk: Compare your risk with the best possible scenario
- Interactive Charts: Bar charts showing risk comparison
- Potential Improvement: Calculate possible risk reduction percentage
- Color-Coded Results: Visual feedback with dynamic color schemes
- XGBoost Classifier: Gradient boosting optimized for Mac M4
- ROC-AUC Score: 0.94+ on test data
- Feature Importance: Transparent model with interpretable features
- Automatic Model Loading: Serialized model (model.pkl) loaded on startup
- Glassmorphism Design: Modern, elegant UI with blur effects
- Responsive Layout: Works seamlessly on desktop and mobile
- Smooth Animations: Polished transitions and hover effects
- Error Handling: Clear error messages and loading states
π₯οΈ Frontend: http://localhost:5173
π‘ Backend API: http://localhost:8000
π API Docs: http://localhost:8000/docs
- Adjust Health Metrics β 2. See Real-Time Risk Update β 3. View Optimal Comparison β 4. Analyze Improvement Potential
| Technology | Purpose | Version |
|---|---|---|
| Core Language | 3.11+ | |
| Web Framework | 0.104.1 | |
| ML Model | 2.0.3 | |
| ML Tools | 1.3.0+ | |
| Data Processing | 2.0.0+ | |
| Numerical Computing | 1.24.0+ |
| Technology | Purpose | Version |
|---|---|---|
| UI Library | 18.3.1 | |
| Build Tool | 6.0.11 | |
| Styling | 3.4.17 | |
| Data Visualization | 2.15.0 | |
| HTTP Client | 1.7.9 |
Cancer-Risk-Simulator/
βββ π backend/ # Python FastAPI backend
β βββ main.py # FastAPI application & endpoints
β βββ train_model.py # XGBoost model training script
β βββ model.pkl # Serialized trained model
β βββ requirements.txt # Python dependencies
β βββ The_Cancer_data_1500_V2.csv # Training dataset (1,500 records)
β βββ __pycache__/ # Python cache
β
βββ π frontend/ # React + Vite frontend
β βββ π src/
β β βββ π components/
β β β βββ RiskCalculator.jsx # Main component (408 lines)
β β βββ π assets/ # Static assets
β β βββ App.jsx # Root component
β β βββ main.jsx # Entry point
β β βββ App.css # Component styles
β β βββ index.css # Global styles + Tailwind
β βββ π public/ # Public assets
β βββ index.html # HTML template
β βββ package.json # Node dependencies
β βββ vite.config.js # Vite configuration
β βββ tailwind.config.js # Tailwind configuration
β βββ postcss.config.js # PostCSS configuration
β βββ eslint.config.js # ESLint configuration
β
βββ README.md # This file
git clone https://github.com/yourusername/Cancer-Risk-Simulator.git
cd Cancer-Risk-Simulator1οΈβ£ Navigate to backend directory
cd backend2οΈβ£ Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows3οΈβ£ Install dependencies
pip install -r requirements.txt4οΈβ£ Train the model (optional - model.pkl included)
python train_model.pyOutput:
β Model loaded successfully!
Accuracy: 0.9367
ROC-AUC Score: 0.9412
5οΈβ£ Start backend server
uvicorn main:app --reload --host 0.0.0.0 --port 8000β
Backend running at: http://localhost:8000
π API Docs: http://localhost:8000/docs
1οΈβ£ Navigate to frontend directory (new terminal)
cd frontend2οΈβ£ Install dependencies
npm install3οΈβ£ Start development server
npm run devβ
Frontend running at: http://localhost:5173
-
Start Backend (Terminal 1):
cd backend source .venv/bin/activate uvicorn main:app --reload --port 8000
-
Start Frontend (Terminal 2):
cd frontend npm run dev -
Open Browser: Navigate to
http://localhost:5173 -
Adjust Health Metrics: Use sliders to input your health data
-
View Results: See real-time risk prediction and analysis
Frontend:
cd frontend
npm run build
npm run preview # Preview production buildBackend:
# Use production ASGI server
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4| Parameter | Type | Range | Description |
|---|---|---|---|
age |
int | 18-100 | Age in years |
gender |
int | 0-1 | 0=Female, 1=Male |
bmi |
float | 10-50 | Body Mass Index |
smoking |
int | 0-1 | 0=No, 1=Yes |
genetic_risk |
int | 0-2 | 0=Low, 1=Medium, 2=High |
physical_activity |
float | 0-25 | Hours per week |
alcohol_intake |
float | 0-10 | Drinks per week |
cancer_history |
int | 0-1 | 0=No, 1=Yes |
- Dataset: Real cancer medical records
- Size: 1,500 patient records
- Features: 8 health metrics
- Target: Binary classification (Diagnosis: 0=No Cancer, 1=Cancer)
- Algorithm: XGBoost Classifier
- Tree Method: Histogram-based (optimized for CPU)
- Hyperparameters:
n_estimators: 150max_depth: 6learning_rate: 0.1eval_metric: logloss
| Metric | Score |
|---|---|
| Accuracy | 93.67% |
| ROC-AUC | 0.9412 |
| Precision | ~0.92 |
| Recall | ~0.94 |
Top factors influencing predictions:
- 𧬠Genetic Risk - Highest importance
- π€ Age - Strong predictor
- π¬ Smoking Status - Significant impact
- π Cancer History - Critical factor
- βοΈ BMI - Moderate importance
- π Physical Activity - Protective factor
- π· Alcohol Intake - Risk factor
- β₯ Gender - Baseline risk difference
- Low: < 25% probability (π’ Green)
- Moderate: 25-50% (π‘ Yellow)
- High: 50-75% (π Orange)
- Very High: > 75% (π΄ Red)
IMPORTANT MEDICAL DISCLAIMER
This application is for educational and research purposes only. While the model achieves 93.7% accuracy on test data:
- β NOT a substitute for professional medical advice
- β NOT a diagnostic tool for clinical use
- β NOT validated for actual medical decision-making
Always consult qualified healthcare professionals for:
- Medical diagnoses
- Treatment plans
- Health screenings
- Cancer risk assessments
The predictions are based on statistical models and should be interpreted as educational demonstrations of machine learning capabilities.