Hackathon Project · Pharmaceutical Tablet Manufacturing · Track A: Predictive Modelling
| Tool | Minimum Version | Check |
|---|---|---|
| Python | 3.10+ | python --version |
| pip | 23+ | pip --version |
| Git | Any | git --version |
# If you cloned via git:
cd "d:\CODE FILES\Projects\manufacturing-intelligence"
# Otherwise just open a terminal in the project root.# Create venv
python -m venv venv
# Activate (Windows PowerShell)
venv\Scripts\Activate.ps1
# Activate (Windows CMD)
venv\Scripts\activate.bat
# Activate (Git Bash / WSL / macOS / Linux)
source venv/bin/activateYou should see
(venv)prefix in your terminal after activation.
pip install --upgrade pip
pip install -r requirements.txtNote: TensorFlow and XGBoost are large packages — this may take 3–5 minutes on first install.
Put the two Excel files into data/raw/:
data/
└── raw/
├── _h_batch_process_data.xlsx ← T001 minute-by-minute sensor log
└── _h_batch_production_data.xlsx ← T001–T060 batch production records
The
data/raw/folder is already created byconfig.pyat import time. Just drop the files in.
This single command runs all 7 steps end-to-end:
# Quick run (no hyperparameter tuning — ~2 min):
python src/run_pipeline.py
# With Optuna tuning for best XGBoost params (~7 min):
python src/run_pipeline.py --tune| Step | Description | Output |
|---|---|---|
| 1 | Load & validate raw data | data/processed/batch_outcomes.csv |
| 2 | Simulate sensors for T002–T060 | data/simulated/simulated_sensors.csv |
| 3 | Extract phase features & merge | data/processed/merged_dataset.csv |
| 4 | Train XGBoost + RF + MLP + Stacking | models/*.pkl / models/*.keras |
| 5 | Train Isolation Forest + LSTM AE | models/isolation_forest.pkl + models/lstm_autoencoder.keras |
| 6 | Compute SHAP values | models/shap_values.pkl + reports/shap_plots/ |
| 7 | Build carbon footprint history | data/processed/carbon_history.csv |
Expected console output at the end:
════════════════════════════════════════════════════════════════
🏁 PIPELINE COMPLETE in ~X s
📊 MODEL PERFORMANCE SUMMARY:
Model | Overall R²
─────────────────────────────────
✅ XGBoost | 0.9200
✅ Random Forest | 0.8900
✅ Stacking Ensemble | 0.9350
════════════════════════════════════════════════════════════════
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Health check: http://localhost:8000/api/health
| Method | URL | Description |
|---|---|---|
GET |
/api/health |
Model load status |
POST |
/api/predict |
Predict quality + energy |
POST |
/api/anomaly |
Detect energy anomalies |
GET |
/api/explain/{batch_id} |
SHAP feature explanations |
GET |
/api/carbon/{batch_id} |
CO₂e + adaptive target |
GET |
/api/batches |
List all batch IDs |
GET |
/api/carbon_history |
Full carbon history |
Open a new terminal (keep API running), then:
cd dashboard
npm install # first run only
npm run dev- Dashboard: http://localhost:3000
| Tab | Feature |
|---|---|
| 🔮 Predictions | Predict all 7 quality targets from sliders |
| ⚡ Energy Monitor | Phase-colored sensor charts + anomaly detection |
| 📊 Batch Comparison | Radar chart fingerprinting of two batches |
| 🌍 Carbon Tracker | CO₂e trend + adaptive targets + grid scenarios |
| 🎛️ What-If Optimizer | Real-time parameter explorer (<100ms) |
| 📈 Benchmark | Full model performance report (R², MAE, RMSE, MAPE) |
pytest tests/ -vNote:
test_api.pytests can be run without the API server — they use FastAPI'sTestClient. However, they require trained models to be present. Run the pipeline first.
jupyter notebook notebooks/| Notebook | Description |
|---|---|
01_EDA.ipynb |
Sensor profiling, distributions, correlations |
02_feature_engineering.ipynb |
Simulation, phase features, LSTM sequences |
03_multitarget_models.ipynb |
Train all models, R² comparison charts |
04_anomaly_detection.ipynb |
IF + LSTM AE training, confusion matrices |
05_explainability.ipynb |
SHAP beeswarm, waterfall, summary |
jupyter notebook analysis/| Notebook | Description |
|---|---|
01_data_profiling.ipynb |
Full stats, missing values, outliers, heatmap |
02_correlation_deep_dive.ipynb |
Pearson/Spearman/VIF multicollinearity |
03_phase_energy_analysis.ipynb |
Phase energy breakdown, CUSUM drift |
04_model_comparison.ipynb |
CV scores, residuals, timing benchmarks |
05_business_impact.ipynb |
ROI, carbon savings, grid scenarios |
models/
├── xgb_multitarget.pkl ← XGBoost multi-output model
├── rf_multitarget.pkl ← Random Forest model
├── mlp_model.keras ← Keras MLP model
├── stacking_meta.pkl ← Stacking ensemble bundle
├── isolation_forest.pkl ← Isolation Forest (anomaly)
├── lstm_autoencoder.keras ← LSTM Autoencoder (anomaly)
├── scaler.pkl ← MinMaxScaler (feature scaling)
├── shap_values.pkl ← Pre-computed SHAP values
├── lstm_threshold.json ← LSTM anomaly threshold
├── lstm_norm_params.json ← LSTM normalization params
├── pipeline_summary.json ← Full run summary
└── evaluation_results.json ← Per-target R², MAE, RMSE, MAPE
reports/
├── shap_plots/ ← Beeswarm + waterfall PNGs
├── sensor_profile_T001.png
├── phase_energy_breakdown.png
├── correlation_heatmap.png
├── model_r2_comparison.png
├── actual_vs_predicted.png
├── business_impact.png
└── ...
| Problem | Fix |
|---|---|
ModuleNotFoundError |
Make sure venv is activated: venv\Scripts\Activate.ps1 |
FileNotFoundError on data |
Place Excel files in data/raw/ |
API 503 error |
Run the pipeline first: python src/run_pipeline.py |
| TensorFlow GPU warning | Ignore — CPU training works fine for this dataset size |
ExecutionPolicy error on Activate |
Run: Set-ExecutionPolicy RemoteSigned -Scope CurrentUser |
deactivate