Skip to content

Latest commit

 

History

History
244 lines (183 loc) · 7.55 KB

File metadata and controls

244 lines (183 loc) · 7.55 KB

🏭 AI-Driven Manufacturing Intelligence — Setup & Run Guide

Hackathon Project · Pharmaceutical Tablet Manufacturing · Track A: Predictive Modelling


⚡ Prerequisites

Tool Minimum Version Check
Python 3.10+ python --version
pip 23+ pip --version
Git Any git --version

📁 Step 1 — Clone / Open the Project

# If you cloned via git:
cd "d:\CODE FILES\Projects\manufacturing-intelligence"

# Otherwise just open a terminal in the project root.

🐍 Step 2 — Create & Activate Virtual Environment

# Create venv
python -m venv venv

# Activate (Windows PowerShell)
venv\Scripts\Activate.ps1

# Activate (Windows CMD)
venv\Scripts\activate.bat

# Activate (Git Bash / WSL / macOS / Linux)
source venv/bin/activate

You should see (venv) prefix in your terminal after activation.


📦 Step 3 — Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Note: TensorFlow and XGBoost are large packages — this may take 3–5 minutes on first install.


📂 Step 4 — Place Raw Data Files

Put the two Excel files into data/raw/:

data/
└── raw/
    ├── _h_batch_process_data.xlsx      ← T001 minute-by-minute sensor log
    └── _h_batch_production_data.xlsx   ← T001–T060 batch production records

The data/raw/ folder is already created by config.py at import time. Just drop the files in.


🚀 Step 5 — Run the Training Pipeline

This single command runs all 7 steps end-to-end:

# Quick run (no hyperparameter tuning — ~2 min):
python src/run_pipeline.py

# With Optuna tuning for best XGBoost params (~7 min):
python src/run_pipeline.py --tune

What the pipeline does:

Step Description Output
1 Load & validate raw data data/processed/batch_outcomes.csv
2 Simulate sensors for T002–T060 data/simulated/simulated_sensors.csv
3 Extract phase features & merge data/processed/merged_dataset.csv
4 Train XGBoost + RF + MLP + Stacking models/*.pkl / models/*.keras
5 Train Isolation Forest + LSTM AE models/isolation_forest.pkl + models/lstm_autoencoder.keras
6 Compute SHAP values models/shap_values.pkl + reports/shap_plots/
7 Build carbon footprint history data/processed/carbon_history.csv

Expected console output at the end:

════════════════════════════════════════════════════════════════
  🏁 PIPELINE COMPLETE in ~X s

📊 MODEL PERFORMANCE SUMMARY:
  Model                |  Overall R²
  ─────────────────────────────────
  ✅ XGBoost           |     0.9200
  ✅ Random Forest     |     0.8900
  ✅ Stacking Ensemble |     0.9350
════════════════════════════════════════════════════════════════

🌐 Step 6 — Start the REST API

uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

Available Endpoints:

Method URL Description
GET /api/health Model load status
POST /api/predict Predict quality + energy
POST /api/anomaly Detect energy anomalies
GET /api/explain/{batch_id} SHAP feature explanations
GET /api/carbon/{batch_id} CO₂e + adaptive target
GET /api/batches List all batch IDs
GET /api/carbon_history Full carbon history

📊 Step 7 — Launch the Dashboard

Open a new terminal (keep API running), then:

cd dashboard
npm install        # first run only
npm run dev

Dashboard Tabs:

Tab Feature
🔮 Predictions Predict all 7 quality targets from sliders
⚡ Energy Monitor Phase-colored sensor charts + anomaly detection
📊 Batch Comparison Radar chart fingerprinting of two batches
🌍 Carbon Tracker CO₂e trend + adaptive targets + grid scenarios
🎛️ What-If Optimizer Real-time parameter explorer (<100ms)
📈 Benchmark Full model performance report (R², MAE, RMSE, MAPE)

🧪 Step 8 — Run Tests

pytest tests/ -v

Note: test_api.py tests can be run without the API server — they use FastAPI's TestClient. However, they require trained models to be present. Run the pipeline first.


📓 Step 9 — Explore Notebooks

Project Notebooks (in order):

jupyter notebook notebooks/
Notebook Description
01_EDA.ipynb Sensor profiling, distributions, correlations
02_feature_engineering.ipynb Simulation, phase features, LSTM sequences
03_multitarget_models.ipynb Train all models, R² comparison charts
04_anomaly_detection.ipynb IF + LSTM AE training, confusion matrices
05_explainability.ipynb SHAP beeswarm, waterfall, summary

Analysis Notebooks (deep-dive):

jupyter notebook analysis/
Notebook Description
01_data_profiling.ipynb Full stats, missing values, outliers, heatmap
02_correlation_deep_dive.ipynb Pearson/Spearman/VIF multicollinearity
03_phase_energy_analysis.ipynb Phase energy breakdown, CUSUM drift
04_model_comparison.ipynb CV scores, residuals, timing benchmarks
05_business_impact.ipynb ROI, carbon savings, grid scenarios

📁 Output Files (after pipeline run)

models/
├── xgb_multitarget.pkl           ← XGBoost multi-output model
├── rf_multitarget.pkl            ← Random Forest model
├── mlp_model.keras               ← Keras MLP model
├── stacking_meta.pkl             ← Stacking ensemble bundle
├── isolation_forest.pkl          ← Isolation Forest (anomaly)
├── lstm_autoencoder.keras        ← LSTM Autoencoder (anomaly)
├── scaler.pkl                    ← MinMaxScaler (feature scaling)
├── shap_values.pkl               ← Pre-computed SHAP values
├── lstm_threshold.json           ← LSTM anomaly threshold
├── lstm_norm_params.json         ← LSTM normalization params
├── pipeline_summary.json         ← Full run summary
└── evaluation_results.json       ← Per-target R², MAE, RMSE, MAPE

reports/
├── shap_plots/                   ← Beeswarm + waterfall PNGs
├── sensor_profile_T001.png
├── phase_energy_breakdown.png
├── correlation_heatmap.png
├── model_r2_comparison.png
├── actual_vs_predicted.png
├── business_impact.png
└── ...

⚠️ Troubleshooting

Problem Fix
ModuleNotFoundError Make sure venv is activated: venv\Scripts\Activate.ps1
FileNotFoundError on data Place Excel files in data/raw/
API 503 error Run the pipeline first: python src/run_pipeline.py
TensorFlow GPU warning Ignore — CPU training works fine for this dataset size
ExecutionPolicy error on Activate Run: Set-ExecutionPolicy RemoteSigned -Scope CurrentUser

🛑 Deactivate venv

deactivate