Time series forecasting on Rossmann Store Sales dataset
LightGBM MAPE: 9.47% vs Prophet baseline 15.48% — 39% improvement
Retail chains need accurate sales forecasts to optimize inventory, staffing, and promotions. This project builds a forecasting system on 1,115 Rossmann stores, predicting daily sales 6 weeks ahead and comparing statistical vs ML approaches.
| Resource | Link |
|---|---|
| Streamlit dashboard | sales-forecasting-efro.streamlit.app |
| GitHub repo | github.com/Efrrowini/sales-forecasting |
| Model | MAPE (%) | RMSE | MAE |
|---|---|---|---|
| Prophet (baseline) | 15.48 | 753 | 667 |
| LightGBM | 9.47 | 902 | 626 |
LightGBM achieves 39% lower MAPE than Prophet by learning store-level patterns through lag and rolling features.
- Promotions drive 38.8% sales uplift (EUR5,930 → EUR8,229)
- Clear Christmas spikes every December — sales peak at 2x baseline
- Monday consistently weakest day, Saturday strongest
- Store type B highest median sales despite fewer locations
25+ features engineered from raw date and sales data:
- Lag features — sales 7, 14, 21, 28 days ago per store
- Rolling statistics — 7/14/30 day rolling mean and std
- Calendar features — year, month, week, quarter, is_weekend, is_month_start, is_month_end
- Store metadata — store type, assortment, competition distance
| Layer | Tools |
|---|---|
| Data | pandas, numpy |
| Modelling | Prophet, LightGBM |
| Experiment tracking | MLflow |
| Frontend | Streamlit, Plotly |
| Deployment | Streamlit Cloud |
git clone https://github.com/Efrrowini/sales-forecasting
cd sales-forecasting
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
# Build features
python -m src.features
# Train LightGBM
python -m src.train_lgbm
# Run dashboard
streamlit run app/streamlit_app.pyRossmann Store Sales — 1,115 German drugstores, 2.5 years of daily sales data. Available on Kaggle.
Built by Efro | Presidency University Bangalore | Data Science Portfolio
