A framework for time series forecasting of business process models. This project combines process model forecasting techniques with advanced machine learning and deep learning models to predict process behavior.
This project provides tools for:
- Preprocessing event logs
- Generating daily process model (Directly-Follows Graph, DFG)
- Converting process model to time series data
- Training and evaluating multiple forecasting models
.
├── event_logs/ # Raw and processed event logs
├── data/ # Daily DFGs and activity mapping
├── dataset/ # Processed time series datasets
├── lib/ # Library files
├── preprocess.py # Event log preprocessing
├── gen_daily_dfg.py # Generates daily DFG matrics
├── dfg_to_time_series.py # Converts DFGs to DF time series
└── multiDFpred.py # Main prediction script
-
Event Log Preprocessing
- Filters infrequent process variants
- Trims event logs to specified time ranges
- Adds artificial start/end activities
-
DF Time Series Generation
- Extracts DFGs with activity mapping
- Time series conversion
-
Multiple Forecasting Models
- Baseline models:
- Persistence
- Mean
- Advanced models:
- Linear Regression
- XGBoost
- LightGBM
- RNN/LSTM/GRU
- N-BEATS
- N-HiTS
- TCN
- Transformer
- TFT
- DLinear
- NLinear
- Baseline models:
- Python 3.9
- PyTorch
- Darts
- PM4Py
- NumPy
- Pandas
- Matplotlib
- Scikit-learn
- XGBoost
- LightGBM
- h5py
- Preprocess Event Logs
python preprocess.py- Generate Daily DFGs
python gen_daily_dfg.py- Convert to DF Time Series
python dfg_to_time_series.py- Run Predictions
python multiDFpred.py --dataset <dataset_name>-
Download the event logs from the following sources:
- BPI2019_1 - BPI Challenge 2019
- Hospital_Billing - Hospital Billing Event Log
- RTFMP - Road Traffic Fine Management Process
-
Create the required directory structure:
mkdir -p event_logs/original
-
Place the downloaded
.xesfiles in theevent_logs/originaldirectory -
After running
python preprocess.py, the processed event logs will be saved in the./event_logs/processeddirectory.
The framework select the following parameters:
- Input sequence length: 32 (days)
- Output sequence length: 16 (days)
- Training/Test split: 80/20
- Mean Absolute Error (MAE)
- Root Mean Square Error (RMSE)
For the complete repository for PMF benchmark including additional resources, documentation, and examples, please visit: PMF-Benchmark
De Smedt, J., Yeshchenko, A., Polyvyanyy, A., De Weerdt, J., & Mendling, J. (2023). Process model forecasting and change exploration using time series analysis of event sequence data. Data & Knowledge Engineering, 145, 102145.
If you find this repository helpful for your work, please consider citing our paper:
@inproceedings{yu2024multivariate,
title={Multivariate Approaches for Process Model Forecasting},
author={Yu, Yongbo and Peeperkorn, Jari and De Smedt, Johannes and De Weerdt, Jochen},
booktitle={International Conference on Process Mining},
pages={279--292},
year={2024},
organization={Springer}
}