Process Model Forecasting Benchmark Framework
A comprehensive framework for preprocessing event logs, extracting directly-follows (DF) relations, converting them to time series, and benchmarking various forecasting models with statistical and entropic relevance evaluation.
This framework enables researchers and practitioners to:
- Process Event Logs: Automated preprocessing of process event logs (XES format)
- Generate Time Series: Extract and transform directly-follows relations into time series data
- Benchmark Models: Comprehensive evaluation across multiple forecasting approaches
- Evaluate with Process-Aware Metrics: Advanced entropic relevance (ER) evaluation
- Track Experiments: Integration with Weights & Biases for monitoring and visualization
- Baseline Models: Persistence, Naive Mean, Naive Seasonal, Naive Drift, Naive Moving Average
- Statistical Models: AR, ARIMA, SES, Prophet
- Machine Learning Models: Linear Regression, Random Forest, XGBoost, LightGBM
- Deep Learning Models: RNN, LSTM, GRU, DeepAR, N-HiTS, Transformer, TCN, DLinear, NLinear
- Foundation Models: TimeGPT
- BPI2017: BPI Challenge 2017
- BPI2019_1: BPI Challenge 2019 (3-way matching, invoice before GR)
- Sepsis: Sepsis event log
- Hospital_Billing: Hospital billing process data
- Traditional: MAE, RMSE
- Process-Aware: Entropic Relevance (ER)
- Python 3.11 or higher
- Git
# Clone the repository
git clone https://github.com/YongboYu/PMF_Benchmark.git
cd PMF_Benchmark
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e .
# Optional: Initialize Weights & Biases
wandb loginBefore running any experiments, you need to download the benchmark datasets. The framework provides an automated download script that handles all dataset acquisition and preprocessing:
# Download all available datasets
python download_data.py
# Download a specific dataset
python download_data.py dataset=BPI2017
python download_data.py dataset=BPI2019_1
python download_data.py dataset=Hospital_Billing
python download_data.py dataset=sepsis
# Force re-download existing datasets
python download_data.py dataset=BPI2017 force=trueThe script will:
- Download datasets from their official sources
- Extract compressed files automatically
- Apply dataset-specific filtering (e.g., BPI2019 category filtering)
- Save processed datasets to
data/raw/directory
- Preprocess Event Logs
python scripts/preprocess_logs.py --dataset BPI2017- Train Models
# Train a specific model
python train.py --dataset BPI2017 --model_group statistical --model prophet --horizon 7
# Train all models in a category
python train.py --dataset BPI2017 --model_group deep_learning --horizon 7
# Train all models
python train.py --dataset BPI2017 --model_group all --horizon 7- Calculate Entropic Relevance
python run_er_evaluation.py --dataset BPI2017 --horizon 7- Datasets:
BPI2017,BPI2019_1,Hospital_Billing,sepsis - Model Groups:
baseline,statistical,regression,deep_learning,foundation - Horizons:
7,28(days)
The framework follows a structured data processing pipeline:
- Event Log Preprocessing: Filter infrequent variants, trim time periods, add artificial start/end events
- DF Relation Extraction: Extract directly-follows relations with timestamps from processed logs
- Time Series Generation: Aggregate relations by time windows (daily frequency)
- Model Training: Train models with hyperparameter optimization using Optuna
- Evaluation: Calculate traditional and process-aware metrics including Entropic Relevance
The framework uses YAML-based configuration with hierarchical organization:
configs/base_config.yaml: Core settings (paths, data splits, transformations)configs/dataset/: Dataset-specific parametersconfigs/model_configs/: Model-specific hyperparameters
Example model training with custom config:
python train.py --config configs/custom_config.yaml --dataset BPI2017 --horizon 7- Automatic logging of metrics, hyperparameters, and predictions
- Real-time training monitoring
- Experiment comparison and visualization
- Model artifact storage
- Results stored in
results/results_log.json - Detailed metrics in
results/evaluation/ - Model predictions in
results/predictions/
- Darts: Core forecasting library with unified interface for multiple model types
- PM4Py: Process mining and event log processing
- Optuna: Hyperparameter optimization with automated parameter space search
- Weights & Biases: Experiment tracking and visualization
- PyTorch: Deep learning backend
If you find this repository helpful for your work, please consider citing our paper:
This project is licensed under the MIT License - see the LICENSE file for details.