@@ -8,8 +8,8 @@ executed activities in the index.
88For example, for an ongoing case ` A-B-F-T-W-S-G-T-D ` , after building the 5-gram index, the state would be computed
99by searching in the index with the sequence ` [W, S, G, T, D] ` .
1010
11- This approach has been submitted as a publication to ICPM 2024 under the title "Efficient State Computation for Log
12- Animation and Short-Term Simulation Using N-Gram Indexing", by David Chapela-Campa and Marlon Dumas.
11+ This approach has been submitted as a publication to VLDB 2025 under the title "Efficient Online Computation of Business
12+ Process State From Trace Prefixes via N-Gram Indexing", by David Chapela-Campa and Marlon Dumas.
1313
1414## Requirements
1515
@@ -47,7 +47,7 @@ ongoing_state = n_gram_index.get_best_marking_state_for(n_gram)
4747
4848## Evaluation Reproducibility
4949
50- The scripts with a name starting with ` icpm24_ ` under folder ` tests/ ` contain the necessary code to reproduce the
50+ The scripts with a name starting with ` vldb25_ ` under folder ` tests/ ` contain the necessary code to reproduce the
5151evaluation presented in the publication.
5252Most of them are only necessary to preprocess the original datasets.
5353This data is already available in this [ Zenodo repository] ( doi.org/10.5281/zenodo.11409897 ) .
@@ -56,32 +56,35 @@ there.
5656
5757### Dependencies
5858
59- The evaluation scripts depend on two versions of PM4PY.
59+ The evaluation scripts depend on two versions of PM4PY:
6060
61- 1 . To discover the process models and measure their fitness, uncomment the line ` pm4py = "2.7.11.9" ` in the
62- file ` pyproject.toml ` and run ` poetry install ` . This is unnecessary if you downloaded the input files from Zenodo (
63- see above).
64- 2 . For the other scripts where the prefix-alignment technique is used, the requirement is a package with a PM4PY fork
65- implemented by Daniel
66- Schuster ([ repo] ( https://github.com/fit-daniel-schuster/online_process_monitoring_using_incremental_state-space_expansion_an_exact_algorithm/ ) ).
61+ 1 . To run the script ` vldb25_compute_states.py ` where the prefix-alignment technique is used, the requirement is a
62+ package with a PM4PY fork implemented by Daniel Schuster
63+ ([ repo] ( https://github.com/fit-daniel-schuster/online_process_monitoring_using_incremental_state-space_expansion_an_exact_algorithm/ ) ).
6764 Download the project from the corresponding repository and specify its path in the ` pyproject.toml ` file in the
6865 line ` pm4py = {path = "../schuster-prefix-alignments"} ` , then, run ` poetry install ` .
66+ 2 . For all the other scripts, the used PM4PY version is 2.7.11.9. Uncomment the line ` pm4py = "2.7.11.9" ` in the file
67+ ` pyproject.toml ` and run ` poetry install ` .
6968
7069### Synthetic Evaluation
7170
72- 1 . Adapt the log routes in the file ` icpm24_compute_states.py ` as said in the comments, and the IDs passed by the ` main `
73- function to ` compute_current_states() ` so the executed datasets are the synthetic ones.
74- 2 . Run the script, obtaining the results with the computed states and runtimes for each proposal in the
75- folder ` outputs ` .
76- 3 . Move this files to the folder ` results ` .
77- 4 . Run the script ` icpm24_exact_state_accuracy.py ` , obtaining the accuracy results in the folder ` outputs ` .
71+ 1 . Install the project with the PM4PY version specified in point 1 (see above).
72+ 2 . Comment the lines in the ` main() ` function in ` vldb25_compute_states.py ` that run the state computation for real-life
73+ logs, leaving only the calls to function ` compute_current_states() ` for the synthetic datasets.
74+ 3 . Run the script, obtaining the results with the computed states and runtimes (also the reachability graphs) for each
75+ proposal in the folder ` outputs ` .
76+ 4 . Reinstall the project with the PM4PY version specified in point 2 (see above).
77+ 5 . Run the script ` vldb25_compute_states_token_replay.py ` , adding the token-based replay results to the previous result
78+ files.
79+ 6 . Move these files to the folder ` results ` .
80+ 7 . Run the script ` vldb25_exact_state_accuracy.py ` , obtaining the accuracy results in the folder ` outputs ` .
7881
7982### Real-life Evaluation
8083
81- 1 . Adapt the log routes in the file ` icpm24_compute_states.py ` as said in the comments, and the IDs passed by the ` main `
82- function to ` compute_current_states ()` so the executed datasets are the real-life ones.
83- 2 . Run the script, obtaining the results with the computed states and runtimes for each proposal in the
84- folder ` outputs ` .
85- 3 . Move this files to the folder ` results ` .
86- 4 . Run the script ` icpm24_next_activity_accuracy.py ` , obtaining the accuracy results in the folder ` outputs ` .
87-
84+ 1 . Install the project with the PM4PY version specified in point 1 (see above).
85+ 2 . Comment the lines in the ` main ()` function in ` vldb25_compute_states.py ` that run the state computation for synthetic
86+ logs, leaving only the calls to function ` compute_current_states() ` for the real-life datasets.
87+ 3 . Run the script, obtaining the results with the computed states and runtimes for each proposal in the folder
88+ ` outputs ` .
89+ 4 . Move this files to the folder ` results ` .
90+ 5 . Run the script ` vldb25_next_activity_accuracy.py ` , obtaining the accuracy results in the folder ` outputs ` .
0 commit comments