Skip to content

Commit e0f5c84

Browse files
committed
Update README.md with reproducibility instructions
1 parent 822db4a commit e0f5c84

1 file changed

Lines changed: 26 additions & 23 deletions

File tree

README.md

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ executed activities in the index.
88
For example, for an ongoing case `A-B-F-T-W-S-G-T-D`, after building the 5-gram index, the state would be computed
99
by searching in the index with the sequence `[W, S, G, T, D]`.
1010

11-
This approach has been submitted as a publication to ICPM 2024 under the title "Efficient State Computation for Log
12-
Animation and Short-Term Simulation Using N-Gram Indexing", by David Chapela-Campa and Marlon Dumas.
11+
This approach has been submitted as a publication to VLDB 2025 under the title "Efficient Online Computation of Business
12+
Process State From Trace Prefixes via N-Gram Indexing", by David Chapela-Campa and Marlon Dumas.
1313

1414
## Requirements
1515

@@ -47,7 +47,7 @@ ongoing_state = n_gram_index.get_best_marking_state_for(n_gram)
4747

4848
## Evaluation Reproducibility
4949

50-
The scripts with a name starting with `icpm24_` under folder `tests/` contain the necessary code to reproduce the
50+
The scripts with a name starting with `vldb25_` under folder `tests/` contain the necessary code to reproduce the
5151
evaluation presented in the publication.
5252
Most of them are only necessary to preprocess the original datasets.
5353
This data is already available in this [Zenodo repository](doi.org/10.5281/zenodo.11409897).
@@ -56,32 +56,35 @@ there.
5656

5757
### Dependencies
5858

59-
The evaluation scripts depend on two versions of PM4PY.
59+
The evaluation scripts depend on two versions of PM4PY:
6060

61-
1. To discover the process models and measure their fitness, uncomment the line `pm4py = "2.7.11.9"` in the
62-
file `pyproject.toml` and run `poetry install`. This is unnecessary if you downloaded the input files from Zenodo (
63-
see above).
64-
2. For the other scripts where the prefix-alignment technique is used, the requirement is a package with a PM4PY fork
65-
implemented by Daniel
66-
Schuster ([repo](https://github.com/fit-daniel-schuster/online_process_monitoring_using_incremental_state-space_expansion_an_exact_algorithm/)).
61+
1. To run the script `vldb25_compute_states.py` where the prefix-alignment technique is used, the requirement is a
62+
package with a PM4PY fork implemented by Daniel Schuster
63+
([repo](https://github.com/fit-daniel-schuster/online_process_monitoring_using_incremental_state-space_expansion_an_exact_algorithm/)).
6764
Download the project from the corresponding repository and specify its path in the `pyproject.toml` file in the
6865
line `pm4py = {path = "../schuster-prefix-alignments"}`, then, run `poetry install`.
66+
2. For all the other scripts, the used PM4PY version is 2.7.11.9. Uncomment the line `pm4py = "2.7.11.9"` in the file
67+
`pyproject.toml` and run `poetry install`.
6968

7069
### Synthetic Evaluation
7170

72-
1. Adapt the log routes in the file `icpm24_compute_states.py` as said in the comments, and the IDs passed by the `main`
73-
function to `compute_current_states()` so the executed datasets are the synthetic ones.
74-
2. Run the script, obtaining the results with the computed states and runtimes for each proposal in the
75-
folder `outputs`.
76-
3. Move this files to the folder `results`.
77-
4. Run the script `icpm24_exact_state_accuracy.py`, obtaining the accuracy results in the folder `outputs`.
71+
1. Install the project with the PM4PY version specified in point 1 (see above).
72+
2. Comment the lines in the `main()` function in `vldb25_compute_states.py` that run the state computation for real-life
73+
logs, leaving only the calls to function `compute_current_states()` for the synthetic datasets.
74+
3. Run the script, obtaining the results with the computed states and runtimes (also the reachability graphs) for each
75+
proposal in the folder `outputs`.
76+
4. Reinstall the project with the PM4PY version specified in point 2 (see above).
77+
5. Run the script `vldb25_compute_states_token_replay.py`, adding the token-based replay results to the previous result
78+
files.
79+
6. Move these files to the folder `results`.
80+
7. Run the script `vldb25_exact_state_accuracy.py`, obtaining the accuracy results in the folder `outputs`.
7881

7982
### Real-life Evaluation
8083

81-
1. Adapt the log routes in the file `icpm24_compute_states.py` as said in the comments, and the IDs passed by the `main`
82-
function to `compute_current_states()` so the executed datasets are the real-life ones.
83-
2. Run the script, obtaining the results with the computed states and runtimes for each proposal in the
84-
folder `outputs`.
85-
3. Move this files to the folder `results`.
86-
4. Run the script `icpm24_next_activity_accuracy.py`, obtaining the accuracy results in the folder `outputs`.
87-
84+
1. Install the project with the PM4PY version specified in point 1 (see above).
85+
2. Comment the lines in the `main()` function in `vldb25_compute_states.py` that run the state computation for synthetic
86+
logs, leaving only the calls to function `compute_current_states()` for the real-life datasets.
87+
3. Run the script, obtaining the results with the computed states and runtimes for each proposal in the folder
88+
`outputs`.
89+
4. Move this files to the folder `results`.
90+
5. Run the script `vldb25_next_activity_accuracy.py`, obtaining the accuracy results in the folder `outputs`.

0 commit comments

Comments
 (0)