Description
When importing an OCEL 2.0 SQLite file using ocpa.objects.log.importer.ocel2.sqlite.factory, the relationship linking events to objects fails to populate in the final event_df.
While the script successfully calculates the relationships in the aggregated_data DataFrame, these values are lost during the merge into event_df. As a result, the columns for object types in the resulting log contain only empty sets for every event.
Reproduction Steps
- Download the standard OCEL 2.0 P2P log from [ocel-standard.org](https://ocel-standard.org/event-logs/simulations/p2p/).
- Run the following script (adapted from
example_scripts/ocel2/example_analysis.py):
from ocpa.objects.log.importer.ocel2.sqlite import factory as ocel_import_factory
# Load the P2P log
ocel = ocel_import_factory.apply("event_logs/ocel2-p2p.sqlite")
# Inspecting the log dataframe directly shows empty sets for objects
# Example: Checking the 'material' column
if 'material' in ocel.log.log.columns:
print("Sample of 'material' column:")
non_empty = ocel.log.log[ocel.log.log['material'].apply(lambda x: len(x) > 0 if hasattr(x, "__len__") else False)]
print(non_empty['material'].head())
# Check if all are empty
is_empty = ocel.log.log['material'].apply(lambda x: len(x) == 0).all()
print(f"\nAre all material sets empty? {is_empty}")
Observed Behavior:
All rows contain empty sets for the material column:
Sample of 'material' column:
Series([], Name: material, dtype: object)
Are all material sets empty? True
Expected Behavior (obtained using proposed fix below):
The object columns in ocel.log.log should be populated with the object IDs derived from the SQLite event_object table. For the material column:
Sample of 'material' column:
event_id
event:3 {material:3, material:2, material:0, material:1}
event:31 {material:8, material:10, material:7, material:9}
event:55 {material:17, material:16, material:18}
event:95 {material:13}
event:137 {material:32, material:33}
Name: material, dtype: object
Are all material sets empty? False
Root Cause Analysis
The issue is located in ocpa/objects/log/importer/ocel2/sqlite/versions/import_ocel2_sqlite.py within the apply() function.
The code uses event_df.update(aggregated_data[object_type]).
event_df uses a default RangeIndex (0, 1, 2...).
aggregated_data uses ocel_event_id (the actual Event ID string/int) as its Index.
Because pd.DataFrame.update() aligns on the Index, and the indices do not match, no updates occur, leaving the initialized empty sets unchanged.
Proposed Fix
To ensure update() works correctly, event_df must temporarily use event_id as its index.
Location: ocpa/objects/log/importer/ocel2/sqlite/versions/import_ocel2_sqlite.py
# ... (previous code)
# Merge this aggregated data into event_df for each object type
# FIX START: Set index to event_id to align with aggregated_data
event_df.set_index('event_id', inplace=True)
for object_type in aggregated_data.columns:
# Update the event_df with the aggregated sets for each object type
event_df.update(aggregated_data[object_type])
# FIX END: Reset index to restore original structure
event_df.reset_index(inplace=True)
# Close the connection to the database
connection.close()
Additional Note: Secondary Error
To make the import work successfully, I also encountered the issue previously mentioned in Issue #19.
In ocpa/objects/log/converter/versions/df_to_ocel.py, the line:
logging.debug(_sample_dict(3, objects))
throws an error during execution. I had to comment this out to reach the object relationship bug described above.
Description
When importing an OCEL 2.0 SQLite file using
ocpa.objects.log.importer.ocel2.sqlite.factory, the relationship linking events to objects fails to populate in the finalevent_df.While the script successfully calculates the relationships in the
aggregated_dataDataFrame, these values are lost during the merge intoevent_df. As a result, the columns for object types in the resulting log contain only empty sets for every event.Reproduction Steps
example_scripts/ocel2/example_analysis.py):Observed Behavior:
All rows contain empty sets for the
materialcolumn:Expected Behavior (obtained using proposed fix below):
The object columns in
ocel.log.logshould be populated with the object IDs derived from the SQLiteevent_objecttable. For thematerialcolumn:Root Cause Analysis
The issue is located in
ocpa/objects/log/importer/ocel2/sqlite/versions/import_ocel2_sqlite.pywithin theapply()function.The code uses
event_df.update(aggregated_data[object_type]).event_dfuses a defaultRangeIndex(0, 1, 2...).aggregated_datausesocel_event_id(the actual Event ID string/int) as its Index.Because
pd.DataFrame.update()aligns on the Index, and the indices do not match, no updates occur, leaving the initialized empty sets unchanged.Proposed Fix
To ensure
update()works correctly,event_dfmust temporarily useevent_idas its index.Location:
ocpa/objects/log/importer/ocel2/sqlite/versions/import_ocel2_sqlite.pyAdditional Note: Secondary Error
To make the import work successfully, I also encountered the issue previously mentioned in Issue #19.
In
ocpa/objects/log/converter/versions/df_to_ocel.py, the line:throws an error during execution. I had to comment this out to reach the object relationship bug described above.