Skip to content

[Bug] OCEL2 SQLite Import: Object relationships lost due to DataFrame Index mismatch #25

@olekuhlmann

Description

@olekuhlmann

Description

When importing an OCEL 2.0 SQLite file using ocpa.objects.log.importer.ocel2.sqlite.factory, the relationship linking events to objects fails to populate in the final event_df.

While the script successfully calculates the relationships in the aggregated_data DataFrame, these values are lost during the merge into event_df. As a result, the columns for object types in the resulting log contain only empty sets for every event.

Reproduction Steps

  1. Download the standard OCEL 2.0 P2P log from [ocel-standard.org](https://ocel-standard.org/event-logs/simulations/p2p/).
  2. Run the following script (adapted from example_scripts/ocel2/example_analysis.py):
from ocpa.objects.log.importer.ocel2.sqlite import factory as ocel_import_factory

# Load the P2P log
ocel = ocel_import_factory.apply("event_logs/ocel2-p2p.sqlite")

# Inspecting the log dataframe directly shows empty sets for objects
# Example: Checking the 'material' column
if 'material' in ocel.log.log.columns:
    print("Sample of 'material' column:")
    
    non_empty = ocel.log.log[ocel.log.log['material'].apply(lambda x: len(x) > 0 if hasattr(x, "__len__") else False)]
    print(non_empty['material'].head())
    
    # Check if all are empty
    is_empty = ocel.log.log['material'].apply(lambda x: len(x) == 0).all()
    print(f"\nAre all material sets empty? {is_empty}")

Observed Behavior:
All rows contain empty sets for the material column:

Sample of 'material' column:
Series([], Name: material, dtype: object)

Are all material sets empty? True

Expected Behavior (obtained using proposed fix below):
The object columns in ocel.log.log should be populated with the object IDs derived from the SQLite event_object table. For the material column:

Sample of 'material' column:
event_id
event:3       {material:3, material:2, material:0, material:1}
event:31     {material:8, material:10, material:7, material:9}
event:55               {material:17, material:16, material:18}
event:95                                         {material:13}
event:137                           {material:32, material:33}
Name: material, dtype: object

Are all material sets empty? False

Root Cause Analysis

The issue is located in ocpa/objects/log/importer/ocel2/sqlite/versions/import_ocel2_sqlite.py within the apply() function.

The code uses event_df.update(aggregated_data[object_type]).

  • event_df uses a default RangeIndex (0, 1, 2...).
  • aggregated_data uses ocel_event_id (the actual Event ID string/int) as its Index.

Because pd.DataFrame.update() aligns on the Index, and the indices do not match, no updates occur, leaving the initialized empty sets unchanged.

Proposed Fix

To ensure update() works correctly, event_df must temporarily use event_id as its index.

Location: ocpa/objects/log/importer/ocel2/sqlite/versions/import_ocel2_sqlite.py

    # ... (previous code)

    # Merge this aggregated data into event_df for each object type
    
    # FIX START: Set index to event_id to align with aggregated_data
    event_df.set_index('event_id', inplace=True) 
    
    for object_type in aggregated_data.columns:
        # Update the event_df with the aggregated sets for each object type
        event_df.update(aggregated_data[object_type])
        
    # FIX END: Reset index to restore original structure
    event_df.reset_index(inplace=True) 

    # Close the connection to the database
    connection.close()

Additional Note: Secondary Error

To make the import work successfully, I also encountered the issue previously mentioned in Issue #19.

In ocpa/objects/log/converter/versions/df_to_ocel.py, the line:

logging.debug(_sample_dict(3, objects))

throws an error during execution. I had to comment this out to reach the object relationship bug described above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions