Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion eval_protocol/data_loader/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ def _process_variant(self, result: DataLoaderResult) -> DataLoaderResult:

def _apply_metadata(self, result: DataLoaderResult, original_count: int, processed_count: int) -> None:
"""Apply metadata to all rows in the result."""
for row in result.rows:
for idx, row in enumerate(result.rows):
if row.input_metadata.dataset_info is None:
row.input_metadata.dataset_info = {}

Expand All @@ -126,3 +126,4 @@ def _apply_metadata(self, result: DataLoaderResult, original_count: int, process
# Apply row counts
row.input_metadata.dataset_info["data_loader_num_rows"] = original_count
row.input_metadata.dataset_info["data_loader_num_rows_after_preprocessing"] = processed_count
row.input_metadata.dataset_info["data_loader_row_idx"] = idx
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Row index added after preprocessing instead of before

The data_loader_row_idx is enumerated from rows after preprocessing, but the PR aims to add the original example index. When preprocess_fn filters rows, the indices get renumbered (e.g., original rows 0, 2, 4 become indices 0, 1, 2), losing track of the original positions. To capture original indices, enumeration needs to happen before preprocessing in _process_variant and the index preserved through the preprocessing step.

Fix in Cursor Fix in Web

Loading