Skip to content

Core: Fix possible position delete spec violation#16939

Closed
grantatspothero wants to merge 2 commits into
apache:mainfrom
grantatspothero:gn/reproduceInvalidPositionDeletePartition
Closed

Core: Fix possible position delete spec violation#16939
grantatspothero wants to merge 2 commits into
apache:mainfrom
grantatspothero:gn/reproduceInvalidPositionDeletePartition

Conversation

@grantatspothero

@grantatspothero grantatspothero commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Per iceberg spec: https://iceberg.apache.org/spec/#scan-planning

A position delete file must be applied to a data file when all of the following are true:
The data file's file_path is equal to the delete file's referenced_data_file if it is non-null
The data file's data sequence number is less than or equal to the delete file's data sequence number
The data file's partition (both spec and partition values) is equal [4] to the delete file's partition
There is no deletion vector that must be applied to the data file (when added, such a vector must contain all deletes from existing position delete files)

But see the test below, a delete file can be written with an invalid partition value that does not match the referenced data file partition value. Per the spec, since all the conditions are not met I was expecting scan planning to not match the invalid delete file to the data file but it appears scan planning does match the delete file to the data file.

Is this a bug in the scan planning implementation? A bug in the spec? It was definitely surprising behavior.

I was expecting delete file commits which reference data files to validate that the delete file partition matches the data file partition.

Why this matters:
If one wishes to do partition pruning via DeleteFileIndex#filterPartitions, it is not safe to filter out partitions based on delete file partition because scan planning ignores the delete file partition value when referenced_data_file is not null.

@github-actions github-actions Bot added the core label Jun 23, 2026
@grantatspothero grantatspothero changed the title Reproduce possible spec error Core: Fix possible spec error Jun 23, 2026
@grantatspothero grantatspothero changed the title Core: Fix possible spec error Core: Fix possible position delete spec violation Jun 23, 2026
@RussellSpitzer

Copy link
Copy Markdown
Member

I mean this sounds technically correct. But a writer who created such a delete file is probably also violating the spec by writing a delete reference to file it couldn't apply to based on partition boundaries.

I assume the reason the check isn't there is because it would be expensive at "addDelete" time to look up the file entry and figure out what it's partition tuple is.

@grantatspothero

Copy link
Copy Markdown
Contributor Author

Thanks for confirming.

Yes the options to fix the behavior are:

  1. Commit time validation: Fail fast during commit if an invalid delete file partition is detected. This is the best solution, but as you mentioned is expensive to do at commit time since it requires reading every manifest in the worst case.
  2. Scan time filtering: If a referenced data file partition does not match the delete file partition, then do not associate the delete with the data file during the scan. This is less ideal because invalid commits are still allowed, but it enforces the spec is followed. It should be relatively cheap to implement because at scan time both the delete and data file partition value exist in memory.

While (1) comes with tradeoffs, (2) seems like a pure improvement over the existing state since it adds spec-correctness with no additional cost.

Partition filter expressions over incorrect delete file partition
values results in incorrect queries.
@grantatspothero

Copy link
Copy Markdown
Contributor Author

I pushed up another test case showing how the current spec-violating scan planning implementation can cause inconsistent behavior when partition filtering is utilized.

If option (2) were implemented, scan planning would be consistent and invalid delete files with incorrect partition values would always be filtered out.

@RussellSpitzer

Copy link
Copy Markdown
Member

I wouldn't filter them out, I would crash. IMHO that delete file shows the metadata is corrupted.

@grantatspothero

grantatspothero commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Crashing is also an option.

The risk: crashing when detecting corrupted metadata would be a backwards incompatible change that would brick table scans against existing iceberg tables.

I thought filtering out during the scan and logging an error would be safer before potentially breaking existing tables. But I'm fine with either approach.

@RussellSpitzer

Copy link
Copy Markdown
Member

Counterpoint, those tables are broken. They have deletes in them that shouldn't possibly exist. Who do we think is producing these files and who would be protecting with this fix?

@grantatspothero

Copy link
Copy Markdown
Contributor Author

That is fair, I can go ahead with crashing table scans upon detecting invalid metadata.

Thanks for the feedback.

@grantatspothero

grantatspothero commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Closing this PR, was used to reproduce the issue.

Fix is here: #16957

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants