Core: Fix possible position delete spec violation#16939
Core: Fix possible position delete spec violation#16939grantatspothero wants to merge 2 commits into
Conversation
|
I mean this sounds technically correct. But a writer who created such a delete file is probably also violating the spec by writing a delete reference to file it couldn't apply to based on partition boundaries. I assume the reason the check isn't there is because it would be expensive at "addDelete" time to look up the file entry and figure out what it's partition tuple is. |
|
Thanks for confirming. Yes the options to fix the behavior are:
While (1) comes with tradeoffs, (2) seems like a pure improvement over the existing state since it adds spec-correctness with no additional cost. |
Partition filter expressions over incorrect delete file partition values results in incorrect queries.
|
I pushed up another test case showing how the current spec-violating scan planning implementation can cause inconsistent behavior when partition filtering is utilized. If option (2) were implemented, scan planning would be consistent and invalid delete files with incorrect partition values would always be filtered out. |
|
I wouldn't filter them out, I would crash. IMHO that delete file shows the metadata is corrupted. |
|
Crashing is also an option. The risk: crashing when detecting corrupted metadata would be a backwards incompatible change that would brick table scans against existing iceberg tables. I thought filtering out during the scan and logging an error would be safer before potentially breaking existing tables. But I'm fine with either approach. |
|
Counterpoint, those tables are broken. They have deletes in them that shouldn't possibly exist. Who do we think is producing these files and who would be protecting with this fix? |
|
That is fair, I can go ahead with crashing table scans upon detecting invalid metadata. Thanks for the feedback. |
|
Closing this PR, was used to reproduce the issue. Fix is here: #16957 |
Per iceberg spec: https://iceberg.apache.org/spec/#scan-planning
But see the test below, a delete file can be written with an invalid partition value that does not match the referenced data file partition value. Per the spec, since all the conditions are not met I was expecting scan planning to not match the invalid delete file to the data file but it appears scan planning does match the delete file to the data file.
Is this a bug in the scan planning implementation? A bug in the spec? It was definitely surprising behavior.
I was expecting delete file commits which reference data files to validate that the delete file partition matches the data file partition.
Why this matters:
If one wishes to do partition pruning via
DeleteFileIndex#filterPartitions, it is not safe to filter out partitions based on delete file partition because scan planning ignores the delete file partition value when referenced_data_file is not null.