Skip to content

fix(table): validate pos delete position chunks#1337

Open
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/pos-delete-pos-type-check
Open

fix(table): validate pos delete position chunks#1337
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/pos-delete-pos-type-check

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • replace unchecked positional-delete pos casts with a shared validated collector
  • return ErrInvalidSchema when a delete file exposes a non-int64 pos chunk
  • add regression coverage for malformed pos chunk types

Why

The Arrow scanner assumed every positional-delete pos chunk was *array.Int64 and cast it directly in both scan paths. If a malformed delete file carried the wrong Arrow type, the scanner panicked instead of returning a schema error.

Testing

  • go test ./table -run 'Test(GroupPosDeletesByFilePathSupportsStringLayouts|GroupPosDeletesByFilePathRejectsUnsupportedFilePathLayout|CollectPosDeletePositionsRejectsUnsupportedPosType|ProcessPositionalDeletesAcrossBatches)$' -count=1
  • go test ./table -count=1

@fallintoplace fallintoplace requested a review from zeroshade as a code owner June 27, 2026 17:09

@tanmayrauth tanmayrauth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice catch turning the blind pos cast into a clean schema error. Just one nit and a question, details inline.

Comment thread table/arrow_scanner.go
func collectPosDeletePositions(positionalDeletes positionDeletes) (set[int64], error) {
deletes := set[int64]{}
for _, chunk := range positionalDeletes {
if chunk == nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the nil-chunk guard meant purely as defensiveness? groupPosDeletesByFilePath only ever stores retained, non-nil chunks into the map, so this branch shouldn't be reachable from the scan path — just want to make sure I'm not missing a caller that can pass a nil chunk in.

Comment thread table/arrow_scanner.go
return nil, fmt.Errorf("%w: unsupported pos column type %s in position delete file",
iceberg.ErrInvalidSchema, chunk.DataType())
}
for _, arr := range chunk.Chunks() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner arr.(*array.Int64) assertion is unreachable here — once the outer chunk.DataType().ID() == arrow.INT64 check at line 401 passes, every chunk of that column is already *array.Int64, so the
!ok branch and its separate "unsupported pos chunk array type" message can't fire. Not a problem to leave as defensive code, but you could drop the , ok check (or fold the two error messages into one) to
keep it tight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants