Skip to content

table: add ability to add DataFile directly#710

Closed
rockwotj wants to merge 1 commit intoapache:mainfrom
rockwotj:main
Closed

table: add ability to add DataFile directly#710
rockwotj wants to merge 1 commit intoapache:mainfrom
rockwotj:main

Conversation

@rockwotj
Copy link

@rockwotj rockwotj commented Feb 3, 2026

Otherwise you're forced to arrow directly, or you must pay the price to
re-read parts of the data file to compute stats/etc.

Otherwise you're forced to arrow directly, or you must pay the price to
re-read parts of the data file to compute stats/etc.
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the below comment, this needs tests 😄

return t.apply(updates, reqs)
}

// AddDataFiles adds new data files to a new snapshot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to allow this, then there's a couple things we need to do:

  1. We need to validate that the DataFiles being added: they need to have correct Partition spec IDs, valid partition data (if relevant), etc. anything we can verify without having to actually open and read the file.
  2. Add to the docstring so that it is made crystal clear that it is the responsibility of the caller to ensure that the DataFile object being added is valid and that calling this function incorrectly can make the entire snapshot invalid. If we're going to allow a user to shoot themselves in the foot, let's at least warn them that it's a possibility.

@subkanthi
Copy link
Contributor

seems like similar work is started here #723

@agaddis02
Copy link

yeah, same issue / fix being applied here, so happy to merge into one: #723

@zeroshade
Copy link
Member

This one has conflicts that need to be addressed. @agaddis02 Does #723 take into account the comment I made in #710 (comment)? We can determine which of the two PRs to prefer

@agaddis02
Copy link

agaddis02 commented Feb 14, 2026

This one has conflicts that need to be addressed. @agaddis02 Does #723 take into account the comment I made in #710 (comment)? We can determine which of the two PRs to prefer

@zeroshade It does not, I can update it though based on those notes today, I'll ping here when it's updated so y'all can choose.

@rockwotj
Copy link
Author

Thanks folks @agaddis02 I'll close this then and will look forward to your PR 😃

@rockwotj rockwotj closed this Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants