table: add ability to add DataFile directly#710
Conversation
Otherwise you're forced to arrow directly, or you must pay the price to re-read parts of the data file to compute stats/etc.
zeroshade
left a comment
There was a problem hiding this comment.
Aside from the below comment, this needs tests 😄
| return t.apply(updates, reqs) | ||
| } | ||
|
|
||
| // AddDataFiles adds new data files to a new snapshot. |
There was a problem hiding this comment.
If we're going to allow this, then there's a couple things we need to do:
- We need to validate that the DataFiles being added: they need to have correct Partition spec IDs, valid partition data (if relevant), etc. anything we can verify without having to actually open and read the file.
- Add to the docstring so that it is made crystal clear that it is the responsibility of the caller to ensure that the
DataFileobject being added is valid and that calling this function incorrectly can make the entire snapshot invalid. If we're going to allow a user to shoot themselves in the foot, let's at least warn them that it's a possibility.
|
seems like similar work is started here #723 |
|
yeah, same issue / fix being applied here, so happy to merge into one: #723 |
|
This one has conflicts that need to be addressed. @agaddis02 Does #723 take into account the comment I made in #710 (comment)? We can determine which of the two PRs to prefer |
@zeroshade It does not, I can update it though based on those notes today, I'll ping here when it's updated so y'all can choose. |
|
Thanks folks @agaddis02 I'll close this then and will look forward to your PR 😃 |
Otherwise you're forced to arrow directly, or you must pay the price to
re-read parts of the data file to compute stats/etc.