Core: Rebase replace transaction onto refreshed metadata on concurrent commit#16975
Open
Kurtiscwright wants to merge 3 commits into
Open
Core: Rebase replace transaction onto refreshed metadata on concurrent commit#16975Kurtiscwright wants to merge 3 commits into
Kurtiscwright wants to merge 3 commits into
Conversation
…t commit When a concurrent commit lands after a replace transaction is staged, commitReplaceTransaction advanced base to the refreshed metadata to satisfy the optimistic lock but committed the stale staged metadata. This dropped the concurrent commit's snapshots from history and reused its sequence number, violating sequence-number monotonicity. A replace is last-writer-wins on the current schema, spec, and data, but it is not a drop-and-recreate: the table must keep its history. Rebuild the replacement on the refreshed base and replay the staged updates so the concurrent commit's history is preserved and the replacement snapshot's sequence number is re-derived from the refreshed base. Add TableMetadata.buildReplacementPreservingIds, which keeps the staged schema's field IDs instead of re-deriving them by name, so a concurrent schema change cannot shift the IDs the staged data files were written against. Add tests covering the rebase, history preservation, and field-id stability. Update the two server-side conflict tests in CatalogTests, which previously asserted a spurious last-assigned-id requirement failure, to assert the replace succeeds as last-writer-wins.
Co-authored-by: Sreesh Maheshwar <maheshwarsreesh@gmail.com>
44eb85c to
f19a18c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #16232
Rebased on PR #16975
Replacing a table replaces its schema, partitioning, and data, but it keeps the
table's history instead of deleting the table and making a new one. The old code
didn't do this when another commit landed first. While a replace was being
prepared, a second commit could update the table. The replace then reloaded the
latest table state to pass the catalog's conflict check, but still wrote out the
changes it had prepared at the start. The second commit's snapshots disappeared
from history, and if that commit had changed the schema, the replace could end
up pointing columns at IDs that no longer matched the data files.
Now, when the table has changed, the replace is re-applied on top of the latest
table state: the new schema, partitioning, sort order, location, and properties
are kept, but they sit on top of the other commit's snapshots, and the replace's
own new data is added after it. If the table no longer exists when the replace
commits (for example, it was dropped in the meantime), there is nothing to build
on, so the replace just creates the table fresh.
Add TableMetadata.buildReplacementPreservingIds. It keeps the field IDs the
replace schema already has, instead of matching columns up by name and handing
out new IDs. Matching by name can move a column to a different ID than its data
files were written with, which makes that data unreadable.
Two design notes:
The replace gets its own re-apply path rather than reusing
applyUpdates, because a replace's schema and partitioning live in the prepared
metadata, not in the list of pending updates, so replaying only those updates
would drop the replacement and leave the old schema in place.
It uses buildReplacementPreservingIds rather than buildReplacement
so a concurrent schema change can't shift the replace's field IDs away
from its data files.
Add tests for the rebuilt replace, kept history, stable field IDs, and the
dropped-table case. Two REST-only tests used to expect a replace with an
unchanged schema or partitioning to fail on a spurious ID check; update them to
expect it to succeed.