Skip to content

Core: Rebase replace transaction onto refreshed metadata on concurrent commit#16975

Open
Kurtiscwright wants to merge 3 commits into
apache:mainfrom
Kurtiscwright:replace-seq-reuse-repro
Open

Core: Rebase replace transaction onto refreshed metadata on concurrent commit#16975
Kurtiscwright wants to merge 3 commits into
apache:mainfrom
Kurtiscwright:replace-seq-reuse-repro

Conversation

@Kurtiscwright

@Kurtiscwright Kurtiscwright commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Fixes #16232
Rebased on PR #16975

Replacing a table replaces its schema, partitioning, and data, but it keeps the
table's history instead of deleting the table and making a new one. The old code
didn't do this when another commit landed first. While a replace was being
prepared, a second commit could update the table. The replace then reloaded the
latest table state to pass the catalog's conflict check, but still wrote out the
changes it had prepared at the start. The second commit's snapshots disappeared
from history, and if that commit had changed the schema, the replace could end
up pointing columns at IDs that no longer matched the data files.

Now, when the table has changed, the replace is re-applied on top of the latest
table state: the new schema, partitioning, sort order, location, and properties
are kept, but they sit on top of the other commit's snapshots, and the replace's
own new data is added after it. If the table no longer exists when the replace
commits (for example, it was dropped in the meantime), there is nothing to build
on, so the replace just creates the table fresh.

Add TableMetadata.buildReplacementPreservingIds. It keeps the field IDs the
replace schema already has, instead of matching columns up by name and handing
out new IDs. Matching by name can move a column to a different ID than its data
files were written with, which makes that data unreadable.

Two design notes:

  1. The replace gets its own re-apply path rather than reusing
    applyUpdates, because a replace's schema and partitioning live in the prepared
    metadata, not in the list of pending updates, so replaying only those updates
    would drop the replacement and leave the old schema in place.

  2. It uses buildReplacementPreservingIds rather than buildReplacement
    so a concurrent schema change can't shift the replace's field IDs away
    from its data files.

Add tests for the rebuilt replace, kept history, stable field IDs, and the
dropped-table case. Two REST-only tests used to expect a replace with an
unchanged schema or partitioning to fail on a spurious ID check; update them to
expect it to succeed.

Kurtis Wright and others added 2 commits June 26, 2026 17:10
…t commit

When a concurrent commit lands after a replace transaction is staged,
commitReplaceTransaction advanced base to the refreshed metadata to satisfy the
optimistic lock but committed the stale staged metadata. This dropped the
concurrent commit's snapshots from history and reused its sequence number,
violating sequence-number monotonicity. A replace is last-writer-wins on the
current schema, spec, and data, but it is not a drop-and-recreate: the table
must keep its history.

Rebuild the replacement on the refreshed base and replay the staged updates so
the concurrent commit's history is preserved and the replacement snapshot's
sequence number is re-derived from the refreshed base. Add
TableMetadata.buildReplacementPreservingIds, which keeps the staged schema's
field IDs instead of re-deriving them by name, so a concurrent schema change
cannot shift the IDs the staged data files were written against.

Add tests covering the rebase, history preservation, and field-id stability.
Update the two server-side conflict tests in CatalogTests, which previously
asserted a spurious last-assigned-id requirement failure, to assert the replace
succeeds as last-writer-wins.
Co-authored-by: Sreesh Maheshwar <maheshwarsreesh@gmail.com>
@Kurtiscwright Kurtiscwright force-pushed the replace-seq-reuse-repro branch from 44eb85c to f19a18c Compare June 26, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Core: REPLACE TABLE transaction silently overwrites concurrent committed changes

1 participant