Skip to content

feat(transaction): add OverwriteAction with CoW delete support#76

Merged
gbrgr merged 4 commits into
mainfrom
gb/overwrite-action
May 28, 2026
Merged

feat(transaction): add OverwriteAction with CoW delete support#76
gbrgr merged 4 commits into
mainfrom
gb/overwrite-action

Conversation

@gbrgr
Copy link
Copy Markdown
Collaborator

@gbrgr gbrgr commented May 28, 2026

Summary

Changes

  • crates/iceberg/src/transaction/overwrite.rs (new) — OverwriteAction builder: add_data_files(), delete_data_files(), commit()
  • crates/iceberg/src/transaction/mod.rs — exposes Transaction::overwrite()
  • crates/iceberg/src/spec/manifest/writer.rsManifestWriter::add_deleted_entry() for writing deleted manifest entries
  • crates/iceberg/src/transaction/snapshot.rsSnapshotProducer::snapshot_id() getter
  • crates/iceberg/src/spec/snapshot_summary.rs — panic fix (our addition on top of the cherry-pick)

Motivation

Needed to support an overwrite/replace-all-files use case in RustyIceberg.jl, where a Julia caller can atomically replace all existing Parquet files in a table with a new set.

Test plan

  • cargo check -p iceberg passes
  • Existing unit tests in crates/iceberg pass
  • New tests in overwrite.rs cover: empty-file errors, snapshot properties, partition validation, manifest structure, end-to-end catalog commit with deletions

🤖 Generated with Claude Code

glitchy and others added 3 commits May 28, 2026 07:56
truncate_table_summary() returns Result but the call site used .unwrap(),
causing a panic on any overwrite of a non-empty table if the previous
snapshot summary has unparseable property values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The SnapshotProducer precondition check blocked snapshots with only
deleted files and no added files or snapshot properties. A delete-only
Overwrite snapshot is valid per the Iceberg spec — the existing manifests
are rewritten with the target entries marked as ManifestStatus::Deleted.

Relax the check to also allow the case when deleted_data_files is
non-empty, so callers can clear a table without simultaneously adding
new files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gbrgr gbrgr requested a review from robertbuessow May 28, 2026 07:21
Copy link
Copy Markdown

@robertbuessow robertbuessow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say I understand the changes but there seem to be no tests? Does it make sense to add some later?

@gbrgr
Copy link
Copy Markdown
Collaborator Author

gbrgr commented May 28, 2026

Can't say I understand the changes but there seem to be no tests? Does it make sense to add some later?

There are tests in overwrite.rs

@gbrgr gbrgr merged commit e173637 into main May 28, 2026
20 checks passed
@gbrgr gbrgr deleted the gb/overwrite-action branch May 28, 2026 14:09
@hall-alex
Copy link
Copy Markdown

Had Claude do a review. Feel free to ignore -- pasting just in case you find anything useful when browsing over it.


Review of #76 — OverwriteAction cherry-pick

Cherry-pick fidelity vs apache/iceberg-rust#2185: byte-identical for overwrite.rs, snapshot.rs, append.rs. manifest/writer.rs differs only cosmetically — no semantic divergence. The two follow-up commits (b2fdde4c panic → ?, 50b335f3 allow delete-only) sit cleanly on top.

On tests: 5 tests do exist in crates/iceberg/src/transaction/overwrite.rs (inside mod tests, easy to miss when skimming the diff) and all pass: cargo test -p iceberg --lib transaction::overwrite::tests. The upstream reviewer's "no tests" comment was incorrect.


1. Partition-spec evolution — rewrite_manifest always uses the default spec

overwrite.rs:227 hardcodes table.metadata().default_partition_spec() for every rewritten manifest, but manifest_file.partition_spec_id may identify an older spec — the data-file partition tuples were encoded under that one. For tables that have evolved partitioning, rewriting under the wrong spec corrupts partition metadata. This also exists upstream; worth raising there.

Claude Code prompt:
In crates/iceberg/src/transaction/overwrite.rs, rewrite_manifest passes table.metadata().default_partition_spec() into ManifestWriterBuilder. This is wrong for tables with partition-spec evolution — the rewritten manifest should use the spec the original manifest was written under. Change it to look up table.metadata().partition_spec_by_id(manifest_file.partition_spec_id) (returning an error if missing). Add a unit test that creates a table, fast-appends a file under spec_id=0, evolves the spec, then runs OverwriteAction deleting the original file, and asserts the rewritten manifest reports the original partition_spec_id. Then prepare a one-paragraph upstream report against apache/iceberg-rust#2185.

2. Latent panic in update_snapshot_summaries, newly reachable via delete-only overwrite

In transaction/snapshot.rs around line 407, previous_snapshot is looked up via table_metadata.snapshot_by_id(self.snapshot_id) — but self.snapshot_id is the new (uncommitted) snapshot, so this always returns None. previous_summary therefore becomes None, the truncate_table_summary arm at snapshot_summary.rs:350 never fires, and update_totals runs with previous_total = 0. For a delete-only overwrite (now allowed by 50b335f3), removed > added then underflows u64 → panic. The b2fdde4c unwrap-→-? change is correct in spirit but addresses a path that's never taken; the actual panic is downstream. The existing test_overwrite_with_deleted_files happens to add 1 and delete 1 (net 0), masking the issue.

Claude Code prompt:
In crates/iceberg/src/transaction/snapshot.rs around line 407, the lookup table_metadata.snapshot_by_id(self.snapshot_id) always returns None because self.snapshot_id is the new (uncommitted) snapshot's id. The intent is to fetch the parent of the new snapshot — which, before commit, is just table.metadata().current_snapshot(). Replace the chain accordingly and drop the now-dead .and_then(parent_snapshot_id) hop. Then add a test in transaction/overwrite.rs that fast-appends 3 files, then runs overwrite().delete_data_files(vec![one_file]) with no adds, and asserts the commit succeeds AND the snapshot summary reports total-data-files=2, deleted-data-files=1. Without the fix this test should panic via u64 underflow — verify both with and without.

3. Delete-only overwrite on a table with no current snapshot

With no current snapshot, existing_manifest returns Ok(vec![]), validate_added_data_files is empty, and manifest_file()'s precondition passes because deleted_data_files is non-empty (newly enabled by 50b335f3). Result: a snapshot with an empty manifest list but deleted-data-files=N in the summary. No panic, but the snapshot is a misleading no-op.

Claude Code prompt:
In crates/iceberg/src/transaction/overwrite.rs, OverwriteAction::commit lets deletes "succeed" against a table with no current snapshot — the resulting snapshot has an empty manifest list but the summary claims N deletes happened. Add an early validation in OverwriteAction::commit (before constructing the producer) that returns ErrorKind::PreconditionFailed if deleted_data_files is non-empty and table.metadata().current_snapshot() is None. Add a test in the same file's mod tests that uses make_v2_minimal_table() (no prior snapshot), calls tx.overwrite().delete_data_files(vec![data_file]), and asserts the commit returns an error.

4. Asymmetry: subsequent overwrite drops the delete-only manifest that FastAppend preserves

The cherry-pick deliberately widened FastAppendOperation::existing_manifest to include has_deleted_files() so a delete-only manifest survives a later fast-append (the new test asserts this). But OverwriteOperation::existing_manifest still filters with has_added_files() || has_existing_files() in both branches — the early-return at ~line 172 and the continue at ~line 179. After overwrite A produces a delete-only manifest, overwrite B drops it. Live-data semantics aren't broken, but time-travel auditability of the most recent delete is lost on overwrite chains, and it's inconsistent with FastAppend.

Claude Code prompt:
In crates/iceberg/src/transaction/overwrite.rs, OverwriteOperation::existing_manifest filters out delete-only manifests (matches !has_added_files() && !has_existing_files()), while the sibling FastAppendOperation::existing_manifest was just widened in this PR to include has_deleted_files(). Mirror the same condition in both code paths of OverwriteOperation::existing_manifest (the empty-deletes early return at ~line 172, and the continue at ~line 179). Add a test that does: append → overwrite (delete A, add B) → overwrite (delete B, add C), and assert the final snapshot's manifest list still contains the originally-deleted-A entry.

5. Test gap: "manifest has no matching deletes" passthrough

overwrite.rs:197 (result.push(manifest_file.clone())) is never hit by current tests — the prior snapshot in test_overwrite_with_deleted_files has exactly one manifest containing the deleted file, so every iteration goes through rewrite_manifest. A two-manifest fixture would cover the passthrough.

Claude Code prompt:
In crates/iceberg/src/transaction/overwrite.rs's mod tests, add test_overwrite_unaffected_manifest_passthrough. Use new_memory_catalog + make_v3_minimal_table_in_catalog. Fast-append file A, commit; fast-append file B, commit (now 2 manifests). Then overwrite().delete_data_files(vec![A]), commit. Assert the resulting snapshot's manifest list contains (a) a rewritten manifest where A is Deleted, and (b) the original manifest containing B passed through unchanged — compare manifest_file.manifest_path against the pre-overwrite manifest's path. This proves the passthrough branch is exercised.

gbrgr added a commit to RelationalAI/RustyIceberg.jl that referenced this pull request May 29, 2026
## Summary

Adds atomic overwrite snapshot support to RustyIceberg.jl, enabling
callers to replace all (or a subset of) existing Parquet files with a
new set in a single Iceberg `Operation::Overwrite` snapshot.

**Depends on**: RelationalAI/iceberg-rust#76 (cherry-pick of upstream
[apache/iceberg-rust#2185](apache/iceberg-rust#2185),
which adds `OverwriteAction` to `iceberg-rust`).

## Changes

### FFI (`iceberg_rust_ffi/src/transaction.rs`)
- `IcebergOverwriteAction` — accumulates added + deleted `DataFile`
lists
- `iceberg_overwrite_action_new` / `_free`
- `iceberg_overwrite_action_add_data_files` — move new files into action
- `iceberg_overwrite_action_delete_data_files` — move files-to-delete
into action
- `iceberg_overwrite_action_apply` — calls
`Transaction::overwrite().apply()`
- `iceberg_table_list_data_files` — async walk of manifest list to
collect all live `DataFile` records from the current snapshot

### Julia bindings (`src/transaction.jl`)
- `OverwriteAction` struct + constructor / `free_overwrite_action!`
- `add_data_files(action, files)` / `delete_data_files(action, files)`
- `apply(action, tx)` / `with_overwrite(f, tx)` convenience helper
- `list_data_files(table) -> DataFiles`
- All new symbols exported from `RustyIceberg`

### Tests (`test/overwrite_tests.jl`)
Self-contained, no Docker — all tests use `mktempdir` +
`catalog_create_memory`:
- OverwriteAction lifecycle (new / free / double-free)
- `list_data_files` on empty table
- `list_data_files` after append
- Overwrite replaces **all** existing files
- Overwrite deletes only **explicitly listed** files; others survive
intact
- Overwrite add-only (no deletes) produces a new snapshot
- Two sequential overwrites converge correctly
- Error handling: freed action, null DataFiles, committed (consumed)
transaction

## Usage

```julia
# Replace all existing files atomically
old_files = list_data_files(table)
new_files = RustyIceberg.with_data_file_writer(table) do w
    write(w, new_data)
end
updated_table = with_transaction(table, catalog) do tx
    with_overwrite(tx) do action
        add_data_files(action, new_files)
        delete_data_files(action, old_files)
    end
end
```

## Test plan
- [ ] `make run-containers && make test` passes (all overwrite testsets
green)
- [ ] Existing test suite unaffected (27875 pre-existing tests still
pass)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Richard Gankema <richardgankema@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants