feat(transaction): add OverwriteAction with CoW delete support#76
Conversation
truncate_table_summary() returns Result but the call site used .unwrap(), causing a panic on any overwrite of a non-empty table if the previous snapshot summary has unparseable property values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The SnapshotProducer precondition check blocked snapshots with only deleted files and no added files or snapshot properties. A delete-only Overwrite snapshot is valid per the Iceberg spec — the existing manifests are rewritten with the target entries marked as ManifestStatus::Deleted. Relax the check to also allow the case when deleted_data_files is non-empty, so callers can clear a table without simultaneously adding new files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
robertbuessow
left a comment
There was a problem hiding this comment.
Can't say I understand the changes but there seem to be no tests? Does it make sense to add some later?
There are tests in |
|
Had Claude do a review. Feel free to ignore -- pasting just in case you find anything useful when browsing over it. Review of #76 — OverwriteAction cherry-pickCherry-pick fidelity vs
1. Partition-spec evolution —
|
## Summary Adds atomic overwrite snapshot support to RustyIceberg.jl, enabling callers to replace all (or a subset of) existing Parquet files with a new set in a single Iceberg `Operation::Overwrite` snapshot. **Depends on**: RelationalAI/iceberg-rust#76 (cherry-pick of upstream [apache/iceberg-rust#2185](apache/iceberg-rust#2185), which adds `OverwriteAction` to `iceberg-rust`). ## Changes ### FFI (`iceberg_rust_ffi/src/transaction.rs`) - `IcebergOverwriteAction` — accumulates added + deleted `DataFile` lists - `iceberg_overwrite_action_new` / `_free` - `iceberg_overwrite_action_add_data_files` — move new files into action - `iceberg_overwrite_action_delete_data_files` — move files-to-delete into action - `iceberg_overwrite_action_apply` — calls `Transaction::overwrite().apply()` - `iceberg_table_list_data_files` — async walk of manifest list to collect all live `DataFile` records from the current snapshot ### Julia bindings (`src/transaction.jl`) - `OverwriteAction` struct + constructor / `free_overwrite_action!` - `add_data_files(action, files)` / `delete_data_files(action, files)` - `apply(action, tx)` / `with_overwrite(f, tx)` convenience helper - `list_data_files(table) -> DataFiles` - All new symbols exported from `RustyIceberg` ### Tests (`test/overwrite_tests.jl`) Self-contained, no Docker — all tests use `mktempdir` + `catalog_create_memory`: - OverwriteAction lifecycle (new / free / double-free) - `list_data_files` on empty table - `list_data_files` after append - Overwrite replaces **all** existing files - Overwrite deletes only **explicitly listed** files; others survive intact - Overwrite add-only (no deletes) produces a new snapshot - Two sequential overwrites converge correctly - Error handling: freed action, null DataFiles, committed (consumed) transaction ## Usage ```julia # Replace all existing files atomically old_files = list_data_files(table) new_files = RustyIceberg.with_data_file_writer(table) do w write(w, new_data) end updated_table = with_transaction(table, catalog) do tx with_overwrite(tx) do action add_data_files(action, new_files) delete_data_files(action, old_files) end end ``` ## Test plan - [ ] `make run-containers && make test` passes (all overwrite testsets green) - [ ] Existing test suite unaffected (27875 pre-existing tests still pass) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Richard Gankema <richardgankema@gmail.com>
Summary
add OverwriteAction with CoW delete supportfrom upstream feat(transaction): add OverwriteAction with CoW delete support apache/iceberg-rust#2185OverwriteActiontoTransaction, enabling atomic replacement of data files with anOperation::Overwritesnapshotupdate_snapshot_summarieswhere.unwrap()ontruncate_table_summarywould panic on overwrite of a non-empty table (.unwrap()→?)Changes
crates/iceberg/src/transaction/overwrite.rs(new) —OverwriteActionbuilder:add_data_files(),delete_data_files(),commit()crates/iceberg/src/transaction/mod.rs— exposesTransaction::overwrite()crates/iceberg/src/spec/manifest/writer.rs—ManifestWriter::add_deleted_entry()for writing deleted manifest entriescrates/iceberg/src/transaction/snapshot.rs—SnapshotProducer::snapshot_id()gettercrates/iceberg/src/spec/snapshot_summary.rs— panic fix (our addition on top of the cherry-pick)Motivation
Needed to support an overwrite/replace-all-files use case in RustyIceberg.jl, where a Julia caller can atomically replace all existing Parquet files in a table with a new set.
Test plan
cargo check -p icebergpassescrates/icebergpassoverwrite.rscover: empty-file errors, snapshot properties, partition validation, manifest structure, end-to-end catalog commit with deletions🤖 Generated with Claude Code