feat(table): add support for merge-on-read delete by alexandre-normand · Pull Request #721 · apache/iceberg-go

alexandre-normand · 2026-02-12T05:22:03Z

This adds support for merge-on-read deletes. It offers an alternative to the copy-on-write to generate position delete files instead of rewriting existing data files.

I'm not very confident in the elegance of my solution as I'm still new to the internals of iceberg-go but the high-level is:

Reuse the classification code from the existing delete implementation to get the list of files of dropped files vs files with partial deletes
Reuse the arrow scanning facilities to filter records from the data files with partial deletes and emit position delete records with file path and position.
- This is done by reusing the pipeline code and function and making the first stage in the pipeline one to enrich the RecordBatch with the file Path and position before the original position is lost due to filtering.
- After filtering, the RecordBatch is projected to the position delete schema (i.e. the original schema fields are dropped)
Once we have filtered PositionDelete records that need to be emitted, we reuse the record to file writing to generate position delete files.

Testing

Integration tests were added to exercise the partitioned and unpartitioned paths and the data is such that it's meant to actually produce a position delete file rather than just go through the quick path that drops an entire file because all records are gone.

Indirect fixes

While working on this change and adding the testing for the partitioned table deletions, I realized that the manifest evaluation when the filter affected a field that was part of a partition spec was not built correctly. It needed to use similar code as what's done during scanning to build projections and build a manifest evaluator per partition id. This is fixed in this PR but this technically also applies to copy-on-write and overwrite paths so the fix goes beyond the scope of the merge-on-read.

alexandre-normand · 2026-02-13T00:00:39Z

manifest.go

 var PositionalDeleteSchema = NewSchema(0,
 	NestedField{ID: 2147483546, Type: PrimitiveTypes.String, Name: "file_path", Required: true},
-	NestedField{ID: 2147483545, Type: PrimitiveTypes.Int32, Name: "pos", Required: true},
+	NestedField{ID: 2147483545, Type: PrimitiveTypes.Int64, Name: "pos", Required: true},


The spec says that pos is a long so I updated this to be Int64 rather than Int32.

laskoviymishka

Overall, this looks correct (I mostly compared it with Iceberg Java), but I think a couple of things need attention:

panic/recover is used in normal write-path error handling;
position-delete fanout needs additional tests;
missing focused local unit tests for invariant-heavy code (enrichRecordsWithPosDeleteFields, position-delete fanout).

Regarding the API change: the ToDataFile change is internal (package scope), but the ManifestWriter.ToManifestFile signature change is public and should be explicitly called out in the compatibility/release notes.

I would advocate for not changing this API and instead adding a new one with the extra argument, but this decision, in my opinion, is not critical.

laskoviymishka · 2026-02-16T21:32:30Z

table/arrow_utils.go

+
+	currentSpec, err := meta.CurrentSpec()
+	if err != nil || currentSpec == nil {
+		panic(fmt.Errorf("%w: cannot write files without a current spec", err))


Can we avoid panic/recover for expected failures here?
CurrentSpec() and schema conversion errors are regular error cases and returning an error iterator directly would be clearer/safer than panicking. Also, the recover block only handles string and error; any other panic type could produce yield(nil, nil), which masks failure.

laskoviymishka · 2026-02-16T21:33:12Z

table/arrow_scanner.go

+// enrichRecordsWithPosDeleteFields enriches a RecordBatch with the columns declared in the PositionalDeleteArrowSchema
+// so that during the pipeline filtering stages that sheds filtered out records, we still have a way to
+// preserve the original position of those records.
+func enrichRecordsWithPosDeleteFields(ctx context.Context, filePath iceberg.DataFile) recProcessFn {


Please add focused unit tests for enrichRecordsWithPosDeleteFields:

monotonically increasing pos across multiple input batches

exact appended schema/column ordering

empty batch behavior

failure mode if required fields are missing from PositionalDeleteArrowSchema.

laskoviymishka · 2026-02-16T21:35:09Z

manifest.go

 }

-func (w *ManifestWriter) ToManifestFile(location string, length int64) (ManifestFile, error) {
+func (w *ManifestWriter) ToManifestFile(location string, length int64, content ManifestContent) (ManifestFile, error) {


I think table/internal.DataFileStatistics.ToDataFile signature changes are acceptable (it's internal so who care?), but ManifestWriter.ToManifestFile is externally visible and now requires a new content arg.

This is backward incompatible change so 2 questions:

Is it really really needed?

If it's needed - we need to state it in changelog.

laskoviymishka · 2026-02-16T21:35:48Z

table/pos_delete_partitioned_fanout_writer.go

+// positionDeletePartitionedFanoutWriter distributes Arrow position delete records across multiple partitions based on
+// a partition specification, writing data to separate delete files for each partition using
+// a fanout pattern with configurable parallelism.
+type positionDeletePartitionedFanoutWriter struct {


Could we add unit tests for positionDeletePartitionedFanoutWriter covering:

mixed file_path values in one batch (assert/handle explicitly)

missing partitionDataByFilePath entry (error instead of silent nil)

empty batch behavior and cancellation path.

Also, consider avoiding defer record.Release() inside the loop to prevent delayed releases in long runs.

alexandre-normand force-pushed the alex.normand/merge-on-read-delete branch 5 times, most recently from 5079248 to 114fc57 Compare February 12, 2026 23:42

alexandre-normand commented Feb 13, 2026

View reviewed changes

alexandre-normand force-pushed the alex.normand/merge-on-read-delete branch 5 times, most recently from c9e30c4 to a7a7ce6 Compare February 15, 2026 06:45

alexandre-normand marked this pull request as ready for review February 15, 2026 06:46

feat(table) add support for merge-on-read delete

6d41651

alexandre-normand force-pushed the alex.normand/merge-on-read-delete branch from a7a7ce6 to 6d41651 Compare February 16, 2026 19:35

laskoviymishka suggested changes Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(table): add support for merge-on-read delete#721

feat(table): add support for merge-on-read delete#721
alexandre-normand wants to merge 1 commit intoapache:mainfrom
alexandre-normand:alex.normand/merge-on-read-delete

alexandre-normand commented Feb 12, 2026 •

edited

Loading

Uh oh!

alexandre-normand Feb 13, 2026

Uh oh!

laskoviymishka left a comment

Uh oh!

laskoviymishka Feb 16, 2026

Uh oh!

laskoviymishka Feb 16, 2026

Uh oh!

laskoviymishka Feb 16, 2026

Uh oh!

laskoviymishka Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexandre-normand commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Indirect fixes

Uh oh!

alexandre-normand Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexandre-normand commented Feb 12, 2026 •

edited

Loading