Core: Fix concurrent mutation of deleteFiles during parallel manifest filtering by liucao-dd · Pull Request #16984 · apache/iceberg

liucao-dd · 2026-06-28T00:30:17Z

Summary

Fix a race in ManifestFilterManager where parallel manifest filtering concurrently mutates the shared deleteFiles set, corrupting its backing store.
Make duplicateDeleteCount thread-safe with AtomicInteger.
Add a high-contention regression test for concurrent adds to deleteFiles during a full overwrite across many manifests.

Problem

ManifestFilterManager.filterManifests() filters manifests in parallel via Tasks.range(...).executeWith(workerPoolSupplier.get()). During expression-based deletes, each worker thread calls deleteFiles.add(fileCopy) on a shared instance-level set backed by WrapperSet / LinkedHashSet, which is not thread-safe.

Under heavy concurrent adds (large overwrites spanning many manifests), the backing LinkedHashMap can corrupt. Observed symptoms include ClassCastException: LinkedHashMap$Entry cannot be cast to HashMap$TreeNode, StackOverflowError, and hangs.

duplicateDeleteCount is also incremented from worker threads with a non-atomic ++, which can lose increments.

Fix

Wrap deleteFiles with Collections.synchronizedSet(newFileSet()) so concurrent add/contains calls from worker threads are safe, and convert duplicateDeleteCount to AtomicInteger.

Collections.synchronizedSet synchronizes individual method calls. Iteration is not auto-synchronized, so iterating deleteFiles must remain after the parallel filtering phase (current behavior) or be guarded by synchronized (deleteFiles). This invariant is documented at the field.

Why `Collections.synchronizedSet`

The fix is intentionally minimal and confined to ManifestFilterManager, while preserving the existing semantics of deleteFiles exactly:

Location-based matching. newFileSet() returns a DataFileSet / DeleteFileSet, which match files by file.location() via WrapperSet. A raw concurrent set keyed on DataFile / DeleteFile would use object identity because the file classes do not override equals/hashCode. That would break the explicit deleteFile(...) path: the caller-supplied file instance added to deleteFiles would not match the separate file instance read back from a manifest during filtering.
Insertion order. WrapperSet documents that it maintains insertion order. It is a shared api/ type used outside this path, so re-backing it with a non-ordered concurrent map to fix one caller would silently change that contract for every consumer.

Test plan

./gradlew :iceberg-core:test --tests org.apache.iceberg.TestOverwrite
./gradlew :iceberg-core:test --tests org.apache.iceberg.TestDeleteFiles
./gradlew :iceberg-core:test --tests org.apache.iceberg.TestOverwriteWithValidation
./gradlew :iceberg-core:spotlessApply :iceberg-core:spotlessJavaCheck

testFullOverwriteAcrossManyManifests disables manifest merging and appends 16 manifests of 64 files each, using file locations engineered to share one String.hashCode() so that DataFileSet wrapper hashes all collide into a single bucket. A full overwrite (alwaysTrue()) then forces many worker threads to add to deleteFiles concurrently, concentrating contention in that bucket. Without the fix this reproduces the corruption (ClassCastException / StackOverflowError / hang); with the fix it passes consistently.

AI Disclosure
LLM is used to research codebase standards and draft the regression test that actually fails without the fix, and drafting the PR description.

Seems like since I opened the issue yesterday there are already 2 open PRs attempted to close the issue, hence adding some thoughts:

A deferred-merge alternative, where worker threads collect deleted files into a concurrent structure and the caller merges them into deleteFiles after the parallel phase, is also behavior-preserving: expression-deleted files added to deleteFiles during filtering are not needed by other workers to decide expression matches, and the actual post-phase consumers (filesToBeDeleted() and required-delete validation) run after Tasks.range(...) joins. It is a reasonable design with a lock-free hot path that is has higher throughput potential, but more invasive. This is available in #16980

My PR prefers the synchronized wrapper for its minimal change and exact preservation of WrapperSet semantics; a deferred merge is a sensible alternative if we have proof on the lock contention.

In object-store-backed deployments, each manifest is read from S3/GCS, so millisecond-scale I/O per manifest dominates both the microseconds of CPU and the nanoseconds of lock. So i would expect the lock to be genuinely negligible here. If the env is local FS, warm cache, manifests of many tiny entries, high core count, then there is merit to the deferred-merge approach.

Deferring the decision to the community and maintainers.

…nager for large table overwrite

github-actions Bot added the core label Jun 28, 2026

Fix concurrent add race condition for deleteFiles of ManifestFilterMa…

9d16b52

…nager for large table overwrite

liucao-dd force-pushed the fix-concurrent-add-race-condition-ManifestFilterManager branch from c17686d to 9d16b52 Compare June 28, 2026 00:40

liucao-dd mentioned this pull request Jun 28, 2026

Core: Fix concurrent modification of deleteFiles in ManifestFilterManager #16980

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Core: Fix concurrent mutation of deleteFiles during parallel manifest filtering#16984

Core: Fix concurrent mutation of deleteFiles during parallel manifest filtering#16984
liucao-dd wants to merge 1 commit into
apache:mainfrom
liucao-dd:fix-concurrent-add-race-condition-ManifestFilterManager

liucao-dd commented Jun 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

liucao-dd commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Why Collections.synchronizedSet

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liucao-dd commented Jun 28, 2026 •

edited

Loading

Why `Collections.synchronizedSet`