Skip to content

Backport ParallelIterable memory fixes (#9402, #10691, #10978) to 1.5.2#245

Merged
jiang95-dev merged 3 commits into
linkedin:openhouse-1.5.2from
cbb330:chbush/backport-parallel-iterable-1.5.2
Jun 8, 2026
Merged

Backport ParallelIterable memory fixes (#9402, #10691, #10978) to 1.5.2#245
jiang95-dev merged 3 commits into
linkedin:openhouse-1.5.2from
cbb330:chbush/backport-parallel-iterable-1.5.2

Conversation

@cbb330

@cbb330 cbb330 commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Type: Backport / cherry-pick

Backports upstream Apache Iceberg ParallelIterable memory fixes/optimizations (from releases 1.6.0 and 1.6.1) into the openhouse-1.5.2 line.

Motivation: prerequisite for switching fork Trino from org.apache.iceberg (1.6.1) to com.linkedin.iceberg (1.5.2). Without these, ParallelIterable can grow ParallelIterator.queue without bound (driver memory pressure / leak after iterator close), which would regress Trino behavior on the 1.5.2 connector.

Cherry-picked commits

Each commit was produced with git cherry-pick -x (original authorship + (cherry picked from commit <sha>) provenance preserved) and applied in upstream chronological order.

PR-number note (discoverability): the canonical fixes landed on apache main as #10691 and #10978. This backport cherry-picked apache's 1.6.x release-branch equivalents (#10787, #10979), which are content-identical to the main PRs. Both numbers are listed below so the fix is findable by either reference.

Order Apache fix (main) Cherry-picked from (1.6.x) Upstream commit Description
1 apache/iceberg#9402 (same commit; in 1.6.0) d3cb1b696 Fix memory leak: queue keeps being populated after iterator close
2 apache/iceberg#10691 apache/iceberg#10787 e18a2fe10 Limit memory consumption by yielding tasks when the queue is full
3 apache/iceberg#10978 apache/iceberg#10979 ed53c6d32 Drop the queue low water mark (reduces manifest I/O for LIMIT queries)

Order matters: applying #9402 first matches upstream history and lets the #10691/#10787 rewrite apply without conflict.

Conflicts

None. All three cherry-picks applied cleanly. TestParallelIterable was already on JUnit 5 + Awaitility in 1.5.2.x, so no test adaptation was required. The cherry-picks are byte-for-byte upstream apart from the prepended Backport of apache/iceberg#NNNN to openhouse-1.5.2. line in each commit message.

Verification

./gradlew :iceberg-core:test --tests org.apache.iceberg.util.TestParallelIterablepass (JDK 17; Gradle 8.1.1 requires JDK <= 17).

The resulting core/src/main/java/org/apache/iceberg/util/ParallelIterable.java is byte-identical to apache's post-#10978 state (bcb32818d) apart from non-functional annotations (@VisibleForTesting, @SuppressWarnings) and a test-only queueSize() helper.

Note for reviewers

ParallelIterable.java emits a pre-existing FutureReturnValueIgnored errorprone warning in the new yielding code — non-fatal, present upstream, build succeeds.

@github-actions github-actions Bot added the CORE label Jun 4, 2026
Heltman and others added 3 commits June 3, 2026 17:32
Core: Fix ParallelIterable memory leak where queue continues to be populated even after iterator close (#9402)

(cherry picked from commit d3cb1b6)
Core: Limit ParallelIterable memory consumption by yielding in tasks (backport #10691) (#10787)

ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for
processing input iterables. This defaults to 2 * # CPU cores.  When one
or some of the input iterables are considerable in size and the
ParallelIterable consumer is not quick enough, this could result in
unbounded allocation inside `ParallelIterator.queue`. This commit bounds
the queue. When queue is full, the tasks yield and get removed from the
executor. They are resumed when consumer catches up.

(cherry picked from commit 7831a8d)

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
(cherry picked from commit e18a2fe10214f5f3ffa0a317a28af8b2a619817a)
Drop ParallelIterable's queue low water mark (#10979)

As part of the change in commit
7831a8d, queue low water mark was
introduced. However, it resulted in increased number of manifests being
read when planning LIMIT queries in Trino Iceberg connector. To avoid
increased I/O, back out the change for now.

(cherry picked from commit ed53c6d326cb7efef2d41e26ef001e1d7b17fd78)
@jiang95-dev jiang95-dev merged commit 198cc01 into linkedin:openhouse-1.5.2 Jun 8, 2026
23 checks passed
@cbb330 cbb330 changed the title Backport ParallelIterable memory fixes (#9402, #10787, #10979) to 1.5.2 Backport ParallelIterable memory fixes (#9402, #10691, #10978) to 1.5.2 Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants