You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected: Reading from / deleting from a large Delta table that has deletion vectors (DVs) completes within a bounded, reasonable memory footprint. Vanilla Spark runs Delta's own "huge table" DV tests fine with a 1 GB test heap (-Xmx1024m).
Actual: Under the Gluten Velox bundle, the same reads grow the JVM's native (off-heap) memory monotonically until the kernel/cgroup OOM-kills the process. On Delta's synthetic 2-billion-row DV table the forked test JVM climbs to ~13 GB RSS even though its JVM heap is only -Xmx2G, i.e. ~11 GB is native (Velox), not heap. The growth tracks the duration of a single DV read over the huge table, which points at unbounded native materialization on the DV / metadata-row-index read path rather than normal query working set.
Concretely, two Delta tests reproduce it (suite org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite):
huge table: read from tables of 2B rows with existing DV of many zeros
huge table: delete a small number of rows from tables of 2B rows with DVs
Both operate on the suite's 2B-row table5. The read test alone grew the fork from ~5.9 GB to ~13.3 GB over ~13 minutes before the OOM-kill.
Likely area: native row-index materialization on the DV read path. Delta DV reads use the metadata row index (spark.databricks.delta.deletionVectors.useMetadataRowIndex, default true), and Gluten offloads that path to Velox (apache/gluten #12269 only falls back DML DV scans when useMetadataRowIndex=false, so the default read path stays native). A maintainer with Velox memory-tracking context should confirm the exact allocation site and whether it can be bounded/spilled.
Gluten version
main branch
Spark version
spark-4.0.x (actually Spark 4.1.0 -- Delta 4.2.0's default; the form has no 4.1 option)
Spark configurations
From the Delta-on-Gluten test harness (patched DeltaSQLCommandTest):
(The forked test JVM heap is -Xmx2G; off-heap is capped at 2g, yet native RSS still reaches ~13 GB -- the allocation appears untracked / not honoring the off-heap cap.)
System information
CI runner: ubuntu-22.04 host, ~16 GB RAM, container apache/gluten:centos-9-jdk17. Not run via dev/info.sh (observed in CI).
Relevant logs
Evidence from the Delta Spark UT (Gluten) pipeline, run 28071158711, shard 2 (job 83108337324). Per-minute memory profiler during the "read from tables of 2B rows with existing DV of many zeros" test (p1289 = forked test JVM with -Xmx2G; p382 = sbt launcher):
MEM cgroup=12.53G JVMs=[2664M(p382) 5869M(p1289)]
MEM cgroup=13.70G JVMs=[2664M(p382) 7777M(p1289)]
MEM cgroup=13.97G JVMs=[2664M(p382) 8431M(p1289)]
MEM cgroup=14.32G JVMs=[2623M(p382) 11629M(p1289)]
MEM cgroup=14.77G JVMs=[1815M(p382) 13122M(p1289)] <- fork ~13.1G RSS, heap only 2G
MEM cgroup=14.91G JVMs=[1879M(p382) 13303M(p1289)]
Warning: Unable to read from client ... <- fork OOM-killed here
MEM cgroup=1.92G JVMs=[1902M(p382)] <- fork gone; cgroup drops ~13G
After the kernel killed the fork, sbt wedged on the dead fork (no hs_err, no heap dump -- the signature of a kernel/cgroup OOM-kill rather than a JVM OOM), and a hang watchdog had to kill the shard after ~16 minutes of silence.
Run delta-io/delta v4.2.0 DeletionVectorsSuite with the Gluten plugin enabled (spark.plugins=org.apache.gluten.GlutenPlugin), e.g. the two "huge table ... 2B rows ... DV" tests above.
Equivalent minimal repro: with Gluten Velox enabled, run a count/sum scan over a Delta table of billions of rows that carries deletion vectors; watch native RSS grow without bound.
Impact / workaround
Makes large-table DV reads unusable under Gluten Velox (native memory blows up and the process is OOM-killed).
In the Delta CI pipeline (apache/gluten PR [VL][Delta] Delta CI pipeline #12278) these two tests are force-failed in setup-delta.sh to keep the shard from OOM-hanging. That workaround should be removed once this is fixed.
This was written with the assistance of AI tooling.
Backend
VL (Velox)
Bug description
Bug description
Expected: Reading from / deleting from a large Delta table that has deletion vectors (DVs) completes within a bounded, reasonable memory footprint. Vanilla Spark runs Delta's own "huge table" DV tests fine with a 1 GB test heap (
-Xmx1024m).Actual: Under the Gluten Velox bundle, the same reads grow the JVM's native (off-heap) memory monotonically until the kernel/cgroup OOM-kills the process. On Delta's synthetic 2-billion-row DV table the forked test JVM climbs to ~13 GB RSS even though its JVM heap is only
-Xmx2G, i.e. ~11 GB is native (Velox), not heap. The growth tracks the duration of a single DV read over the huge table, which points at unbounded native materialization on the DV / metadata-row-index read path rather than normal query working set.Concretely, two Delta tests reproduce it (suite
org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite):huge table: read from tables of 2B rows with existing DV of many zeroshuge table: delete a small number of rows from tables of 2B rows with DVsBoth operate on the suite's 2B-row
table5. The read test alone grew the fork from ~5.9 GB to ~13.3 GB over ~13 minutes before the OOM-kill.Likely area: native row-index materialization on the DV read path. Delta DV reads use the metadata row index (
spark.databricks.delta.deletionVectors.useMetadataRowIndex, default true), and Gluten offloads that path to Velox (apache/gluten #12269 only falls back DML DV scans whenuseMetadataRowIndex=false, so the default read path stays native). A maintainer with Velox memory-tracking context should confirm the exact allocation site and whether it can be bounded/spilled.Gluten version
main branch
Spark version
spark-4.0.x (actually Spark 4.1.0 -- Delta 4.2.0's default; the form has no 4.1 option)
Spark configurations
From the Delta-on-Gluten test harness (patched
DeltaSQLCommandTest):(The forked test JVM heap is -Xmx2G; off-heap is capped at 2g, yet native RSS still reaches ~13 GB -- the allocation appears untracked / not honoring the off-heap cap.)
System information
CI runner: ubuntu-22.04 host, ~16 GB RAM, container apache/gluten:centos-9-jdk17. Not run via dev/info.sh (observed in CI).
Relevant logs
Evidence from the Delta Spark UT (Gluten) pipeline, run 28071158711, shard 2 (job 83108337324). Per-minute memory profiler during the "read from tables of 2B rows with existing DV of many zeros" test (p1289 = forked test JVM with -Xmx2G; p382 = sbt launcher):
After the kernel killed the fork, sbt wedged on the dead fork (no hs_err, no heap dump -- the signature of a kernel/cgroup OOM-kill rather than a JVM OOM), and a hang watchdog had to kill the shard after ~16 minutes of silence.
Reproduction
DeletionVectorsSuitewith the Gluten plugin enabled (spark.plugins=org.apache.gluten.GlutenPlugin), e.g. the two "huge table ... 2B rows ... DV" tests above.Impact / workaround
setup-delta.shto keep the shard from OOM-hanging. That workaround should be removed once this is fixed.This was written with the assistance of AI tooling.
Gluten version
main branch
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs