[VL] Native memory OOMs the test/executor JVM when reading large (billions-of-rows) Delta tables with deletion vectors

### Backend

VL (Velox)

### Bug description

## Bug description

**Expected:** Reading from / deleting from a large Delta table that has deletion vectors (DVs) completes within a bounded, reasonable memory footprint. Vanilla Spark runs Delta's own "huge table" DV tests fine with a 1 GB test heap (`-Xmx1024m`).

**Actual:** Under the Gluten Velox bundle, the same reads grow the JVM's **native** (off-heap) memory monotonically until the kernel/cgroup OOM-kills the process. On Delta's synthetic 2-billion-row DV table the forked test JVM climbs to ~13 GB RSS even though its JVM heap is only `-Xmx2G`, i.e. ~11 GB is native (Velox), not heap. The growth tracks the duration of a single DV read over the huge table, which points at unbounded native materialization on the DV / metadata-row-index read path rather than normal query working set.

Concretely, two Delta tests reproduce it (suite `org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite`):
- `huge table: read from tables of 2B rows with existing DV of many zeros`
- `huge table: delete a small number of rows from tables of 2B rows with DVs`

Both operate on the suite's 2B-row `table5`. The read test alone grew the fork from ~5.9 GB to ~13.3 GB over ~13 minutes before the OOM-kill.

Likely area: native row-index materialization on the DV read path. Delta DV reads use the metadata row index (`spark.databricks.delta.deletionVectors.useMetadataRowIndex`, default true), and Gluten offloads that path to Velox (apache/gluten #12269 only falls back DML DV scans when `useMetadataRowIndex=false`, so the default read path stays native). A maintainer with Velox memory-tracking context should confirm the exact allocation site and whether it can be bounded/spilled.

## Gluten version
main branch

## Spark version
spark-4.0.x (actually Spark 4.1.0 -- Delta 4.2.0's default; the form has no 4.1 option)

## Spark configurations

From the Delta-on-Gluten test harness (patched `DeltaSQLCommandTest`):

    spark.plugins                    = org.apache.gluten.GlutenPlugin
    spark.shuffle.manager            = org.apache.spark.shuffle.sort.ColumnarShuffleManager
    spark.memory.offHeap.enabled     = true
    spark.memory.offHeap.size        = 2g
    spark.gluten.sql.columnar.backend.velox... (default bundle config)
    Delta 4.2.0, Scala 2.13, JDK 17

(The forked test JVM heap is -Xmx2G; off-heap is capped at 2g, yet native RSS still reaches ~13 GB -- the allocation appears untracked / not honoring the off-heap cap.)

## System information
CI runner: ubuntu-22.04 host, ~16 GB RAM, container apache/gluten:centos-9-jdk17. Not run via dev/info.sh (observed in CI).

## Relevant logs

Evidence from the Delta Spark UT (Gluten) pipeline, run 28071158711, shard 2 (job 83108337324). Per-minute memory profiler during the "read from tables of 2B rows with existing DV of many zeros" test (p1289 = forked test JVM with -Xmx2G; p382 = sbt launcher):

    MEM cgroup=12.53G JVMs=[2664M(p382) 5869M(p1289)]
    MEM cgroup=13.70G JVMs=[2664M(p382) 7777M(p1289)]
    MEM cgroup=13.97G JVMs=[2664M(p382) 8431M(p1289)]
    MEM cgroup=14.32G JVMs=[2623M(p382) 11629M(p1289)]
    MEM cgroup=14.77G JVMs=[1815M(p382) 13122M(p1289)]   <- fork ~13.1G RSS, heap only 2G
    MEM cgroup=14.91G JVMs=[1879M(p382) 13303M(p1289)]
    Warning: Unable to read from client ...                 <- fork OOM-killed here
    MEM cgroup=1.92G  JVMs=[1902M(p382)]                    <- fork gone; cgroup drops ~13G

After the kernel killed the fork, sbt wedged on the dead fork (no hs_err, no heap dump -- the signature of a kernel/cgroup OOM-kill rather than a JVM OOM), and a hang watchdog had to kill the shard after ~16 minutes of silence.

## Reproduction
1. Build the Gluten Velox bundle (Spark 4.1 + Scala 2.13 + JDK 17, Delta profile).
2. Run delta-io/delta v4.2.0 `DeletionVectorsSuite` with the Gluten plugin enabled (`spark.plugins=org.apache.gluten.GlutenPlugin`), e.g. the two "huge table ... 2B rows ... DV" tests above.
   - Equivalent minimal repro: with Gluten Velox enabled, run a count/sum scan over a Delta table of billions of rows that carries deletion vectors; watch native RSS grow without bound.

## Impact / workaround
- Makes large-table DV reads unusable under Gluten Velox (native memory blows up and the process is OOM-killed).
- In the Delta CI pipeline (apache/gluten PR #12278) these two tests are force-failed in `setup-delta.sh` to keep the shard from OOM-hanging. That workaround should be removed once this is fixed.


This was written with the assistance of AI tooling.

### Gluten version

main branch

### Spark version

None

### Spark configurations

_No response_

### System information

_No response_

### Relevant logs

```bash
https://github.com/apache/gluten/actions/runs/28071158711/job/83108337324
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VL] Native memory OOMs the test/executor JVM when reading large (billions-of-rows) Delta tables with deletion vectors #12387

Backend

Bug description

Bug description

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Reproduction

Impact / workaround

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[VL] Native memory OOMs the test/executor JVM when reading large (billions-of-rows) Delta tables with deletion vectors #12387

Description

Backend

Bug description

Bug description

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Reproduction

Impact / workaround

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions