You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[VL] Native Delta DV bitmap aggregator aborts on a Long.MAX_VALUE sentinel row index during MERGE with deletion vectors (intermittent VeloxRuntimeError) #12377
Expected: A Delta MERGE INTO that writes deletion vectors (DVs) completes successfully, exactly as it does on vanilla Spark + Delta.
Actual: Under the Gluten Velox bundle the MERGE intermittently aborts with a native VeloxRuntimeError (INVALID_STATE) raised by Gluten's Delta DV bitmap aggregator:
Delta RoaringBitmapArray row index 9223372036854775807 exceeds max representable value 9223372030412324864
9223372036854775807 is exactly Long.MAX_VALUE (2^63 - 1). The target table in the failing test is tiny (a handful of rows), so this is not a real row index -- it is a sentinel / placeholder value that is leaking into the DV-write aggregation.
The aggregation that builds the per-file DV (PartialAggregation, function addSafe) packs each matched target row's index into a RoaringBitmapArray. RoaringBitmapArray::addSafe enforces value <= kMaxRepresentableValue (= 0x7ffffffe80000000 = 9223372030412324864, which the code comments say mirrors Delta JVM's RoaringBitmapArray.MAX_REPRESENTABLE_VALUE). Long.MAX_VALUE is one 2^32 block above that ceiling, so the check fails and the whole stage aborts.
This is flaky / non-deterministic. The exact same, byte-for-byte identical bundle passed this test in one CI run and failed it in the next (see Logs). So whether the sentinel reaches the aggregator depends on runtime plan / scan / scheduling (split boundaries, batch composition, task distribution), not on a source change. It reproduces in the suite:
Open question for a maintainer with Velox + Delta DV-write context: Delta's own JVM RoaringBitmapArray uses the sameMAX_REPRESENTABLE_VALUE, so vanilla Delta would reject Long.MAX_VALUE too. Since vanilla Delta passes this MERGE, it must either never produce the sentinel on the DV-write branch or filter it out before the bitmap is built. That suggests the real defect is upstream of the aggregator -- Gluten's native row-index materialization / DV-write plan is emitting (and not filtering) a Long.MAX_VALUE placeholder that vanilla Delta would have excluded. The addSafe check is just where it surfaces. Two possible fix directions:
Stop the sentinel at the source (mirror Delta's filter so placeholder rows never reach the DV aggregation), or
Make the aggregator skip the sentinel the same way it skips NULLs -- but only if that matches Delta's documented semantics (silently dropping a genuinely out-of-range index would corrupt the DV, so option 1 is preferred unless the sentinel is a contract).
This was written with the assistance of AI tooling.
Gluten version
main branch
Spark version
spark-4.0.x (actually Spark 4.1.0 -- Delta 4.2.0's default; the form has no 4.1 option)
Spark configurations
From the Delta-on-Gluten test harness (patched DeltaSQLCommandTest):
CI runner: ubuntu-22.04 host, ~16 GB RAM, container apache/gluten:centos-9-jdk17. Not run via dev/info.sh (observed in CI).
Relevant logs
Delta Spark UT (Gluten) pipeline, apache/gluten run 28198677737, shard 1 (job 83536282846). The prior, byte-for-byte identical run 28148323203 passed the same test (shard 1: 230 expected failures, 0 regressions) -- demonstrating the intermittency.
extended syntax - update + conditional insert - isPartitioned: true *** FAILED ***
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 1028.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 1028.0 (TID 843):
org.apache.gluten.exception.GlutenException: ... Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (9223372036854775807 vs. 9223372030412324864)
Delta RoaringBitmapArray row index 9223372036854775807
exceeds max representable value 9223372030412324864
Retriable: False
Expression: value <= kMaxRepresentableValue
Context: Operator: PartialAggregation[9] 9
Function: addSafe
File: /work/cpp/velox/compute/delta/RoaringBitmapArray.cpp
Line: 92
...
at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:135)
at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:316)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:111)
Run delta-io/delta v4.2.0 with the Gluten plugin enabled (spark.plugins=org.apache.gluten.GlutenPlugin), suite MergeIntoExtendedSyntaxSQLPathBasedDVsPredPushOnSuite, test "extended syntax - update + conditional insert - isPartitioned: true".
Because it is intermittent, it may take several runs (or concurrent test forks / CPU contention) to surface. Equivalent minimal repro: a MERGE INTO with an UPDATE action plus a conditional INSERT into a partitioned Delta table that has deletion vectors enabled, with predicate pushdown on.
CI Link: https://github.com/apache/gluten/actions/runs/28198677737/job/83536282846?pr=12371#step:9:2828
Impact / workaround
Intermittently fails any MERGE-with-DV workload, and makes the Delta-on-Gluten CI gate flaky (apache/gluten PR [VL][Delta] Delta CI pipeline #12278): the test is not in the known-failures baseline (it usually passes), so a run that hits the sentinel is reported as a regression and turns the gate red.
No good baseline workaround: because the failure is flaky, adding it to known-failures.txt would instead make the gate red on every run where it passes (the pipeline runs with DELTA_FAIL_ON_FIXED=true). A proper fix (or a dedicated flaky-quarantine list in the gate) is needed.
Backend
VL (Velox)
Bug description
Backend
VL (Velox)
Bug description
Expected: A Delta
MERGE INTOthat writes deletion vectors (DVs) completes successfully, exactly as it does on vanilla Spark + Delta.Actual: Under the Gluten Velox bundle the MERGE intermittently aborts with a native
VeloxRuntimeError(INVALID_STATE) raised by Gluten's Delta DV bitmap aggregator:9223372036854775807is exactlyLong.MAX_VALUE(2^63 - 1). The target table in the failing test is tiny (a handful of rows), so this is not a real row index -- it is a sentinel / placeholder value that is leaking into the DV-write aggregation.The aggregation that builds the per-file DV (
PartialAggregation, functionaddSafe) packs each matched target row's index into aRoaringBitmapArray.RoaringBitmapArray::addSafeenforcesvalue <= kMaxRepresentableValue(=0x7ffffffe80000000=9223372030412324864, which the code comments say mirrors Delta JVM'sRoaringBitmapArray.MAX_REPRESENTABLE_VALUE).Long.MAX_VALUEis one 2^32 block above that ceiling, so the check fails and the whole stage aborts.This is flaky / non-deterministic. The exact same, byte-for-byte identical bundle passed this test in one CI run and failed it in the next (see Logs). So whether the sentinel reaches the aggregator depends on runtime plan / scan / scheduling (split boundaries, batch composition, task distribution), not on a source change. It reproduces in the suite:
(
...DVsPredPushOn...= deletion vectors on, predicate pushdown on.)Root cause analysis
cpp/velox/operators/functions/delta/DeltaBitmapAggregator.cc:63-69(addInputreturns early only when!value.has_value()),cpp/velox/operators/functions/delta/DeltaBitmapAggregator.cc:43-46(addRowIndexchecks onlyvalue >= 0, then callsbitmap.addSafe).cpp/velox/compute/delta/RoaringBitmapArray.cpp:91-98(addSafe,VELOX_CHECK_LE(value, kMaxRepresentableValue, ...)),cpp/velox/compute/delta/RoaringBitmapArray.h:51-56(kMaxHighKey = 0x7ffffffe,kMaxLowKeyForMaxHighKey = 0x80000000,kMaxRepresentableValue = (kMaxHighKey << 32) | kMaxLowKeyForMaxHighKey; comment: "Matches Delta JVM RoaringBitmapArray.MAX_REPRESENTABLE_VALUE").Open question for a maintainer with Velox + Delta DV-write context: Delta's own JVM
RoaringBitmapArrayuses the sameMAX_REPRESENTABLE_VALUE, so vanilla Delta would rejectLong.MAX_VALUEtoo. Since vanilla Delta passes this MERGE, it must either never produce the sentinel on the DV-write branch or filter it out before the bitmap is built. That suggests the real defect is upstream of the aggregator -- Gluten's native row-index materialization / DV-write plan is emitting (and not filtering) aLong.MAX_VALUEplaceholder that vanilla Delta would have excluded. TheaddSafecheck is just where it surfaces. Two possible fix directions:This was written with the assistance of AI tooling.
Gluten version
main branch
Spark version
spark-4.0.x (actually Spark 4.1.0 -- Delta 4.2.0's default; the form has no 4.1 option)
Spark configurations
From the Delta-on-Gluten test harness (patched
DeltaSQLCommandTest):System information
CI runner: ubuntu-22.04 host, ~16 GB RAM, container apache/gluten:centos-9-jdk17. Not run via dev/info.sh (observed in CI).
Relevant logs
Delta Spark UT (Gluten) pipeline, apache/gluten run 28198677737, shard 1 (job 83536282846). The prior, byte-for-byte identical run 28148323203 passed the same test (shard 1: 230 expected failures, 0 regressions) -- demonstrating the intermittency.
Reproduction
spark.plugins=org.apache.gluten.GlutenPlugin), suiteMergeIntoExtendedSyntaxSQLPathBasedDVsPredPushOnSuite, test "extended syntax - update + conditional insert - isPartitioned: true".MERGE INTOwith an UPDATE action plus a conditional INSERT into a partitioned Delta table that has deletion vectors enabled, with predicate pushdown on.CI Link: https://github.com/apache/gluten/actions/runs/28198677737/job/83536282846?pr=12371#step:9:2828
Impact / workaround
known-failures.txtwould instead make the gate red on every run where it passes (the pipeline runs withDELTA_FAIL_ON_FIXED=true). A proper fix (or a dedicated flaky-quarantine list in the gate) is needed.Gluten version
main branch
Spark version
None
Spark configurations
Spark 4.1.0
System information
No response
Relevant logs