Delta pipeline fix tests#12380
Closed
felipepessoto wants to merge 3 commits into
Closed
Conversation
…es baseline Run delta-io/delta's `spark` ScalaTest suite against a Gluten Velox bundle in CI and gate the results against a committed baseline so the many expected Delta-on- Gluten failures stay manageable and can be fixed incrementally without letting currently-passing tests silently regress. What it adds (.github/workflows/util/delta-spark-ut/): - delta_spark_ut.yml: builds the native lib + Gluten bundle, then runs the Delta spark suite sharded by suite into 4 shards x 4 forked test JVMs (~16-way), and gates each shard against the baseline. - compare-test-results.py: the gate. Per shard, regressions (failed not in the baseline) fail the build; newly-passing baselined tests are flagged so the baseline can be tightened. Also supports seed/aggregate modes. - known-failures.txt: the committed baseline of expected failures. - setup-delta.sh: clones Delta, injects the Gluten bundle, patches DeltaSQLCommandTest, and force-fails the two DeletionVectorsSuite 2B-row tests whose native row-index materialization OOM-kills the runner and hangs the shard. - README.md: how the pipeline, gating and baseline-refresh work. The workflow also carries a hang watchdog that thread-dumps and kills a wedged fork, and tunes the per-fork heap (2G) and off-heap (2G) to fit the ~16G runner. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
6953c7f to
987abe4
Compare
62a9d53 to
08c146a
Compare
730c6ef to
8f7f17b
Compare
Velox has no Arrow representation for VariantType, so the native columnar write path -- which converts the incoming rows to Velox batches via RowToVeloxColumnarExec.toArrowSchema -- throws `UnsupportedOperationException: Unsupported data type: variant` at runtime. This broke every Delta write whose schema contains a variant column (INSERT, UPDATE, MERGE, OPTIMIZE/auto-compact, checkpoint-driven rewrites), since GlutenOptimisticTransaction.writeFiles always offloaded the write to the native writer (the now-removed code path built the Velox plan unconditionally). Guard GlutenOptimisticTransaction.writeFiles: if the input schema contains a variant at any nesting level, delegate to super.writeFiles (the vanilla Delta write path) instead of offloading. Non-variant writes are unaffected. The check matches by type name so it stays source-compatible across Spark versions. Adds GlutenDeltaVariantWriteSuite covering top-level, struct-nested, and UPDATE variant writes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8f7f17b to
d9291ba
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?