From 987abe45e3e0ef3cfdc54eb5a1f4d264131ce383 Mon Sep 17 00:00:00 2001 From: Felipe Fujiy Pessoto Date: Thu, 25 Jun 2026 20:20:58 +0000 Subject: [PATCH 1/2] [GLUTEN][CI] Add Delta Spark UT pipeline gated against a known-failures baseline Run delta-io/delta's `spark` ScalaTest suite against a Gluten Velox bundle in CI and gate the results against a committed baseline so the many expected Delta-on- Gluten failures stay manageable and can be fixed incrementally without letting currently-passing tests silently regress. What it adds (.github/workflows/util/delta-spark-ut/): - delta_spark_ut.yml: builds the native lib + Gluten bundle, then runs the Delta spark suite sharded by suite into 4 shards x 4 forked test JVMs (~16-way), and gates each shard against the baseline. - compare-test-results.py: the gate. Per shard, regressions (failed not in the baseline) fail the build; newly-passing baselined tests are flagged so the baseline can be tightened. Also supports seed/aggregate modes. - known-failures.txt: the committed baseline of expected failures. - setup-delta.sh: clones Delta, injects the Gluten bundle, patches DeltaSQLCommandTest, and force-fails the two DeletionVectorsSuite 2B-row tests whose native row-index materialization OOM-kills the runner and hangs the shard. - README.md: how the pipeline, gating and baseline-refresh work. The workflow also carries a hang watchdog that thread-dumps and kills a wedged fork, and tunes the per-fork heap (2G) and off-heap (2G) to fit the ~16G runner. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/workflows/delta_spark_ut.yml | 657 ++++++++++++ .../workflows/util/delta-spark-ut/README.md | 112 ++ .../delta-spark-ut/compare-test-results.py | 467 +++++++++ .../util/delta-spark-ut/known-failures.txt | 977 ++++++++++++++++++ .../util/delta-spark-ut/setup-delta.sh | 177 ++++ 5 files changed, 2390 insertions(+) create mode 100644 .github/workflows/delta_spark_ut.yml create mode 100644 .github/workflows/util/delta-spark-ut/README.md create mode 100644 .github/workflows/util/delta-spark-ut/compare-test-results.py create mode 100644 .github/workflows/util/delta-spark-ut/known-failures.txt create mode 100755 .github/workflows/util/delta-spark-ut/setup-delta.sh diff --git a/.github/workflows/delta_spark_ut.yml b/.github/workflows/delta_spark_ut.yml new file mode 100644 index 00000000000..5ad2e66b489 --- /dev/null +++ b/.github/workflows/delta_spark_ut.yml @@ -0,0 +1,657 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Runs Delta Lake's `spark` sbt module unit tests against a Gluten Velox bundle +# that is built from the source in this repository. The pipeline: +# +# 1. Builds the Velox/Gluten native libraries (centos-7 + vcpkg, x86_64). +# 2. Builds the Gluten Java/Scala jars and assembles the +# `gluten-velox-bundle-spark_-linux_amd64-.jar` +# fat jar for Spark 4.1 + Scala 2.13 + Java 17 with the Delta profile. +# 3. Clones delta-io/delta at the requested release tag (default `v4.2.0`), +# drops the bundle jar into `spark-unified/lib/` only (NOT `spark/lib/` +# -- see setup-delta.sh for the unmanagedJars scoping rationale), +# patches Delta's `DeltaSQLCommandTest` to register the Gluten plugin, +# and runs `sbt spark/test` sharded across the matrix. +# +# Limited to Velox + x86 to keep the matrix simple, per the pipeline's purpose +# of validating Gluten changes against the latest Delta release. + +name: Delta Spark UT (Gluten) + +on: + workflow_dispatch: + inputs: + delta_ref: + description: 'delta-io/delta git ref (tag/branch/SHA) to test against' + required: true + default: 'v4.2.0' + spark_version: + description: 'Delta `-DsparkVersion` value (must match the Gluten -P profile below)' + required: true + default: '4.1' + test_parallelism: + description: 'Forked test JVMs per shard (TEST_PARALLELISM_COUNT)' + required: true + default: '4' + update_baseline: + description: 'Seed/refresh the known-failures baseline instead of enforcing it' + type: boolean + required: false + default: false + fail_on_fixed: + description: 'Fail when a baseline test now passes (keeps the baseline honest)' + type: boolean + required: false + default: true + pull_request: + paths: + - '.github/workflows/delta_spark_ut.yml' + - '.github/workflows/util/delta-spark-ut/**' + - 'gluten-delta/**' + - 'backends-velox/src-delta40/**/DeltaSQLCommandTest.scala' + +env: + ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true + MVN_CMD: 'build/mvn -ntp' + CCACHE_DIR: "${{ github.workspace }}/.ccache" + # Gluten profile / bundle naming for the build-gluten-bundle and + # delta-spark-test jobs. Spark 4.1 + Scala 2.13 + JDK 17 matches Delta v4.2.0's + # default Spark version (4.1.0) from project/CrossSparkVersions.scala. + GLUTEN_SPARK_PROFILE: 'spark-4.1' + GLUTEN_SCALA_PROFILE: 'scala-2.13' + GLUTEN_JAVA_PROFILE: 'java-17' + GLUTEN_BUNDLE_SPARK_VERSION: '4.1' + GLUTEN_BUNDLE_SCALA_VERSION: '2.13' + # Default values used when the workflow is triggered by pull_request + # (where `inputs.*` is empty). Keep these in sync with the workflow_dispatch + # defaults above. + DELTA_REF_DEFAULT: 'v4.2.0' + DELTA_SPARK_VERSION_DEFAULT: '4.1' + DELTA_TEST_PARALLELISM_DEFAULT: '4' + # Default mode for pull_request runs (where inputs.* is empty): enforce the + # committed baseline and fail when a baseline test starts passing. Override + # via the workflow_dispatch inputs above. + DELTA_UPDATE_BASELINE_DEFAULT: 'false' + DELTA_FAIL_ON_FIXED_DEFAULT: 'true' + DELTA_SCALA_VERSION: '2.13.16' + # Number of shards in the delta-spark-test matrix. Must equal the length of + # the `shard` matrix below. + # + # EXPERIMENT: 4 shards x TEST_PARALLELISM_COUNT=4 (vs production 16 shards x 1). + # Both give ~16-way parallelism, but this packs it into 4 runner jobs (4 forks + # each) instead of 16 single-fork jobs -- fewer concurrent runners for the same + # throughput. Sharding is by SUITE; total work (~1250 shard-minutes) is fixed. + # RISK: each forked test JVM uses ~4G (2G heap + 2G off-heap), so 4 forks atop + # the sbt launcher push the ~16G runner to its limit and may OOM on heavy suites + # -- which is why production uses TEST_PARALLELISM_COUNT=1. Measuring whether it + # fits now that the worst memory hog (DeletionVectorsSuite 2B-row) is force-failed. + DELTA_NUM_SHARDS: '4' + +concurrency: + group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} + cancel-in-progress: true + +jobs: + build-native-lib-centos-7: + runs-on: ubuntu-22.04 + steps: + - uses: actions/checkout@v4 + - name: Get Ccache + uses: actions/cache/restore@v4 + with: + path: '${{ env.CCACHE_DIR }}' + key: ccache-delta-spark-ut-centos7-release-default-${{github.sha}} + restore-keys: | + ccache-delta-spark-ut-centos7-release-default + ccache-centos7-release-default + - name: Build Gluten native libraries + run: | + docker run -v $GITHUB_WORKSPACE:/work -w /work apache/gluten:vcpkg-centos-7-gcc13 bash -c " + set -e + yum install tzdata -y + df -a + cd /work + export CCACHE_DIR=/work/.ccache + export CCACHE_MAXSIZE=1G + mkdir -p /work/.ccache + ccache -sz + bash dev/ci-velox-buildstatic-centos-7.sh + ccache -s + mkdir -p /work/.m2/repository/org/apache/arrow/ + cp -r /root/.m2/repository/org/apache/arrow/* /work/.m2/repository/org/apache/arrow/ + " + - name: Save Ccache + if: always() + uses: actions/cache/save@v4 + with: + path: '${{ env.CCACHE_DIR }}' + key: ccache-delta-spark-ut-centos7-release-default-${{github.sha}} + - uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-native-lib-centos-7-${{github.sha}} + path: ./cpp/build/ + if-no-files-found: error + - uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-arrow-jars-centos-7-${{github.sha}} + path: .m2/repository/org/apache/arrow/ + if-no-files-found: error + + build-gluten-bundle: + needs: build-native-lib-centos-7 + runs-on: ubuntu-22.04 + container: apache/gluten:centos-9-jdk17 + steps: + - uses: actions/checkout@v4 + - name: Download native artifacts + uses: actions/download-artifact@v4 + with: + name: delta-spark-ut-native-lib-centos-7-${{github.sha}} + path: ./cpp/build/ + - name: Download Arrow jars + uses: actions/download-artifact@v4 + with: + name: delta-spark-ut-arrow-jars-centos-7-${{github.sha}} + path: /root/.m2/repository/org/apache/arrow/ + - name: Cache Maven repository + uses: actions/cache@v4 + with: + path: /root/.m2/repository + key: m2-delta-spark-ut-bundle-${{ env.GLUTEN_SPARK_PROFILE }}-${{ env.GLUTEN_SCALA_PROFILE }}-${{ hashFiles('pom.xml', '**/pom.xml') }} + restore-keys: | + m2-delta-spark-ut-bundle-${{ env.GLUTEN_SPARK_PROFILE }}-${{ env.GLUTEN_SCALA_PROFILE }}- + m2-delta-spark-ut-bundle- + - name: Build Gluten Velox + Delta bundle + run: | + set -euo pipefail + yum install -y java-17-openjdk-devel + export JAVA_HOME=/usr/lib/jvm/java-17-openjdk + export PATH=$JAVA_HOME/bin:$PATH + java -version + cd "$GITHUB_WORKSPACE" + # `install` (not `package`) so the gluten-delta artifact is in the local + # m2 repo before the `package/` shaded jar is built. `Dmaven.compiler.release=17` + # overrides any user settings.xml that may pin release=1.8 for Java 17 builds. + $MVN_CMD clean install \ + -P${{ env.GLUTEN_SPARK_PROFILE }} \ + -P${{ env.GLUTEN_SCALA_PROFILE }} \ + -P${{ env.GLUTEN_JAVA_PROFILE }} \ + -Pbackends-velox -Pdelta \ + -DskipTests -Dmaven.compiler.release=17 + - name: Stage bundle jar + run: | + set -euo pipefail + mkdir -p bundle-out + # Match the renamed fat jar produced by package/pom.xml's copy-fat-jar + # exec. The version part may bump (e.g. 1.7.0-SNAPSHOT -> 1.8.0-SNAPSHOT), + # so glob the version suffix. + jar=$(ls package/target/gluten-velox-bundle-spark${{ env.GLUTEN_BUNDLE_SPARK_VERSION }}_${{ env.GLUTEN_BUNDLE_SCALA_VERSION }}-linux_amd64-*.jar | head -n 1) + if [ -z "$jar" ] || [ ! -f "$jar" ]; then + echo "ERROR: Could not find Gluten bundle jar under package/target/" >&2 + ls -la package/target/ || true + exit 1 + fi + cp "$jar" bundle-out/ + ls -lh bundle-out/ + - uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-gluten-bundle-${{github.sha}} + path: bundle-out/gluten-velox-bundle-spark*_*-linux_amd64-*.jar + if-no-files-found: error + + delta-spark-test: + needs: build-gluten-bundle + runs-on: ubuntu-22.04 + container: apache/gluten:centos-9-jdk17 + # EXPERIMENT (4 shards x 4 forks): back to 350 -- per-shard suites now run + # 4-at-a-time, so each shard should finish well under the cap again. + timeout-minutes: 350 + strategy: + fail-fast: false + matrix: + # Length of this list MUST equal env.DELTA_NUM_SHARDS. + shard: [0, 1, 2, 3] + env: + # Mirror Delta's spark_test.yaml env vars used by run-tests.py / + # TestParallelization.scala. + SHARD_ID: ${{ matrix.shard }} + steps: + - uses: actions/checkout@v4 + + - name: Resolve workflow inputs + id: resolve + run: | + set -euo pipefail + delta_ref='${{ github.event.inputs.delta_ref }}' + spark_version='${{ github.event.inputs.spark_version }}' + test_parallelism='${{ github.event.inputs.test_parallelism }}' + update_baseline='${{ github.event.inputs.update_baseline }}' + fail_on_fixed='${{ github.event.inputs.fail_on_fixed }}' + : "${delta_ref:=${DELTA_REF_DEFAULT}}" + : "${spark_version:=${DELTA_SPARK_VERSION_DEFAULT}}" + : "${test_parallelism:=${DELTA_TEST_PARALLELISM_DEFAULT}}" + : "${update_baseline:=${DELTA_UPDATE_BASELINE_DEFAULT}}" + : "${fail_on_fixed:=${DELTA_FAIL_ON_FIXED_DEFAULT}}" + { + echo "delta_ref=${delta_ref}" + echo "spark_version=${spark_version}" + echo "test_parallelism=${test_parallelism}" + echo "update_baseline=${update_baseline}" + echo "fail_on_fixed=${fail_on_fixed}" + } | tee -a "$GITHUB_OUTPUT" + + - name: Download Gluten bundle jar + uses: actions/download-artifact@v4 + with: + name: delta-spark-ut-gluten-bundle-${{github.sha}} + path: gluten-bundle + + - name: Install minimal tools + run: | + set -euo pipefail + # apache/gluten:centos-9-jdk17 already has java-17, git, tar, a POSIX + # shell, and curl-minimal (which provides the `curl` command sbt's + # launcher needs). Install the rest of what Delta's build/sbt and the + # tests may need. We deliberately do NOT install the full `curl` + # package -- it conflicts with the pre-installed curl-minimal. + yum install -y java-17-openjdk-devel which findutils gzip python3 + export JAVA_HOME=/usr/lib/jvm/java-17-openjdk + export PATH=$JAVA_HOME/bin:$PATH + java -version + git --version + curl --version | head -n 1 + + - name: Cache sbt / Ivy / Coursier + uses: actions/cache@v4 + with: + path: | + /root/.sbt + /root/.ivy2 + /root/.cache/coursier + # Intentionally NOT keyed by ${{ matrix.shard }} -- all shards share + # the same dependency tree, so a single shared cache (with parallel + # save races resolved by GH on a first-write-wins basis) gives the + # best storage / hit-rate tradeoff. + key: delta-spark-ut-sbt-${{ steps.resolve.outputs.delta_ref }}-${{ steps.resolve.outputs.spark_version }}-${{ env.DELTA_SCALA_VERSION }} + restore-keys: | + delta-spark-ut-sbt-${{ steps.resolve.outputs.delta_ref }}-${{ steps.resolve.outputs.spark_version }}- + delta-spark-ut-sbt-${{ steps.resolve.outputs.delta_ref }}- + + - name: Clone and patch Delta + run: | + set -euo pipefail + GLUTEN_JAR=$(ls "$GITHUB_WORKSPACE"/gluten-bundle/gluten-velox-bundle-spark*_*-linux_amd64-*.jar | head -n 1) + echo "Using Gluten bundle: $GLUTEN_JAR" + bash "$GITHUB_WORKSPACE/.github/workflows/util/delta-spark-ut/setup-delta.sh" \ + "${{ steps.resolve.outputs.delta_ref }}" \ + "$GITHUB_WORKSPACE/delta" \ + "$GLUTEN_JAR" \ + "$GITHUB_WORKSPACE" + + - name: Run Delta spark module tests (shard ${{ matrix.shard }} / ${{ env.DELTA_NUM_SHARDS }}) + env: + NUM_SHARDS: ${{ env.DELTA_NUM_SHARDS }} + TEST_PARALLELISM_COUNT: ${{ steps.resolve.outputs.test_parallelism }} + # Required by Delta to enable testing-only code paths + # (see delta build.sbt: "Test / envVars += DELTA_TESTING -> 1"). + DELTA_TESTING: '1' + # JDK 17 + Gluten/Arrow/Netty requires extra --add-opens and the + # `io.netty.tryReflectionSetAccessible` system property; otherwise + # the forked test JVM fails with + # java.lang.UnsupportedOperationException: sun.misc.Unsafe or + # java.nio.DirectByteBuffer.(long, int) not available + # as soon as Gluten's bundled Arrow allocator initializes Netty + # direct buffers. Delta's own `Test / javaOptions` (see + # project/CrossSparkVersions.scala `java17TestSettings`) sets the + # base add-opens but NOT the Netty property -- Delta's own tests + # don't load Arrow/Netty buffers in a way that triggers it. + # + # Use JAVA_TOOL_OPTIONS so the flags propagate to BOTH the sbt + # launcher JVM and the forked test JVM (sbt forks tests and the + # child inherits the parent's env). The set below mirrors + # `extraJavaTestArgs` from Gluten's own root pom.xml (the + # canonical Gluten test JVM flag set). + # + # NOTE: we deliberately do NOT put `-Xmx` here. JAVA_TOOL_OPTIONS + # is processed BEFORE the JVM command line, so Delta's explicit + # `-Xmx1024m` (set in build.sbt `Test / javaOptions`) would still + # win (last `-Xmx` wins). The forked-test-JVM heap is bumped via + # an sbt `set spark / Test / javaOptions ++= ...` command below, + # which APPENDS to Delta's own seq -- so our `-Xmx` lands AFTER + # `-Xmx1024m` and wins. + JAVA_TOOL_OPTIONS: >- + -XX:+IgnoreUnrecognizedVMOptions + --add-opens=java.base/java.lang=ALL-UNNAMED + --add-opens=java.base/java.lang.invoke=ALL-UNNAMED + --add-opens=java.base/java.lang.reflect=ALL-UNNAMED + --add-opens=java.base/java.io=ALL-UNNAMED + --add-opens=java.base/java.net=ALL-UNNAMED + --add-opens=java.base/java.nio=ALL-UNNAMED + --add-opens=java.base/java.util=ALL-UNNAMED + --add-opens=java.base/java.util.concurrent=ALL-UNNAMED + --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED + --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED + --add-opens=java.base/sun.nio.ch=ALL-UNNAMED + --add-opens=java.base/sun.nio.cs=ALL-UNNAMED + --add-opens=java.base/sun.security.action=ALL-UNNAMED + --add-opens=java.base/sun.util.calendar=ALL-UNNAMED + -Djdk.reflect.useDirectMethodHandle=false + -Dio.netty.tryReflectionSetAccessible=true + -Dfile.encoding=UTF-8 + run: | + set -euo pipefail + export JAVA_HOME=/usr/lib/jvm/java-17-openjdk + export PATH=$JAVA_HOME/bin:$PATH + cd "$GITHUB_WORKSPACE/delta" + chmod +x build/sbt + # Only run the unified `spark` sbt project, NOT `sparkGroup/test` -- + # `sparkGroup` aggregates many other projects (sparkV2, contribs, + # sharing, connect*, ...) that are out of scope for this pipeline. + # + # JVM heap layout (16 GB ubuntu-22.04 runner, TEST_PARALLELISM=1): + # * sbt launcher JVM: -J-Xmx4G, BUT made to RETURN idle memory (see the + # G1 periodic-GC flags below). The per-minute MEM profiler in the + # watchdog (run #18) DISPROVED the old "launcher RSS is well under 4G" + # assumption: the launcher grew to a ROCK-STEADY 5.3G (RSS) during the + # test-compile and then HELD it for the entire run -- ~3.8G of pure + # idle waste during the (long) test phase, where it only relays the + # fork's test events. That fixed 5.3G + the fork's native spike in a + # heavy suite is what pushed the cgroup to the ~16G OOM-kill. G1 does + # NOT uncommit on its own here because the idle launcher never GCs. + # FIX (behaviour-neutral -- touches NO Gluten/Spark runtime config, so + # it cannot pollute the measured pass/fail signal): keep -Xmx4G for the + # compile headroom but force a periodic GC every 10s when idle + # (G1PeriodicGCInterval) with the system-load gate disabled + # (G1PeriodicGCSystemLoadThreshold=0, since the busy fork would + # otherwise suppress it) as a full STW collection + # (-G1PeriodicGCInvokesConcurrent) that uncommits down to a tight free + # ratio (Min/MaxHeapFreeRatio 5/15, JEP 346) above a low -Xms512m + # floor. The idle launcher then drops from ~5.3G back to ~1-2G during + # the test phase, cutting the cgroup peak by ~3.8G (~15.9G -> ~12G) -- + # real headroom under the OOM threshold -- with zero compile-OOM risk. + # * Forked test JVM: -Xmx2G via the `set ... Test / javaOptions` command + # below. Delta v4.2.0 caps its test fork at `-Xmx1024m` in build.sbt; + # Gluten OFFLOADS data to Velox off-heap (capped at 2g via + # spark.memory.offHeap.size in the patched DeltaSQLCommandTest), so the + # fork's JVM HEAP need is modest -- 2G is generous. + # HISTORY: briefly bumped to 8G then 4G to absorb a DV+CDC merge suite's + # giant RoaringBitmapArray heap allocation, but that was a Gluten bug + # (garbage native _metadata.row_index) FIXED UPSTREAM by #12269 -- so the + # large heap is no longer needed. Worse, on the ~16G runner the cgroup + # memory.peak hit 15.97G with a 4G fork heap and the kernel OOM-killed + # the fork mid-shard (sudden death, no hs_err / no heap dump), which + # wedged sbt's main process forever in ScalaTestRunner.done -> Thread + # .join (the chronic "shard 2 hang"). The JVM heaps -- not off-heap -- + # drive that peak, so 2G fork heap (+ the unchanged 4G sbt launcher, + # which needs its heap to COMPILE the tests) brings the peak to ~13G, + # leaving real headroom. The `++=` appends to Delta's own Test/javaOptions + # seq so our `-Xmx2G` comes AFTER `-Xmx1024m` and wins (last `-Xmx` + # wins). Keep heap-dump-on-OOM so a genuine >2G heap OOM is analyzable. + # `-u target/test-reports` enables ScalaTest's JUnit XML reporter so + # every suite writes per-test results. Delta itself only configures + # the console reporter (-oDF), so without this we'd have no machine- + # readable results to gate on. The path is relative to the forked + # test JVM's working dir (Test / baseDirectory = spark/), i.e. + # delta/spark/target/test-reports/TEST-*.xml. + # + # We deliberately do NOT let an sbt non-zero exit (which fires on the + # MANY expected Delta-on-Gluten failures) fail this step directly. + # Instead the known-failures gate below decides pass/fail: the build + # is green when the only failures are ones already recorded in the + # baseline, and red on a genuine regression. + set +e + # --- hang watchdog --------------------------------------------------- + # Shard 2 (and occasionally others) hangs indefinitely after a suite's + # last test with no further output. ScalaTest's failAfter only wraps + # individual test BODIES, so a wedge in suite teardown/afterAll -- or in + # a non-interruptible native Velox/JNI call that ignores + # Thread.interrupt() -- has no timeout and stalls until the 350-min job + # limit with zero diagnostics. This watchdog dumps the forked test JVM's + # threads (to the job log, and to a file for the artifact) once the test + # output has been silent for too long, so the deadlock is diagnosable. + SBT_LOG="/tmp/sbt-spark-test-shard-${{ matrix.shard }}.log" + : > "$SBT_LOG" + rm -f /tmp/sbt-done + ( + # CRITICAL: the step shell runs with `bash -eo pipefail`, which the + # subshell inherits. Without `set +e` here, ANY non-zero command -- + # e.g. fork detection finding no match, or `kill`/`jps` returning + # non-zero -- silently kills this watchdog. That errexit kill (plus a + # /proc detection miss) is why the watchdog captured ZERO dumps in + # runs #12 and #13. A diagnostic must never abort on a failed probe. + set +e +o pipefail + JSTACK="${JAVA_HOME}/bin/jstack" + JPS="${JAVA_HOME}/bin/jps" + silent_limit=900 # 15 min with no new test output => treat as hung + dumps=0 + fork_pids() { + # The sbt test fork's main class is sbt.ForkMain. Prefer jps (reads + # the main class from hsperfdata, robust to sbt's @argfile launch); + # fall back to scanning /proc cmdline + @argfile. + "$JPS" -l 2>/dev/null | awk '/sbt\.ForkMain/ {print $1}' + local p cl arg + for p in /proc/[0-9]*; do + [ "$(cat "$p/comm" 2>/dev/null)" = "java" ] || continue + cl="$(tr '\0' ' ' < "$p/cmdline" 2>/dev/null)" + case "$cl" in *sbt.ForkMain*) echo "${p##*/}"; continue ;; esac + arg="$(printf '%s' "$cl" | tr ' ' '\n' | sed -n 's/^@//p' | head -1)" + [ -n "$arg" ] && [ -f "$arg" ] && grep -qa 'sbt\.ForkMain' "$arg" 2>/dev/null \ + && echo "${p##*/}" + done + } + all_java_pids() { + "$JPS" -q 2>/dev/null + local p + for p in /proc/[0-9]*; do + [ "$(cat "$p/comm" 2>/dev/null)" = "java" ] && echo "${p##*/}" + done + } + echo "HANG WATCHDOG armed: dumps the test JVM after ${silent_limit}s of output silence" + hb=0 + while [ ! -f /tmp/sbt-done ]; do + sleep 60 + [ -f "$SBT_LOG" ] || continue + now=$(date +%s) + mtime=$(stat -c %Y "$SBT_LOG" 2>/dev/null || echo "$now") + silent=$(( now - mtime )) + # Per-minute memory profile: heap tuning proved the ~16G OOM peak is + # NATIVE-driven, so log which JVM (sbt launcher vs fork) actually grows + # toward it -- the last lines before a hang reveal the real hog to cut. + # Read /proc directly (no `ps` dependency in the minimal container). + memnow=$(awk '{printf "%.2fG",$1/1073741824}' /sys/fs/cgroup/memory.current 2>/dev/null) + jvmrss="" + for mp in $(all_java_pids 2>/dev/null | sort -un); do + r=$(awk '/^VmRSS:/{print $2}' "/proc/$mp/status" 2>/dev/null) + [ -n "$r" ] && jvmrss="$jvmrss $(( r / 1024 ))M(p$mp)" + done + echo "MEM cgroup=${memnow} JVMs=[${jvmrss# }]" + hb=$(( hb + 1 )) + # Heartbeat every ~5 min so we can SEE the watchdog is alive (and how + # long the test has been silent) without waiting for a hang. + [ $(( hb % 5 )) -eq 0 ] && echo "HANG WATCHDOG: alive; last test output ${silent}s ago" + if [ "$silent" -ge "$silent_limit" ] && [ "$dumps" -lt 3 ]; then + dumps=$(( dumps + 1 )) + pids="$(fork_pids | sort -un)" + # Safety net: if the fork JVM cannot be pinpointed, dump EVERY JVM. + [ -n "$pids" ] || pids="$(all_java_pids | sort -un)" + echo "::group::HANG WATCHDOG: test output silent ${silent}s -- thread dump #${dumps} (pids:$(printf ' %s' $pids))" + [ -n "$pids" ] || echo "HANG WATCHDOG: no java process found to dump" + for pid in $pids; do + # SIGQUIT makes the JVM print a full thread dump to its OWN stderr, + # which sbt relays into the test log via the SAME stream as test + # output -- so it lands in the job log even when a separately + # spawned jstack child's output would be buffered/lost. Also write + # jstack to a file for the per-shard artifact. + echo "----- SIGQUIT + jstack pid ${pid} -----" + kill -QUIT "$pid" 2>/dev/null || echo "HANG WATCHDOG: kill -QUIT failed for pid ${pid}" + timeout 120 "$JSTACK" -l "$pid" > "/tmp/threaddump-shard-${{ matrix.shard }}-${dumps}-${pid}.txt" 2>&1 \ + || echo "HANG WATCHDOG: jstack failed/timed out for pid ${pid}" + done + echo "::endgroup::" + # The dump is now captured (job log via SIGQUIT + artifact via + # jstack file). A hung JVM otherwise stalls the whole shard until + # the 350-min job timeout AND keeps the job log frozen so the dump + # never becomes reachable. So KILL the wedged JVM(s): the suite + # fails fast (acceptable -- errors are expected; only an + # unrecoverable hang blocks CI), the job proceeds/ends, and the log + # + artifacts flush. Give SIGQUIT a moment to print first. + sleep 20 + echo "HANG WATCHDOG: killing wedged JVM(s) to unblock the shard: $(printf '%s ' $pids)" + for pid in $pids; do kill -KILL "$pid" 2>/dev/null; done + fi + done + ) & + WATCHDOG_PID=$! + + ./build/sbt \ + -DsparkVersion=${{ steps.resolve.outputs.spark_version }} \ + -v \ + -J-XX:+UseG1GC -J-Xms512m -J-Xmx4G \ + -J-XX:G1PeriodicGCInterval=10000 \ + -J-XX:G1PeriodicGCSystemLoadThreshold=0 \ + -J-XX:-G1PeriodicGCInvokesConcurrent \ + -J-XX:MinHeapFreeRatio=5 -J-XX:MaxHeapFreeRatio=15 \ + "++ ${DELTA_SCALA_VERSION}" \ + 'set spark / Test / javaOptions ++= Seq("-Xmx2G", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/tmp/")' \ + 'set spark / Test / testOptions += Tests.Argument(TestFrameworks.ScalaTest, "-u", "target/test-reports")' \ + "spark/test" 2>&1 | tee "$SBT_LOG" + SBT_EXIT=${PIPESTATUS[0]} + touch /tmp/sbt-done + kill "$WATCHDOG_PID" 2>/dev/null || true + set -e + echo "sbt spark/test exited with ${SBT_EXIT}" + + # Memory forensics: a sudden forked-JVM death with no hs_err and no heap + # dump is almost always a kernel/cgroup OOM-kill (Velox off-heap + JVM + # heap exceeding the ~16G runner). Surface the cgroup peak + oom_kill + # count so we can confirm/measure it (cgroup v2 paths; best-effort). + ( echo "=== cgroup memory forensics (exit ${SBT_EXIT}) ===" + for f in /sys/fs/cgroup/memory.peak /sys/fs/cgroup/memory.max \ + /sys/fs/cgroup/memory.current /sys/fs/cgroup/memory.events; do + [ -r "$f" ] && { echo "--- $f ---"; cat "$f"; } + done ) || true + + # A compile/launch failure leaves no reports at all. In that case the + # gate would see zero failures and pass spuriously, so fail loudly. + REPORT_COUNT=$(find . -path '*/target/test-reports/*.xml' 2>/dev/null | wc -l || true) + echo "Found ${REPORT_COUNT} JUnit XML report file(s)." + if [ "${REPORT_COUNT}" -eq 0 ]; then + echo "::error::sbt produced no test reports (exit ${SBT_EXIT}) -- likely a compile or launch failure, not test failures." + exit 1 + fi + + # update_baseline=true -> SEED mode (record failures, never fail) so the + # baseline can be (re)generated. Otherwise ENFORCE against the baseline. + GATE_MODE=enforce + if [ "${{ steps.resolve.outputs.update_baseline }}" = "true" ]; then + GATE_MODE=seed + fi + mkdir -p "$GITHUB_WORKSPACE/gate-out" + python3 "$GITHUB_WORKSPACE/.github/workflows/util/delta-spark-ut/compare-test-results.py" \ + --mode "${GATE_MODE}" \ + --reports-dir "$GITHUB_WORKSPACE/delta" \ + --known-failures "$GITHUB_WORKSPACE/.github/workflows/util/delta-spark-ut/known-failures.txt" \ + --failures-out "$GITHUB_WORKSPACE/gate-out/failures-shard-${{ matrix.shard }}.txt" \ + --ran-out "$GITHUB_WORKSPACE/gate-out/ran-shard-${{ matrix.shard }}.txt" \ + --fail-on-fixed "${{ steps.resolve.outputs.fail_on_fixed }}" + + - name: Compress heap dumps (if any) + if: ${{ failure() }} + run: | + set -euo pipefail + if compgen -G "/tmp/*.hprof" > /dev/null; then + echo "Found heap dump(s); compressing..." + ls -lh /tmp/*.hprof + # gzip is single-threaded and slow on multi-GB heaps but is + # always present in the centos image. Heap dumps compress ~10x. + gzip -1 /tmp/*.hprof + ls -lh /tmp/*.hprof.gz + else + echo "No heap dumps found in /tmp/." + fi + + - name: Upload per-shard gate lists + if: always() + uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-gate-lists-shard-${{ matrix.shard }} + path: gate-out/*.txt + if-no-files-found: warn + + - name: Upload test reports + if: always() + uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-reports-shard-${{ matrix.shard }} + path: | + delta/**/target/test-reports/**/*.xml + delta/**/target/surefire-reports/**/*.xml + if-no-files-found: warn + + - name: Upload hang watchdog thread dumps + if: always() + uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-threaddumps-shard-${{ matrix.shard }} + path: /tmp/threaddump-shard-${{ matrix.shard }}-*.txt + if-no-files-found: ignore + + - name: Upload JVM crash logs and other failure artifacts + if: ${{ failure() }} + uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-failure-logs-shard-${{ matrix.shard }} + path: | + delta/**/target/*.log + delta/**/hs_err_pid*.log + delta/**/core.* + /tmp/*.hprof + /tmp/*.hprof.gz + if-no-files-found: ignore + + # Merges every shard's failure/ran lists into a single, sorted, ready-to-commit + # known-failures.txt and reports global regressions / now-passing / stale + # entries. Runs even when some shards went red (if: always()) so the refreshed + # baseline artifact is always available -- this is what you download and commit + # to bootstrap or refresh the baseline (see util/delta-spark-ut/README.md). + delta-spark-aggregate: + needs: delta-spark-test + if: always() + runs-on: ubuntu-22.04 + steps: + - uses: actions/checkout@v4 + - name: Download per-shard gate lists + uses: actions/download-artifact@v4 + continue-on-error: true + with: + pattern: delta-spark-ut-gate-lists-shard-* + path: gate-lists + merge-multiple: true + - name: Aggregate known failures + run: | + set -euo pipefail + python3 .github/workflows/util/delta-spark-ut/compare-test-results.py \ + --mode aggregate \ + --inputs-dir gate-lists \ + --known-failures .github/workflows/util/delta-spark-ut/known-failures.txt \ + --baseline-out aggregated/known-failures.txt + - name: Upload refreshed baseline + if: always() + uses: actions/upload-artifact@v4 + with: + name: delta-spark-ut-known-failures + path: aggregated/known-failures.txt + if-no-files-found: warn diff --git a/.github/workflows/util/delta-spark-ut/README.md b/.github/workflows/util/delta-spark-ut/README.md new file mode 100644 index 00000000000..ea2cc6af190 --- /dev/null +++ b/.github/workflows/util/delta-spark-ut/README.md @@ -0,0 +1,112 @@ + + +# Delta Spark UT (Gluten) — managing expected failures + +Running delta-io/delta's `spark` ScalaTest suite against the Gluten Velox +bundle produces **many expected failures**: Gluten does not yet offload every +Delta code path, and falls back or behaves differently in places. If CI simply +went red on any failure, the signal would be useless and we could never tell a +*new* breakage from the hundreds of already-known ones. + +To make this manageable we keep a **baseline of known failures** and gate each +run against it. The build is green when the only failing tests are ones already +recorded in the baseline; it goes red the moment a **previously-passing test +starts failing** (a regression). + +## Files + +| File | Purpose | +|---|---| +| `known-failures.txt` | Committed baseline: the tests currently expected to fail. One `#` per line. | +| `compare-test-results.py` | Parses the JUnit XML from `sbt spark/test` and gates / seeds / aggregates against the baseline. Standard-library only. | +| `setup-delta.sh` | Clones Delta, drops in the Gluten bundle, and patches `DeltaSQLCommandTest`. | + +## How the gate works + +Each test shard: + +1. Runs `sbt spark/test` with ScalaTest's JUnit XML reporter enabled + (`-u target/test-reports`), so every suite writes per-test results. (Delta + itself only configures the console reporter, so the workflow injects this.) +2. Runs `compare-test-results.py --mode enforce`, which classifies every test: + - **regression** — failed, but not in the baseline → **fails the shard**. + - **expected** — failed and in the baseline → ignored. + - **now-passing** — in the baseline but passed this run → fails the shard + (so the baseline is kept honest), unless `fail_on_fixed=false`. + +A final `aggregate` job merges every shard's results into a single, sorted, +ready-to-commit `known-failures.txt` artifact and reports **stale** baseline +entries (tests no longer present in any shard, e.g. after a Delta version bump). + +Because Delta shards **by suite**, every suite (and therefore every test) runs +in exactly one shard, so per-shard enforcement sees complete suites and never +double-counts. + +## Bootstrapping the baseline (first time) + +While `known-failures.txt` has no entries the gate auto-runs in **seed mode** +(it never fails — it only records failures). To create the initial baseline: + +1. Trigger **Actions → Delta Spark UT (Gluten) → Run workflow** with + `update_baseline = true`. +2. When it finishes, download the **`delta-spark-ut-known-failures`** artifact. +3. Replace `known-failures.txt` with the file from that artifact and commit it. + +From the next run onward the gate enforces the baseline. + +## Day-to-day: fixing tests incrementally + +- **You fixed Gluten and some Delta tests now pass.** CI will flag them as + *now-passing*. Delete those lines from `known-failures.txt` in your PR. That + is the whole point — the baseline only ever shrinks as coverage improves. +- **You intentionally added a new expected failure** (e.g. a Delta path Gluten + can't offload yet). Add the exact `Suite#test` line(s) the gate prints under + *Regressions* to `known-failures.txt`, ideally with a comment explaining why. +- **A genuine regression.** Fix it; do **not** add it to the baseline. + +The error log prints copy-pasteable `Suite#test` lines for both regressions and +now-passing tests, and each run's job summary shows the full breakdown. + +## Regenerating / refreshing the whole baseline + +After a Delta version bump or a large Gluten change, regenerate from scratch the +same way as bootstrapping: run the workflow with `update_baseline=true`, download +the `delta-spark-ut-known-failures` artifact, and commit it. The aggregate job +also lists **stale** entries you can prune. + +## Caveats + +- **Flaky tests.** A flaky test that usually passes will be flagged as a + regression when it flakes; one that usually fails (and is in the baseline) + may be flagged as now-passing when it happens to pass. Re-run, or set + `fail_on_fixed=false` for that run, and keep genuinely flaky tests out of the + enforced set. +- **Known failures still execute** (and fail) — they are gated *after* the run, + not skipped — so they still consume CI time. This keeps us decoupled from + Delta's sources; skipping them at runtime would require patching Delta. + +## Running the comparison locally + +```bash +# after an sbt spark/test run that wrote delta/**/target/test-reports/*.xml +python3 .github/workflows/util/delta-spark-ut/compare-test-results.py \ + --mode enforce \ + --reports-dir delta \ + --known-failures .github/workflows/util/delta-spark-ut/known-failures.txt \ + --failures-out /tmp/failures.txt --ran-out /tmp/ran.txt +``` diff --git a/.github/workflows/util/delta-spark-ut/compare-test-results.py b/.github/workflows/util/delta-spark-ut/compare-test-results.py new file mode 100644 index 00000000000..bed6d18712e --- /dev/null +++ b/.github/workflows/util/delta-spark-ut/compare-test-results.py @@ -0,0 +1,467 @@ +#!/usr/bin/env python3 +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Gate / seed / aggregate the Delta-on-Gluten unit test results. + +Running delta-io/delta's ScalaTest suite against the Gluten Velox bundle +produces many *expected* failures (Gluten does not yet support every Delta +code path). To keep the red/green signal meaningful while we fix those +failures incrementally, we maintain a committed baseline of known failing +tests (``known-failures.txt``) and compare each CI run against it. + +This script has three modes: + +``enforce`` (default, per shard) + Parse the JUnit XML produced by ``sbt spark/test`` (ScalaTest ``-u`` + reporter) and compare against the baseline: + + * regression -- a test that FAILED but is NOT in the baseline. These + fail the build: a previously-passing test just started failing. + * expected -- a test that failed and IS in the baseline. Ignored. + * fixed -- a baseline test that now PASSES. By default these also + fail the build (``--fail-on-fixed true``) so the baseline stays honest + and contributors remove entries as they fix them. + + If the baseline is empty (not yet bootstrapped) the mode automatically + degrades to ``seed`` so the first run is never spuriously red. + +``seed`` (bootstrap / ``update_baseline``) + Never fails. Just writes the current shard's failing tests so the baseline + can be (re)generated from a real run. + +``aggregate`` (final job) + Merge every shard's ``--failures-out`` / ``--ran-out`` file into a single, + sorted, ready-to-commit ``known-failures.txt`` and report stale baseline + entries (tests no longer present in any shard). + +Baseline file format (``known-failures.txt``):: + + # comment lines start with '#' + # + +The suite is always a JVM class name (dot-separated, never starts with '#'), +so a line whose first non-space character is '#' is unambiguously a comment, +and the FIRST '#' after the suite separates suite from the (possibly +'#'-containing) test name. + +Only the Python standard library is used so the script runs in the bare +centos image used by the Delta UT pipeline with no ``pip install``. +""" + +import argparse +import glob +import os +import sys +import xml.etree.ElementTree as ET + +# Synthetic "test name" recorded when a whole suite aborts (e.g. beforeAll +# throws) so that the JUnit XML reports a suite-level error with no per-test +# . Without this, a suite that used to pass but now aborts entirely +# would record zero failing testcases and the regression would be missed. +SUITE_ABORTED = "" + +SEP = "#" + + +def eprint(*args, **kwargs): + print(*args, file=sys.stderr, **kwargs) + + +# --------------------------------------------------------------------------- # +# Baseline (known-failures.txt) parsing / formatting +# --------------------------------------------------------------------------- # +def format_entry(suite, test): + return "{}{}{}".format(suite, SEP, test) + + +def parse_entry(line): + """Parse a 'suite#test' line into (suite, test) or return None for blanks/comments.""" + stripped = line.strip() + if not stripped or stripped.startswith("#"): + return None + idx = stripped.find(SEP) + if idx < 0: + # No separator: treat the whole line as a suite-level entry. + return (stripped, SUITE_ABORTED) + return (stripped[:idx], stripped[idx + len(SEP) :]) + + +def load_entries(path): + """Load a set of (suite, test) tuples from a baseline/shard-list file.""" + entries = set() + if not path or not os.path.exists(path): + return entries + with open(path, "r", encoding="utf-8") as fh: + for line in fh: + parsed = parse_entry(line) + if parsed is not None: + entries.add(parsed) + return entries + + +def write_entries(path, entries, header=None): + """Write a sorted set of (suite, test) tuples to a file.""" + os.makedirs(os.path.dirname(os.path.abspath(path)) or ".", exist_ok=True) + with open(path, "w", encoding="utf-8") as fh: + if header: + for hl in header.splitlines(): + fh.write(hl.rstrip() + "\n") + for suite, test in sorted(entries): + # Defensive: collapse any stray newlines so each entry stays on one line. + safe_test = test.replace("\r", " ").replace("\n", " ") + fh.write(format_entry(suite, safe_test) + "\n") + + +# --------------------------------------------------------------------------- # +# JUnit XML parsing +# --------------------------------------------------------------------------- # +def _iter_testsuites(root): + """Yield every element regardless of whether the file root is + (wrapper) or a single .""" + tag = root.tag.split("}")[-1] # strip any namespace + if tag == "testsuites": + for child in root: + if child.tag.split("}")[-1] == "testsuite": + yield child + elif tag == "testsuite": + yield root + + +def _child_local_tags(elem): + return {c.tag.split("}")[-1] for c in elem} + + +def parse_reports(reports_dir): + """Walk reports_dir for JUnit XML and classify every test. + + Returns (passed, failed, skipped) sets of (suite, test) tuples. A test is + 'failed' if its has a or child, 'skipped' if + it has a child, otherwise 'passed'. Suite-level aborts (a + reporting errors/failures with no failing ) are + recorded as a synthetic (suite, SUITE_ABORTED) failure. + """ + passed, failed, skipped = set(), set(), set() + + xml_files = [] + # ScalaTest's -u reporter and Maven surefire both write `TEST-.xml` + # under a `target/.../*-reports/` dir. Restrict the secondary glob to + # `target/` so we never parse Delta's own XML *test resources* (which live + # under src/test/resources and are not reports). The -root guard + # below is a final safety net. + for pattern in ("**/TEST-*.xml", "**/target/**/*.xml"): + xml_files.extend(glob.glob(os.path.join(reports_dir, pattern), recursive=True)) + xml_files = sorted(set(xml_files)) + + parsed_any = False + for xml_file in xml_files: + try: + tree = ET.parse(xml_file) + except ET.ParseError as exc: + eprint("WARNING: could not parse {}: {}".format(xml_file, exc)) + continue + root = tree.getroot() + root_tag = root.tag.split("}")[-1] + if root_tag not in ("testsuites", "testsuite"): + continue # not a JUnit report + + for ts in _iter_testsuites(root): + parsed_any = True + suite_name = ts.get("name") or "" + suite_has_failing_tc = False + for tc in ts: + if tc.tag.split("}")[-1] != "testcase": + continue + suite = tc.get("classname") or suite_name + name = tc.get("name") or "" + key = (suite, name) + tags = _child_local_tags(tc) + if "failure" in tags or "error" in tags: + failed.add(key) + suite_has_failing_tc = True + elif "skipped" in tags: + skipped.add(key) + else: + passed.add(key) + + # Suite-level abort: counters say something failed but no testcase + # carried the failure (the suite blew up in beforeAll/constructor). + # Record a + # synthetic entry so the regression is visible. + try: + errors = int(ts.get("errors", "0") or "0") + failures = int(ts.get("failures", "0") or "0") + except ValueError: + errors = failures = 0 + if (errors + failures) > 0 and not suite_has_failing_tc: + failed.add((suite_name, SUITE_ABORTED)) + + if not parsed_any: + eprint( + "WARNING: no JUnit elements found under {}".format(reports_dir) + ) + + # A test can't be both passed and failed; failure wins. Skipped only counts + # if the test was not otherwise seen (e.g. retried). + passed -= failed + skipped -= failed + skipped -= passed + return passed, failed, skipped + + +# --------------------------------------------------------------------------- # +# Reporting helpers +# --------------------------------------------------------------------------- # +def _summary_sink(): + """Return a writer that mirrors to GITHUB_STEP_SUMMARY when available.""" + path = os.environ.get("GITHUB_STEP_SUMMARY") + handle = open(path, "a", encoding="utf-8") if path else None + + def write(line=""): + print(line) + if handle: + handle.write(line + "\n") + + return write, handle + + +def _print_block(write, title, entries, limit=50): + write("") + write("### {} ({})".format(title, len(entries))) + if not entries: + return + write("") + write("```") + for i, (suite, test) in enumerate(sorted(entries)): + if i >= limit: + write("... and {} more".format(len(entries) - limit)) + break + write(format_entry(suite, test)) + write("```") + + +# --------------------------------------------------------------------------- # +# Modes +# --------------------------------------------------------------------------- # +def run_enforce(args): + baseline = load_entries(args.known_failures) + passed, failed, skipped = parse_reports(args.reports_dir) + + # Always emit this shard's artifacts for the aggregation job. + if args.failures_out: + write_entries(args.failures_out, failed) + if args.ran_out: + write_entries(args.ran_out, passed | failed) + + write, handle = _summary_sink() + try: + seeding = args.mode == "seed" or not baseline + if seeding and args.mode != "seed": + write( + "> NOTE: baseline `{}` is empty -- running in SEED mode " + "(no failures will be enforced). Bootstrap the baseline from " + "the aggregated artifact, commit it, then enforcement begins.".format( + args.known_failures + ) + ) + + write( + "## Delta-on-Gluten test gate -- shard {}".format( + os.environ.get("SHARD_ID", "?") + ) + ) + write("") + write("| Category | Count |") + write("|---|---:|") + write("| Ran (pass+fail) | {} |".format(len(passed) + len(failed))) + write("| Passed | {} |".format(len(passed))) + write("| Failed | {} |".format(len(failed))) + write("| Skipped | {} |".format(len(skipped))) + write("| Baseline (known failures) | {} |".format(len(baseline))) + + if seeding: + write("") + write( + "Seed mode: recorded {} failing test(s) for this shard. " + "Nothing enforced.".format(len(failed)) + ) + return 0 + + regressions = failed - baseline + fixed = baseline & passed + expected = failed & baseline + + write("") + write("| Gate result | Count |") + write("|---|---:|") + write("| Expected failures (in baseline) | {} |".format(len(expected))) + write("| **Regressions (new failures)** | {} |".format(len(regressions))) + write("| Now-passing (remove from baseline) | {} |".format(len(fixed))) + + _print_block( + write, "Regressions -- new failures NOT in the baseline", regressions + ) + if regressions: + write("") + write( + "These tests were not previously known to fail. Either fix " + "the regression, or (if it is a genuinely new expected " + "failure) add the lines above to `known-failures.txt`." + ) + + if args.fail_on_fixed: + _print_block( + write, "Now-passing -- delete these lines from the baseline", fixed + ) + + exit_code = 0 + if regressions: + for suite, test in sorted(regressions): + eprint("::error::REGRESSION {}".format(format_entry(suite, test))) + exit_code = 1 + if args.fail_on_fixed and fixed: + for suite, test in sorted(fixed): + eprint( + "::error::NOW-PASSING (remove from baseline) {}".format( + format_entry(suite, test) + ) + ) + exit_code = 1 + + if exit_code == 0: + write("") + write("All failures are expected (in the baseline). Gate passed.") + return exit_code + finally: + if handle: + handle.close() + + +def run_aggregate(args): + failure_files = sorted( + glob.glob(os.path.join(args.inputs_dir, "**", "failures-*.txt"), recursive=True) + ) + ran_files = sorted( + glob.glob(os.path.join(args.inputs_dir, "**", "ran-*.txt"), recursive=True) + ) + + union_failed = set() + for f in failure_files: + union_failed |= load_entries(f) + union_ran = set() + for f in ran_files: + union_ran |= load_entries(f) + + header = ( + "# Known Delta-on-Gluten unit test failures.\n" + "#\n" + "# Auto-generated by compare-test-results.py --mode aggregate.\n" + "# Format: #\n" + "# Lines starting with '#' are comments.\n" + "#\n" + "# Regenerate by running the 'Delta Spark UT (Gluten)' workflow with\n" + "# update_baseline=true and committing the produced artifact.\n" + ) + if args.baseline_out: + write_entries(args.baseline_out, union_failed, header=header) + + write, handle = _summary_sink() + try: + write("## Delta-on-Gluten aggregated results") + write("") + write("| Metric | Count |") + write("|---|---:|") + write("| Shards with failure lists | {} |".format(len(failure_files))) + write("| Distinct failing tests | {} |".format(len(union_failed))) + write("| Distinct tests run | {} |".format(len(union_ran))) + + exit_code = 0 + if args.known_failures and os.path.exists(args.known_failures): + baseline = load_entries(args.known_failures) + if baseline: + regressions = union_failed - baseline + fixed = baseline & (union_ran - union_failed) + stale = baseline - union_ran + write("| Baseline entries | {} |".format(len(baseline))) + write("| Regressions (global) | {} |".format(len(regressions))) + write("| Now-passing (global) | {} |".format(len(fixed))) + write("| Stale (not seen this run) | {} |".format(len(stale))) + _print_block(write, "Regressions (global)", regressions) + _print_block(write, "Now-passing (global)", fixed) + _print_block(write, "Stale baseline entries (suite/test gone)", stale) + if args.fail_on_regression and regressions: + exit_code = 1 + return exit_code + finally: + if handle: + handle.close() + + +# --------------------------------------------------------------------------- # +# CLI +# --------------------------------------------------------------------------- # +def str2bool(value): + return str(value).strip().lower() in ("1", "true", "yes", "y", "on") + + +def main(argv=None): + parser = argparse.ArgumentParser( + description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter + ) + parser.add_argument( + "--mode", choices=("enforce", "seed", "aggregate"), default="enforce" + ) + parser.add_argument( + "--known-failures", help="Path to the committed known-failures.txt baseline." + ) + parser.add_argument( + "--reports-dir", help="Root dir to search for JUnit XML (enforce/seed)." + ) + parser.add_argument( + "--failures-out", help="Write this shard's failing tests here (enforce/seed)." + ) + parser.add_argument( + "--ran-out", help="Write this shard's run tests (pass+fail) here." + ) + parser.add_argument( + "--fail-on-fixed", + type=str2bool, + default=True, + help="Fail when a baseline test now passes (default true).", + ) + parser.add_argument( + "--inputs-dir", help="Dir with per-shard failures-*/ran-* files (aggregate)." + ) + parser.add_argument( + "--baseline-out", help="Write the merged baseline here (aggregate)." + ) + parser.add_argument( + "--fail-on-regression", + type=str2bool, + default=False, + help="In aggregate mode, fail if global regressions exist.", + ) + args = parser.parse_args(argv) + + if args.mode in ("enforce", "seed"): + if not args.reports_dir: + parser.error("--reports-dir is required for --mode {}".format(args.mode)) + return run_enforce(args) + return run_aggregate(args) + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/.github/workflows/util/delta-spark-ut/known-failures.txt b/.github/workflows/util/delta-spark-ut/known-failures.txt new file mode 100644 index 00000000000..ae7b300084b --- /dev/null +++ b/.github/workflows/util/delta-spark-ut/known-failures.txt @@ -0,0 +1,977 @@ +# Known Delta-on-Gluten unit test failures. +# +# Baseline of delta-io/delta `spark` ScalaTest tests EXPECTED to fail under the +# Gluten Velox bundle. The Delta Spark UT (Gluten) workflow enforces this list: +# a failing test NOT listed here is a regression (fails CI); a listed test that +# now passes should be removed. Format: #. +# Lines starting with '#' are comments. See README.md in this directory. +# +# --------------------------------------------------------------------------- +# Full 16-shard baseline. Originally seeded from 15 of 16 shards (run +# 27490052632). Shard 2 used to hang/OOM-crash on DeletionVectorsSuite's 2B-row +# DV tests; those two tests are now force-failed in setup-delta.sh, so shard 2 +# runs to completion and contributes 69 failures. 963 known failures total. +# --------------------------------------------------------------------------- +io.delta.sql.DeltaExtensionAndCatalogSuite#activate Delta SQL parser using SQL conf +io.delta.sql.DeltaExtensionAndCatalogSuite#activate Delta SQL parser using withExtensions +io.delta.sql.JavaDeltaSparkSessionExtensionSuite#testSQLConf +io.delta.tables.DeltaTableHadoopOptionsSuite#delete - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#details - with filesystem options. +io.delta.tables.DeltaTableHadoopOptionsSuite#forPath - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#forPath error out without filesystem options passed in. +io.delta.tables.DeltaTableHadoopOptionsSuite#forPath with unsupported options +io.delta.tables.DeltaTableHadoopOptionsSuite#forPath: as/alias/toDF with filesystem options. +io.delta.tables.DeltaTableHadoopOptionsSuite#generate - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#history - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#merge - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#optimize - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#restoreTable - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#update - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#updateExpr - with filesystem options +io.delta.tables.DeltaTableHadoopOptionsSuite#vacuum - with filesystem options +org.apache.spark.sql.delta.AutoCompactExecutionIdColumnMappingSuite#auto-compact-enabled-conf: auto compact should kick in when enabled - session config - column mapping id mode +org.apache.spark.sql.delta.AutoCompactExecutionIdColumnMappingSuite#auto-compact-enabled-property: auto compact should kick in when enabled - table config - column mapping id mode +org.apache.spark.sql.delta.AutoCompactExecutionIdColumnMappingSuite#auto-compact-enabled-property: auto compact should not kick in when session config is off - column mapping id mode +org.apache.spark.sql.delta.AutoCompactExecutionIdColumnMappingSuite#variant auto compact kicks in when enabled - session config - column mapping id mode +org.apache.spark.sql.delta.AutoCompactExecutionIdColumnMappingSuite#variant auto compact kicks in when enabled - table config - column mapping id mode +org.apache.spark.sql.delta.AutoCompactExecutionNameColumnMappingSuite#auto-compact-enabled-conf: auto compact should kick in when enabled - session config - column mapping name mode +org.apache.spark.sql.delta.AutoCompactExecutionNameColumnMappingSuite#auto-compact-enabled-property: auto compact should kick in when enabled - table config - column mapping name mode +org.apache.spark.sql.delta.AutoCompactExecutionNameColumnMappingSuite#auto-compact-enabled-property: auto compact should not kick in when session config is off - column mapping name mode +org.apache.spark.sql.delta.AutoCompactExecutionNameColumnMappingSuite#variant auto compact kicks in when enabled - session config - column mapping name mode +org.apache.spark.sql.delta.AutoCompactExecutionNameColumnMappingSuite#variant auto compact kicks in when enabled - table config - column mapping name mode +org.apache.spark.sql.delta.AutoCompactExecutionSuite#auto-compact-enabled-conf: auto compact should kick in when enabled - session config +org.apache.spark.sql.delta.AutoCompactExecutionSuite#auto-compact-enabled-property: auto compact should kick in when enabled - table config +org.apache.spark.sql.delta.AutoCompactExecutionSuite#auto-compact-enabled-property: auto compact should not kick in when session config is off +org.apache.spark.sql.delta.AutoCompactExecutionSuite#variant auto compact kicks in when enabled - session config +org.apache.spark.sql.delta.AutoCompactExecutionSuite#variant auto compact kicks in when enabled - table config +org.apache.spark.sql.delta.CheckpointsSuite#DML with DVs corrupts variant stats when collectVariantDataSkippingStats is disabled +org.apache.spark.sql.delta.CheckpointsSuite#DML with DVs preserves nested variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsSuite#DML with DVs preserves variant and struct stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsSuite#DML with DVs preserves variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsSuite#SC-86940: writing a GCS checkpoint should happen in a new thread +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch100Suite#DML with DVs corrupts variant stats when collectVariantDataSkippingStats is disabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch100Suite#DML with DVs preserves nested variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch100Suite#DML with DVs preserves variant and struct stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch100Suite#DML with DVs preserves variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch100Suite#SC-86940: writing a GCS checkpoint should happen in a new thread +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch1Suite#DML with DVs corrupts variant stats when collectVariantDataSkippingStats is disabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch1Suite#DML with DVs preserves nested variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch1Suite#DML with DVs preserves variant and struct stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch1Suite#DML with DVs preserves variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch1Suite#SC-86940: writing a GCS checkpoint should happen in a new thread +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch2Suite#DML with DVs corrupts variant stats when collectVariantDataSkippingStats is disabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch2Suite#DML with DVs preserves nested variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch2Suite#DML with DVs preserves variant and struct stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch2Suite#DML with DVs preserves variant stats when collectVariantDataSkippingStats is enabled +org.apache.spark.sql.delta.CheckpointsWithCatalogOwnedBatch2Suite#SC-86940: writing a GCS checkpoint should happen in a new thread +org.apache.spark.sql.delta.CloneTableSQLSuite#shallow clone across file systems +org.apache.spark.sql.delta.CloneTableSQLWithCatalogOwnedBatch100Suite#shallow clone across file systems +org.apache.spark.sql.delta.CloneTableSQLWithCatalogOwnedBatch1Suite#shallow clone across file systems +org.apache.spark.sql.delta.CloneTableSQLWithCatalogOwnedBatch2Suite#shallow clone across file systems +org.apache.spark.sql.delta.CloneTableScalaDeletionVectorSuite#Cloning table with persistent DVs and absolute parquet paths +org.apache.spark.sql.delta.CloneTableScalaDeletionVectorSuite#Shallow clone round-trip with DVs +org.apache.spark.sql.delta.CloneTableScalaDeletionVectorSuite#shallow clone across file systems +org.apache.spark.sql.delta.CloneTableScalaSuite#shallow clone across file systems +org.apache.spark.sql.delta.ConvertToDeltaSQLSuite#external tables use correct path scheme +org.apache.spark.sql.delta.ConvertToDeltaScalaSuite#external tables use correct path scheme +org.apache.spark.sql.delta.DeleteMetricsSuite#delete-metrics: delete one row per file - Partitioned = false, cdfEnabled = false +org.apache.spark.sql.delta.DeleteMetricsSuite#delete-metrics: delete one row per file - Partitioned = false, cdfEnabled = true +org.apache.spark.sql.delta.DeltaAllFilesInCrcSuite#test all-files-in-crc verification failure also triggers and logs incremental-commit verification result +org.apache.spark.sql.delta.DeltaAlterTableByNameIdColumnMappingSuite#CHANGE COLUMN - case insensitive - column mapping id mode +org.apache.spark.sql.delta.DeltaAlterTableByNameIdColumnMappingSuite#CHANGE COLUMN - move to first (nested) - column mapping id mode +org.apache.spark.sql.delta.DeltaAlterTableByNameNameColumnMappingSuite#CHANGE COLUMN - case insensitive - column mapping name mode +org.apache.spark.sql.delta.DeltaAlterTableByNameNameColumnMappingSuite#CHANGE COLUMN - move to first (nested) - column mapping name mode +org.apache.spark.sql.delta.DeltaArbitraryColumnNameSuite#create table +org.apache.spark.sql.delta.DeltaCDCStreamDeletionVectorSuite#cdc streams with noop merge +org.apache.spark.sql.delta.DeltaCDCStreamSuite#cdc streams with noop merge +org.apache.spark.sql.delta.DeltaCDCStreamWithCatalogManagedBatch100Suite#cdc streams with noop merge +org.apache.spark.sql.delta.DeltaCDCStreamWithCatalogManagedBatch1Suite#cdc streams with noop merge +org.apache.spark.sql.delta.DeltaCDCStreamWithCatalogManagedBatch2Suite#cdc streams with noop merge +org.apache.spark.sql.delta.DeltaColumnMappingSuite#add nested column in schema on new protocol +org.apache.spark.sql.delta.DeltaColumnMappingSuite#alter column order in schema on new protocol +org.apache.spark.sql.delta.DeltaColumnMappingSuite#explicit id matching +org.apache.spark.sql.delta.DeltaColumnMappingSuite#id and name mode should write field_id in parquet schema +org.apache.spark.sql.delta.DeltaColumnMappingSuite#try modifying restricted max id property should fail +org.apache.spark.sql.delta.DeltaDataFrameHadoopOptionsSuite#SC-86916: Delta log cache should respect options +org.apache.spark.sql.delta.DeltaDataFrameHadoopOptionsSuite#SC-86916: checkpoint should pick up Hadoop file system options +org.apache.spark.sql.delta.DeltaDataFrameHadoopOptionsSuite#SC-86916: invalidateCache should invalidate all DeltaLogs of the given path +org.apache.spark.sql.delta.DeltaDataFrameHadoopOptionsSuite#SC-86916: read/write Delta paths using DataFrame should pick up Hadoop file system options +org.apache.spark.sql.delta.DeltaDataFrameHadoopOptionsSuite#all operations should propagate Hadoop file system options +org.apache.spark.sql.delta.DeltaDataFrameHadoopOptionsSuite#operations without Hadoop options should fail for fake:// filesystem +org.apache.spark.sql.delta.DeltaFastDropFeatureSuite#Vacuum does not delete deletion vector files.generateDVTombstones: false +org.apache.spark.sql.delta.DeltaFastDropFeatureSuite#We do not create redundant DV tombstones after cloning isShallowClone: true +org.apache.spark.sql.delta.DeltaGenerateSymlinkManifestSuite#incremental manifest: failure to generate manifest throws exception +org.apache.spark.sql.delta.DeltaGenerateSymlinkManifestSuite#special partition column values +org.apache.spark.sql.delta.DeltaHistoryManagerSuite#data skipping still works with time travel +org.apache.spark.sql.delta.DeltaHistoryManagerWithCatalogOwnedBatch100Suite#data skipping still works with time travel +org.apache.spark.sql.delta.DeltaHistoryManagerWithCatalogOwnedBatch1Suite#data skipping still works with time travel +org.apache.spark.sql.delta.DeltaHistoryManagerWithCatalogOwnedBatch2Suite#data skipping still works with time travel +org.apache.spark.sql.delta.DeltaInsertIntoDataFrameByPathSuite#insertInto: timestamp partition values with different precisions +org.apache.spark.sql.delta.DeltaInsertIntoDataFrameSuite#insertInto: timestamp partition values with different precisions +org.apache.spark.sql.delta.DeltaInsertIntoSQLByPathSuite#insertInto: timestamp partition values with different precisions +org.apache.spark.sql.delta.DeltaInsertIntoSQLSuite#insertInto: timestamp partition values with different precisions +org.apache.spark.sql.delta.DeltaLimitPushDownV1Suite#Works with union +org.apache.spark.sql.delta.DeltaLimitPushDownV1Suite#limit larger than total +org.apache.spark.sql.delta.DeltaLimitPushDownV1Suite#limit push-down flag +org.apache.spark.sql.delta.DeltaLimitPushDownV1Suite#no filter or projection +org.apache.spark.sql.delta.DeltaLimitPushDownV1Suite#with non-partition filter +org.apache.spark.sql.delta.DeltaLimitPushDownV1Suite#with partition filter only +org.apache.spark.sql.delta.DeltaLimitPushDownV1Suite#with projection only +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch100Suite#Works with union +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch100Suite#limit larger than total +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch100Suite#limit push-down flag +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch100Suite#no filter or projection +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch100Suite#with non-partition filter +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch100Suite#with partition filter only +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch100Suite#with projection only +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch1Suite#Works with union +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch1Suite#limit larger than total +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch1Suite#limit push-down flag +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch1Suite#no filter or projection +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch1Suite#with non-partition filter +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch1Suite#with partition filter only +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch1Suite#with projection only +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch2Suite#Works with union +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch2Suite#limit larger than total +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch2Suite#limit push-down flag +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch2Suite#no filter or projection +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch2Suite#with non-partition filter +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch2Suite#with partition filter only +org.apache.spark.sql.delta.DeltaLimitPushDownWithCatalogOwnedBatch2Suite#with projection only +org.apache.spark.sql.delta.DeltaLiteVacuumSuite#vacuum for cdc - delete tombstones +org.apache.spark.sql.delta.DeltaLiteVacuumSuite#vacuum for cdc - update/merge +org.apache.spark.sql.delta.DeltaNameColumnMappingSuite#query with predicates should skip partitions - column mapping name mode +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=false, with rowIndexCol=false, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=false, with rowIndexCol=false, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=false, with rowIndexCol=false, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=false, with rowIndexCol=true, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=false, with rowIndexCol=true, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=false, with rowIndexCol=true, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=false, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=false, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=false, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=true, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=true, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=false, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=true, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=true, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=false, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=true, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=false, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=true, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=false, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=true, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=true, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=true, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=true, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatSuite#isDeletionVectorsEnabled=true, read DV metadata columns: with isRowDeletedCol=true, with rowIndexCol=true, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatWithPredicatePushdownSuite#read DV metadata columns: with rowIndexFilterType=IF_CONTAINED, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatWithPredicatePushdownSuite#read DV metadata columns: with rowIndexFilterType=IF_CONTAINED, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatWithPredicatePushdownSuite#read DV metadata columns: with rowIndexFilterType=IF_CONTAINED, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatWithPredicatePushdownSuite#read DV metadata columns: with rowIndexFilterType=IF_NOT_CONTAINED, with vectorized Parquet reader=false, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaParquetFileFormatWithPredicatePushdownSuite#read DV metadata columns: with rowIndexFilterType=IF_NOT_CONTAINED, with vectorized Parquet reader=true, with readColumnarBatchAsRows=false +org.apache.spark.sql.delta.DeltaParquetFileFormatWithPredicatePushdownSuite#read DV metadata columns: with rowIndexFilterType=IF_NOT_CONTAINED, with vectorized Parquet reader=true, with readColumnarBatchAsRows=true +org.apache.spark.sql.delta.DeltaSinkIdColumnMappingSuite#partitioned writing and batch reading - column mapping id mode +org.apache.spark.sql.delta.DeltaSinkNameColumnMappingSuite#partitioned writing and batch reading - column mapping name mode +org.apache.spark.sql.delta.DeltaSuite#SC-8810: skip deleted file +org.apache.spark.sql.delta.DeltaSuite#SC-8810: skipping deleted file still throws on corrupted file +org.apache.spark.sql.delta.DeltaSuite#all operations with special characters in path +org.apache.spark.sql.delta.DeltaSuite#deleted files cause failure by default +org.apache.spark.sql.delta.DeltaSuite#invalid replaceWhere +org.apache.spark.sql.delta.DeltaSuite#query with predicates should skip partitions +org.apache.spark.sql.delta.DeltaSuite#replaceArbitrary should enforce proper usage of backtick +org.apache.spark.sql.delta.DeltaTableCreationSuite#Default column values: CONVERT TO DELTA keeps EXISTS_DEFAULT +org.apache.spark.sql.delta.DeltaUpdateCatalogSuite#convert to delta with partitioning change +org.apache.spark.sql.delta.DeltaUpdateCatalogSuite#partitioned convert to delta with schema change +org.apache.spark.sql.delta.DeltaVacuumSuite#vacuum for cdc - delete tombstones +org.apache.spark.sql.delta.DeltaVacuumSuite#vacuum for cdc - update/merge +org.apache.spark.sql.delta.DeltaVariantShreddingSuite#Infer schema for Delta table +org.apache.spark.sql.delta.DeltaVariantSuite#DISABLE_VARIANT_TABLE_FEATURE_FOR_SPARK_40 - config disabled does not block +org.apache.spark.sql.delta.DeltaVariantSuite#DISABLE_VARIANT_TABLE_FEATURE_FOR_SPARK_40 - no-op on Spark 4.1+ +org.apache.spark.sql.delta.DeltaVariantSuite#Existing table with variant type can enable CDF +org.apache.spark.sql.delta.DeltaVariantSuite#Table with variant type can use CDF +org.apache.spark.sql.delta.DeltaVariantSuite#Variant can be used as a source for generated columns +org.apache.spark.sql.delta.DeltaVariantSuite#Variant can have default value set +org.apache.spark.sql.delta.DeltaVariantSuite#Variant cannot be created as a generated column +org.apache.spark.sql.delta.DeltaVariantSuite#Variant respects Delta table CHECK constraints +org.apache.spark.sql.delta.DeltaVariantSuite#Variant respects Delta table IS NOT NULL constraints +org.apache.spark.sql.delta.DeltaVariantSuite#Zorder is not supported for Variant +org.apache.spark.sql.delta.DeltaVariantSuite#column mapping works - id - false +org.apache.spark.sql.delta.DeltaVariantSuite#column mapping works - id - true +org.apache.spark.sql.delta.DeltaVariantSuite#column mapping works - name - false +org.apache.spark.sql.delta.DeltaVariantSuite#column mapping works - name - true +org.apache.spark.sql.delta.DeltaVariantSuite#optimize variant +org.apache.spark.sql.delta.DeltaVariantSuite#shallow cloning table with variant +org.apache.spark.sql.delta.DeltaVariantSuite#streaming variant delta table +org.apache.spark.sql.delta.DeltaVariantSuite#time travel with variant column works +org.apache.spark.sql.delta.DeltaVariantSuite#variant works with schema evolution for INSERT +org.apache.spark.sql.delta.DeltaVariantSuite#variant works with schema evolution for MERGE +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch100Suite#SC-8810: skip deleted file +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch100Suite#SC-8810: skipping deleted file still throws on corrupted file +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch100Suite#deleted files cause failure by default +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch100Suite#invalid replaceWhere +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch100Suite#query with predicates should skip partitions +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch100Suite#replaceArbitrary should enforce proper usage of backtick +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch1Suite#SC-8810: skip deleted file +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch1Suite#SC-8810: skipping deleted file still throws on corrupted file +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch1Suite#deleted files cause failure by default +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch1Suite#invalid replaceWhere +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch1Suite#query with predicates should skip partitions +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch1Suite#replaceArbitrary should enforce proper usage of backtick +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch2Suite#SC-8810: skip deleted file +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch2Suite#SC-8810: skipping deleted file still throws on corrupted file +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch2Suite#deleted files cause failure by default +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch2Suite#invalid replaceWhere +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch2Suite#query with predicates should skip partitions +org.apache.spark.sql.delta.DeltaWithCatalogOwnedBatch2Suite#replaceArbitrary should enforce proper usage of backtick +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only delete all rows - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only delete all rows - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only delete all rows - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only delete all rows - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with duplicates - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with duplicates - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with duplicates - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with duplicates - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with skipping - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with skipping - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with skipping - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only with skipping - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join with source with 1 row - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join with source with 1 row - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join with source with 1 row - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: delete-only without join with source with 1 row - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with skipping - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with skipping - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with skipping - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with skipping - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: not matched by source update only - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: not matched by source update only - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: not matched by source update only - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: not matched by source update only - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: replace target with source - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: replace target with source - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: replace target with source - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: replace target with source - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert and delete with conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert and delete with conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert and delete with conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#merge-metrics: upsert and delete with conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistorySuite#operation metrics - merge +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only delete all rows - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only delete all rows - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only delete all rows - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only delete all rows - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with condition on delete and insert with no matching rows - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with duplicates - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with duplicates - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with duplicates - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with duplicates - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with skipping - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with skipping - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with skipping - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only with skipping - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join with source with 1 row - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join with source with 1 row - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join with source with 1 row - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: delete-only without join with source with 1 row - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: insert-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with skipping - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with skipping - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with skipping - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with skipping - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: match-only with update/delete with unsatisfied conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: not matched by source update only - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: not matched by source update only - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: not matched by source update only - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: not matched by source update only - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: replace target with source - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: replace target with source - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: replace target with source - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: replace target with source - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: update/delete/insert with some unsatisfied conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert and delete with conditions - Partitioned = false, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert and delete with conditions - Partitioned = false, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert and delete with conditions - Partitioned = true, CDF = false +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#merge-metrics: upsert and delete with conditions - Partitioned = true, CDF = true +org.apache.spark.sql.delta.DescribeDeltaHistoryWithCatalogOwnedBatch100Suite#operation metrics - merge +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Disable Deletion Vectors feature - withUnset: false +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Disable Deletion Vectors feature - withUnset: true +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Disable row tracking feature - withUnset: false +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Disable row tracking feature - withUnset: true +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Enable column mapping feature - txnInterleaved: true +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Enable deletion vectors feature +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Enable row tracking feature concurrent txn: delete +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Removing column mapping mode produces conflict - startMode: id +org.apache.spark.sql.delta.FeatureEnablementConcurrencySuite#Removing column mapping mode produces conflict - startMode: name +org.apache.spark.sql.delta.FileSizeHistogramSuite#check CommitStats with deletes +org.apache.spark.sql.delta.FileSizeHistogramSuite#histogram is re-calculated when files are removed +org.apache.spark.sql.delta.GeneratedColumnSuite#update_generated_column_with_incorrect_value +org.apache.spark.sql.delta.GeneratedColumnSuite#update_source_and_generated_columns_with_incorrect_value +org.apache.spark.sql.delta.HDFSLogStoreSuite#No AbstractFileSystem - end to end test using data frame +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#Convert a partitioned parquet table with partition schema autofill +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#can convert a partition-like table path +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#can convert table with partition overwrite +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#catalog partition values contain special characters +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a Hive based external parquet table +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a Hive based parquet table +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a delta table where metadata does not reflect that the table is already converted should update the metadata +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a parquet path to delta while database called parquet exists +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a parquet table to delta with database name as parquet +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a parquet table using table name +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a parquet table with catalog schema - false +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert a parquet table with catalog schema - true +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert an external parquet table +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert partitioned parquet table with catalog partitions - false +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert partitioned parquet table with catalog partitions - true +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert to delta using table name without database name +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert two external tables pointing to same underlying files with differing table properties should error if conf enabled otherwise merge properties +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert with statistics +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#convert without statistics +org.apache.spark.sql.delta.HiveConvertToDeltaSuite#negative case: convert parquet path to delta when there is a database called parquet but no table or path exists +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: ARRAY, targetType: ARRAY followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: ARRAY, targetType: ARRAY followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: ARRAY, targetType: ARRAY followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: ARRAY, targetType: ARRAY followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: DECIMAL(7,2) followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: DECIMAL(7,2) followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: DECIMAL(7,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: DECIMAL(7,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: INT followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: INT followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: BIGINT, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DECIMAL(3,1), targetType: DECIMAL(3,2) followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DECIMAL(3,1), targetType: DECIMAL(3,2) followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DECIMAL(3,1), targetType: DECIMAL(3,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DECIMAL(3,1), targetType: DECIMAL(3,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DOUBLE, targetType: BIGINT followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DOUBLE, targetType: BIGINT followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DOUBLE, targetType: BIGINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: DOUBLE, targetType: BIGINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: SMALLINT followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: SMALLINT followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: SMALLINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: SMALLINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: TINYINT followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: TINYINT followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: TINYINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: INT, targetType: TINYINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: MAP, targetType: MAP followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: MAP, targetType: MAP followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: MAP, targetType: MAP followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: MAP, targetType: MAP followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: STRING, targetType: INT followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: STRING, targetType: INT followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: STRING, targetType: INT followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: STRING, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: STRING, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: Struct, targetType: Struct followAnsiEnabled: false, ansiEnabled: false, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: Struct, targetType: Struct followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: Struct, targetType: Struct followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitMergeCastingSuite#MERGE overflow in WHEN MATCHED THEN UPDATE SET t.value = s.value sourceType: Struct, targetType: Struct followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: ARRAY, targetType: ARRAY followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: ARRAY, targetType: ARRAY followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: BIGINT, targetType: DECIMAL(7,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: BIGINT, targetType: DECIMAL(7,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: BIGINT, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: BIGINT, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: DECIMAL(3,1), targetType: DECIMAL(3,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: DECIMAL(3,1), targetType: DECIMAL(3,2) followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: DOUBLE, targetType: BIGINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: DOUBLE, targetType: BIGINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: INT, targetType: SMALLINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: INT, targetType: SMALLINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: INT, targetType: TINYINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: INT, targetType: TINYINT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: MAP, targetType: MAP followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: MAP, targetType: MAP followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: STRING, targetType: INT followAnsiEnabled: false, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: STRING, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: STRING, targetType: INT followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: Struct, targetType: Struct followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: ANSI +org.apache.spark.sql.delta.ImplicitStreamingMergeCastingSuite#Streaming MERGE overflow sourceType: Struct, targetType: Struct followAnsiEnabled: true, ansiEnabled: true, storeAssignmentPolicy: LEGACY +org.apache.spark.sql.delta.PublicHDFSLogStoreSuite#No AbstractFileSystem - end to end test using data frame +org.apache.spark.sql.delta.RestoreTableSQLSuite#cdf + RESTORE +org.apache.spark.sql.delta.RestoreTableSQLSuite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableSQLSuite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableSQLWithCatalogOwnedBatch100Suite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableSQLWithCatalogOwnedBatch100Suite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableSQLWithCatalogOwnedBatch1Suite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableSQLWithCatalogOwnedBatch1Suite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableSQLWithCatalogOwnedBatch2Suite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableSQLWithCatalogOwnedBatch2Suite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableScalaDeletionVectorSuite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableScalaDeletionVectorSuite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableScalaSuite#cdf + RESTORE +org.apache.spark.sql.delta.RestoreTableScalaSuite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableScalaSuite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableScalaWithCatalogOwnedBatch100Suite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableScalaWithCatalogOwnedBatch100Suite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableScalaWithCatalogOwnedBatch1Suite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableScalaWithCatalogOwnedBatch1Suite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.RestoreTableScalaWithCatalogOwnedBatch2Suite#restore command output metrics +org.apache.spark.sql.delta.RestoreTableScalaWithCatalogOwnedBatch2Suite#restore operation metrics in Delta table history +org.apache.spark.sql.delta.SnapshotManagementSuite#recover from a corrupt checkpoint: previous checkpoint doesn't exist +org.apache.spark.sql.delta.SnapshotManagementSuite#should not recover when both the current and previous checkpoints are broken +org.apache.spark.sql.delta.SnapshotManagementWithCoordinatedCommitsBatch100Suite#recover from a corrupt checkpoint: previous checkpoint doesn't exist +org.apache.spark.sql.delta.SnapshotManagementWithCoordinatedCommitsBatch100Suite#should not recover when both the current and previous checkpoints are broken +org.apache.spark.sql.delta.SnapshotManagementWithCoordinatedCommitsBatch1Suite#recover from a corrupt checkpoint: previous checkpoint doesn't exist +org.apache.spark.sql.delta.SnapshotManagementWithCoordinatedCommitsBatch1Suite#should not recover when both the current and previous checkpoints are broken +org.apache.spark.sql.delta.SnapshotManagementWithCoordinatedCommitsBatch2Suite#recover from a corrupt checkpoint: previous checkpoint doesn't exist +org.apache.spark.sql.delta.SnapshotManagementWithCoordinatedCommitsBatch2Suite#should not recover when both the current and previous checkpoints are broken +org.apache.spark.sql.delta.UpdateMetricsSuite#update-metrics: update one row per file - Partitioned = false, cdfEnabled = false +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#DELETE - Scenario 1 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#DELETE - Scenario 2 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#DELETE - Scenario 3 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#DELETE - Scenario 4 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#DELETE - Scenario 5 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#DELETE - Scenario 6 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#DELETE - Scenario 7 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#OPTIMIZE - Scenario 1 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#OPTIMIZE - Scenario 2 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#OPTIMIZE - Scenario 3 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#OPTIMIZE - Scenario 4 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#OPTIMIZE - Scenario 5 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#OPTIMIZE - Scenario 6 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#OPTIMIZE - Scenario 7 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#UPDATE - Scenario 1 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#UPDATE - Scenario 2 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#UPDATE - Scenario 3 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#UPDATE - Scenario 4 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#UPDATE - Scenario 5 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#UPDATE - Scenario 6 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsDVSuite#UPDATE - Scenario 7 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#DELETE - Scenario 1 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#DELETE - Scenario 2 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#DELETE - Scenario 3 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#DELETE - Scenario 4 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#DELETE - Scenario 5 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#DELETE - Scenario 6 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#DELETE - Scenario 7 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#OPTIMIZE - Scenario 1 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#OPTIMIZE - Scenario 2 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#OPTIMIZE - Scenario 3 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#OPTIMIZE - Scenario 4 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#OPTIMIZE - Scenario 5 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#OPTIMIZE - Scenario 6 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#OPTIMIZE - Scenario 7 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#UPDATE - Scenario 1 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#UPDATE - Scenario 2 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#UPDATE - Scenario 3 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#UPDATE - Scenario 4 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#UPDATE - Scenario 5 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#UPDATE - Scenario 6 +org.apache.spark.sql.delta.commands.backfill.RowTrackingBackfillConflictsSuite#UPDATE - Scenario 7 +org.apache.spark.sql.delta.concurrency.TransactionExecutionObserverSuite#Phase Locking - delete command +org.apache.spark.sql.delta.coordinatedcommits.CoordinatedCommitsSuite#Incomplete backfills are handled properly by next commit after CC to FS conversion +org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite#DELETE with DVs with column mapping mode=id +org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite#huge table: delete a small number of rows from tables of 2B rows with DVs +org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite#huge table: read from tables of 2B rows with existing DV of many zeros +org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite#variant types DELETE with DVs with column mapping mode=id +org.apache.spark.sql.delta.deletionvectors.DeletionVectorsSuite#variant types DELETE with DVs with column mapping mode=name +org.apache.spark.sql.delta.deletionvectors.DeletionVectorsWithPredicatePushdownSuite#(It is not a test it is a sbt.testing.SuiteSelector) +org.apache.spark.sql.delta.deletionvectors.DeletionVectorsWithPredicatePushdownSuite# +org.apache.spark.sql.delta.generatedsuites.DeleteBaseSQLNameBasedSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.DeleteBaseSQLPathBasedCDCOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.DeleteBaseSQLPathBasedDVPredPushOffSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.DeleteBaseSQLPathBasedDVPredPushOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.DeleteBaseSQLPathBasedSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.DeleteBaseScalaSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLNameBasedSuite#test delete on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLNameBasedSuite#test delete on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedCDCOnSuite#test delete on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedCDCOnSuite#test delete on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedDVPredPushOffSuite#test delete on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedDVPredPushOffSuite#test delete on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedDVPredPushOnSuite#test delete on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedDVPredPushOnSuite#test delete on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedSuite#test delete on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.DeleteTempViewSQLPathBasedSuite#test delete on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.MergeCDCSQLPathBasedCDCOnSuite#merge CDC - all conditions failed for all rows +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedCDCOnDVsPredPushOffSuite#Merge with DVs metrics - Incremental Updates +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedCDCOnDVsPredPushOffSuite#Merge with DVs metrics - delete entire file +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedCDCOnDVsPredPushOffSuite#Verify error is produced when paths are not joined correctly +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedCDCOnDVsPredPushOnSuite#Merge with DVs metrics - Incremental Updates +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedCDCOnDVsPredPushOnSuite#Merge with DVs metrics - delete entire file +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedCDCOnDVsPredPushOnSuite#Verify error is produced when paths are not joined correctly +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedDVsPredPushOffSuite#Merge with DVs metrics - Incremental Updates +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedDVsPredPushOffSuite#Merge with DVs metrics - delete entire file +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedDVsPredPushOffSuite#Verify error is produced when paths are not joined correctly +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedDVsPredPushOnSuite#Merge with DVs metrics - Incremental Updates +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedDVsPredPushOnSuite#Merge with DVs metrics - delete entire file +org.apache.spark.sql.delta.generatedsuites.MergeIntoDVsSQLPathBasedDVsPredPushOnSuite#Verify error is produced when paths are not joined correctly +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, non-null target leaves, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, non-null target leaves, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null source nested map, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null source nested map, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null target col, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null target col, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null target leaves, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null target leaves, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null target nested struct, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedMapStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested map-of-struct - non-null source leaves, null target nested struct, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLNameBasedSuite#schema evolution - struct in different order +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLNameBasedSuite#schema evolution - struct in different order - with evolution disabled +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - struct in different order +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - struct in different order - with evolution disabled +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - struct in different order +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - struct in different order - with evolution disabled +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedCDCOnSuite#schema evolution - struct in different order +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedCDCOnSuite#schema evolution - struct in different order - with evolution disabled +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedSuite#schema evolution - struct in different order +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertSQLPathBasedSuite#schema evolution - struct in different order - with evolution disabled +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertScalaSuite#schema evolution - struct in different order +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionInsertScalaSuite#schema evolution - struct in different order - with evolution disabled +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, non-null target leaves, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, non-null target leaves, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, null target col, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, null target col, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, null target leaves, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, null target leaves, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, null target nested struct, UPDATE * +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionNullnessSQLNameBasedSuite#schema evolution - nested struct - non-null source leaves, null target nested struct, UPDATE t.col = s.col +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLNameBasedSuite#schema evolution - extra nested column in source - update, isPartitioned=false +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLNameBasedSuite#schema evolution - extra nested column in source - update, isPartitioned=true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLNameBasedSuite#schema evolution - extra nested column in source - update, partition on unused column +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - extra nested column in source - update, isPartitioned=false +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - extra nested column in source - update, isPartitioned=true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - extra nested column in source - update, partition on unused column +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - extra nested column in source - update, isPartitioned=false +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - extra nested column in source - update, isPartitioned=true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - extra nested column in source - update, partition on unused column +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnSuite#schema evolution - extra nested column in source - update, isPartitioned=false +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnSuite#schema evolution - extra nested column in source - update, isPartitioned=true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedCDCOnSuite#schema evolution - extra nested column in source - update, partition on unused column +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedSuite#schema evolution - extra nested column in source - update, isPartitioned=false +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedSuite#schema evolution - extra nested column in source - update, isPartitioned=true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlySQLPathBasedSuite#schema evolution - extra nested column in source - update, partition on unused column +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlyScalaSuite#schema evolution - extra nested column in source - update, isPartitioned=false +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlyScalaSuite#schema evolution - extra nested column in source - update, isPartitioned=true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructEvolutionUpdateOnlyScalaSuite#schema evolution - extra nested column in source - update, partition on unused column +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLNameBasedSuite#schema evolution - new source column in map struct key +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLNameBasedSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - new source column in map struct key +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - new source column in map struct key +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedCDCOnSuite#schema evolution - new source column in map struct key +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedCDCOnSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedDVsPredPushOffSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedDVsPredPushOnSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedSuite#schema evolution - new source column in map struct key +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionSQLPathBasedSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionScalaSuite#schema evolution - new source column in map struct key +org.apache.spark.sql.delta.generatedsuites.MergeIntoNestedStructInMapEvolutionScalaSuite#schema evolution - source nested map struct key contains less columns than target +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2SQLNameBasedSuite#not matched by source - all 3 clauses - no changes - isPartitioned: false - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2SQLNameBasedSuite#not matched by source - all 3 clauses - no changes - isPartitioned: true - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2SQLPathBasedCDCOnSuite#not matched by source - all 3 clauses - no changes - isPartitioned: false - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2SQLPathBasedCDCOnSuite#not matched by source - all 3 clauses - no changes - isPartitioned: true - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2SQLPathBasedSuite#not matched by source - all 3 clauses - no changes - isPartitioned: false - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2SQLPathBasedSuite#not matched by source - all 3 clauses - no changes - isPartitioned: true - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2ScalaSuite#not matched by source - all 3 clauses - no changes - isPartitioned: false - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoNotMatchedBySourceCDCPart2ScalaSuite#not matched by source - all 3 clauses - no changes - isPartitioned: true - cdcEnabled: true +org.apache.spark.sql.delta.generatedsuites.MergeIntoSQLSQLNameBasedSuite#CTE as a source in MERGE +org.apache.spark.sql.delta.generatedsuites.MergeIntoSQLSQLPathBasedCDCOnDVsPredPushOffSuite#CTE as a source in MERGE +org.apache.spark.sql.delta.generatedsuites.MergeIntoSQLSQLPathBasedCDCOnDVsPredPushOnSuite#CTE as a source in MERGE +org.apache.spark.sql.delta.generatedsuites.MergeIntoSQLSQLPathBasedCDCOnSuite#CTE as a source in MERGE +org.apache.spark.sql.delta.generatedsuites.MergeIntoSQLSQLPathBasedDVsPredPushOffSuite#CTE as a source in MERGE +org.apache.spark.sql.delta.generatedsuites.MergeIntoSQLSQLPathBasedDVsPredPushOnSuite#CTE as a source in MERGE +org.apache.spark.sql.delta.generatedsuites.MergeIntoSQLSQLPathBasedSuite#CTE as a source in MERGE +org.apache.spark.sql.delta.generatedsuites.MergeIntoSchemaEvolutionBaseNewColumnSQLNameBasedSuite#schema evolution - extra nested column in source - update - single target partition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSchemaEvolutionBaseNewColumnSQLPathBasedCDCOnDVsPredPushOffSuite#schema evolution - extra nested column in source - update - single target partition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSchemaEvolutionBaseNewColumnSQLPathBasedCDCOnDVsPredPushOnSuite#schema evolution - extra nested column in source - update - single target partition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSchemaEvolutionBaseNewColumnSQLPathBasedCDCOnSuite#schema evolution - extra nested column in source - update - single target partition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSchemaEvolutionBaseNewColumnSQLPathBasedSuite#schema evolution - extra nested column in source - update - single target partition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSchemaEvolutionBaseNewColumnScalaSuite#schema evolution - extra nested column in source - update - single target partition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLNameBasedSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLNameBasedSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLNameBasedSuite#data skipping - target-only condition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLNameBasedSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLNameBasedSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLNameBasedSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOffSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOffSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOffSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOffSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOffSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOffSuite#merge with repartition - partition on multiple columns +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOnSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOnSuite#data skipping - target-only condition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOnSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOnSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOnSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnDVsPredPushOnSuite#merge with repartition - partition on multiple columns +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnSuite#data skipping - target-only condition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedCDCOnSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOffSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOffSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOffSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOffSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOffSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOffSuite#merge with repartition - partition on multiple columns +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOnSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOnSuite#data skipping - target-only condition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOnSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOnSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOnSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedDVsPredPushOnSuite#merge with repartition - partition on multiple columns +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedSuite#data skipping - target-only condition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscSQLPathBasedSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscScalaSuite#UDT Data Types - simple and nested +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscScalaSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscScalaSuite#data skipping - target-only condition +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscScalaSuite#data skipping with matched predicates - with insert clause +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscScalaSuite#insert only merge - target data skipping +org.apache.spark.sql.delta.generatedsuites.MergeIntoSuiteBaseMiscScalaSuite#merge with repartition - insert only merge +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLNameBasedSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedCDCOnDVSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedCDCOnRowTrackingOffSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedCDCOnRowTrackingOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedCDCOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedDVPredPushOffSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedDVPredPushOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedRowTrackingOffSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedRowTrackingOnSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscSQLPathBasedSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseMiscScalaSuite#Variant type +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLNameBasedSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLNameBasedSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedCDCOnDVSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedCDCOnDVSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedCDCOnRowTrackingOffSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedCDCOnRowTrackingOffSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedCDCOnSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedCDCOnSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedDVPredPushOffSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedDVPredPushOffSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedDVPredPushOnSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedDVPredPushOnSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedRowTrackingOffSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedRowTrackingOffSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedSuite#test update on temp view - nontrivial projection - Dataset TempView +org.apache.spark.sql.delta.generatedsuites.UpdateBaseTempViewSQLPathBasedSuite#test update on temp view - nontrivial projection - SQL TempView +org.apache.spark.sql.delta.optimize.OptimizeCompactionSQLSuite#optimize - multiple jobs start executing at once +org.apache.spark.sql.delta.optimize.OptimizeCompactionScalaSuite#optimize - multiple jobs start executing at once +org.apache.spark.sql.delta.optimize.OptimizeConflictSuite#conflict handling between Optimize and Business Txn +org.apache.spark.sql.delta.optimize.OptimizeMetricsSuite#optimize ZOrderBy operation metrics in Delta table history +org.apache.spark.sql.delta.optimize.OptimizeMetricsSuite#optimize metrics on idempotent operations +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(day,yyyy-MM-dd) from timestamp +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(day,yyyy-MM-dd) from timestamp nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(hour,yyyy-MM-dd-HH) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(hour,yyyy-MM-dd-HH) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(month,yyyy-MM) from cast(date) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(month,yyyy-MM) from cast(date) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(month,yyyy-MM) from timestamp +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DateFormatPartitionExpr(month,yyyy-MM) from timestamp nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DatePartitionExpr(date) from cast(date) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DatePartitionExpr(date) from cast(date) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DatePartitionExpr(date) from cast(timestamp) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#DatePartitionExpr(date) from cast(timestamp) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#IdentityPartitionExpr(part) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#IdentityPartitionExpr(part) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#IdentityPartitionExpr(part1) escaped field names +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(my.substr,1,3) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(my.substr,1,3) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(substr,0,3) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(substr,0,3) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(substr,1,3) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(substr,1,3) deeply nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(substr,1,3) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(substr,2,3) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#SubstringPartitionExpr(substr,2,3) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TimestampTruncPartitionExpr(DD,eventTimeTrunc) from date_trunc(cast(date)) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TimestampTruncPartitionExpr(DD,eventTimeTrunc) from date_trunc(cast(date)) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TimestampTruncPartitionExpr(YEAR,eventTimeTrunc) from date_trunc(timestamp) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TimestampTruncPartitionExpr(YEAR,eventTimeTrunc) from date_trunc(timestamp) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TruncDatePartitionExpr(date,month) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TruncDatePartitionExpr(date,month) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TruncDatePartitionExpr(date,quarter) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TruncDatePartitionExpr(date,quarter) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TruncDatePartitionExpr(date,year) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#TruncDatePartitionExpr(date,year) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearMonthDayHourPartitionExpr(year,month,day,hour) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearMonthDayHourPartitionExpr(year,month,day,hour) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearMonthDayPartitionExpr(year,month,day) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearMonthDayPartitionExpr(year,month,day) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearMonthPartitionExpr(year,month) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearMonthPartitionExpr(year,month) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearPartitionExpr(year) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearPartitionExpr(year) from year(cast(date)) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearPartitionExpr(year) from year(cast(date)) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearPartitionExpr(year) from year(date) +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearPartitionExpr(year) from year(date) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#YearPartitionExpr(year) nested +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#end-to-end optimizable partition expression +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#end-to-end test of behaviors of write/read null on partition column +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#five digits year in a date_format yyyy-MM partition column +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#five digits year in a date_format yyyy-MM-dd-HH partition column +org.apache.spark.sql.delta.perf.OptimizeGeneratedColumnSuite#substring on multibyte characters +org.apache.spark.sql.delta.rowid.ConflictCheckerRowIdSuite#Re-added files keep their row IDs after conflict with txn not updating high watermark +org.apache.spark.sql.delta.rowid.ConflictCheckerRowIdSuite#concurrent transactions do not assign overlapping row IDs +org.apache.spark.sql.delta.rowid.ConflictCheckerRowIdSuite#re-added files keep their row ids +org.apache.spark.sql.delta.rowid.RowIdSuite#Filter by base Row IDs +org.apache.spark.sql.delta.rowid.RowIdSuite#Filter by base Row IDs in subquery +org.apache.spark.sql.delta.rowid.RowIdSuite#No dictionary filtering on _metadata.row_id +org.apache.spark.sql.delta.rowid.RowIdSuite#No row-group skipping on _metadata.row_id +org.apache.spark.sql.delta.rowid.RowIdSuite#missing base row ids and default row commit versions +org.apache.spark.sql.delta.rowid.RowIdSuite#row ids can be read back +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencySuite#Interleaved delete right after protocol downgrade should abort due to protocol change +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencySuite#Interleaved update right after protocol downgrade should abort due to protocol change +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencySuite#Single Unbackfill batch interleaves delete +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencySuite#Single Unbackfill batch interleaves update +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencyWithoutDVsSuite#Interleaved delete right after protocol downgrade should abort due to protocol change +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencyWithoutDVsSuite#Interleaved update right after protocol downgrade should abort due to protocol change +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencyWithoutDVsSuite#Single Unbackfill batch interleaves delete +org.apache.spark.sql.delta.rowid.RowTrackingRemovalConcurrencyWithoutDVsSuite#Single Unbackfill batch interleaves update +org.apache.spark.sql.delta.rowtracking.RowTrackingReadWriteSuite#write and read table with all-null materialized columns +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#Data skipping handles aliasing for _metadata fields +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#Data skipping handles aliasing for _metadata fields - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#Test file pruning metrics with data skipping +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#Test file pruning metrics with data skipping - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping flags +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping flags - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping on TIMESTAMP_NTZ +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping on TIMESTAMP_NTZ - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping shouldn't use expressions involving a subquery +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping shouldn't use expressions involving a subquery - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping stats before and after optimize +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#data skipping stats before and after optimize - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#loading data from Delta to parquet should skip data +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#loading data from Delta to parquet should skip data - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#support case insensitivity for partitioning filters +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1JsonCheckpointV2Suite#support case insensitivity for partitioning filters - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#Test file pruning metrics with data skipping - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#Test file pruning metrics with data skipping - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - double nested, single 1 - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - double nested, single 1 - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - backtick escapes work as expected - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - backtick escapes work as expected - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - index only a subset of leaf columns - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - index only a subset of leaf columns - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - naming a nested column allows nested complex types - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - naming a nested column allows nested complex types - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - naming a nested column indexes all leaf fields of that column - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - indexed column names - naming a nested column indexes all leaf fields of that column - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested schema - # indexed column = 3 - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested schema - # indexed column = 3 - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested schema - # indexed column = 6 - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested schema - # indexed column = 6 - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested schema - # indexed column = 9 - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested schema - # indexed column = 9 - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested, single 1 - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - nested, single 1 - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - starts with, nested - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping by stats - starts with, nested - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping flags - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping flags - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP_NTZ - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP_NTZ - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping shouldn't use expressions involving a subquery - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping shouldn't use expressions involving a subquery - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping stats before and after optimize - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping stats before and after optimize - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping with a different DataFrame schema order and nested columns - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#data skipping with missing columns in DataFrame - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#loading data from Delta to parquet should skip data - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#loading data from Delta to parquet should skip data - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#support case insensitivity for partitioning filters - column mapping name mode +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1NameColumnMappingSuite#support case insensitivity for partitioning filters - column mapping name mode - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#Data skipping handles aliasing for _metadata fields +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#Data skipping handles aliasing for _metadata fields - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#Test file pruning metrics with data skipping +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#Test file pruning metrics with data skipping - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping flags +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping flags - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping on TIMESTAMP_NTZ +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping on TIMESTAMP_NTZ - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping shouldn't use expressions involving a subquery +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping shouldn't use expressions involving a subquery - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping stats before and after optimize +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#data skipping stats before and after optimize - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#loading data from Delta to parquet should skip data +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#loading data from Delta to parquet should skip data - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#support case insensitivity for partitioning filters +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1ParquetCheckpointV2Suite#support case insensitivity for partitioning filters - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#Data skipping handles aliasing for _metadata fields +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#Data skipping handles aliasing for _metadata fields - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#Test file pruning metrics with data skipping +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#Test file pruning metrics with data skipping - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping flags +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping flags - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping on TIMESTAMP_NTZ +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping on TIMESTAMP_NTZ - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping shouldn't use expressions involving a subquery +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping shouldn't use expressions involving a subquery - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping stats before and after optimize +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#data skipping stats before and after optimize - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#loading data from Delta to parquet should skip data +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#loading data from Delta to parquet should skip data - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#support case insensitivity for partitioning filters +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1Suite#support case insensitivity for partitioning filters - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#Data skipping handles aliasing for _metadata fields +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#Data skipping handles aliasing for _metadata fields - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#Test file pruning metrics with data skipping +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#Test file pruning metrics with data skipping - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping flags +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping flags - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping on TIMESTAMP_NTZ +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping on TIMESTAMP_NTZ - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping shouldn't use expressions involving a subquery +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#data skipping shouldn't use expressions involving a subquery - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#loading data from Delta to parquet should skip data +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#loading data from Delta to parquet should skip data - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#support case insensitivity for partitioning filters +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch100Suite#support case insensitivity for partitioning filters - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#Data skipping handles aliasing for _metadata fields +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#Data skipping handles aliasing for _metadata fields - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#Test file pruning metrics with data skipping +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#Test file pruning metrics with data skipping - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping flags +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping flags - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping on TIMESTAMP_NTZ +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping on TIMESTAMP_NTZ - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping shouldn't use expressions involving a subquery +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#data skipping shouldn't use expressions involving a subquery - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#loading data from Delta to parquet should skip data +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#loading data from Delta to parquet should skip data - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#support case insensitivity for partitioning filters +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch1Suite#support case insensitivity for partitioning filters - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#Data skipping handles aliasing for _metadata fields +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#Data skipping handles aliasing for _metadata fields - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#Test file pruning metrics with data skipping +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#Test file pruning metrics with data skipping - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping flags +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping flags - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping on TIMESTAMP_NTZ +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping on TIMESTAMP_NTZ - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping on TIMESTAMP_NTZ near Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping on TIMESTAMP_NTZ with Long.MaxValue - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping shouldn't use expressions involving a subquery +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#data skipping shouldn't use expressions involving a subquery - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#loading data from Delta to parquet should skip data +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#loading data from Delta to parquet should skip data - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#support case insensitivity for partitioning filters +org.apache.spark.sql.delta.stats.DataSkippingDeltaV1WithCatalogOwnedBatch2Suite#support case insensitivity for partitioning filters - old behavior with DataFrame schema +org.apache.spark.sql.delta.stats.PartitionLikeDataSkippingColumnMappingSuite#partition-like data skipping for expression COALESCE: COALESCE(TO_DATE(S.b), c) = '1976-07-03' - column mapping id mode +org.apache.spark.sql.delta.stats.StatsCollectionSuite#gather stats +org.apache.spark.sql.delta.stats.StatsCollectionSuite#recompute stats multiple columns and files +org.apache.spark.sql.delta.stats.StatsCollectionSuite#recompute variant stats +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#type widening BIGINT -> DECIMAL(20,0), partitioned=true +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#type widening DATE -> TIMESTAMP_NTZ, partitioned=false +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#type widening DATE -> TIMESTAMP_NTZ, partitioned=true +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#type widening DECIMAL(9,2) -> DECIMAL(19,3), partitioned=true +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#type widening FLOAT -> DOUBLE, partitioned=true +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#type widening INT -> DOUBLE, partitioned=true +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#type widening with user-defined type in table +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#unsupported type changes DOUBLE -> FLOAT, partitioned=true +org.apache.spark.sql.delta.typewidening.TypeWideningAlterTableSuite#unsupported type changes TIMESTAMP_NTZ -> DATE, partitioned=false +org.apache.spark.sql.delta.typewidening.TypeWideningInsertSchemaEvolutionBasicSuite#INSERT - always automatic type widening DATE -> TIMESTAMP_NTZ +org.apache.spark.sql.delta.typewidening.TypeWideningInsertSchemaEvolutionBasicSuite#INSERT - automatic type widening DATE -> TIMESTAMP_NTZ +org.apache.spark.sql.delta.typewidening.TypeWideningInsertSchemaEvolutionBasicSuite#INSERT - unsupported automatic type widening TIMESTAMP_NTZ -> DATE +org.apache.spark.sql.delta.typewidening.TypeWideningMergeIntoSchemaEvolutionSuite#MERGE - automatic type widening DATE -> TIMESTAMP_NTZ +org.apache.spark.sql.delta.typewidening.TypeWideningMergeIntoSchemaEvolutionSuite#MERGE - unsupported automatic type widening TIMESTAMP_NTZ -> DATE +org.apache.spark.sql.delta.typewidening.TypeWideningTableFeatureAdvancedSuite#drop feature after type change DATE -> TIMESTAMP_NTZ +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate no duplicates - Native +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate no duplicates - Portable +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate no duplicates - invalid Int ids - Native +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate no duplicates - invalid Int ids - Portable +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate no duplicates - invalid unsigned Int ids - Native +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate no duplicates - invalid unsigned Int ids - Portable +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate with duplicates - Native +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate with duplicates - Portable +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate with duplicates - invalid Int ids - Native +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate with duplicates - invalid Int ids - Portable +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate with duplicates - invalid unsigned Int ids - Native +org.apache.spark.sql.delta.util.BitmapAggregatorE2ESuite#DataFrame bitmap groupBy aggregate with duplicates - invalid unsigned Int ids - Portable diff --git a/.github/workflows/util/delta-spark-ut/setup-delta.sh b/.github/workflows/util/delta-spark-ut/setup-delta.sh new file mode 100755 index 00000000000..8da1b660ad7 --- /dev/null +++ b/.github/workflows/util/delta-spark-ut/setup-delta.sh @@ -0,0 +1,177 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# +# Prepares a delta-io/delta clone for running its `spark` module tests with the +# Gluten (Velox) bundle jar on the classpath. +# +# Usage: +# setup-delta.sh +# +# Arguments: +# delta_ref - git ref (tag/branch/sha) to check out (e.g. v4.2.0) +# delta_dir - destination directory for the Delta clone +# gluten_bundle_jar - path to the gluten-velox-bundle fat jar +# gluten_repo_root - path to the Gluten repository root (used to locate +# backends-velox/src-delta40/.../DeltaSQLCommandTest.scala) +# + +set -euo pipefail + +if [ "$#" -ne 4 ]; then + echo "Usage: $0 " >&2 + exit 1 +fi + +DELTA_REF="$1" +DELTA_DIR="$2" +GLUTEN_BUNDLE_JAR="$3" +GLUTEN_ROOT="$4" + +if [ ! -f "$GLUTEN_BUNDLE_JAR" ]; then + echo "Gluten bundle jar not found: $GLUTEN_BUNDLE_JAR" >&2 + exit 1 +fi + +# Reuse the existing DeltaSQLCommandTest from Gluten's backends-velox module +# rather than maintaining a separate copy. This file is compiled as part of the +# unified `spark` project's Test scope, which has the Gluten bundle on its +# classpath (via spark-unified/lib/), so the typed GlutenConfig / VeloxDeltaConfig +# imports resolve correctly. +PATCH_SOURCE="$GLUTEN_ROOT/backends-velox/src-delta40/test/scala/org/apache/spark/sql/delta/test/DeltaSQLCommandTest.scala" +if [ ! -f "$PATCH_SOURCE" ]; then + echo "Gluten DeltaSQLCommandTest not found: $PATCH_SOURCE" >&2 + exit 1 +fi + +echo "::group::Cloning delta-io/delta @ ${DELTA_REF}" +# Shallow clone the requested tag/branch. Fall back to full clone when the ref is a SHA. +if ! git clone --depth 1 --branch "$DELTA_REF" https://github.com/delta-io/delta.git "$DELTA_DIR"; then + echo "Shallow clone of ref '${DELTA_REF}' failed, falling back to full clone." + rm -rf "$DELTA_DIR" + git clone https://github.com/delta-io/delta.git "$DELTA_DIR" + git -C "$DELTA_DIR" checkout "$DELTA_REF" +fi +git -C "$DELTA_DIR" --no-pager log -1 --oneline +echo "::endgroup::" + +echo "::group::Injecting Gluten bundle jar onto the spark project's TEST classpath" +# The Gluten bundle jar must be on the spark project's TEST runtime classpath +# (so DeltaSQLCommandTest can load org.apache.gluten.GlutenPlugin by name) but +# NOT on the COMPILE classpath of `sparkV1`, which is the project that holds +# Delta's main sources. The bundle's transitive contents include extra symbols +# under `org.apache.spark.sql` that collide with Delta's main sources -- e.g. +# MergeOutputGeneration.scala imports both `org.apache.spark.sql._` and +# `org.apache.spark.sql.delta.ClassicColumnConversions._`, and would then fail +# with `reference to expression is ambiguous`. +# +# sbt auto-scans `/lib` via `unmanagedBase`. Two relevant +# projects in Delta v4.2.0 have a `lib/` baseDirectory: +# - sparkV1: `project in file("spark")` -> spark/lib +# - spark : `project in file("spark-unified")` -> spark-unified/lib +# unmanagedJars are project-scoped (NOT inherited by dependents), so dropping +# the bundle into spark-unified/lib/ adds it to the unified `spark` project's +# Compile *and* Test classpaths -- but NOT to sparkV1's. That's exactly what +# we want: +# * sparkV1/Compile sees ONLY Delta's regular deps -> Delta main compiles. +# * spark/Test/fullClasspath sees the bundle -> tests load GlutenPlugin. +# (Verified empirically: with bundle only in spark-unified/lib/, sbt's +# `show sparkV1/Compile/dependencyClasspath` excludes the bundle and +# `show spark/Test/fullClasspath` includes it.) +# +# We deliberately do NOT also drop the bundle into spark/lib/, which is what +# caused the previous compile failure: spark/lib/ is sparkV1's unmanagedBase, +# and putting the bundle there would re-introduce the ambiguity errors. +SPARK_UNIFIED_LIB="$DELTA_DIR/spark-unified/lib" +mkdir -p "$SPARK_UNIFIED_LIB" +cp "$GLUTEN_BUNDLE_JAR" "$SPARK_UNIFIED_LIB/gluten-velox-bundle.jar" +ls -lh "$SPARK_UNIFIED_LIB" +echo "::endgroup::" + +echo "::group::Patching DeltaSQLCommandTest to enable Gluten plugin" +TARGET="$DELTA_DIR/spark/src/test/scala/org/apache/spark/sql/delta/test/DeltaSQLCommandTest.scala" +if [ ! -f "$TARGET" ]; then + echo "Expected file not found in Delta clone: $TARGET" >&2 + echo "The Delta directory layout for ref '${DELTA_REF}' may have changed." + exit 1 +fi +cp "$PATCH_SOURCE" "$TARGET" +echo "Patched $TARGET" +echo "--- diff vs. upstream ---" +git -C "$DELTA_DIR" --no-pager diff -- "spark/src/test/scala/org/apache/spark/sql/delta/test/DeltaSQLCommandTest.scala" || true +echo "::endgroup::" + +echo "::group::Force-failing memory-hog DeletionVectorsSuite 2B-row tests" +# Two DeletionVectorsSuite tests read from / delete from a 2-billion-row table. +# Under the Gluten Velox bundle they balloon the forked test JVM to ~13G of +# NATIVE memory (row-index materialization) and the kernel/cgroup OOM-kills it. +# The dead fork then wedges sbt, hanging the whole shard until the workflow's +# hang-watchdog dumps threads and kills it (~16 min wasted, and every suite +# QUEUED AFTER it in that fork is skipped) -- see delta_spark_ut.yml. +# +# Rather than silently `ignore` these (easy to forget), we make them FAIL FAST +# with a clear message: the gap stays visible in the test reports / baseline +# until the native memory blow-up is fixed, at which point this patch should be +# removed. NOTE: making the suite complete also un-skips the rest of the shard's +# suite queue, so the known-failures baseline must be refreshed after this. +DVS="$DELTA_DIR/spark/src/test/scala/org/apache/spark/sql/delta/deletionvectors/DeletionVectorsSuite.scala" +if [ ! -f "$DVS" ]; then + echo "Expected file not found in Delta clone: $DVS" >&2 + echo "The Delta directory layout for ref '${DELTA_REF}' may have changed." >&2 + exit 1 +fi +# Inject `fail(...)` as the first statement of each test body (the line ending +# in `) {`). Delta sets no -Xfatal-warnings / dead-code warning, so the now- +# unreachable original body compiles fine. Keep each injected line <100 chars: +# Delta's scalastyle enforces a 100-char line length on test sources. The full +# rationale lives in this comment, so the in-test message stays terse. +sed -i 's#huge table: read from tables of 2B rows with existing DV of many zeros") {#&\n fail("[Gluten CI] Force-failed: 2B-row DV read OOMs the test JVM; see setup-delta.sh")#' "$DVS" +sed -i 's#number of rows from tables of 2B rows with DVs") {#&\n fail("[Gluten CI] Force-failed: 2B-row DV delete OOMs the test JVM; see setup-delta.sh")#' "$DVS" +INJECTED=$(grep -c "Gluten CI] Force-failed" "$DVS" || true) +if [ "$INJECTED" -ne 2 ]; then + echo "ERROR: expected to force-fail 2 DeletionVectorsSuite tests but injected ${INJECTED}." >&2 + echo "Their test names likely changed in Delta ref '${DELTA_REF}'; update setup-delta.sh." >&2 + exit 1 +fi +echo "Force-failed 2 DeletionVectorsSuite 2B-row tests (read + delete)." +git -C "$DELTA_DIR" --no-pager diff -- "spark/src/test/scala/org/apache/spark/sql/delta/deletionvectors/DeletionVectorsSuite.scala" || true +echo "::endgroup::" + +echo "::group::Disabling Delta scalastyle HeaderMatchesChecker" +# Our reused DeltaSQLCommandTest carries Gluten's ASF-only license header, which +# does not match Delta's HeaderMatchesChecker regex (the regex expects either a +# Delta copyright block, or the ASF header followed by a Spark-modifications +# block and the Delta copyright block). HeaderMatchesChecker is a file-level +# checker that does NOT honor `// scalastyle:off` directives, so we instead +# disable it globally in Delta's shared scalastyle-config.xml. The config is +# applied via `ThisBuild / scalastyleConfig` in project/Checkstyle.scala, so a +# single edit covers every sbt sub-project. +SCALASTYLE_CONFIG="$DELTA_DIR/scalastyle-config.xml" +if [ ! -f "$SCALASTYLE_CONFIG" ]; then + echo "Expected scalastyle config not found: $SCALASTYLE_CONFIG" >&2 + exit 1 +fi +sed -i \ + 's|||' \ + "$SCALASTYLE_CONFIG" +if ! grep -q '' "$SCALASTYLE_CONFIG"; then + echo "Failed to disable HeaderMatchesChecker in $SCALASTYLE_CONFIG" >&2 + grep -n 'HeaderMatchesChecker' "$SCALASTYLE_CONFIG" >&2 || true + exit 1 +fi +echo "Disabled HeaderMatchesChecker in $SCALASTYLE_CONFIG" +echo "::endgroup::" From 85d25addc096acef30a2f42c69317a1d53f9ea61 Mon Sep 17 00:00:00 2001 From: Felipe Fujiy Pessoto Date: Sat, 27 Jun 2026 01:29:13 +0000 Subject: [PATCH 2/2] [CI] Patch ScanReportHelper to recognize Gluten's offloaded scan Delta's data-skipping and limit-push-down tests capture a ScanReport per query via ScanReportHelper.collectScans, which matches the concrete Spark case class FileSourceScanExec. Under the Gluten bundle the file scan is offloaded to DeltaScanTransformer -- a sibling that implements the same FileSourceScanLike interface but is not FileSourceScanExec. So the match never fires: collectScans returns Nil, and `val Seq(r1) = getScanReport {..}` throws `scala.MatchError: List()` in ~56 DataSkipping*/DeltaLimitPushDown* tests. Gluten preserves Delta's PreparedDeltaFileIndex on the offloaded scan, so the data skipping itself is correct and identical to vanilla -- only the test's observation breaks. The upstream fix in Delta #7104 widens collectScans to match the FileSourceScanLike interface, which both the vanilla and Gluten scans implement (behavior-preserving for vanilla). It is already merged but lands after the Delta ref this workflow builds against (v4.2.0), so setup-delta.sh cherry-picks it onto the checkout instead of carrying our own edit. Because the Delta clone is shallow, fetch the fix commit at depth 2 (commit and parent) so cherry-pick can compute the parent->fix diff against the checkout; a depth-1 fetch grafts the parent away and would turn the cherry-pick into an add of the whole tree. cherry-pick -n stages the change without needing a committer identity, matching the other working-tree patches in this script. Once the pinned DELTA_REF advances to include the commit, the cherry-pick is a clean no-op and the block can be deleted outright. After this lands and CI confirms which tests now pass, remove their entries from known-failures.txt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../util/delta-spark-ut/setup-delta.sh | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/.github/workflows/util/delta-spark-ut/setup-delta.sh b/.github/workflows/util/delta-spark-ut/setup-delta.sh index 8da1b660ad7..8a60a9aa4c6 100755 --- a/.github/workflows/util/delta-spark-ut/setup-delta.sh +++ b/.github/workflows/util/delta-spark-ut/setup-delta.sh @@ -151,6 +151,42 @@ echo "Force-failed 2 DeletionVectorsSuite 2B-row tests (read + delete)." git -C "$DELTA_DIR" --no-pager diff -- "spark/src/test/scala/org/apache/spark/sql/delta/deletionvectors/DeletionVectorsSuite.scala" || true echo "::endgroup::" +echo "::group::Cherry-picking upstream ScanReportHelper fix (delta-io/delta#7104)" +# Delta's data-skipping / limit-push-down tests capture a ScanReport per query via +# ScanReportHelper.collectScans, which matches the concrete `FileSourceScanExec` +# case class. Under Gluten the file scan is offloaded to DeltaScanTransformer, a +# sibling that implements the same `FileSourceScanLike` interface, so the match +# never fires: collectScans returns Nil and `val Seq(r1) = getScanReport {..}` +# throws `scala.MatchError: List()` in ~56 DataSkipping* / DeltaLimitPushDown* tests. +# +# delta-io/delta#7104 fixes this upstream by widening collectScans to match the +# `FileSourceScanLike` interface, which both the vanilla and Gluten scans +# implement (behavior-preserving for vanilla). It is already merged but lands +# after the Delta ref this workflow builds against (v4.2.0), so cherry-pick it +# onto the checkout rather than maintaining our own edit. +# +# Fetch depth 2 so the fix commit AND its parent are present: cherry-pick needs +# the parent to compute the parent->fix diff against our shallow checkout (a +# depth-1 fetch grafts the parent away and turns the cherry-pick into an add of +# the whole tree, conflicting on every file). `cherry-pick -n` stages the change +# without requiring a committer identity, matching the other working-tree patches +# above. Once the pinned DELTA_REF advances to include this commit the cherry-pick +# becomes a clean no-op, so this whole block can simply be deleted. +SCAN_REPORT_FIX_SHA="46bd45d57eadd7e528002a0ae7bd36ce5a456eca" +SCAN_REPORT_HELPER="$DELTA_DIR/spark/src/test/scala/org/apache/spark/sql/delta/test/ScanReportHelper.scala" +git -C "$DELTA_DIR" fetch --depth 2 origin "$SCAN_REPORT_FIX_SHA" +git -C "$DELTA_DIR" cherry-pick -n "$SCAN_REPORT_FIX_SHA" +if ! grep -q 'case fs: FileSourceScanLike => Seq(fs)' "$SCAN_REPORT_HELPER"; then + echo "ERROR: cherry-pick of ${SCAN_REPORT_FIX_SHA} did not produce the expected" >&2 + echo "FileSourceScanLike match in ScanReportHelper; its structure may have changed" >&2 + echo "in Delta ref '${DELTA_REF}'. Update setup-delta.sh." >&2 + exit 1 +fi +echo "Cherry-picked delta-io/delta#7104 (collect scans by FileSourceScanLike)." +git -C "$DELTA_DIR" --no-pager diff --cached -- \ + "spark/src/test/scala/org/apache/spark/sql/delta/test/ScanReportHelper.scala" || true +echo "::endgroup::" + echo "::group::Disabling Delta scalastyle HeaderMatchesChecker" # Our reused DeltaSQLCommandTest carries Gluten's ASF-only license header, which # does not match Delta's HeaderMatchesChecker regex (the regex expects either a