Skip to content

[GLUTEN-12013][VL] Fix bloom-filter bytes corruption on whole-stage AQE fallback#12151

Open
brijrajk wants to merge 2 commits into
apache:mainfrom
brijrajk:fix/12013-bloom-filter-stage-fallback
Open

[GLUTEN-12013][VL] Fix bloom-filter bytes corruption on whole-stage AQE fallback#12151
brijrajk wants to merge 2 commits into
apache:mainfrom
brijrajk:fix/12013-bloom-filter-stage-fallback

Conversation

@brijrajk

@brijrajk brijrajk commented May 27, 2026

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Fixes #12013, and the SPARK-54336 regression that the issue's query shape exposes.

Background

BloomFilterMightContainJointRewriteRule rewrites a bloom-filter producer
(bloom_filter_agg) and its consumer (might_contain) to their Velox variants so Velox
evaluates them natively. It is a Rule[LogicalPlan] registered via injectOptimizerRule,
which lands in Spark's Operator Optimization batch. Running there ensures the
substitution is baked into the originalPlan snapshot that
ExpandFallbackPolicy
captures when it promotes an individual-stage fallback to a whole-stage AQE fallback — so
both sides keep the same serialized byte format even if a stage reverts to JVM execution.
That is the original GLUTEN-12013 crash (java.io.IOException: Unexpected Bloom filter version number).

The SPARK-54336 crash

Spark 4.0.2 / 4.1 added BloomFilterAggregateQuerySuite."SPARK-54336":

SELECT
  (SELECT first(might_contain((SELECT bloom_filter_agg(col) FROM t), 0L)) FROM t)
FROM t

The might_contain value here is a literal (0L), not a column. The earlier revision
wrapped only the outer might_contain in VeloxBloomFilterMightContain (which
expects version=1 bytes) but left the inner bloom_filter_agg vanilla. Vanilla
bloom_filter_agg has no Substrait mapping, so it runs in the JVM and emits version=0
bytes, which the Velox might-contain then fails to deserialize:

Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: (1 vs. 0)
Expression: kBloomFilterV1 == version
Function: merge

Fix

Keep the producer and consumer on the same byte format — rewrite them together, or
not at all:

  • might_contain(ScalarSubquery(...), col) (plain column value): rewrite both the
    inner aggregate and the outer might-contain to their Velox forms (version=1). This is the
    user-facing filter path protected across whole-stage AQE fallback (GLUTEN-12013).
  • might_contain(ScalarSubquery(...), <non-column>) (e.g. a literal, as in
    SPARK-54336): leave both vanilla (version=0). This also preserves vanilla's
    NULL-on-empty-input semantics, so might_contain((SELECT bloom_filter_agg(x) FROM empty), v) still returns NULL rather than false.

Standalone BloomFilterAggregate (e.g. DataFrame.stat.bloomFilter()) is never matched,
so its collected bytes stay in Spark-native format.

Because this rule runs in the Operator Optimization batch — before
InjectRuntimeFilter and MergeScalarSubqueries — it never observes DPP/runtime-filter
might_contain expressions (which hash the key with xxhash64) or
ScalarSubqueryReference nodes (created by subquery merging). Those are produced
downstream and are unaffected by this rule, so no special handling is needed for them.

Files changed

  • BloomFilterMightContainJointRewriteRule.scala — column-valued might_contain rewrites
    both sides to Velox; a non-column value leaves both vanilla.
  • VeloxRuleApi.scala — registers the rule via injectOptimizerRule (replacing the
    earlier injectPreTransform/Rule[SparkPlan] + fallback patcher approach).
  • CallerInfo.scala (gluten-core) — removes the now-unused isBloomFilterStatFunction()
    helper (and inBloomFilterStatFunctionCall); the rule keys off the expression pattern,
    not the call site, so a standalone BloomFilterAggregate is excluded inherently.
  • GlutenBloomFilterFallbackSuite.scala (gluten-ut/test) — regression tests for the
    whole-stage-fallback and literal-value (SPARK-54336) scenarios, plus the
    DataFrame.stat.bloomFilter() and native bloom filter disabled guards.
  • TPC-DS plan-stability golden filesgluten-ut/spark40 and gluten-ut/spark41
    under tpcds-plan-stability/gluten-approved-plans-{v1_4,v2_7,modified}/. Because the
    rule now runs in the Operator Optimization batch, the optimized plans of the TPC-DS
    queries that use runtime bloom filters (q2, q10, q16, q24a, q24b, q32, q37, q40, q59, q64, q69, q80, q82, q85, q92, q94, q95, v2.7 q10a/q64/q80a, and the two modified
    variants) now carry velox_bloom_filter_agg / velox_might_contain instead of the
    vanilla forms. The explain.txt/simplified.txt goldens are regenerated accordingly.
    Operator structure is unchanged for non-bloom queries, and no TPC-H goldens change.

The final patch is intentionally split into two commits:

  1. the functional fix + regression coverage
  2. the TPC-DS golden-file regeneration

How was this patch tested?

Verified against the Velox backend on Spark 4.0:

Suite Result
GlutenBloomFilterAggregateQuerySuiteCGOff — incl. SPARK-54336 and might_contain on bloom_filter_agg with empty input passed (SPARK-54336 previously crashed)
GlutenBloomFilterFallbackSuite (gluten-ut/test) passed
TPC-DS plan-stability suites (gluten-ut/spark40 + gluten-ut/spark41) passed after regenerating the bloom-affected goldens (SPARK_GENERATE_GOLDEN_FILES=1)

GlutenBloomFilterFallbackSuite tests, guarded with
requireBloomFilterAggMightContainJointFallback():

  1. only the filter stage falls back (threshold=2) — bloom_filter_agg runs natively
    (Velox bytes), the filter stage falls back; asserts velox_might_contain is in the
    optimized plan and the query succeeds.
  2. both stages fall back (threshold=1) — both sides execute via JNI in JVM row-mode,
    producing/consuming Velox-format bytes consistently.
  3. DataFrame.stat.bloomFilter() — standalone aggregate stays vanilla; readFrom()
    succeeds.
  4. native bloom filter disabled — early-exit path; plan stays vanilla.
  5. SPARK-54336 (new)might_contain((SELECT bloom_filter_agg(col) FROM t), 0L);
    both sides stay vanilla and the query returns the correct result without the version
    mismatch.

Was this patch authored or co-authored using generative AI tooling?

Yes. Claude Code was used as an AI assistant during development.

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels May 27, 2026
@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch 2 times, most recently from 4a56662 to 9bf19dc Compare May 27, 2026 11:38
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@philo-he

Copy link
Copy Markdown
Member

Gentle ping for a maintainer review. The link-referenced-issues CI check that initially failed has since re-run successfully — all checks are green.

Also re-raising: could a maintainer remove the CORE label? The three changed files are all Velox-backend-specific (backends-velox/ and gluten-ut/spark40/) — no common core code is touched, so VELOX label only is correct.

@brijrajk, thanks for the PR. Could you rebase the code to see if the CI failures go away?

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 9bf19dc to 009a9a8 Compare June 11, 2026 05:38
@brijrajk

Copy link
Copy Markdown
Contributor Author

Done — rebased onto current main and force-pushed. Fresh CI triggered.

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 009a9a8 to 3148dbe Compare June 11, 2026 05:50
@philo-he philo-he requested a review from Copilot June 11, 2026 16:30
@philo-he philo-he removed the CORE works for Gluten Core label Jun 11, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines +44 to +48
override def apply(plan: SparkPlan): SparkPlan = {
if (!BackendsApiManager.getSettings.requireBloomFilterAggMightContainJointFallback()) {
return plan
}
plan match {
Comment on lines +173 to +177
val df = spark.sql(sqlString)
// Must not throw java.io.IOException: Unexpected Bloom filter version number (16777217)
df.collect
// All 200003 rows match the bloom filter built from the same data.
assert(df.count() == 200003L)
@philo-he

Copy link
Copy Markdown
Member

@brijrajk, could you first check if Copilot's comments make sense?

@github-actions github-actions Bot added the CORE works for Gluten Core label Jun 11, 2026
@brijrajk

Copy link
Copy Markdown
Contributor Author

Thanks for flagging this, @philo-he!

Both of Copilot's comments were valid:

1. Patcher active when native bloom filter is disabled

When spark.gluten.sql.native.bloomFilter=false, Stage 0 falls back to Spark and produces Spark-format bytes. The joint-fallback rule still wraps Stage 1 in a FallbackNode, so the patcher was incorrectly rewriting it to VeloxBloomFilterMightContain — which would cause the same IOException the patcher was introduced to fix, just from the opposite trigger.

Added a second guard: if (!GlutenConfig.get.enableNativeBloomFilter) return plan. This mirrors the existing guard already in BloomFilterMightContainJointRewriteRule.

2. df.collect + df.count() runs the query twice

Combined into assert(df.collect().length == 200003L) — single execution, same failure signal if the IOException is thrown.

@philo-he

Copy link
Copy Markdown
Member

@brijrajk, thanks for the update. Could you check if my following understanding is correct?

Besides the spark.gluten.sql.native.bloomFilter=false setting, which makes the bloom filter fall back in stage 0, there's another case: the fallback policy can also cause it to fall back. In that case, if we rely solely on checking that config, could it lead to an incompatibility issue in stage 1?

@brijrajk

Copy link
Copy Markdown
Contributor Author

@philo-he You are absolutely right. We confirmed it with a test case.

How threshold and cost work

ExpandFallbackPolicy counts the number of columnar↔row conversion boundaries inside a stage. If that count (cost) meets COLUMNAR_WHOLESTAGE_FALLBACK_THRESHOLD, the entire stage is wrapped in a FallbackNode and runs as plain Spark.

Scenario Threshold Stage 0 cost Stage 1 cost Outcome
Original fix (PR as-is) 2 1 → native ✓ 2 → whole-stage fallback Stage 0 Velox bytes, Stage 1 JVM — patcher correct
Your scenario 1 1 → whole-stage fallback ≥ 1 → whole-stage fallback Stage 0 Spark bytes, Stage 1 JVM — patcher misfires

Test case confirming the failure

testGluten(
  "Test bloom_filter_agg whole-stage fallback when both stages fall back",
  Issue12013) {
  ...
  if (BackendsApiManager.getSettings.requireBloomFilterAggMightContainJointFallback()) {
    // threshold=1: Stage 0's inherent transition cost of 1 meets the threshold, so
    // ExpandFallbackPolicy promotes Stage 0 to a whole-stage fallback as well.
    // Stage 0 runs as Spark and produces Spark-format bytes. Stage 1 also falls back.
    // The patcher must NOT rewrite BloomFilterMightContain -> VeloxBloomFilterMightContain
    // in this case.
    withSQLConf(
      GlutenConfig.COLUMNAR_FILTER_ENABLED.key -> "false",
      GlutenConfig.COLUMNAR_WHOLESTAGE_FALLBACK_THRESHOLD.key -> "1",
      SQLConf.ANSI_ENABLED.key -> "false"
    ) {
      val df = spark.sql(sqlString)
      assert(df.collect().length == 200003L)
    }
  }
}

Output

- Gluten - Test bloom_filter_agg whole-stage fallback when both stages fall back *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 1 times,
  most recent failure: Lost task 0.0 in stage 7.0: org.apache.gluten.exception.GlutenException:
  Exception: VeloxUserError
  Error Source: USER
  Error Code: INVALID_ARGUMENT
  Reason: (1 vs. 0)
  Retriable: False
  Expression: kBloomFilterV1 == version
  Function: mayContain
  File: velox/common/base/BloomFilter.h
  Line: 70

    at org.apache.gluten.utils.VeloxBloomFilterJniWrapper.mightContainLongOnSerializedBloom(Native Method)
    at org.apache.gluten.utils.VeloxBloomFilter.mightContainLongOnSerializedBloom(VeloxBloomFilter.java:163)
    ...

Tests: succeeded 1, failed 1

kBloomFilterV1 == version failing with (1 vs. 0) is the exact version-byte mismatch: Velox's reader expected its own format (1) but got Spark's format (0).

Proposed fix

The root cause is that enableNativeBloomFilter answers "is native bloom filter on in config?" but the right question is "did Stage 0 actually run natively?" The fix is to make the guard structural: inside patchBloomFilterMightContain, before rewriting, inspect the physical plan referenced by bloomFilterExpression. If Stage 0's plan is itself a FallbackNode, it will produce Spark-format bytes and Stage 1 must be left with the vanilla BloomFilterMightContain.

Do you see any concerns with this approach, or is there a cleaner way you would handle it?

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch 3 times, most recently from 25c7fd9 to 2727774 Compare June 19, 2026 04:23

@zhztheplayer zhztheplayer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brijrajk thanks.

* This rule runs as a second fallback-policy pass, after `ExpandFallbackPolicy`, so it only acts
* when the plan is already wrapped in a `FallbackNode`.
*/
case class BloomFilterMightContainFallbackPatcher() extends Rule[SparkPlan] {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall why BloomFilterMightContainJointRewriteRule was made a physical rule, but can you try turning it to a logical rule anyway? So such a patcher rule can be avoided?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — BloomFilterMightContainJointRewriteRule is now a Rule[LogicalPlan] registered via injectOptimizerRule, modelled after CollectRewriteRule. The patcher is gone. Running as an optimizer rule ensures both substitutions (BloomFilterAggregateVeloxBloomFilterAggregate and BloomFilterMightContainVeloxBloomFilterMightContain) are captured in the originalPlan snapshot before ExpandFallbackPolicy takes it, so the byte format stays consistent regardless of which stages fall back. This also fixes the threshold=1 case where Stage 0 itself falls back (the patcher would incorrectly rewrite the filter side while Stage 0 was producing Spark-format bytes).

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch 2 times, most recently from f64edd1 to cac891f Compare June 20, 2026 02:38

@rdtr rdtr left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could CallerInfo.isBloomFilterStatFunction() and inBloomFilterStatFunctionCall() be removed now with this PR?

// from the original vanilla Spark plan which contains BloomFilterMightContain (not the Velox
// variant). If Stage 0 (bloom_filter_agg subquery) already ran natively it produced Velox-
// format bytes, which BloomFilterImpl.readFrom() cannot deserialize. BloomFilterMightContain-
// FallbackPatcher patches the fallback plan to use VeloxBloomFilterMightContain so Stage 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Patcher is now gone so this comment is outdated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated the comment to describe the optimizer rule approach instead.

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from cac891f to 59c6c50 Compare June 20, 2026 02:53
@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 299f4f8 to 9d096a3 Compare June 26, 2026 03:55
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from c06593c to aca2904 Compare June 26, 2026 08:10
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 1fe11de to 51185c7 Compare June 26, 2026 14:01
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 51185c7 to 04f38a3 Compare June 26, 2026 18:59
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 04f38a3 to c34c4e6 Compare June 26, 2026 19:39
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk

brijrajk commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

SPARK-54336 fix — literal-value might_contain

Pushed a single focused commit.

The earlier revision wrapped only the outer might_contain in VeloxBloomFilterMightContain whenever its value was not a plain column, while leaving the inner bloom_filter_agg vanilla. For the upstream BloomFilterAggregateQuerySuite."SPARK-54336" query — might_contain((SELECT bloom_filter_agg(col) FROM t), 0L), where the value is a literal — the inner vanilla aggregate has no Substrait mapping, so it runs in the JVM and emits version=0 bytes that the Velox might-contain (version=1) cannot deserialize:

Error Code: INVALID_ARGUMENT  Reason: (1 vs. 0)
Expression: kBloomFilterV1 == version   Function: merge

Fix: keep the producer and consumer on the same byte format.

  • column-valued might_contain → rewrite both the inner aggregate and the outer might-contain to Velox (the GLUTEN-12013 whole-stage-fallback case, unchanged);
  • non-column (literal) value → leave both vanilla, which also preserves vanilla's NULL-on-empty-input semantics.

Correction to an earlier note of mine on DPP / TPC-DS. This rule is registered via injectOptimizerRule, which lands in Spark's Operator Optimization batch — before InjectRuntimeFilter and MergeScalarSubqueries. It therefore never observes DPP/runtime-filter might_contain expressions (xxhash64(col) values) or ScalarSubqueryReference nodes; those are produced downstream and are unaffected by this rule. In the failing CI run the DPP might_contain shows up as plain vanilla BloomFilterMightContain in the logs (Not supported to map ... might_contain(Subquery, xxhash64(...))), confirming the rule does not touch it.

Verified locally (Velox backend, Spark 4.0):

  • GlutenBloomFilterAggregateQuerySuiteCGOffSPARK-54336 ✅ and might_contain on bloom_filter_agg with empty input ✅ (SPARK-54336 previously crashed with kBloomFilterV1 == version).
  • GlutenBloomFilterFallbackSuite — all pass, including a new regression test mirroring the SPARK-54336 query.

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 8a1ef81 to cd58f6f Compare June 27, 2026 12:42
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from cd58f6f to bd52842 Compare June 27, 2026 13:45
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from bd52842 to e2c7eb4 Compare June 28, 2026 04:07
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk

brijrajk commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

TPC-DS golden regeneration

Following @zhztheplayer's suggestion to make BloomFilterMightContainJointRewriteRule a logical rule, it now runs in the Operator Optimization batch. That changes the optimized plans of the bloom-filter-using TPC-DS queries, so the checked-in gluten-approved-plans-* goldens were regenerated to match. Operator structure for non-bloom queries is unchanged.

spark-test-spark40 / spark-test-spark41 are green with these regenerated goldens, which are folded into the PR's second commit.

@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from e2c7eb4 to 9d85377 Compare June 28, 2026 17:17
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

brijrajk and others added 2 commits June 28, 2026 23:00
…QE fallback (incl. SPARK-54336)

BloomFilterMightContainJointRewriteRule keeps a bloom-filter producer
(bloom_filter_agg) and its consumer (might_contain) on the same serialized byte
format, so they never end up mismatched (Velox version=1 vs Spark version=0) when
AQE promotes an individual-stage fallback to a whole-stage fallback -- the original
GLUTEN-12013 crash (java.io.IOException: Unexpected Bloom filter version number).

The rule runs as a Rule[LogicalPlan] via injectOptimizerRule (Operator Optimization
batch), so the substitution is captured in the originalPlan snapshot that
ExpandFallbackPolicy uses when promoting a stage fallback to a whole-stage AQE
fallback -- both sides stay consistent regardless of which stages fall back to JVM
execution.

  - might_contain(ScalarSubquery(...), col) with a plain column value: rewrite both
    the inner aggregate and the outer might-contain to their Velox forms.
  - might_contain(ScalarSubquery(...), <non-column>) -- e.g. a literal, as in
    SPARK-54336: might_contain((SELECT bloom_filter_agg(col) FROM t), 0L): leave both
    vanilla (version=0), which also preserves vanilla's NULL-on-empty-input
    semantics. Rewriting only the outer side caused kBloomFilterV1 == version (1 vs. 0).
  - Standalone BloomFilterAggregate (e.g. DataFrame.stat.bloomFilter()) is never
    matched, so its bytes stay Spark-native (fixes GlutenDataFrameStatSuite).

Because the rule runs before InjectRuntimeFilter and MergeScalarSubqueries,
DPP/runtime-filter might_contain expressions and ScalarSubqueryReference nodes are
never observed here, so no special handling is needed for them.

Adds GlutenBloomFilterFallbackSuite (gluten-ut/test) covering: only-filter-stage
fallback (threshold=2), both-stages fallback (threshold=1),
DataFrame.stat.bloomFilter() standalone usage, native-bloom-filter-disabled
early-exit, and the SPARK-54336 literal-value query. Also removes the now-unused
CallerInfo.isBloomFilterStatFunction() helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 4.0/4.1

BloomFilterMightContainJointRewriteRule now runs as a logical optimizer rule and rewrites BloomFilterAggregate/BloomFilterMightContain to their Velox forms (velox_bloom_filter_agg / velox_might_contain) for the affected TPC-DS queries. Regenerate the spark40 and spark41 plan-stability golden files accordingly. Operator structure (simplified.txt) is unchanged for non-bloom queries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@brijrajk brijrajk force-pushed the fix/12013-bloom-filter-stage-fallback branch from 9d85377 to 932ff73 Compare June 28, 2026 17:30
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk

Copy link
Copy Markdown
Contributor Author

@zhztheplayer — quick update: CI is now fully green on this PR ✅ (62/62 checks pass; approved & mergeable).

The earlier spark-test-spark40 / spark-test-spark41 reds were the TPC-DS plan-stability golden regeneration — moving BloomFilterMightContainJointRewriteRule into the Operator Optimization batch (your suggestion) changes those plans, and the regenerated goldens are now included. The PR is squashed into two logical commits (the fix incl. SPARK-54336, and the regenerated goldens). Thanks again for the logical-rule pointer — it is what made the fix clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fail to read the native bloom_filter when the stage fallback to java

5 participants