Skip to content

[SPARK-56032][SQL][FOLLOWUP] Skip FilterExec CSE codegen when the only common subexpression is a leaf#56604

Open
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:filterexec-cse-leaf-gate
Open

[SPARK-56032][SQL][FOLLOWUP] Skip FilterExec CSE codegen when the only common subexpression is a leaf#56604
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:filterexec-cse-leaf-gate

Conversation

@cloud-fan

@cloud-fan cloud-fan commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

A follow-up of #54862 (which introduced subexpression elimination (CSE) in FilterExec
whole-stage codegen) and #56209 (which gated the CSE path on whether otherPreds contain a
common subexpression).

The #56209 gate takes the CSE path whenever otherPreds contain a common subexpression, where
"common subexpression" is anything EquivalentExpressions counts more than once -- which includes
cheap expressions such as bare columns. c BETWEEN lo AND hi lowers to c >= lo AND c <= hi, so
any BETWEEN (or a column referenced in several conjuncts) makes that column a common subexpression.

Taking the CSE path then emits the eager inputVarsEvalCode prologue, which evaluates every
column referenced by otherPreds at the top of the per-row loop. Caching a cheap load gains nothing
-- the non-CSE path already loads each column lazily into a variable on demand -- so when the only
common subexpressions are cheap, the prologue is pure overhead that defeats short-circuiting.

This PR requires a non-cheap common subexpression (per CollapseProject.isCheap) before taking
the CSE path. Filters with a genuine repeated computation (e.g. a + b) are unaffected and still
benefit from CSE.

CollapseProject.isCheap is the canonical "cheap to recompute" predicate (attributes, foldables,
Alias/ExtractValue of cheap children). The gate runs on predicates already bound for codegen, so
this PR also teaches isCheap that BoundReference -- the codegen-bound form of an Attribute, an
equally cheap slot read -- is cheap. This lets the gate reuse the single EquivalentExpressions
analysis it already shares with the CSE codegen, rather than re-analyzing the predicates.

Why are the changes needed?

TPC-DS q28 filters as
ss_quantity BETWEEN ... AND (ss_list_price BETWEEN ... OR ss_coupon_amt BETWEEN ... OR ss_wholesale_cost BETWEEN ...).
Its only repeated expressions are the bare columns, so the #56209 gate wrongly took the CSE path and
eagerly decoded the high-precision decimal columns on every row -- including rows the cheap
ss_quantity integer predicate would have rejected -- allocating a BigInteger/BigDecimal per
decoded decimal. On a 3TB run this showed up as a ~40% slowdown on q28, which this change removes
(the filter falls back to the lazy, short-circuiting path).

Does this PR introduce any user-facing change?

No. This is a codegen-only change; query results are unchanged.

How was this patch tested?

New unit test in WholeStageCodegenSuite asserting that, for the q28 BETWEEN shape (whose only
common subexpressions are cheap leaf columns), CSE-enabled generated code is identical to
CSE-disabled code -- i.e. it falls back to the lazy, short-circuiting non-CSE path. The existing
FilterExec CSE tests, which use genuine non-cheap common subexpressions, still exercise the CSE
path and pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Claude Code)

…y common subexpression is a leaf

### What changes were proposed in this pull request?

A follow-up of the FilterExec whole-stage-codegen subexpression elimination (CSE) work.
The CSE path is taken whenever otherPreds contain a common subexpression, where
"common subexpression" is anything EquivalentExpressions counts more than once -- which
includes bare leaf columns. `c BETWEEN lo AND hi` lowers to `c >= lo AND c <= hi`, so any
BETWEEN (or a column used in several conjuncts) makes its column a common subexpression.

Taking the CSE path then emits the eager inputVarsEvalCode prologue, which decodes every
otherPred-referenced column at the top of the row loop. Caching a bare column load gains
nothing (the non-CSE path already loads columns lazily), so this is pure overhead and
defeats short-circuiting. This PR requires a non-leaf common subexpression before taking
the CSE path.

### Why are the changes needed?

TPC-DS q28 filters as `ss_quantity BETWEEN ... AND (ss_list_price BETWEEN ... OR
ss_coupon_amt BETWEEN ... OR ss_wholesale_cost BETWEEN ...)`. Its only repeated
expressions are the bare columns, so the gate wrongly took the CSE path and eagerly
decoded the high-precision decimal columns on every row -- including rows the cheap
ss_quantity integer predicate would reject -- allocating a BigInteger/BigDecimal per row.
A 3TB q28 run showed ~40% slowdown that this change removes.

### Does this PR introduce any user-facing change?

No. Codegen-only change; query results are unchanged.

### How was this patch tested?

New unit test in WholeStageCodegenSuite asserting that, for the q28 BETWEEN shape (only
leaf common subexpressions), CSE-enabled codegen is identical to CSE-disabled codegen
(i.e. it falls back to the lazy, short-circuiting path). The existing FilterExec CSE tests,
which use genuine non-leaf common subexpressions, still exercise the CSE path and pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Claude Code)

Co-authored-by: Isaac
…sCheap

Use the canonical CollapseProject.isCheap to decide whether a common subexpression is
worth eliminating, instead of an ad-hoc leaf check. A repeated expression that is cheap to
recompute (a bare column, a foldable, an Alias/ExtractValue of cheap children) gains nothing
from CSE, so requiring a non-cheap common subexpression avoids the eager prologue for those.

CollapseProject.isCheap matches Attribute; the gate runs on bound predicates, so teach
isCheap that BoundReference (the codegen-bound form of Attribute) is equally cheap. This
keeps the single EquivalentExpressions analysis shared by the gate and the CSE codegen.

Co-authored-by: Isaac
@cloud-fan

Copy link
Copy Markdown
Contributor Author

@LuciferYang @yaooqinn

@LuciferYang LuciferYang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we backport this to 4.2?

val df = spark.createDataFrame(data, schema)
// The only repeated expressions are the bare columns q, p1, p2 (each referenced by the two
// halves of its BETWEEN). No non-leaf expression is shared.
val filtered = df.where(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me verify the validity of this test.

@cloud-fan

Copy link
Copy Markdown
Contributor Author

Yea this is a 4.2 perf regression, cc @huaxingao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants