[SPARK-56032][SQL][FOLLOWUP] Skip FilterExec CSE codegen when the only common subexpression is a leaf#56604
Open
cloud-fan wants to merge 2 commits into
Open
[SPARK-56032][SQL][FOLLOWUP] Skip FilterExec CSE codegen when the only common subexpression is a leaf#56604cloud-fan wants to merge 2 commits into
cloud-fan wants to merge 2 commits into
Conversation
…y common subexpression is a leaf ### What changes were proposed in this pull request? A follow-up of the FilterExec whole-stage-codegen subexpression elimination (CSE) work. The CSE path is taken whenever otherPreds contain a common subexpression, where "common subexpression" is anything EquivalentExpressions counts more than once -- which includes bare leaf columns. `c BETWEEN lo AND hi` lowers to `c >= lo AND c <= hi`, so any BETWEEN (or a column used in several conjuncts) makes its column a common subexpression. Taking the CSE path then emits the eager inputVarsEvalCode prologue, which decodes every otherPred-referenced column at the top of the row loop. Caching a bare column load gains nothing (the non-CSE path already loads columns lazily), so this is pure overhead and defeats short-circuiting. This PR requires a non-leaf common subexpression before taking the CSE path. ### Why are the changes needed? TPC-DS q28 filters as `ss_quantity BETWEEN ... AND (ss_list_price BETWEEN ... OR ss_coupon_amt BETWEEN ... OR ss_wholesale_cost BETWEEN ...)`. Its only repeated expressions are the bare columns, so the gate wrongly took the CSE path and eagerly decoded the high-precision decimal columns on every row -- including rows the cheap ss_quantity integer predicate would reject -- allocating a BigInteger/BigDecimal per row. A 3TB q28 run showed ~40% slowdown that this change removes. ### Does this PR introduce any user-facing change? No. Codegen-only change; query results are unchanged. ### How was this patch tested? New unit test in WholeStageCodegenSuite asserting that, for the q28 BETWEEN shape (only leaf common subexpressions), CSE-enabled codegen is identical to CSE-disabled codegen (i.e. it falls back to the lazy, short-circuiting path). The existing FilterExec CSE tests, which use genuine non-leaf common subexpressions, still exercise the CSE path and pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Claude Code) Co-authored-by: Isaac
…sCheap Use the canonical CollapseProject.isCheap to decide whether a common subexpression is worth eliminating, instead of an ad-hoc leaf check. A repeated expression that is cheap to recompute (a bare column, a foldable, an Alias/ExtractValue of cheap children) gains nothing from CSE, so requiring a non-cheap common subexpression avoids the eager prologue for those. CollapseProject.isCheap matches Attribute; the gate runs on bound predicates, so teach isCheap that BoundReference (the codegen-bound form of Attribute) is equally cheap. This keeps the single EquivalentExpressions analysis shared by the gate and the CSE codegen. Co-authored-by: Isaac
Contributor
Author
LuciferYang
approved these changes
Jun 19, 2026
LuciferYang
left a comment
Contributor
There was a problem hiding this comment.
should we backport this to 4.2?
LuciferYang
reviewed
Jun 19, 2026
| val df = spark.createDataFrame(data, schema) | ||
| // The only repeated expressions are the bare columns q, p1, p2 (each referenced by the two | ||
| // halves of its BETWEEN). No non-leaf expression is shared. | ||
| val filtered = df.where( |
Contributor
There was a problem hiding this comment.
Let me verify the validity of this test.
Contributor
Author
|
Yea this is a 4.2 perf regression, cc @huaxingao |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
A follow-up of #54862 (which introduced subexpression elimination (CSE) in
FilterExecwhole-stage codegen) and #56209 (which gated the CSE path on whether
otherPredscontain acommon subexpression).
The #56209 gate takes the CSE path whenever
otherPredscontain a common subexpression, where"common subexpression" is anything
EquivalentExpressionscounts more than once -- which includescheap expressions such as bare columns.
c BETWEEN lo AND hilowers toc >= lo AND c <= hi, soany
BETWEEN(or a column referenced in several conjuncts) makes that column a common subexpression.Taking the CSE path then emits the eager
inputVarsEvalCodeprologue, which evaluates everycolumn referenced by
otherPredsat the top of the per-row loop. Caching a cheap load gains nothing-- the non-CSE path already loads each column lazily into a variable on demand -- so when the only
common subexpressions are cheap, the prologue is pure overhead that defeats short-circuiting.
This PR requires a non-cheap common subexpression (per
CollapseProject.isCheap) before takingthe CSE path. Filters with a genuine repeated computation (e.g.
a + b) are unaffected and stillbenefit from CSE.
CollapseProject.isCheapis the canonical "cheap to recompute" predicate (attributes, foldables,Alias/ExtractValueof cheap children). The gate runs on predicates already bound for codegen, sothis PR also teaches
isCheapthatBoundReference-- the codegen-bound form of anAttribute, anequally cheap slot read -- is cheap. This lets the gate reuse the single
EquivalentExpressionsanalysis it already shares with the CSE codegen, rather than re-analyzing the predicates.
Why are the changes needed?
TPC-DS q28 filters as
ss_quantity BETWEEN ... AND (ss_list_price BETWEEN ... OR ss_coupon_amt BETWEEN ... OR ss_wholesale_cost BETWEEN ...).Its only repeated expressions are the bare columns, so the #56209 gate wrongly took the CSE path and
eagerly decoded the high-precision decimal columns on every row -- including rows the cheap
ss_quantityinteger predicate would have rejected -- allocating aBigInteger/BigDecimalperdecoded decimal. On a 3TB run this showed up as a ~40% slowdown on q28, which this change removes
(the filter falls back to the lazy, short-circuiting path).
Does this PR introduce any user-facing change?
No. This is a codegen-only change; query results are unchanged.
How was this patch tested?
New unit test in
WholeStageCodegenSuiteasserting that, for the q28BETWEENshape (whose onlycommon subexpressions are cheap leaf columns), CSE-enabled generated code is identical to
CSE-disabled code -- i.e. it falls back to the lazy, short-circuiting non-CSE path. The existing
FilterExecCSE tests, which use genuine non-cheap common subexpressions, still exercise the CSEpath and pass.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Claude Code)