Skip to content

[SPARK-52818][SQL] Fix MergeSubplans creating nested WithCTE with cross-scope CTE references#55338

Open
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:fix-cte-merge-scalar-subqueries
Open

[SPARK-52818][SQL] Fix MergeSubplans creating nested WithCTE with cross-scope CTE references#55338
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:fix-cte-merge-scalar-subqueries

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented Apr 14, 2026

What changes were proposed in this pull request?

When a non-deterministic CTE (e.g. using monotonically_increasing_id()) is referenced in scalar subqueries and the result is displayed with .show() (which adds a Limit), MergeSubplans can create an outer WithCTE whose CTE defs reference CTE defs from an inner WithCTE. This causes ReplaceCTERefWithRepartition to crash with NoSuchElementException because it processes the outer CTE defs before the inner CTE defs have been added to the map.

The root cause: MergeSubplans.apply checks case _: WithCTE => plan to skip plans with CTEs, but this only matches when WithCTE is the top-level node. When .show() wraps the plan in GlobalLimit(LocalLimit(WithCTE(...))), the top-level node is GlobalLimit, so MergeSubplans runs and creates a new outer WithCTE around the existing inner one — producing nested WithCTE nodes with cross-scope references.

Repro:

WITH cte_base AS (
  SELECT recid, givenname, surname,
    concat('t_', cast(monotonically_increasing_id() AS string)) AS __unique_id
  FROM t
),
cte_tf_givenname AS (
  SELECT givenname,
    CAST(COUNT(*) AS DOUBLE) / (SELECT COUNT(givenname) FROM cte_base) AS tf
  FROM cte_base WHERE givenname IS NOT NULL GROUP BY givenname
),
cte_tf_surname AS (
  SELECT surname,
    CAST(COUNT(*) AS DOUBLE) / (SELECT COUNT(surname) FROM cte_base) AS tf
  FROM cte_base WHERE surname IS NOT NULL GROUP BY surname
)
SELECT cte_base.givenname, cte_base.surname,
  cte_tf_givenname.tf, cte_tf_surname.tf
FROM cte_base
LEFT JOIN cte_tf_givenname ON cte_base.givenname = cte_tf_givenname.givenname
LEFT JOIN cte_tf_surname ON cte_base.surname = cte_tf_surname.surname

Calling .show() on this crashes. Calling .collect() works (no Limit wrapper).

The fix has two parts:

  1. MergeSubplans: Change case _: WithCTE => plan to case _ if plan.containsPattern(CTE) => plan to skip plans that contain WithCTE anywhere in the tree, not just at the top level.
  2. ReplaceCTERefWithRepartition: improve error message

Why are the changes needed?

Bug fix. Queries with non-deterministic CTEs referenced in scalar subqueries crash with java.util.NoSuchElementException: key not found when using .show().

Does this PR introduce any user-facing change?

Yes, queries that previously crashed now work correctly.

How was this patch tested?

New tests added:

  • CTEInlineSuite: end-to-end test with the exact repro query using .show()
  • InlineCTESuite: unit test verifying ReplaceCTERefWithRepartition handles orphaned CTERelationRef gracefully

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

@cloud-fan cloud-fan force-pushed the fix-cte-merge-scalar-subqueries branch 3 times, most recently from 615846e to 7d915e1 Compare April 14, 2026 13:04
…ss-scope CTE references

### What changes were proposed in this pull request?

When a non-deterministic CTE (e.g. using `monotonically_increasing_id()`) is referenced in scalar subqueries and the result is displayed with `.show()` (which adds a `Limit`), `MergeSubplans` can create an outer `WithCTE` whose CTE defs reference CTE defs from an inner `WithCTE`. This causes `ReplaceCTERefWithRepartition` to crash with `NoSuchElementException` because it processes the outer CTE defs before the inner CTE defs have been added to the map.

The root cause: `MergeSubplans.apply` checks `case _: WithCTE => plan` to skip plans with CTEs, but this only matches when `WithCTE` is the **top-level** node. When `.show()` wraps the plan in `GlobalLimit(LocalLimit(WithCTE(...)))`, the top-level node is `GlobalLimit`, so `MergeSubplans` runs and creates a new outer `WithCTE` around the existing inner one — producing nested `WithCTE` nodes with cross-scope references.

The fix has two parts:
1. **`MergeSubplans`**: Change `case _: WithCTE => plan` to `case _ if plan.containsPattern(CTE) => plan` to skip plans that contain `WithCTE` **anywhere**, not just at the top level.
2. **`ReplaceCTERefWithRepartition`**: Add a defensive guard `if cteMap.contains(ref.cteId)` so that orphaned `CTERelationRef` nodes don't crash with `NoSuchElementException`.

### Why are the changes needed?

Bug fix. The query crashes with `java.util.NoSuchElementException: key not found` in `ReplaceCTERefWithRepartition`.

### Does this PR introduce _any_ user-facing change?

Yes, queries that previously crashed now work correctly.

### How was this patch tested?

New tests in `CTEInlineSuite` and `InlineCTESuite`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code
@cloud-fan cloud-fan force-pushed the fix-cte-merge-scalar-subqueries branch from 7d915e1 to abd393e Compare April 14, 2026 13:11
@cloud-fan
Copy link
Copy Markdown
Contributor Author

cc @peter-toth

Copy link
Copy Markdown
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants