Skip to content

[GLUTEN-10992][VL] Fix MatchError for KeyGroupedPartitioning in native shuffle#12335

Open
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:fix/10992-keygrouped-partitioning-fallback
Open

[GLUTEN-10992][VL] Fix MatchError for KeyGroupedPartitioning in native shuffle#12335
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:fix/10992-keygrouped-partitioning-fallback

Conversation

@brijrajk

@brijrajk brijrajk commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

When Spark 4.0's V2 bucketing shuffle (spark.sql.v2.bucketing.shuffle.enabled=true) is used in a join where only one side reports partitioning, Spark generates a ShuffleExchangeExec with KeyGroupedPartitioning as its output partitioning.

The default case _ => in VeloxSparkPlanExecApi.genColumnarShuffleExchange created a ColumnarShuffleExchangeExec for this node without validation. When the query executed, ExecUtil.genShuffleDependency crashed with a scala.MatchError because KeyGroupedPartitioning was missing from its exhaustive match.

Changes:

  • VeloxSparkPlanExecApi.genColumnarShuffleExchange: add an explicit case _: KeyGroupedPartitioning => before the default that adds a fallback tag and returns the vanilla ShuffleExchangeExec. This prevents a ColumnarShuffleExchangeExec from being created for an unsupported partitioning type.
  • ExecUtil.genShuffleDependency: add an explicit wildcard case other => that throws GlutenNotSupportException instead of the cryptic scala.MatchError, as a defensive guard for any future unknown partitioning types.

How was this patch tested?

The existing testGluten("SPARK-41471: shuffle one side: only one side reports partitioning") tests in GlutenKeyGroupedPartitioningSuite (both spark40 and spark41) reproduce the crash exactly — they set V2_BUCKETING_SHUFFLE_ENABLED=true with only one bucketed side, which triggers a ShuffleExchangeExec with KeyGroupedPartitioning output and then call checkAnswer. After this fix these tests pass without MatchError.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (https://claude.ai/code)

Related issue: #10992

@liuneng1994

Copy link
Copy Markdown
Contributor

Can you please add some tests for KeyGroupedPartitioning

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Spark 4.0 native-shuffle crash when Spark produces a ShuffleExchangeExec with KeyGroupedPartitioning (e.g., with V2 bucketing shuffle enabled and only one join side reporting partitioning). The change prevents Gluten from creating a native ColumnarShuffleExchangeExec for an unsupported partitioning type and replaces a runtime scala.MatchError with a clearer Gluten exception.

Changes:

  • Add an explicit KeyGroupedPartitioning fallback in VeloxSparkPlanExecApi.genColumnarShuffleExchange to return vanilla ShuffleExchangeExec (tagged for fallback) instead of creating ColumnarShuffleExchangeExec.
  • Add a defensive default case other => in ExecUtil.genShuffleDependency to throw GlutenNotSupportException rather than a scala.MatchError for unknown partitioning types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
backends-velox/src/main/scala/org/apache/spark/sql/execution/utils/ExecUtil.scala Adds a default match case to fail fast with GlutenNotSupportException instead of MatchError for unsupported/unknown partitioning types.
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxSparkPlanExecApi.scala Adds an explicit KeyGroupedPartitioning fallback path to avoid creating native columnar shuffle for an unsupported partitioning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +176 to +178
case other =>
throw new GlutenNotSupportException(
s"Partitioning ${other.getClass.getSimpleName} is not supported by native shuffle")
…e shuffle

When Spark 4.0's V2 bucketing shuffle (spark.sql.v2.bucketing.shuffle.enabled=true)
is used in a join where only one side reports partitioning, Spark generates a
ShuffleExchangeExec with KeyGroupedPartitioning as its output. The default
case in VeloxSparkPlanExecApi.genColumnarShuffleExchange created a
ColumnarShuffleExchangeExec for this node, which then crashed with a
scala.MatchError in ExecUtil.genShuffleDependency because KeyGroupedPartitioning
was not handled in the native partitioning match.

Fix by adding an explicit KeyGroupedPartitioning case to genColumnarShuffleExchange
that marks the shuffle for fallback to vanilla Spark. Also harden
ExecUtil.genShuffleDependency with an explicit wildcard that throws
GlutenNotSupportException instead of a cryptic MatchError for any future
unknown partitioning types. The exception now embeds the full partitioning
toString (expressions, numPartitions) to aid debugging.

Add a dedicated GlutenKeyGroupedPartitioningSuite test (spark40 and spark41)
that asserts the KeyGroupedPartitioning shuffle falls back to a vanilla
ShuffleExchangeExec and is never offloaded to ColumnarShuffleExchangeExec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@brijrajk brijrajk force-pushed the fix/10992-keygrouped-partitioning-fallback branch from 761cfc7 to 37d4051 Compare June 27, 2026 13:25
@github-actions github-actions Bot added the CORE works for Gluten Core label Jun 27, 2026
@brijrajk

Copy link
Copy Markdown
Contributor Author

@liuneng1994 thanks for the review. Both comments are addressed in the latest push:

  • Tests: added a dedicated GlutenKeyGroupedPartitioningSuite test (spark40 and spark41) that asserts the KeyGroupedPartitioning shuffle falls back to a vanilla ShuffleExchangeExec and is never offloaded to ColumnarShuffleExchangeExec, then checks the join result. This complements the existing SPARK-41471 one-side tests by verifying the fallback path specifically rather than just the shuffle count.
  • Error message (Copilot suggestion): the GlutenNotSupportException in ExecUtil.genShuffleDependency now embeds the full partitioning toString (expressions, numPartitions) instead of only the class name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants