Commit f28c056
committed
[SPARK-56241][SQL] Derive outputOrdering from KeyedPartitioning key expressions
### What changes were proposed in this pull request?
Within a `KeyedPartitioning` partition, all rows share the same key value, so
the key expressions are trivially sorted (ascending) within each partition.
This PR makes two plan nodes expose that structural guarantee via
`outputOrdering`:
- **`DataSourceV2ScanExecBase`**: when `outputPartitioning` is a
`KeyedPartitioning` and the source reports no ordering via
`SupportsReportOrdering`, derive one ascending `SortOrder` per key
expression. When the source does report ordering, it is returned as-is.
- **`GroupPartitionsExec`**:
- *Non-coalescing* (every group has ≤ 1 input partition): pass through
`child.outputOrdering` unchanged.
- *Coalescing without reducers*: re-derive ordering from the output
`KeyedPartitioning` key expressions; a join may embed multiple
`KeyedPartitioning`s with different expressions — expose equivalences
via `sameOrderExpressions`.
- *Coalescing with reducers*: fall back to `super.outputOrdering` (empty),
because merged partitions share only the reduced key.
### Why are the changes needed?
Before this change, `outputOrdering` on both nodes returned an empty sequence
(unless `SupportsReportOrdering` was implemented), even though the within-
partition ordering was structurally guaranteed by the partitioning itself.
As a result, `EnsureRequirements` would insert a redundant `SortExec` before
`SortMergeJoin` inputs that are already in key order.
### Does this PR introduce _any_ user-facing change?
Yes. Queries involving storage-partitioned joins (v2 bucketing) no longer add
a redundant `SortExec` before `SortMergeJoin` when the join keys match the
partition keys, reducing CPU and memory overhead.
### How was this patch tested?
- New unit test class `GroupPartitionsExecSuite` covering all four
`outputOrdering` branches (non-coalescing, coalescing without reducers with
single and multi-key, join `sameOrderExpressions`, coalescing with reducers).
- New SQL integration tests in `KeyGroupedPartitioningSuite` (SPARK-56241):
- Scan with `KeyedPartitioning` reports key-derived `outputOrdering`.
- Non-coalescing `GroupPartitionsExec` (non-identical key sets) passes
through child ordering — no pre-join `SortExec`.
- Coalescing `GroupPartitionsExec` derives ordering from key expressions —
no pre-join `SortExec`.
- Updated expected output in `DataSourceV2Suite` for the case where a source
is partitioned by a key with no reported ordering — groupBy on the partition
key no longer requires a sort.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.61 parent 042440d commit f28c056
5 files changed
Lines changed: 261 additions & 11 deletions
File tree
- sql/core/src
- main/scala/org/apache/spark/sql/execution/datasources/v2
- test/scala/org/apache/spark/sql
- connector
- execution/datasources/v2
Lines changed: 17 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
108 | | - | |
109 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
110 | 112 | | |
111 | | - | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
112 | 124 | | |
113 | 125 | | |
114 | 126 | | |
| |||
Lines changed: 22 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
184 | 184 | | |
185 | 185 | | |
186 | 186 | | |
187 | | - | |
188 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
189 | 191 | | |
190 | 192 | | |
191 | | - | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
192 | 212 | | |
193 | 213 | | |
194 | 214 | | |
| |||
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
313 | 313 | | |
314 | 314 | | |
315 | 315 | | |
316 | | - | |
317 | | - | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
318 | 319 | | |
319 | 320 | | |
320 | 321 | | |
| |||
Lines changed: 113 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| |||
3568 | 3568 | | |
3569 | 3569 | | |
3570 | 3570 | | |
| 3571 | + | |
| 3572 | + | |
| 3573 | + | |
| 3574 | + | |
| 3575 | + | |
| 3576 | + | |
| 3577 | + | |
| 3578 | + | |
| 3579 | + | |
| 3580 | + | |
| 3581 | + | |
| 3582 | + | |
| 3583 | + | |
| 3584 | + | |
| 3585 | + | |
| 3586 | + | |
| 3587 | + | |
| 3588 | + | |
| 3589 | + | |
| 3590 | + | |
| 3591 | + | |
| 3592 | + | |
| 3593 | + | |
| 3594 | + | |
| 3595 | + | |
| 3596 | + | |
| 3597 | + | |
| 3598 | + | |
| 3599 | + | |
| 3600 | + | |
| 3601 | + | |
| 3602 | + | |
| 3603 | + | |
| 3604 | + | |
| 3605 | + | |
| 3606 | + | |
| 3607 | + | |
| 3608 | + | |
| 3609 | + | |
| 3610 | + | |
| 3611 | + | |
| 3612 | + | |
| 3613 | + | |
| 3614 | + | |
| 3615 | + | |
| 3616 | + | |
| 3617 | + | |
| 3618 | + | |
| 3619 | + | |
| 3620 | + | |
| 3621 | + | |
| 3622 | + | |
| 3623 | + | |
| 3624 | + | |
| 3625 | + | |
| 3626 | + | |
| 3627 | + | |
| 3628 | + | |
| 3629 | + | |
| 3630 | + | |
| 3631 | + | |
| 3632 | + | |
| 3633 | + | |
| 3634 | + | |
| 3635 | + | |
| 3636 | + | |
| 3637 | + | |
| 3638 | + | |
| 3639 | + | |
| 3640 | + | |
| 3641 | + | |
| 3642 | + | |
| 3643 | + | |
| 3644 | + | |
| 3645 | + | |
| 3646 | + | |
| 3647 | + | |
| 3648 | + | |
| 3649 | + | |
| 3650 | + | |
| 3651 | + | |
| 3652 | + | |
| 3653 | + | |
| 3654 | + | |
| 3655 | + | |
| 3656 | + | |
| 3657 | + | |
| 3658 | + | |
| 3659 | + | |
| 3660 | + | |
| 3661 | + | |
| 3662 | + | |
| 3663 | + | |
| 3664 | + | |
| 3665 | + | |
| 3666 | + | |
| 3667 | + | |
| 3668 | + | |
| 3669 | + | |
| 3670 | + | |
| 3671 | + | |
| 3672 | + | |
| 3673 | + | |
| 3674 | + | |
| 3675 | + | |
| 3676 | + | |
| 3677 | + | |
| 3678 | + | |
| 3679 | + | |
| 3680 | + | |
| 3681 | + | |
3571 | 3682 | | |
Lines changed: 106 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
0 commit comments