Skip to content

Push distinct-by-primary-key operators below map operators in plans#3860

Open
alecgrieser wants to merge 1 commit intoFoundationDB:mainfrom
alecgrieser:03858-push-distinct-below-map
Open

Push distinct-by-primary-key operators below map operators in plans#3860
alecgrieser wants to merge 1 commit intoFoundationDB:mainfrom
alecgrieser:03858-push-distinct-below-map

Conversation

@alecgrieser
Copy link
Collaborator

This adds a new planner rule to push a distinct-by-primary-key operator below a map operator. This can be relevant for plans like an update plan when there are version columns (now broken out into its own yamsql file) as the new logic added in 4.9.0.1 (see: #3809) can result in additional maps being inserted into plans in such a way that the distinct was forced to happen after. This also had some knock on effects like resulting in preferring an index scan over a covering scan if there were some predicates that could be pushed below a fetch (but also some that couldn't).

This resolves #3858.

@alecgrieser alecgrieser added the enhancement New feature or request label Jan 22, 2026
@alecgrieser alecgrieser requested a review from normen662 January 23, 2026 10:51
@alecgrieser alecgrieser force-pushed the 03858-push-distinct-below-map branch from c8aca9d to 73a0dd0 Compare February 11, 2026 17:08
This adds a new planner rule to push a distinct-by-primary-key operator below a map operator. This can be relevant for plans like an update plan when there are version columns (now broken out into its own yamsql file) as the new logic added in 4.9.0.1 (see: FoundationDB#3809) can result in additional maps being inserted into plans in such a way that the distinct was forced to happen after. This also had some knock on effects like resulting in preferring an index scan over a covering scan if there were some predicates that could be pushed below a fetch (but also some that couldn't).

This resolves FoundationDB#3858.
@alecgrieser alecgrieser force-pushed the 03858-push-distinct-below-map branch from 73a0dd0 to b3c5820 Compare February 24, 2026 11:08
@github-actions
Copy link

📊 Metrics Diff Analysis Report

Summary

  • New queries: 11
  • Dropped queries: 0
  • Plan changed + metrics changed: 6
  • Plan unchanged + metrics changed: 9
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/update-with-versions.metrics.yaml: 11

Plan and Metrics Changed

These queries experienced both plan and metrics changes. This generally indicates that there was some planner change
that means the planning for this query may be substantially different. Some amount of query plan metrics change is expected,
but the reviewer should still validate that these changes are not excessive.

Total: 6 queries

Statistical Summary (Plan and Metrics Changed)

task_count:

  • Average change: +25.8
  • Average regression: +25.8
  • Median change: +6
  • Median regression: +6
  • Standard deviation: 45.2
  • Standard deviation of regressions: 45.2
  • Range: +5 to +127
  • Range of regressions: +5 to +127
  • Queries changed: 6
  • Queries regressed: 6

transform_count:

  • Average change: +18.0
  • Average regression: +18.0
  • Median change: +18
  • Median regression: +18
  • Standard deviation: 0.0
  • Standard deviation of regressions: 0.0
  • Range: +18 to +18
  • Range of regressions: +18 to +18
  • Queries changed: 1
  • Queries regressed: 1

transform_yield_count:

  • Average change: +5.0
  • Average regression: +5.0
  • Median change: +5
  • Median regression: +5
  • Standard deviation: 0.0
  • Standard deviation of regressions: 0.0
  • Range: +5 to +5
  • Range of regressions: +5 to +5
  • Queries changed: 1
  • Queries regressed: 1

insert_new_count:

  • Average change: +12.0
  • Average regression: +12.0
  • Median change: +12
  • Median regression: +12
  • Standard deviation: 0.0
  • Standard deviation of regressions: 0.0
  • Range: +12 to +12
  • Range of regressions: +12 to +12
  • Queries changed: 1
  • Queries regressed: 1

insert_reused_count:

  • Average change: +1.0
  • Average regression: +1.0
  • Median change: +1
  • Median regression: +1
  • Standard deviation: 0.0
  • Standard deviation of regressions: 0.0
  • Range: +1 to +1
  • Range of regressions: +1 to +1
  • Queries changed: 1
  • Queries regressed: 1

Significant Regressions (Plan and Metrics Changed)

There was 1 outlier detected. Outlier queries have a significant regression in at least one field. Statistically, this represents either an increase of more than two standard deviations above the mean or a large absolute increase (e.g., 100).

  • yaml-tests/src/test/resources/versions-tests.metrics.yaml:2: EXPLAIN UPDATE t3 SET col2 = col2 + 1 WHERE col1 = 'a' RETURNING "new".*
    • old explain: COVERING(T3_VERSION_WITH_COL1 <,> -> [COL1: VALUE:[0], ID: KEY:[2]]) | FILTER _.COL1 EQUALS promote(@c12 AS STRING) | FETCH | MAP (_.ID AS ID, _.COL1 AS COL1, _.COL2 AS COL2) | DISTINCT BY PK | UPDATE T3 | MAP ((_.new).ID AS ID, (_.new).COL1 AS COL1, (_.new).COL2 AS COL2)
    • new explain: COVERING(T3_VERSION_WITH_COL1 <,> -> [COL1: VALUE:[0], ID: KEY:[2]]) | DISTINCT BY PK | FILTER _.COL1 EQUALS promote(@c12 AS STRING) | FETCH | MAP (_.ID AS ID, _.COL1 AS COL1, _.COL2 AS COL2) | UPDATE T3 | MAP ((_.new).ID AS ID, (_.new).COL1 AS COL1, (_.new).COL2 AS COL2)
    • task_count: 552 -> 679 (+127)
    • transform_count: 135 -> 153 (+18)
    • transform_yield_count: 36 -> 41 (+5)
    • insert_new_count: 61 -> 73 (+12)
    • insert_reused_count: 3 -> 4 (+1)

Minor Changes (Plan and Metrics Changed)

In addition, there were 5 queries with minor changes.

Only Metrics Changed

These queries experienced only metrics changes without any plan changes. If these metrics have substantially changed,
then a planner change has been made which affects planner performance but does not correlate with any new outcomes,
which could indicate a regression.

Total: 9 queries

Statistical Summary (Only Metrics Changed)

task_count:

  • Average change: +3.9
  • Average regression: +3.9
  • Median change: +2
  • Median regression: +2
  • Standard deviation: 3.8
  • Standard deviation of regressions: 3.8
  • Range: +1 to +11
  • Range of regressions: +1 to +11
  • Queries changed: 9
  • Queries regressed: 9

There were no queries with significant regressions detected.

Minor Changes (Only Metrics Changed)

In addition, there were 9 queries with minor changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Planner should be able to push distinct operators below a map

1 participant