SNOW-3266495: Defer HAVING, ORDER BY, LIMIT clauses in SCOS compatibility mode by sfc-gh-joshi · Pull Request #4132 · snowflakedb/snowpark-python

sfc-gh-joshi · 2026-03-23T20:20:12Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-3266495
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
- If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
Please describe how your code solves the related issue.

Snowflake SQL requires HAVING, ORDER BY, and LIMIT clauses of a GROUP BY statement to appear in that particular order. For example:

SELECT 
    "DEPT", 
    count(1) AS "HEADCOUNT", 
    avg("SALARY") AS "AVG_SALARY"
 FROM (
 SELECT 
    "ID", 
    "DEPT", 
    "SALARY"
 FROM (
 SELECT $1 AS "ID", $2 AS "DEPT", $3 AS "SALARY" FROM  VALUES (1 :: INT, 'engineering' :: STRING, 80000 :: INT), (2 :: INT, 'engineering' :: STRING, 90000 :: INT), (3 :: INT, 'sales' :: STRING, 50000 :: INT), (4 :: INT, 'sales' :: STRING, 60000 :: INT), (5 :: INT, 'hr' :: STRING, 45000 :: INT), (6 :: INT, 'hr' :: STRING, 55000 :: INT), (7 :: INT, 'engineering' :: STRING, 85000 :: INT)
)
)
 GROUP BY 
    "DEPT"
 HAVING ("HEADCOUNT" > 1)
 ORDER BY "AVG_SALARY" DESC NULLS LAST
 LIMIT 2 OFFSET 0

Re-arranging any of these clauses produces a syntax error. Currently, when _is_snowpark_connect_compatible_mode is set, the order in which these clauses are added depends on the order in which they are specified by the user. That is, df.groupBy(...).agg(...).sort(...).filter(...) would emit ORDER BY before HAVING, resulting in a compilation error.

This PR defers the generation of these clauses for GROUP BY aggregations. Instead of appending each clause to the analyzer tree when a method is called, the new DataFrame._build_post_agg_df method ensures the clauses are added in the correct order.

sfc-gh-aling · 2026-03-26T22:41:52Z

src/snowflake/snowpark/dataframe.py

+        self._pending_having = None
+        self._pending_order_by = None
+        self._pending_limit = None


I'm curious how multiple filter would affect the plan generation

df1 = df.groupBy(...).agg(...) df2 = df1.filter(...).limit().filter(...) df3 = df1.filter(...).filter(...).limit() df4 = df1.limit().filter(...).filter(...)

do df2,3,4 output the same query?

what's the behavior in spark and do we align with spark behavior after your code change?

That's a good point. It looks like in spark, filter is not commutative across a sequence of df.filter(...).orderBy(...).limit(...).filter(...) (the final call will see a deterministic subset of rows based on the prior order/limit). I'll need to do some more testing to see what this means for SQL generation, and whether the cases you mentioned have similar problems.

@sfc-gh-aling I added some test cases covering this behavior, and checked the output against spark. The changes are:

Operations that occur after a LIMIT now always produce a new sub-query, since FILTER -> LIMIT and LIMIT -> FILTER are not equivalent.

Consecutive filter operations are now conjoined into a single HAVING clause. I don't think SQL has any short-circuiting evaluation behavior that imperative languages do, so I believe this should always be equivalent. The Spark explain plans I looked at did also combine filter clauses together into a single operator.

Consecutive ordering operations are now combined into a single ORDER BY, with the last ordering clause appearing first in the SQL.

Most of these cases were previously broken in SCOS, as the only sequence of operations that would have produced valid SQL was df.groupby(...).agg(...).filter(...).orderBy(...).limit(...), where the operations appeared in the same order as that required by SQL.

nice, thanks for checking the behavior!
re1: does orderBy also produce a subquery like limit?

sfc-gh-aling

LGTM, I suggest kicking off the SCOS regression pipeline to see if how many new cases we support with the change or if there's any gap in our implementation before merging the PR

sfc-gh-joshi requested review from a team as code owners March 23, 2026 20:20

sfc-gh-joshi requested review from a team, sfc-gh-aling, sfc-gh-yixie and sfc-gh-yuwang March 23, 2026 20:20

defer having/sort/limit construction

0d60875

sfc-gh-joshi force-pushed the joshi-SNOW-3266495-having-order branch from d7aca89 to 0d60875 Compare March 25, 2026 23:03

sfc-gh-joshi added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Mar 25, 2026

sfc-gh-aling reviewed Mar 26, 2026

View reviewed changes

update nesting/limit breaking behavior

8cbcd4e

sfc-gh-joshi requested a review from a team March 27, 2026 20:46

update comments

4258872

sfc-gh-aling approved these changes Mar 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-3266495: Defer HAVING, ORDER BY, LIMIT clauses in SCOS compatibility mode#4132

SNOW-3266495: Defer HAVING, ORDER BY, LIMIT clauses in SCOS compatibility mode#4132
sfc-gh-joshi wants to merge 3 commits intomainfrom
joshi-SNOW-3266495-having-order

sfc-gh-joshi commented Mar 23, 2026 •

edited

Loading

Uh oh!

sfc-gh-aling Mar 26, 2026

Uh oh!

sfc-gh-joshi Mar 27, 2026

Uh oh!

sfc-gh-joshi Mar 27, 2026

Uh oh!

sfc-gh-aling Mar 28, 2026

Uh oh!

sfc-gh-aling left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sfc-gh-joshi commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-aling Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-joshi Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-joshi Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aling Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aling left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfc-gh-joshi commented Mar 23, 2026 •

edited

Loading

sfc-gh-aling left a comment •

edited

Loading