Support Dataframe.offset function#203
Conversation
|
@sfc-gh-bli @sfc-gh-rysun any chance I could get a review on this PR? Let me know if there are adjustments needed, I can make the changes. Thank you. |
sfc-gh-bli
left a comment
There was a problem hiding this comment.
The results of Snowflake SQL is unordered unless explicitly invoke "order by".
for example in your test
getSession().sql("select * from values(1), (2), (3), (4), (5) as t(a)").offset(3)
is not guaranteed to return 4 and 5, can be any two numbers from 1,2,3,4,5.
At the same time, Select col1 from table1 order by col1 offset 3 and select * from (select col1 from table1 order by col1) offset 3 are different. the first one is ordered but the second one is unordered.
Snowpark has an optimization on Sort + Limit to make it generateing flatten SQL to keep the order.
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-875095: Add
offsetparam toDataFrame.limit()#44Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
This allows use of the OFFSET command with a DataFrame fluent API instead of calling sql().
Pre-review checklist
(For Snowflake employees)