[SPARK-57103][FOLLOWUP][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types#56592
[SPARK-57103][FOLLOWUP][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types#56592stevomitric wants to merge 3 commits into
Conversation
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 1 non-blocking, 1 nit.
Clean, well-targeted tests-only PR. I verified the production claims: Max/Min gate on orderability only and dataType = child.dataType, and findWiderDateTimeType genuinely has no nanos arm, so mixed-precision is correctly out of scope (SPARK-57454). Test data is meaningful (sub-microsecond-distinct values would catch micros-truncation; the group-by test asserts no-collapse). Both findings are about test placement / a stale suite doc, neither blocks merge.
Design / architecture (1)
- TimestampNanosFunctionsSuiteBase.scala:243: MIN/MAX aggregate tests live in a functions-scoped suite, whereas the cited TimeType analogue (SPARK-52626/52660) placed its aggregate tests in
DataFrameAggregateSuite— see inline
Nits: 1 minor item (stale class Scaladoc, folded into the same inline comment).
| } | ||
| } | ||
|
|
||
| // ===== MIN / MAX aggregates over nanosecond-precision timestamps (SPARK-56822) ===== |
There was a problem hiding this comment.
These are MIN/MAX aggregate tests, but they land in a suite whose class Scaladoc (lines 28-33) scopes it to "the hour, minute and second functions". The TimeType precedent this PR explicitly mirrors put its analogous aggregate tests in DataFrameAggregateSuite (SPARK-52626 group-by at :3713, SPARK-52660 codegen-split max/min at :3724) — the dedicated aggregate suite. Consider co-locating these there (or in a dedicated nanos-aggregate suite) so aggregate-over-temporal-type coverage stays discoverable in one place.
There's a legitimate pull toward keeping them here — this suite gives the ANSI on/off matrix, the nanos preview-flag session, and the fixed session timezone for free — so this is a judgment call, not a defect.
Minor, regardless of where they end up: the class Scaladoc is now stale. It still says the suite tests only hour/minute/second and that "Each test exercises both the SQL path (selectExpr) and the Scala Column API (functions.hour / minute / second)" — no longer true now that aggregate tests are present (several use only selectExpr). Please update it (or move the tests).
There was a problem hiding this comment.
Kept them here for the free ANSI on/off matrix, preview-flag session, and fixed timezone. Updated the class Scaladoc.
…N/MAX aggregate tests Addresses review feedback on apache#56592: the class Scaladoc for TimestampNanosFunctionsSuiteBase still described only the hour/minute/second functions and claimed every test exercises both the SQL and Column API paths, no longer accurate now that the MIN/MAX aggregate tests are present. Rewrote it to cover the datetime functions and the MIN/MAX aggregates and to note the ANSI on/off subclasses. Tests-only; no production change. Co-authored-by: Isaac
|
cc @MaxGekk PTAL. |
|
+1, LGTM. Merging to master/4.x. |
|
@stevomitric SPARK-57502 is wrong. Please, set correct one. |
…AX nanos tests
The MIN/MAX-over-nanos test names and golden-file comments were tagged with a
placeholder ticket, SPARK-57502, which is in fact an unrelated INFRA issue
("Fix npm vulnerabilities in ui-test and dev"). Rename the references to
SPARK-57103 ("Add ordering, compare, and hash for nanosecond timestamp types"),
the sub-task of SPARK-56822 that enabled MIN/MAX over the nanosecond types.
Tests-only; no production or behavior change.
Co-authored-by: Isaac
You are right. I set the SPARK-57103 which introduced comparisons which enabled the min/max expressions. |
|
@stevomitric Could you resolve conflicts, please. |
What changes were proposed in this pull request?
This PR adds end-to-end, Catalyst-level, and golden-file test coverage for the
MINandMAXaggregates (and the closely relatedmin_by/max_by/greatest/least) over the nanosecond-precision timestamp typesTIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)(p ∈ [7, 9]), part of the nanosecond timestamp preview (SPARK-56822).No production code is changed.
Max/Minare type-agnosticDeclarativeAggregates whose only type gate isTypeUtils.checkForOrderingExpr, which already returnsTypeCheckSuccessfor the nanosecond types because they are orderableAtomicTypesComparablewere wired in SPARK-57103, the physical value is UnsafeRow-mutable (SPARK-56981), andCodeGenerator.genCompalready has a dedicatedcase _: AnyTimestampNanoType => compareToarm. SincedataType = child.dataType, the result preserves the input precision. This mirrors the TimeType precedent (SPARK-52626 / SPARK-52660), where becoming orderable madeMIN/MAXwork and the remaining work was test coverage.Added tests:
TimestampNanosFunctionsSuiteBase(runs under ANSI on and off): precision/type preservation; sub-microsecond ordering on both the codegen and interpreted paths (two values sharing the sameepochMicrosbut differing within the microsecond); all-NULL and empty input → NULL;GROUP BYa nanosecond key whose sub-microsecond remainder must keep distinct keys from collapsing;min_by/max_by/greatest/leastover same-precision columns; and equivalence with the microsecond path when the sub-microsecond digits are zero.ExpressionTypeCheckingSuite: assertsMax/MincheckInputDataTypes()succeeds forTimestampNTZNanosType/TimestampLTZNanosTypeand that the result type preserves the precision.timestamp-ntz-nanos.sql/timestamp-ltz-nanos.sql:MAX/MINandGROUP BYqueries, with regeneratedresults/andanalyzer-results/golden files.Mixed-precision inputs (e.g.
greatest(ts_p7, ts_p9), aUNIONof two precisions, or comparing an aggregate against a micros literal) route throughfindWiderDateTimeType, which has no nanosecond arm yet, so they are intentionally out of scope here and tracked by SPARK-57454; every column in these tests is strictly same-precision.Why are the changes needed?
MIN/MAXover nanosecond timestamps are a core, expected operation, but the nanosecond preview had no aggregate test coverage.Does this PR introduce any user-facing change?
No. Tests only;
How was this patch tested?
New tests in this PR.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor