Skip to content

[SPARK-57103][FOLLOWUP][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types#56592

Open
stevomitric wants to merge 3 commits into
apache:masterfrom
stevomitric:stevomitric/min-max
Open

[SPARK-57103][FOLLOWUP][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types#56592
stevomitric wants to merge 3 commits into
apache:masterfrom
stevomitric:stevomitric/min-max

Conversation

@stevomitric

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds end-to-end, Catalyst-level, and golden-file test coverage for the MIN and MAX aggregates (and the closely related min_by / max_by / greatest / least) over the nanosecond-precision timestamp types TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) (p ∈ [7, 9]), part of the nanosecond timestamp preview (SPARK-56822).

No production code is changed. Max/Min are type-agnostic DeclarativeAggregates whose only type gate is TypeUtils.checkForOrderingExpr, which already returns TypeCheckSuccess for the nanosecond types because they are orderable AtomicTypes Comparable were wired in SPARK-57103, the physical value is UnsafeRow-mutable (SPARK-56981), and CodeGenerator.genComp already has a dedicated case _: AnyTimestampNanoType => compareTo arm. Since dataType = child.dataType, the result preserves the input precision. This mirrors the TimeType precedent (SPARK-52626 / SPARK-52660), where becoming orderable made MIN/MAX work and the remaining work was test coverage.

Added tests:

  • TimestampNanosFunctionsSuiteBase (runs under ANSI on and off): precision/type preservation; sub-microsecond ordering on both the codegen and interpreted paths (two values sharing the same epochMicros but differing within the microsecond); all-NULL and empty input → NULL; GROUP BY a nanosecond key whose sub-microsecond remainder must keep distinct keys from collapsing; min_by/max_by/greatest/least over same-precision columns; and equivalence with the microsecond path when the sub-microsecond digits are zero.
  • ExpressionTypeCheckingSuite: asserts Max/Min checkInputDataTypes() succeeds for TimestampNTZNanosType / TimestampLTZNanosType and that the result type preserves the precision.
  • timestamp-ntz-nanos.sql / timestamp-ltz-nanos.sql: MAX/MIN and GROUP BY queries, with regenerated results/ and analyzer-results/ golden files.

Mixed-precision inputs (e.g. greatest(ts_p7, ts_p9), a UNION of two precisions, or comparing an aggregate against a micros literal) route through findWiderDateTimeType, which has no nanosecond arm yet, so they are intentionally out of scope here and tracked by SPARK-57454; every column in these tests is strictly same-precision.

Why are the changes needed?

MIN/MAX over nanosecond timestamps are a core, expected operation, but the nanosecond preview had no aggregate test coverage.

Does this PR introduce any user-facing change?

No. Tests only;

How was this patch tested?

New tests in this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 1 non-blocking, 1 nit.
Clean, well-targeted tests-only PR. I verified the production claims: Max/Min gate on orderability only and dataType = child.dataType, and findWiderDateTimeType genuinely has no nanos arm, so mixed-precision is correctly out of scope (SPARK-57454). Test data is meaningful (sub-microsecond-distinct values would catch micros-truncation; the group-by test asserts no-collapse). Both findings are about test placement / a stale suite doc, neither blocks merge.

Design / architecture (1)

  • TimestampNanosFunctionsSuiteBase.scala:243: MIN/MAX aggregate tests live in a functions-scoped suite, whereas the cited TimeType analogue (SPARK-52626/52660) placed its aggregate tests in DataFrameAggregateSuite — see inline

Nits: 1 minor item (stale class Scaladoc, folded into the same inline comment).

}
}

// ===== MIN / MAX aggregates over nanosecond-precision timestamps (SPARK-56822) =====

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are MIN/MAX aggregate tests, but they land in a suite whose class Scaladoc (lines 28-33) scopes it to "the hour, minute and second functions". The TimeType precedent this PR explicitly mirrors put its analogous aggregate tests in DataFrameAggregateSuite (SPARK-52626 group-by at :3713, SPARK-52660 codegen-split max/min at :3724) — the dedicated aggregate suite. Consider co-locating these there (or in a dedicated nanos-aggregate suite) so aggregate-over-temporal-type coverage stays discoverable in one place.

There's a legitimate pull toward keeping them here — this suite gives the ANSI on/off matrix, the nanos preview-flag session, and the fixed session timezone for free — so this is a judgment call, not a defect.

Minor, regardless of where they end up: the class Scaladoc is now stale. It still says the suite tests only hour/minute/second and that "Each test exercises both the SQL path (selectExpr) and the Scala Column API (functions.hour / minute / second)" — no longer true now that aggregate tests are present (several use only selectExpr). Please update it (or move the tests).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept them here for the free ANSI on/off matrix, preview-flag session, and fixed timezone. Updated the class Scaladoc.

…N/MAX aggregate tests

Addresses review feedback on apache#56592: the class Scaladoc for
TimestampNanosFunctionsSuiteBase still described only the hour/minute/second
functions and claimed every test exercises both the SQL and Column API paths,
no longer accurate now that the MIN/MAX aggregate tests are present. Rewrote it
to cover the datetime functions and the MIN/MAX aggregates and to note the ANSI
on/off subclasses. Tests-only; no production change.

Co-authored-by: Isaac
@stevomitric

Copy link
Copy Markdown
Contributor Author

cc @MaxGekk PTAL.

@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member

+1, LGTM. Merging to master/4.x.
Thank you, @stevomitric.

@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member

@stevomitric SPARK-57502 is wrong. Please, set correct one.

@stevomitric stevomitric changed the title [SPARK-57502][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types [SPARK-57103][FOLLOWUP][SQL][TESTS] Add test coverage for MIN/MAX over nanosecond-precision timestamp types Jun 18, 2026
…AX nanos tests

The MIN/MAX-over-nanos test names and golden-file comments were tagged with a
placeholder ticket, SPARK-57502, which is in fact an unrelated INFRA issue
("Fix npm vulnerabilities in ui-test and dev"). Rename the references to
SPARK-57103 ("Add ordering, compare, and hash for nanosecond timestamp types"),
the sub-task of SPARK-56822 that enabled MIN/MAX over the nanosecond types.
Tests-only; no production or behavior change.

Co-authored-by: Isaac
@stevomitric

Copy link
Copy Markdown
Contributor Author

@stevomitric SPARK-57502 is wrong. Please, set correct one.

You are right. I set the SPARK-57103 which introduced comparisons which enabled the min/max expressions.

@MaxGekk

MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member

@stevomitric Could you resolve conflicts, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants