Skip to content

[SPARK-57527][SQL] Add the unix_nanos function returning nanoseconds since the epoch for timestamps#56602

Open
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:unix_nanos
Open

[SPARK-57527][SQL] Add the unix_nanos function returning nanoseconds since the epoch for timestamps#56602
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:unix_nanos

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 18, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR adds a new built-in function unix_nanos(expr) that returns the number of nanoseconds since 1970-01-01 00:00:00 UTC for a nanosecond-precision timestamp.

Concretely:

  • Adds a UnixNanos expression in datetimeExpressions.scala that accepts only the nanosecond-precision timestamp types TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) (p in [7, 9], i.e. AnyTimestampNanoType) and returns a lossless DECIMAL(21, 0).
  • Computes epochMicros * 1000 + nanosWithinMicro via BigInteger in both the interpreted (eval) and codegen (doGenCode) paths. A BIGINT return type was rejected because epochMicros * 1000 overflows 64 bits across the full [0001..9999] calendar range; DECIMAL(21, 0) is wide enough for every value (~2.5e20 max) and stays lossless.
  • Registers unix_nanos in FunctionRegistry and adds the Scala functions.unix_nanos.
  • Adds catalyst unit tests (interpreted + codegen), Scala/SQL end-to-end tests, and SQL golden-file coverage for TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p).

The microsecond TimestampType input and the PySpark / Spark Connect / R surfaces are out of scope here and tracked as follow-ups; unix_nanos is recorded in the PySpark function-parity allowlist in the meantime.

Why are the changes needed?

Part of the SPARK-56822 umbrella (timestamps with nanosecond precision). Spark has unix_seconds / unix_millis / unix_micros but no nanosecond counterpart, which is the natural inverse of nanosecond timestamp construction.

Does this PR introduce any user-facing change?

Yes. A new unix_nanos(timeExp) function is available in SQL and the Scala API. It accepts TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) and returns DECIMAL(21, 0). This is a change only within the unreleased nanosecond-timestamp preview.

Example:

SELECT unix_nanos(TIMESTAMP_NTZ '2008-12-25 15:30:00.123456789');
-- 1230219000123456789

How was this patch tested?

  • build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'
  • build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'
  • build/sbt 'sql/testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite org.apache.spark.sql.ExpressionsSchemaSuite'
  • SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'
  • ./dev/scalastyle

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

…s since the epoch for timestamps

### What changes were proposed in this pull request?
This PR adds a new built-in function `unix_nanos(expr)` that returns the number of nanoseconds since `1970-01-01 00:00:00 UTC` for a nanosecond-precision timestamp.

Concretely:
- Adds a `UnixNanos` expression in `datetimeExpressions.scala` that accepts only the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` (`p in [7, 9]`, i.e. `AnyTimestampNanoType`) and returns a lossless `DECIMAL(21, 0)`.
- Computes `epochMicros * 1000 + nanosWithinMicro` via `BigInteger` (the product overflows a 64-bit `BIGINT` across the full `[0001..9999]` calendar range) in both the interpreted (`eval`) and codegen (`doGenCode`) paths.
- Registers `unix_nanos` in `FunctionRegistry` and adds the Scala `functions.unix_nanos`.
- Adds catalyst unit tests (interpreted + codegen), Scala/SQL end-to-end tests, and SQL golden-file coverage for `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)`.

Micro `TimestampType` input and the PySpark / Spark Connect / R surfaces are out of scope and tracked as follow-ups; `unix_nanos` is recorded in the PySpark function-parity allowlist in the meantime.

### Why are the changes needed?
Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) umbrella (timestamps with nanosecond precision). Spark has `unix_seconds` / `unix_millis` / `unix_micros` but no nanosecond counterpart, which is the natural inverse of nanosecond timestamp construction.

### Does this PR introduce _any_ user-facing change?
Yes. A new `unix_nanos(timeExp)` function is available in SQL and the Scala API. It accepts `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` and returns `DECIMAL(21, 0)`. This is within the unreleased nanosecond-timestamp preview.

Example:

```sql
SELECT unix_nanos(TIMESTAMP_NTZ '2008-12-25 15:30:00.123456789');
-- 1230219000123456789
```

### How was this patch tested?
- `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'`
- `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'`
- `build/sbt 'sql/testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite org.apache.spark.sql.ExpressionsSchemaSuite'`
- `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'`
- `./dev/scalastyle`

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant