Skip to content

Consistent time handling #2586

@VeckoTheGecko

Description

@VeckoTheGecko

At the outset of Parcels v4 development, it was first thought that it would be possible to handle different datetime's on the fly - allowing users to use datetime objects (e.g., cftime, np.datetime) within their kernels where appropriate. With this in mind, the TimeInterval class was created to help determine bounds for simulations etc..

This approach - however - was discovered to have poor interoperability with numpy among other shortcomings.

In #2405 we reverted back to working with float datetimes. Upon further exploring #2583 , I realised that we need better support for how datetimes are handled in Parcels - how those are propogated into kernels - and how those are reflected in the ParticleFile output.


Flow data

NetCDF and Zarr data formats store datatimes as integers on disk (netcdf Time Coordinate docs) . These datetimes (which primarily are used for the coordinates in datasets, as opposed to the data variables) are optionally decoded from numeric to datetime objects when being ingested by Xarray into xr.Dataset objects (Xarray docs). These datetime objects are np.datetime64 or cftime objects.

Kernels

Users working with time information will need to calculate timing themselves given the time origin, calendar, and numeric time of the simulation.

Trajectory output

When serialising out trajectory data, there are available date types (i.e., pyarrow.date64 and pyarrow.timestamp) however no support for cftime objects. Pyarrow does however allow for the storing of dictionary metadata on the column level, allowing us to store CF-time metadata alongside the column (in a similar fashion to what is done for Zarr and NetCDF). This storing of CF-time metadata in the Parquet metadata is something that users would need to be aware of during analysis to fully contextualise their data (which they can do by using cftime alongside their favourite dataframe/tabular data processing library).

I see two options:

  1. opt for float numeric output for time variables (irrespective of calendar), attaching appropriate metadata according to CF-time
  2. opt for pyarrow's datetime objects where appropriate (for select CFtime calendars) and opt for float numeric output for other cases

I think (1) is preffered for its simpler implementation and conistent behaviour across calendars.


Changes:

  • Implement (1) from above
  • See if we can simplify our internals (e.g., by enforcing that time dimensions in the datasets are numeric and aligned to the same time origin)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions