At the outset of Parcels v4 development, it was first thought that it would be possible to handle different datetime's on the fly - allowing users to use datetime objects (e.g., cftime, np.datetime) within their kernels where appropriate. With this in mind, the TimeInterval class was created to help determine bounds for simulations etc..
This approach - however - was discovered to have poor interoperability with numpy among other shortcomings.
In #2405 we reverted back to working with float datetimes. Upon further exploring #2583 , I realised that we need better support for how datetimes are handled in Parcels - how those are propogated into kernels - and how those are reflected in the ParticleFile output.
Flow data
NetCDF and Zarr data formats store datatimes as integers on disk (netcdf Time Coordinate docs) . These datetimes (which primarily are used for the coordinates in datasets, as opposed to the data variables) are optionally decoded from numeric to datetime objects when being ingested by Xarray into xr.Dataset objects (Xarray docs). These datetime objects are np.datetime64 or cftime objects.
Kernels
Users working with time information will need to calculate timing themselves given the time origin, calendar, and numeric time of the simulation.
Trajectory output
When serialising out trajectory data, there are available date types (i.e., pyarrow.date64 and pyarrow.timestamp) however no support for cftime objects. Pyarrow does however allow for the storing of dictionary metadata on the column level, allowing us to store CF-time metadata alongside the column (in a similar fashion to what is done for Zarr and NetCDF). This storing of CF-time metadata in the Parquet metadata is something that users would need to be aware of during analysis to fully contextualise their data (which they can do by using cftime alongside their favourite dataframe/tabular data processing library).
I see two options:
- opt for float numeric output for time variables (irrespective of calendar), attaching appropriate metadata according to CF-time
- opt for
pyarrow's datetime objects where appropriate (for select CFtime calendars) and opt for float numeric output for other cases
I think (1) is preffered for its simpler implementation and conistent behaviour across calendars.
Changes:
- Implement (1) from above
- See if we can simplify our internals (e.g., by enforcing that time dimensions in the datasets are numeric and aligned to the same time origin)
At the outset of Parcels v4 development, it was first thought that it would be possible to handle different datetime's on the fly - allowing users to use datetime objects (e.g., cftime, np.datetime) within their kernels where appropriate. With this in mind, the
TimeIntervalclass was created to help determine bounds for simulations etc..This approach - however - was discovered to have poor interoperability with numpy among other shortcomings.
In #2405 we reverted back to working with float datetimes. Upon further exploring #2583 , I realised that we need better support for how datetimes are handled in Parcels - how those are propogated into kernels - and how those are reflected in the ParticleFile output.
Flow data
NetCDF and Zarr data formats store datatimes as integers on disk (netcdf Time Coordinate docs) . These datetimes (which primarily are used for the coordinates in datasets, as opposed to the data variables) are optionally decoded from numeric to datetime objects when being ingested by Xarray into
xr.Datasetobjects (Xarray docs). These datetime objects arenp.datetime64orcftimeobjects.Kernels
Users working with time information will need to calculate timing themselves given the time origin, calendar, and numeric time of the simulation.
Trajectory output
When serialising out trajectory data, there are available date types (i.e.,
pyarrow.date64andpyarrow.timestamp) however no support for cftime objects. Pyarrow does however allow for the storing of dictionary metadata on the column level, allowing us to store CF-time metadata alongside the column (in a similar fashion to what is done for Zarr and NetCDF). This storing of CF-time metadata in the Parquet metadata is something that users would need to be aware of during analysis to fully contextualise their data (which they can do by using cftime alongside their favourite dataframe/tabular data processing library).I see two options:
pyarrow's datetime objects where appropriate (for select CFtime calendars) and opt for float numeric output for other casesI think (1) is preffered for its simpler implementation and conistent behaviour across calendars.
Changes: