There are several steps involved in migration to parquet for intermediate processing of slocum glider data.
Is there a particular storage pattern or design desired for the parquet data structures?
REF: https://arrow.apache.org/docs/python/parquet.html#parquet-file-writing-options
- Enforce version 2.4? 2.6?
- Ensure the structure is queryable by time for speedy subsetting?
- If enforcing 2.6, timestamp units become less an issue
- Partitioning? Glider ID, Deployment ID, Process method (rt vs. delayed), QC'd (Level 0, 1, ...)
Have to nail down a potential dbdreader issue first.
There are several steps involved in migration to parquet for intermediate processing of slocum glider data.
dbdreaderreproduces similar results to slocum binaries (Feature: optionally include first record in data payload smerckel/dbdreader#18)convertDbds.shwithdbdreader/parquetparquet(just using tables for now)Is there a particular storage pattern or design desired for the
parquetdata structures?REF: https://arrow.apache.org/docs/python/parquet.html#parquet-file-writing-options
Have to nail down a potential dbdreader issue first.