utilization of parquet for intermediate storage

There are several steps involved in migration to parquet for intermediate processing of slocum glider data.

 * [x] ensure `dbdreader` reproduces similar results to slocum binaries (https://github.com/smerckel/dbdreader/issues/18)
 * [ ] replacement of `convertDbds.sh` with `dbdreader`/`parquet`
 * [x] desired storage pattern for `parquet` (just using tables for now)
 
Is there a particular storage pattern or design desired for the `parquet` data structures?
REF: https://arrow.apache.org/docs/python/parquet.html#parquet-file-writing-options

 * Enforce version 2.4? 2.6?
 * Ensure the structure is queryable by time for speedy subsetting? 
 * If enforcing 2.6, timestamp units become less an issue
 * Partitioning? Glider ID, Deployment ID, Process method (rt vs. delayed), QC'd (Level 0, 1, ...)
    * local file system or plug into directly to duckdb (https://duckdb.org/docs/guides/python/filesystems.html) or allow both

Have to nail down a potential dbdreader issue first.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utilization of parquet for intermediate storage #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

utilization of parquet for intermediate storage #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions