Base implementation of Parquet file writing#2583
Base implementation of Parquet file writing#2583VeckoTheGecko merged 35 commits intoParcels-code:mainfrom
Conversation
1d26f4a to
91ce496
Compare
|
@erikvansebille let's table some of these questions for our meeting tomorrow (mainly around datetime serialization in the Parquet file) |
Covered by test_write_dtypes_pfile
for more information, see https://pre-commit.ci
e06a618 to
60ceef6
Compare
Remove temporary test_cftime.py file
This function is now independent of the time_interval as time is now stored as float
Remove nested key - save on root instead
|
@erikvansebille I moved |
|
(we'll likely get doc failures on this PR - which is to be expected since they haven't been updated) |
erikvansebille
left a comment
There was a problem hiding this comment.
Very nice work. And as I'm going through updating all the documentation, it seems to work smoothly too!
A few small comments below
| @@ -187,7 +154,7 @@ def test_variable_written_once(): | |||
| @pytest.mark.skip(reason="Pending ParticleFile refactor; see issue #2386") | |||
There was a problem hiding this comment.
Does this test work again? If not out of the box, perhaps we should discuss what we actually expect from output of a looped pset.execute...
There was a problem hiding this comment.
This test was introduced in 4b28ddb . I'm ok cutting tests that are no longer valuable (/tested specific implementation bugs from the old codebase). I don't really understand this test - do you think its still valuable?
perhaps we should discuss what we actually expect from output of a looped pset.execute...
Agreed. Should we open a tracking issue? Whatever we decide on that front we can add separate tests
There was a problem hiding this comment.
Yes, I think it is a valuable test (as it tests what happens in the particle file when pset.execute() is run in a for-loop). But perhaps better for a later PR
There was a problem hiding this comment.
And perhaps rename from test_pset_repeated_release_delayed_adding_deleting - since I find that quite confusing
61430c9 to
6695187
Compare
a5bdd31 to
3c52647
Compare
* Update .gitignore * Disable zarr writing * Fix parquet writing * Remove test_vriable_write_double * Fix all "uses_old_zarr" tests * Remove test_variable_write_double Covered by test_write_dtypes_pfile * Fixing tests * More test fixing * Fix last tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove old fixtures * Fix pre-commit errors * Cleanup This mark was only introduced during refactoring * Add pandas and pyarrow as explicit dependencies * Add assert_cftime_like_particlefile Remove temporary test_cftime.py file * MAINT: Cleanup create_particle_data This function is now independent of the time_interval as time is now stored as float * Add cftime metadata serialization * Add np.timedelta64 support * Fix assert_cftime_like_particlefile Remove nested key - save on root instead * Move imports * Fixing tests * Fix test_time_is_age test * Refactor assert_cftime_like_particlefile * Self-review feedback * Fix test_particle_schema * Make read_particlefile public * Add docstring to read_particlefile * Updarting Argo tutorial to use parquet * Updating tutorial_nemo to use parquet output * Update tutorial_diffusion to use parquet * Update tutorial_output to use parquet * Review feedback * Update migration guide * Remove obs_written * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update migration guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert from extra_metadata to metadata * Fix test_pfile_array_remove_particles * Fix numpy warning #2583 (comment) * Using polars in tutorial_output * Update explanation_concepts.md * update tutorial_croco to use parquet * using polars for read_particlefile * Update tutorial_quickstart to use parquet * Update tutorial_output to use polars * Update v4-migration.md * Fix using polars in tutorial_output * Fixing read_parquet to use polars * Update tutorial_delaystart to use parquet * Update tutorial_dt_integrators to use parquet * Fixing parcels.read_particlefile for timedelta time * Update tutorial_interaction to use parquet * Update tutorial_manipulating_field_data to use parquet * Update tutorial_mitgcm to use parquet * Updsate tutorial_nestedgrids to use parquet * Update tutorial_sampling to use parquet (and remove to_write="once" section) * Removing old attributes from particlefile.repr * Using more intuitive variable names for polars subsetting * Fixing repr of particleset * Update tutorial_Argofloats.ipynb * Update tutorial_quickstart to use parquet * Update tutorial_croco_3D.ipynb * Using polars in tutorial_diffusion * Use polars in tutorial_nemo * Use parquet in explanation_kernelloop * Update policies.md * Fixing unit tests to use polars in parcels.read_particlefile * Doc fix: docs/user_guide/examples/tutorial_dt_integrators.ipynb * Fixing plotting in tutorial_nestedgrids * Fixing colour range of nemo 3D plot Now that each trajectory is plotted by itself, colour range has to be set for all * Using start location for last plot in quickstart * Fixing assert statement in tutorial_quickstart * Adding comment on cftime * Update tutorial_quickstart.md * Update docs/getting_started/tutorial_quickstart.md Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com> --------- Co-authored-by: Vecko <36369090+VeckoTheGecko@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
This PR introduces Parquet file writing to Parcels.
I still need to work on:
parcels.read_particlefile(...)to streamline ingestion into cftime objects.test_particlefile.pyfile - are there tests that are no longer needed? What would be the best testing approach here?Posting as draft for initial feedback
Checklist
mainfor normal development,v3-supportfor v3 support)AI Disclosure