Skip to content

Base implementation of Parquet file writing#2583

Merged
VeckoTheGecko merged 35 commits intoParcels-code:mainfrom
VeckoTheGecko:push-zlxyoyvlpoqm
Apr 24, 2026
Merged

Base implementation of Parquet file writing#2583
VeckoTheGecko merged 35 commits intoParcels-code:mainfrom
VeckoTheGecko:push-zlxyoyvlpoqm

Conversation

@VeckoTheGecko
Copy link
Copy Markdown
Contributor

@VeckoTheGecko VeckoTheGecko commented Apr 20, 2026

Description

This PR introduces Parquet file writing to Parcels.

I still need to work on:

  • How to work with cftime output in the Parquet (how does this work with our internal model of time in Parcels? How should it work?)
  • Reviewing the test_particlefile.py file - are there tests that are no longer needed? What would be the best testing approach here?
    • Will leave for a future PR
  • Update documentation
    • Will leave for a future PR

Posting as draft for initial feedback

Checklist

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.
    • Describe how you used it (e.g., by pasting your prompt): Just to help with learning the PyArrow API (i.e., used it to create an example script - which I then used as an entry to exploring the docs on pyarrow)

@VeckoTheGecko VeckoTheGecko changed the title Add parquet file writing Base implementation of Parquet file writing Apr 20, 2026
@VeckoTheGecko
Copy link
Copy Markdown
Contributor Author

@erikvansebille let's table some of these questions for our meeting tomorrow (mainly around datetime serialization in the Parquet file)

@VeckoTheGecko VeckoTheGecko marked this pull request as ready for review April 23, 2026 12:48
@VeckoTheGecko
Copy link
Copy Markdown
Contributor Author

@erikvansebille I moved read_particlefile to the global scope (i.e.,parcels.read_particlefile()) as well as added a docstring.

@VeckoTheGecko
Copy link
Copy Markdown
Contributor Author

(we'll likely get doc failures on this PR - which is to be expected since they haven't been updated)

Copy link
Copy Markdown
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work. And as I'm going through updating all the documentation, it seems to work smoothly too!

A few small comments below

Comment thread src/parcels/_core/particlefile.py
Comment thread src/parcels/_core/particlefile.py
Comment thread src/parcels/_core/particlefile.py
Comment thread src/parcels/_core/particlefile.py Outdated
Comment thread src/parcels/_core/particlefile.py Outdated
Comment thread src/parcels/_core/particlefile.py Outdated
Comment thread src/parcels/_core/particlefile.py Outdated
Comment thread tests-v3/test_advection.py
Comment thread tests/test_particlefile.py Outdated
Comment thread tests/test_particlefile.py Outdated
@@ -187,7 +154,7 @@ def test_variable_written_once():
@pytest.mark.skip(reason="Pending ParticleFile refactor; see issue #2386")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test work again? If not out of the box, perhaps we should discuss what we actually expect from output of a looped pset.execute...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was introduced in 4b28ddb . I'm ok cutting tests that are no longer valuable (/tested specific implementation bugs from the old codebase). I don't really understand this test - do you think its still valuable?

perhaps we should discuss what we actually expect from output of a looped pset.execute...

Agreed. Should we open a tracking issue? Whatever we decide on that front we can add separate tests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is a valuable test (as it tests what happens in the particle file when pset.execute() is run in a for-loop). But perhaps better for a later PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps rename from test_pset_repeated_release_delayed_adding_deleting - since I find that quite confusing

Comment thread src/parcels/_core/particle.py
Comment thread src/parcels/_core/particlefile.py
@VeckoTheGecko VeckoTheGecko merged commit 439ea29 into Parcels-code:main Apr 24, 2026
14 of 15 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Parcels development Apr 24, 2026
@VeckoTheGecko VeckoTheGecko deleted the push-zlxyoyvlpoqm branch April 24, 2026 13:06
erikvansebille added a commit that referenced this pull request May 4, 2026
* Update .gitignore

* Disable zarr writing

* Fix parquet writing

* Remove test_vriable_write_double

* Fix all "uses_old_zarr" tests

* Remove test_variable_write_double

Covered by test_write_dtypes_pfile

* Fixing tests

* More test fixing

* Fix last tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove old fixtures

* Fix pre-commit errors

* Cleanup

This mark was only introduced during refactoring

* Add pandas and pyarrow as explicit dependencies

* Add assert_cftime_like_particlefile


Remove temporary test_cftime.py file

* MAINT: Cleanup create_particle_data

This function is now independent of the time_interval as time is now stored as float

* Add cftime metadata serialization

* Add np.timedelta64 support

* Fix assert_cftime_like_particlefile

Remove nested key - save on root instead

* Move imports

* Fixing tests

* Fix test_time_is_age test

* Refactor assert_cftime_like_particlefile

* Self-review feedback

* Fix test_particle_schema

* Make read_particlefile public

* Add docstring to read_particlefile

* Updarting Argo tutorial to use parquet

* Updating tutorial_nemo to use parquet output

* Update tutorial_diffusion to use parquet

* Update tutorial_output to use parquet

* Review feedback

* Update migration guide

* Remove obs_written

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update migration guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert from extra_metadata to metadata

* Fix test_pfile_array_remove_particles

* Fix numpy warning

#2583 (comment)

* Using polars in tutorial_output

* Update explanation_concepts.md

* update tutorial_croco to use parquet

* using polars for read_particlefile

* Update tutorial_quickstart to use parquet

* Update tutorial_output to use polars

* Update v4-migration.md

* Fix using polars in tutorial_output

* Fixing read_parquet to use polars

* Update tutorial_delaystart to use parquet

* Update tutorial_dt_integrators to use parquet

* Fixing parcels.read_particlefile for timedelta time

* Update tutorial_interaction to use parquet

* Update tutorial_manipulating_field_data to use parquet

* Update tutorial_mitgcm to use parquet

* Updsate tutorial_nestedgrids to use parquet

* Update tutorial_sampling to use parquet (and remove to_write="once" section)

* Removing old attributes from particlefile.repr

* Using more intuitive variable names for polars subsetting

* Fixing repr of particleset

* Update tutorial_Argofloats.ipynb

* Update tutorial_quickstart to use parquet

* Update tutorial_croco_3D.ipynb

* Using polars in tutorial_diffusion

* Use polars in tutorial_nemo

* Use parquet in explanation_kernelloop

* Update policies.md

* Fixing unit tests to use polars in parcels.read_particlefile

* Doc fix: docs/user_guide/examples/tutorial_dt_integrators.ipynb

* Fixing plotting in tutorial_nestedgrids

* Fixing colour range of nemo 3D plot

Now that each trajectory is plotted by itself, colour range has to be set for all

* Using start location for last plot in quickstart

* Fixing assert statement in tutorial_quickstart

* Adding comment on cftime

* Update tutorial_quickstart.md

* Update docs/getting_started/tutorial_quickstart.md

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>

---------

Co-authored-by: Vecko <36369090+VeckoTheGecko@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Consistent time handling

2 participants