Adding both an xarray.dataset and a dict-of-numpy-arrays for the particle-data by erikvansebille · Pull Request #2097 · Parcels-code/Parcels

erikvansebille · 2025-07-21T07:08:14Z

This PR builds on #2094, following @VeckoTheGecko's suggestion at Parcels-code/parcels-benchmarks#1 (comment) to keep the ParticleData in both an xarray.DataSet structure (ParticleSet._ds) and a dict-of-numpy-arrays (ParticleSet._data).

This new branch has the same performance as #2094 (see Parcels-code/parcels-benchmarks#1 (comment)), but advantage is that users can also access the data as a xarray.DataSet.

This will be useful e.g. in the __repr__ (to be implemented) and the new implementation of ParticleFile

Chose the correct base branch (particledata_as_dict)
Added tests

@VeckoTheGecko

Following @VeckoTheGecko's suggestion at Parcels-code/parcels-benchmarks#1 (comment)

VeckoTheGecko · 2025-07-21T11:42:54Z

I think a difficulty with this is that once a new array is created (e..g, from the deletion of particles), then there will be drift between these two data structures. I don't think this is currently working as intended.

Here's a test:

def test_pset_remove_indices(fieldset):
    npart = 10
    lon_start = np.linspace(0, 1, npart)
    lat_start = np.linspace(1, 0, npart)
    pset = ParticleSet(fieldset, lon=np.linspace(0, 1, npart), lat=np.linspace(1, 0, npart))
    assert len(pset._ds.lon) == len(pset._data["lon"]) == npart

    pset.remove_indices([0])
    assert len(pset._ds.lon) == len(pset._data["lon"]) == npart - 1

________________________________________________________________ test_pset_remove_indices _________________________________________________________________

fieldset = <parcels.fieldset.FieldSet object at 0x3249d7f20>

    def test_pset_remove_indices(fieldset):
        npart = 10
        lon_start = np.linspace(0, 1, npart)
        lat_start = np.linspace(1, 0, npart)
        pset = ParticleSet(fieldset, lon=np.linspace(0, 1, npart), lat=np.linspace(1, 0, npart))
        assert len(pset._ds.lon) == len(pset._data["lon"]) == npart
    
        pset.remove_indices([0])
>       assert len(pset._ds.lon) == len(pset._data["lon"]) == npart - 1
E       AssertionError: assert 10 == 9
E        +  where 10 = len(<xarray.DataArray 'lon' (trajectory: 10)> Size: 40B\narray([0.        , 0.11111111, 0.22222222, 0.33333334, 0.44444445,..., 0.8888889 , 1.        ],\n      dtype=float32)\nCoordinates:\n  * trajectory  (trajectory) int64 80B 0 1 2 3 4 5 6 7 8 9)
E        +    where <xarray.DataArray 'lon' (trajectory: 10)> Size: 40B\narray([0.        , 0.11111111, 0.22222222, 0.33333334, 0.44444445,..., 0.8888889 , 1.        ],\n      dtype=float32)\nCoordinates:\n  * trajectory  (trajectory) int64 80B 0 1 2 3 4 5 6 7 8 9 = <xarray.Dataset> Size: 640B\nDimensions:         (trajectory: 10, ngrid: 1)\nCoordinates:\n  * trajectory      (trajector... datetime64[ns] 80B NaT NaT NaT ... NaT NaT NaT\nAttributes:\n    ngrid:    1\n    ptype:    ParticleType(pclass=Particle).lon
E        +      where <xarray.Dataset> Size: 640B\nDimensions:         (trajectory: 10, ngrid: 1)\nCoordinates:\n  * trajectory      (trajector... datetime64[ns] 80B NaT NaT NaT ... NaT NaT NaT\nAttributes:\n    ngrid:    1\n    ptype:    ParticleType(pclass=Particle) = <[KeyError('pclass') raised in repr()] ParticleSet object at 0x3249d4080>._ds
E        +  and   9 = len(array([0.11111111, 0.22222222, 0.33333334, 0.44444445, 0.5555556 ,\n       0.6666667 , 0.7777778 , 0.8888889 , 1.        ], dtype=float32))

tests/v4/test_particleset_execute.py:142: AssertionError

erikvansebille · 2025-07-21T11:52:18Z

Ah, good catch! I hadn't realised that deleting a particle would create a new dataset, indeed. And so would adding two ParticleSets.

We could change the code to all update the dict-of-numpy-arrays whenever we change the dataset; or is that too prone to errors?

@VeckoTheGecko

Adding @VeckoTheGecko's failing test showing that the dict-of-numpys does not track the xarray dataset anymore after a deletion

erikvansebille · 2025-07-21T12:12:29Z

I just added the test above to the unit tests suite, so that we got failing CI and don't accidentally merge this PR

erikvansebille · 2025-07-22T12:55:13Z

Clsogn this PR, as the combination of a dict-of-numpy-arrays and an xarray Dataset does not seem robust enough (see #2097 (comment))

Adding both a datstruct and a dict for the particledata

86e0b54

Following @VeckoTheGecko's suggestion at Parcels-code/parcels-benchmarks#1 (comment)

github-project-automation bot added this to Parcels development Jul 21, 2025

github-project-automation bot moved this to Backlog in Parcels development Jul 21, 2025

Adding one extra assert to check dataset and dict equivalence

486765d

erikvansebille mentioned this pull request Jul 21, 2025

Basic benchmark script for a very simple kernel loop Parcels-code/parcels-benchmarks#1

Draft

Fixing bug in attrgetr for dataset

80d9749

Adding failing deletion test

e70b9f9

Adding @VeckoTheGecko's failing test showing that the dict-of-numpys does not track the xarray dataset anymore after a deletion

erikvansebille closed this Jul 22, 2025

github-project-automation bot moved this from Backlog to Done in Parcels development Jul 22, 2025

erikvansebille deleted the particledata_as_datastruct_and_dict branch January 21, 2026 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding both an xarray.dataset and a dict-of-numpy-arrays for the particle-data#2097

Adding both an xarray.dataset and a dict-of-numpy-arrays for the particle-data#2097
erikvansebille wants to merge 4 commits intoparticledata_as_dictfrom
particledata_as_datastruct_and_dict

erikvansebille commented Jul 21, 2025

Uh oh!

VeckoTheGecko commented Jul 21, 2025

Uh oh!

erikvansebille commented Jul 21, 2025

Uh oh!

erikvansebille commented Jul 21, 2025

Uh oh!

erikvansebille commented Jul 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erikvansebille commented Jul 21, 2025

Uh oh!

VeckoTheGecko commented Jul 21, 2025

Uh oh!

erikvansebille commented Jul 21, 2025

Uh oh!

erikvansebille commented Jul 21, 2025

Uh oh!

erikvansebille commented Jul 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants