Feature/zarr#675
Conversation
|
Tests are failing because zarr version 3 requires Python version 3.11 onward |
|
Hi @Oisin-M, thank you very much for this development. I had a quick look at the code and it seems to me that the "zarr" source is not really needed. |
|
For datasets on the local filesystem, yes. But the source can also access URLs and other locations supported through xarray. The url source doesn't handle zarrs properly at the moment because they are directories and can't be cached easily. We have to look into the dataset first to know which files are on the server. And I'd argue that most of the time we wouldn't want to eagerly download and cache the dataset anyway, because the on-demand, chunk-based access that xarray supports for remote zarrs is really useful. We can create an exception in the url source, but other sources then need that too. The question "is this location a zarr dataset?" is not easy to answer by the zarr reader's detection and will depend on the source. Starting with In some aspects, zarr is an API (see, e.g., kerchunk and virtualizarr, as far as I understand these) and so matches the source paradigm of ek-data as well as the reader. |
@chpolste, thank you for the explanation. I agree, this should be a source, just like e.g. "opendap" . |
sandorkertesz
left a comment
There was a problem hiding this comment.
Thank you for adding this new source! Once it is in sync with the latest develop it can be merged.
|
@sandorkertesz synced with develop now. Just to flag
If you think it's better, can remove from the |
|
@Oisin-M unfortunately we cannot afford disabling all the tests due to this dependency. So I suggest you remove the zarr dependency from |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #675 +/- ##
===========================================
- Coverage 90.98% 90.92% -0.06%
===========================================
Files 167 168 +1
Lines 12663 12679 +16
Branches 613 614 +1
===========================================
+ Hits 11521 11529 +8
- Misses 960 967 +7
- Partials 182 183 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @sandorkertesz, thanks for taking a look. I have now made zarr optional and also made the zarr test automatically skip if it can't find the zarr package, so tests are all passing |
|
@Oisin-M , I have a question about the source name. What was the reason of choosing "xarray_zarr" and not simply "zarr"? E.g. the "opendap" source is also using xarray to open the data but it does not have "xarray" in the name. |
|
Hi @sandorkertesz, this was to leave room for having a reader based on the zarr API directly |
|
@Oisin-M, for simplicity, I suggest that the name should be just "zarr". If you want to add a zarr API based reader in the future it could be controlled by adding a key e.g. "reader" to "from_source". E.g. from_source("zarr", ..., reader="xarray")
from_source("zarr", ..., reader="zarr")What do you think? |
|
Hi @sandorkertesz, I'd be happy to move towards that as long as we return objects with the same methods for both readers |
|
Well that is the catch. Not sure what |
Implements a simple
zarrsource for reading zarr files (only version 2 at the moment)