Add lazy loading option for the FDB source by sandorkertesz · Pull Request #677 · ecmwf/earthkit-data

sandorkertesz · 2025-04-11T20:02:40Z

This PR implements lazy loading for GRIB data read from FDB. The idea is that when lazy=True is used:

request = {
    "class": "od",
    "expver": "0001",
    "stream": "oper",
    "date": [20240603, 20240604],
    "time": [0, 1200],
    "domain": "g",
    "type": "fc",
    "levtype": "pl",
    "levelist": [500, 700],
    "step": [0, 6],
    "param": [130, 157],
}

ds = from_source("fdb", , config=config, stream=False, lazy=True)

from_source does not execute the retrieval but returns a Virtual FieldList. This involves the following steps:

The the request for each available field is determined by using pyfdb.list()
Create one Virtual Field for each request. These fields form a Virtual Fieldlist
The first field is retrieved as a reference, while the other fields only contain their requests

Using any metadata queries on the Virtual Fieldlist/Fields does not involve any data retrieval, but try to use the requests for it. For metadata keys not present in the request the metadata from the reference field is used.

# No retrieval is needed for these calls
ds1 = ds.sel(param="t", levelist=500)
ds[1].metadata("param")
xr_ds = ds.to_xarray()

Any calls requiring the data values in a field trigger the actual retrieval and the resulting GRIB field will be stored in memory for each virtual field.

# This will trigger the data retrieval if data is not yet retrieved
ds[1].to_numpy()

Notes

retrievals are performed per field. However, grouping may be more efficient
each Virtual Field retrieves its corresponding GRIB field into memory temporarily when the values are accessed. This GRIB field is not cached and gets deleted when going out of scope (as soon as the values are returned from the Virtual Field). This behaviour should be controlled allowing options to cache it in memory or on disk.
we could determine the individual field requests by splitting the original request instead of using pyfdb.list()

codecov-commenter · 2025-04-11T20:11:20Z

Codecov Report

❌ Patch coverage is 32.50000% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.82%. Comparing base (4c937d9) to head (9c5a256).
⚠️ Report is 352 commits behind head on develop.

Files with missing lines	Patch %	Lines
tests/lazy/test_lazy_fdb.py	32.50%	27 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #677      +/-   ##
===========================================
- Coverage    91.01%   90.82%   -0.19%     
===========================================
  Files          162      163       +1     
  Lines        12391    12431      +40     
  Branches       609      609              
===========================================
+ Hits         11278    11291      +13     
- Misses         932      959      +27     
  Partials       181      181

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sandorkertesz added 3 commits April 7, 2025 13:34

Virtual field

54ac5be

Merge branch 'develop' into feature/virtual-field

7e91efe

Implement virtual fields

779fe61

sandorkertesz added 8 commits April 22, 2025 11:22

Add test

36ed2cb

Merge branch 'develop' into feature/virtual-field

9839fa0

Merge branch 'develop' into feature/virtual-field

f446a2c

Merge branch 'develop' into feature/virtual-field

c006ed1

Merge branch 'develop' into feature/virtual-field

19ddf7e

Merge branch 'develop' into feature/virtual-field

587d225

Implement virtual fields

0792b71

Implement virtual fields

9c5a256

sandorkertesz changed the title ~~WIP: virtual fields~~ Virtual fields for FDB May 6, 2025

sandorkertesz marked this pull request as ready for review May 6, 2025 17:01

sandorkertesz merged commit 4a3a919 into develop May 6, 2025
124 of 126 checks passed

sandorkertesz deleted the feature/virtual-field branch May 6, 2025 18:06

sandorkertesz changed the title ~~Virtual fields for FDB~~ Add lazy loading option for the FDB source May 6, 2025

sandorkertesz mentioned this pull request May 14, 2025

feature: experimental gribjump source #689

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lazy loading option for the FDB source#677

Add lazy loading option for the FDB source#677
sandorkertesz merged 11 commits into
developfrom
feature/virtual-field

sandorkertesz commented Apr 11, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sandorkertesz commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Uh oh!

codecov-commenter commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sandorkertesz commented Apr 11, 2025 •

edited

Loading

codecov-commenter commented Apr 11, 2025 •

edited

Loading