Skip to content

Add lazy loading option for the FDB source#677

Merged
sandorkertesz merged 11 commits into
developfrom
feature/virtual-field
May 6, 2025
Merged

Add lazy loading option for the FDB source#677
sandorkertesz merged 11 commits into
developfrom
feature/virtual-field

Conversation

@sandorkertesz
Copy link
Copy Markdown
Collaborator

@sandorkertesz sandorkertesz commented Apr 11, 2025

This PR implements lazy loading for GRIB data read from FDB. The idea is that when lazy=True is used:

request = {
    "class": "od",
    "expver": "0001",
    "stream": "oper",
    "date": [20240603, 20240604],
    "time": [0, 1200],
    "domain": "g",
    "type": "fc",
    "levtype": "pl",
    "levelist": [500, 700],
    "step": [0, 6],
    "param": [130, 157],
}

ds = from_source("fdb", , config=config, stream=False, lazy=True)

from_source does not execute the retrieval but returns a Virtual FieldList. This involves the following steps:

  • The the request for each available field is determined by using pyfdb.list()
  • Create one Virtual Field for each request. These fields form a Virtual Fieldlist
  • The first field is retrieved as a reference, while the other fields only contain their requests

Using any metadata queries on the Virtual Fieldlist/Fields does not involve any data retrieval, but try to use the requests for it. For metadata keys not present in the request the metadata from the reference field is used.

# No retrieval is needed for these calls
ds1 = ds.sel(param="t", levelist=500)
ds[1].metadata("param")
xr_ds = ds.to_xarray()

Any calls requiring the data values in a field trigger the actual retrieval and the resulting GRIB field will be stored in memory for each virtual field.

# This will trigger the data retrieval if data is not yet retrieved
ds[1].to_numpy()

Notes

  • retrievals are performed per field. However, grouping may be more efficient
  • each Virtual Field retrieves its corresponding GRIB field into memory temporarily when the values are accessed. This GRIB field is not cached and gets deleted when going out of scope (as soon as the values are returned from the Virtual Field). This behaviour should be controlled allowing options to cache it in memory or on disk.
  • we could determine the individual field requests by splitting the original request instead of using pyfdb.list()

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 11, 2025

Codecov Report

❌ Patch coverage is 32.50000% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.82%. Comparing base (4c937d9) to head (9c5a256).
⚠️ Report is 352 commits behind head on develop.

Files with missing lines Patch % Lines
tests/lazy/test_lazy_fdb.py 32.50% 27 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #677      +/-   ##
===========================================
- Coverage    91.01%   90.82%   -0.19%     
===========================================
  Files          162      163       +1     
  Lines        12391    12431      +40     
  Branches       609      609              
===========================================
+ Hits         11278    11291      +13     
- Misses         932      959      +27     
  Partials       181      181              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sandorkertesz sandorkertesz changed the title WIP: virtual fields Virtual fields for FDB May 6, 2025
@sandorkertesz sandorkertesz marked this pull request as ready for review May 6, 2025 17:01
@sandorkertesz sandorkertesz merged commit 4a3a919 into develop May 6, 2025
124 of 126 checks passed
@sandorkertesz sandorkertesz deleted the feature/virtual-field branch May 6, 2025 18:06
@sandorkertesz sandorkertesz changed the title Virtual fields for FDB Add lazy loading option for the FDB source May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants