Pre-slice the input datasets spatially in the concurrent Dataset for efficiency by dfulu · Pull Request #410 · openclimatefix/ocf-data-sampler

dfulu · 2026-03-31T13:07:17Z

Pull Request

Description

The PVNetConcurrentDatasetclass loads the entire spatial crop before slicing out windows for each location. This is fine when the input dataset roughly matches the spatial extent of the spread of location. However, I've recently gained a use-case where the input dataset is spatially much larger than the spread of locations. e.g. the dataset covers europe but I'm only interested in the UK.

This PR adds a spatial slice into the __init__() of PVNetConcurrentDataset so that the input datasets are reduced cover only the spatial area needed to create the window slices around all the locations. This reduces the amount of unnecessary data loaded for each sample when the input datasets are much wider than required.

Also:

Removed references to padding since that was removed a while ago

How Has This Been Tested?

I've run some checks locally and the PVNetConcurrentDataset class is already tested in:

tests/torch_datasets/test_pvnet_dataset.py::test_pvnet_concurrent_dataset

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

…efficiency

ocf_data_sampler/select/select_spatial_slice.py

ocf_data_sampler/torch_datasets/utils/spatial_slice_for_dataset.py

felix-e-h-p

Looking all good, just a couple of comments.

Also, what about a test that fails without the buffer via select_spatial_slice_pixels_multiple?

dfulu · 2026-04-02T09:20:58Z

Thanks for the review @felix-e-h-p

Also, what about a test that fails without the buffer via select_spatial_slice_pixels_multiple?

I'm not sure I understand. Do you mean we'd use select_spatial_slice_pixels_multiple() with a window size other than 2 so the buffer isn't applied?

felix-e-h-p · 2026-04-02T09:59:15Z

Thanks for the review @felix-e-h-p

Also, what about a test that fails without the buffer via select_spatial_slice_pixels_multiple?

I'm not sure I understand. Do you mean we'd use select_spatial_slice_pixels_multiple() with a window size other than 2 so the buffer isn't applied?

Ah was thinking more a test that uses window size of 2 and just shows that removing the buffer would cause a failure.

dfulu · 2026-04-02T10:35:20Z

Ah was thinking more a test that uses window size of 2 and just shows that removing the buffer would cause a failure.

To do that we'd probably need to make the buffer configurable, and then we'd simply be testing that the version where the buffer isn't the default value is incorrect. That doesn't feel like best practice to me

dfulu added 3 commits March 31, 2026 12:59

Pre-slice the input datasets spatially in the concurrent Dataset for …

aa98e3b

…efficiency

lint

29d1553

lint

2f5073f

dfulu requested a review from felix-e-h-p April 1, 2026 14:34

felix-e-h-p reviewed Apr 1, 2026

View reviewed changes

ocf_data_sampler/select/select_spatial_slice.py Outdated Show resolved Hide resolved

felix-e-h-p reviewed Apr 1, 2026

View reviewed changes

ocf_data_sampler/torch_datasets/utils/spatial_slice_for_dataset.py Show resolved Hide resolved

felix-e-h-p requested changes Apr 1, 2026

View reviewed changes

Remove reference to padding

3f417d6

felix-e-h-p approved these changes Apr 2, 2026

View reviewed changes

dfulu merged commit ac62fca into main Apr 2, 2026
6 checks passed

dfulu deleted the concurrent_preslice branch April 2, 2026 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pre-slice the input datasets spatially in the concurrent Dataset for efficiency#410

Pre-slice the input datasets spatially in the concurrent Dataset for efficiency#410
dfulu merged 4 commits intomainfrom
concurrent_preslice

dfulu commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

felix-e-h-p left a comment

Uh oh!

dfulu commented Apr 2, 2026

Uh oh!

felix-e-h-p commented Apr 2, 2026

Uh oh!

dfulu commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dfulu commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

How Has This Been Tested?

Checklist:

Uh oh!

Uh oh!

Uh oh!

felix-e-h-p left a comment

Choose a reason for hiding this comment

Uh oh!

dfulu commented Apr 2, 2026

Uh oh!

felix-e-h-p commented Apr 2, 2026

Uh oh!

dfulu commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dfulu commented Mar 31, 2026 •

edited

Loading

dfulu commented Apr 2, 2026 •

edited

Loading