Skip to content

Pre-slice the input datasets spatially in the concurrent Dataset for efficiency#410

Merged
dfulu merged 4 commits intomainfrom
concurrent_preslice
Apr 2, 2026
Merged

Pre-slice the input datasets spatially in the concurrent Dataset for efficiency#410
dfulu merged 4 commits intomainfrom
concurrent_preslice

Conversation

@dfulu
Copy link
Copy Markdown
Member

@dfulu dfulu commented Mar 31, 2026

Pull Request

Description

The PVNetConcurrentDatasetclass loads the entire spatial crop before slicing out windows for each location. This is fine when the input dataset roughly matches the spatial extent of the spread of location. However, I've recently gained a use-case where the input dataset is spatially much larger than the spread of locations. e.g. the dataset covers europe but I'm only interested in the UK.

This PR adds a spatial slice into the __init__() of PVNetConcurrentDataset so that the input datasets are reduced cover only the spatial area needed to create the window slices around all the locations. This reduces the amount of unnecessary data loaded for each sample when the input datasets are much wider than required.

Also:

  • Removed references to padding since that was removed a while ago

How Has This Been Tested?

I've run some checks locally and the PVNetConcurrentDataset class is already tested in:

tests/torch_datasets/test_pvnet_dataset.py::test_pvnet_concurrent_dataset

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@dfulu dfulu requested a review from felix-e-h-p April 1, 2026 14:34
Copy link
Copy Markdown
Contributor

@felix-e-h-p felix-e-h-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking all good, just a couple of comments.

Also, what about a test that fails without the buffer via select_spatial_slice_pixels_multiple?

@dfulu
Copy link
Copy Markdown
Member Author

dfulu commented Apr 2, 2026

Thanks for the review @felix-e-h-p

Also, what about a test that fails without the buffer via select_spatial_slice_pixels_multiple?

I'm not sure I understand. Do you mean we'd use select_spatial_slice_pixels_multiple() with a window size other than 2 so the buffer isn't applied?

@felix-e-h-p
Copy link
Copy Markdown
Contributor

Thanks for the review @felix-e-h-p

Also, what about a test that fails without the buffer via select_spatial_slice_pixels_multiple?

I'm not sure I understand. Do you mean we'd use select_spatial_slice_pixels_multiple() with a window size other than 2 so the buffer isn't applied?

Ah was thinking more a test that uses window size of 2 and just shows that removing the buffer would cause a failure.

@dfulu
Copy link
Copy Markdown
Member Author

dfulu commented Apr 2, 2026

Ah was thinking more a test that uses window size of 2 and just shows that removing the buffer would cause a failure.

To do that we'd probably need to make the buffer configurable, and then we'd simply be testing that the version where the buffer isn't the default value is incorrect. That doesn't feel like best practice to me

@dfulu dfulu merged commit ac62fca into main Apr 2, 2026
6 checks passed
@dfulu dfulu deleted the concurrent_preslice branch April 2, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants