Pre-slice the input datasets spatially in the concurrent Dataset for efficiency#410
Pre-slice the input datasets spatially in the concurrent Dataset for efficiency#410
Conversation
felix-e-h-p
left a comment
There was a problem hiding this comment.
Looking all good, just a couple of comments.
Also, what about a test that fails without the buffer via select_spatial_slice_pixels_multiple?
|
Thanks for the review @felix-e-h-p
I'm not sure I understand. Do you mean we'd use |
Ah was thinking more a test that uses window size of 2 and just shows that removing the buffer would cause a failure. |
To do that we'd probably need to make the buffer configurable, and then we'd simply be testing that the version where the buffer isn't the default value is incorrect. That doesn't feel like best practice to me |
Pull Request
Description
The
PVNetConcurrentDatasetclass loads the entire spatial crop before slicing out windows for each location. This is fine when the input dataset roughly matches the spatial extent of the spread of location. However, I've recently gained a use-case where the input dataset is spatially much larger than the spread of locations. e.g. the dataset covers europe but I'm only interested in the UK.This PR adds a spatial slice into the
__init__()ofPVNetConcurrentDatasetso that the input datasets are reduced cover only the spatial area needed to create the window slices around all the locations. This reduces the amount of unnecessary data loaded for each sample when the input datasets are much wider than required.Also:
How Has This Been Tested?
I've run some checks locally and the
PVNetConcurrentDatasetclass is already tested in:Checklist: