Skip to content

Add optional clip#406

Merged
dfulu merged 5 commits intomainfrom
add_optional_clip
Mar 26, 2026
Merged

Add optional clip#406
dfulu merged 5 commits intomainfrom
add_optional_clip

Conversation

@dfulu
Copy link
Copy Markdown
Member

@dfulu dfulu commented Mar 23, 2026

Pull Request

Description

This PR introduces an optional clip for the NWP and satellite data which is controlled from the data config. The clip is applied before the data is standardised so the clip values are more physically interpretable. Clipping before applying the normalisation means the clip range is not independent of the mean and std values used.

The clip is optional and backwards compatible since it defaults to no clip.

Also in this PR (sorry):

  • Updates to allow this to work with the new satellite dataset format
  • Remove an unused config file in the tests

How Has This Been Tested?

I've run the data-sampler locally to make sure the results look sensible. Basically, I set up a config and whilst looking at ECMWF temperature I made sure the maxes and mins of the samples were as expected when I included/excluded the clipping

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

"time": "time_utc",
},
)
for old_name, new_name in list(rename_dict.items()):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is to allow for the new satellite data where the channel dimension is already called channel

ds = ds.transpose("time_utc", "channel", "x_geostationary", "y_geostationary")

data_array = get_xr_data_array_from_xr_dataset(ds)
da = get_xr_data_array_from_xr_dataset(ds)
Copy link
Copy Markdown
Member Author

@dfulu dfulu Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed data_array --> da to make the style here more consistent

channel_mins = self.clip_min_dict["sat"]
channel_maxs = self.clip_max_dict["sat"]
dataset_dict["sat"].data = (
(dataset_dict["sat"].data.clip(channel_mins, channel_maxs) - channel_means)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies a different clip value per channel

@felix-e-h-p felix-e-h-p self-requested a review March 24, 2026 14:59
Copy link
Copy Markdown
Contributor

@felix-e-h-p felix-e-h-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - noticed something that may potentially be a silent issue RE clip_min/max for NWP and Sat.

@dfulu dfulu requested a review from felix-e-h-p March 26, 2026 10:00

means_list.append(norm_conf.mean)
stds_list.append(norm_conf.std)
clip_min_list.append(-np.inf if norm_conf.clip_min is None else norm_conf.clip_min)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers!

@dfulu dfulu merged commit 3b4377f into main Mar 26, 2026
6 checks passed
@dfulu dfulu deleted the add_optional_clip branch March 26, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants