Clean up Data handeling methods in adf_dataset.py and fix AMWG table formatting#427
Clean up Data handeling methods in adf_dataset.py and fix AMWG table formatting#427justin-richling wants to merge 8 commits into
adf_dataset.py and fix AMWG table formatting#427Conversation
Remove the dependency of the baseline case name since there will only be one baseline
This will now grab any scale factors and new units for time series files
Now if these methods are called it will apply the scale factors/offset/units and time bounds fixes (for time series)
Make `new_unit` not Tex formatting (mostly for AMWG table viewing) but move it to mpl args for plotting
|
Hi @justin-richling -- Here's another analysis thanks to Claude. Looks like this one needs a little more attention, but the proposed changes are good. PR #427 Complete Analysis & ReviewPR: Clean up Data handling methods in Executive SummaryPR #427 is a significant refactoring that improves data loading architecture and fixes AMWG table formatting issues. The work reorganizes how test and reference data are loaded, creating consistent wrapper methods for different data types. However, the PR is stale (6+ months old), marked as draft, has merge conflicts with the current main branch, and needs updates to pass CI. 1. MERGE CONFLICT ANALYSISConflict Found: YES ❌File with Conflicts: Location: Lines 164-186 Nature of Conflict:
Root Cause: Conflict Block: # PR Version adds:
def load_climo_dataset(self, case, variablename):
"""Return Dataset for test climo of field"""
fils = self.get_climo_file(case, variablename)
if not fils:
warnings.warn(...)
return None
return self.load_dataset(fils)
# vs. Main simplified to:
fils = sorted(model_cl_loc.glob(...))
return filsSeverity: Medium - Resolvable but requires careful consideration of design intent 2. CHANGES OVERVIEWlib/adf_dataset.py (+112, -104)Purpose: Refactor data loading methods for consistency and clarity Key Changes:
Assessment: ✅ Good refactoring - Creates consistent API, reduces code duplication, improves maintainability lib/adf_variable_defaults.yaml (+59, -482)Purpose: Configuration cleanup and modernization Changes:
Assessment: ✅ Beneficial cleanup - Removes dead code, reduces config file size, modernizes unit notation scripts/analysis/amwg_table.py (+7, -5)Changes:
Assessment: ✅ Proper adaptation - Uses new data loading interface consistently Plotting Scripts (adf_histogram.py, aod_latlon.py, global_latlon_map.py, polar_map.py, zonal_mean.py)Changes: Removed Assessment: ✅ Consistent API - Reflects the API cleanup in adf_dataset.py 3. RELATED ISSUESCloses Issues:
Note: Current status of these issues should be verified 4. CODE QUALITY ASSESSMENTStrengths ✅
Weaknesses
|
| File | Type | Risk Level | Notes |
|---|---|---|---|
| lib/adf_dataset.py | Core | 🟡 Medium | Has merge conflict, needs reconciliation with recent fixes |
| lib/adf_variable_defaults.yaml | Config | 🟢 Low | Straightforward cleanup, no conflicts |
| scripts/analysis/amwg_table.py | Script | 🟢 Low | Simple API update |
| adf_histogram.py | Script | 🟢 Low | Simple API update |
| aod_latlon.py | Script | 🟢 Low | Simple API update |
| global_latlon_map.py | Script | 🟢 Low | Simple API update |
| polar_map.py | Script | 🟢 Low | Simple API update |
| zonal_mean.py | Script | 🟢 Low | Simple API update |
This PR will clean up and organize the calls to read files, datasets, and data arrays depending on the file type, ie time series, climo, regridded climo.
The main changes here are to:
adf_dataset.pyto gather new unit infoupdate_unitand call that inload_da. This could be useful as an external method, similarly toget_value_convertersamwg_table.pyto implement these changesImportant changes:
adf_dataset.py. That was the issue with the tables being improperly formatted, see AMWG tables are not getting proper data cleaning #423Now you can load data sets and arrays generically any where in the ADF by calling
load_dataset(fils)and/orload_da(<list of files>, <variable name>, <case name>)or through the three file types:
load_reference_timeseries_dataset/load_timeseries_datasetload_reference_climo_dataset/load_climo_datasetload_reference_regrid_dataset/load_regrid_datasetload_<file type>_datasetwill inherently callload_datasetandload_<file type>_dawill inherently callload_dacalling
load_datasethas a flag for time series files so that it can check the time bounds to ensure proper time alignment. This is only an issue for using older CAM/CESM files but this won't modify anything if the newer files are used.This change will now force the ADF to figure out the modifications at the loading data array stage
Potential issues:
user must make sure of the scale/factor, units if using any premade file; using files that aren't created via ADF from the history files level.
Remedy: The user just has to make sure there are no values for these arguments in the config yaml file if their datasets have been already modified, or use even more generic xarray load via
load_datasetfromadf_utilsNot likely but a definite issue for edge case scenarios.
Closes #423 and #424