Skip to content

Enable zppy E3SM Diagnostics workflow for EAMxx climo filenames#830

Draft
zhangshixuan1987 wants to merge 1 commit into
mainfrom
zppy-e3smdiag
Draft

Enable zppy E3SM Diagnostics workflow for EAMxx climo filenames#830
zhangshixuan1987 wants to merge 1 commit into
mainfrom
zppy-e3smdiag

Conversation

@zhangshixuan1987
Copy link
Copy Markdown
Collaborator

Add optional climo and climo_diurnal filename keys so E3SM diagnostics can use raw climatology files whose filename prefix differs from the active case or reference-case name.

Update the generated e3sm_diags script to distinguish the source file prefix from the desired local link prefix. The workflow now matches source files using clim_fkey or clim_diurnal_fkey, then renames the local symlinks to the expected case-based or reference-case-based prefix for downstream diagnostics.

When these filename keys are not provided, they fall back to the case or reference name, preserving the previous behavior.

Summary

Objectives:

  • Add optional clim_fkey and clim_diurnal_fkey configuration options for climatology and diurnal climatology files.
  • Add corresponding reference-case filename-key handling for model-vs-model diagnostics.
  • Allow E3SM Diagnostics to use raw climo and climo_diurnal files whose filename prefix differs from case or ref_name.
  • Rename local symlinks to the expected case-based or reference-case-based prefix so downstream diagnostics logic remains unchanged.
  • Preserve previous behavior when the new filename keys are unset, empty, or identical to the case/reference name.

Issue resolution:

  • Closes #<ISSUE_NUMBER_HERE>

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Please fill out either the "Small Change" or "Big Change" section (the latter includes the numbered subsections), and delete the other.

Small Change

  • To merge, I will use "Squash and merge". That is, this change should be a single commit.
  • Logic: I have visually inspected the entire pull request myself.
  • Pre-commit checks: All the pre-commits checks have passed.

Big Change

  • To merge, I will use "Create a merge commit". That is, this change is large enough to require multiple units of work (i.e., it should be multiple commits).

1. Does this do what we want it to do?

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

  • Testing: this pull request introduces an important feature or bug fix that we must test often. I have updated the weekly-test configuration files, not just a "min-case" one.
  • Testing: this pull request adds at least one new possible parameter to the cfg. I have tested using this parameter with and without any other parameter that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

  • Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zppy/conda, not just an import statement.

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

If applicable:

  • Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

Add optional climo and climo_diurnal filename keys so E3SM
diagnostics can use raw climatology files whose filename prefix differs
from the active case or reference-case name.

Update the generated e3sm_diags script to distinguish the source file
prefix from the desired local link prefix. The workflow now matches
source files using clim_fkey or clim_diurnal_fkey, then renames the local
symlinks to the expected case-based or reference-case-based prefix for
downstream diagnostics.

When these filename keys are not provided, they fall back to the case or
reference name, preserving the previous behavior.
@zhangshixuan1987
Copy link
Copy Markdown
Collaborator Author

zhangshixuan1987 commented May 15, 2026

configuration files used for the test


# Directions to run:
# 1. Update <output>, <www>, <environment_commands_secondary> below.
# 2. Run with `zppy -c examples/post.v3.LR.amip.0101.cfg`.
# Direction to create stand-alone test data for zppy-interfaces:
# 3. Once the jobs finish, `cd <output>/post/scripts`.
# 4. Run `grep -n "Running a zi-pcmdi command" pcmdi_diags*.o*` to find the pcmdi_diags commands.
# 5. Then, you can run those lines stand-alone.
[default]
input = /pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1
output = /pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1
case = ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1
www = /global/cfs/cdirs/e3sm/www/zhan391/eamxx-pcmdi

partition = "debug"
account = "e3sm"
#account = "priority"
campaign = "water_cycle"
debug = False
environment_commands = "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh"

[climo]
active = True
walltime = "2:00:00"
years = "1995:2004:10",
# Another example of `years`:
# years = "1985:2014:30", "1985:2014:15"

  [[ atm_monthly_180x360_aave ]]
  # The following e3sm_diags sets require it:
  # "lat_lon", "zonal_mean_xy", "zonal_mean_2d", "polar", "cosp_histogram", "meridional_mean_2d", "annual_cycle_zonal_mean", "zonal_mean_2d_stratosphere" "aerosol_aeronet", "aerosol_budget"
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  frequency = "monthly"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc


  [[ atm_monthly_diurnal_8xdaily_180x360_aave ]]
  # The following e3sm_diags sets require it:
  # "diurnal_cycle"
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "3ha_ne30pg2"
  input_files = "AVERAGE.nhours_x3"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  frequency = "diurnal_8xdaily"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  vars = "precip_liq_surf_mass_flux,precip_ice_surf_mass_flux"

  [[ land_monthly_climo ]]
  active = True
  # This subtask is a dependency for the e3sm_diags task's lnd_monthly_mvm_lnd subtask.
  # The following e3sm_diags sets require it:
  # "lat_lon_land",
  input_component = "elm"
  #note: if not specify case then the default will be used 
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = archive/lnd/hist
  input_subdir = "run/"
  vars = "" # Setting this as "" will tell zppy to use ALL variables

  [[ land_monthly_180x360_traave ]]
  active = True
  input_component = "elm"
  #note: if not specify case then the default will be used 
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = "/global/cfs/cdirs/e3sm/diagnostics/maps/map_ne256pg2_to_cmip6_180x360_traave.20250301.nc"
  vars = ""
  
  [ts]
active = True
walltime = "00:10:00"
years = "1995:2004:5"
ts_num_years=5

  [[ atm_2d_monthly_180x360_aave ]]
  active = True
  input_component = "eamxx"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  frequency = "monthly"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  # list for pcmdi diag, note: PHIS,hyam,hybm,hyai,hybi need to be included to process the 3D fields
  vars="ps,surf_radiative_T,SeaLevelPressure,IceWaterPath,qv_2m,precip_liq_surf_mass_flux,precip_ice_surf_mass_flux,omega_at_500hPa,omega_at_700hPa,omega_at_850hPa,T_mid_at_700hPa,T_2m,surface_upward_latent_heat_flux,surf_sens_flux,z_mid_at_700hPa,wind_speed_10m,surf_evap,U_at_10m_above_surface,V_at_10m_above_surface,LW_clrsky_flux_dn_at_model_bot,LW_clrsky_flux_up_at_model_top,LW_flux_dn_at_model_bot,LW_flux_up_at_model_bot,LW_flux_up_at_model_top,SW_clrsky_flux_dn_at_model_bot,SW_clrsky_flux_dn_at_model_top,SW_clrsky_flux_up_at_model_bot,SW_clrsky_flux_up_at_model_top,SW_flux_dn_at_model_bot,SW_flux_dn_at_model_top,SW_flux_up_at_model_bot,SW_flux_up_at_model_top,ShortwaveCloudForcing,LongwaveCloudForcing,isccp_cldtot"
  extra_vars= "area,landfrac,ocnfrac"


  [[ atm_3d_monthly_180x360_aave ]]
  active = True
  input_component = "eamxx"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  frequency = "monthly"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  # list for pcmdi diag, note: PHIS,hyam,hybm,hyai,hybi need to be included to process the 3D fields
  vars="U,V,T_mid,z_mid,omega,RelativeHumidity,p_mid,qv"
  extra_vars= "ps,hyai,hyam,hybi,hybm,area,landfrac,ocnfrac"


  [[ atm_daily_180x360_aave ]]
  active = True
  # This subtask is a dependency for the e3sm_diags task's atm_monthly_180x360 subtask.
  # The following e3sm_diags sets require it:
  # "tropical_subseasonal", "precip_pdf"
  input_component = "eamxx"
  case = "1da_ne30pg2"
  input_files = "AVERAGE.ndays_x1"
  frequency = "daily"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  mapping_file = /global/cfs/cdirs/e3sm/diagnostics/maps/map_ne30pg2_to_cmip6_180x360_aave.20200201.nc
  # Needed for Wheeler Kiladis
  vars = "LW_flux_up_at_model_top,precip_liq_surf_mass_flux,precip_ice_surf_mass_flux,U_at_850hPa"


  [[ atm_monthly_glb ]]
  active = True
  # This subtask is a dependency for the global_time_series task.
  input_component = "eam"
  #input_subdir = "archive/atm/hist"
  input_subdir = "run/"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  frequency = "monthly"
  mapping_file = "glb"
  vars="ps,surf_radiative_T,SeaLevelPressure,IceWaterPath,qv_2m,precip_liq_surf_mass_flux,precip_ice_surf_mass_flux"
  #vars="omega_at_500hPa,omega_at_700hPa,omega_at_850hPa,T_mid_at_700hPa,T_2m,surface_upward_latent_heat_flux,surf_sens_flux,z_mid_at_700hPa,wind_speed_10m,surf_evap,U_at_10m_above_surface,V_at_10m_above_surface,LW_clrsky_flux_dn_at_model_bot,LW_clrsky_flux_up_at_model_top,LW_flux_dn_at_model_bot,LW_flux_up_at_model_bot,LW_flux_up_at_model_top,SW_clrsky_flux_dn_at_model_bot,SW_clrsky_flux_dn_at_model_top,SW_clrsky_flux_up_at_model_bot,SW_clrsky_flux_up_at_model_top,SW_flux_dn_at_model_bot,SW_flux_dn_at_model_top,SW_flux_up_at_model_bot,SW_flux_up_at_model_top,ShortwaveCloudForcing,LongwaveCloudForcing,isccp_cldtot"

  [[ land_monthly ]]
  active = True
  # This subtask is a dependency for the e3sm_to_cmip task's land_monthly subtask.
  input_component = "elm"
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = "/global/cfs/cdirs/e3sm/diagnostics/maps/map_ne256pg2_to_cmip6_180x360_traave.20250301.nc"
  # Variables:
  #vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILICE,SOILLIQ,SOILWATER_10CM,TSA,TSOI,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  vars = "SOILWATER_10CM"
  extra_vars = "landfrac"


  [[ lnd_monthly_glb ]]
  active = True
  # This subtask is a dependency for the global_time_series task.
  input_component = "elm"
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = "glb"
  job_nbr = 50 # This reduces paralllel processes in ncclimo time-series splitting for memory management.
  #vars = "" # This will tell zppy to use all available variables.
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO"


  [[ land_monthly_energy ]]
  active = True
  input_component = "elm"
  frequency = "monthly"
  input_files = "elm.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = ""
  vars = "EFLX_LH_TOT,FIRA,FLDS,FSA,FSDS,FSRND,FSRVD,FSDSND,FSDSVD,FSH,TSA"

  [[ rof_monthly ]]
  active = True
  # The following e3sm_diags sets require it:
  # "streamflow"
  input_component = "mosart"
  frequency = "monthly"
  input_files = "mosart.h0"
  #input_subdir = "archive/lnd/hist"
  input_subdir = "run/"
  mapping_file = ""
  # Variables:
  vars = "RIVER_DISCHARGE_OVER_LAND_LIQ"
  extra_vars = 'areatotal2'
  

[e3sm_to_cmip]
active = True
frequency = "monthly"
ts_grid = "180x360_aave"
ts_num_years=5
walltime = "00:10:00"
years = "1995:2004:5"
environment_commands = "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh; conda activate zi-pcmdi-diags"

  [[ atm_2d_monthly_180x360_aave ]]
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  ts_subsection = "atm_2d_monthly_180x360_aave"
  vars="ps,surf_radiative_T,SeaLevelPressure,IceWaterPath,qv_2m,precip_liq_surf_mass_flux,precip_ice_surf_mass_flux,omega_at_500hPa,omega_at_700hPa,omega_at_850hPa,T_mid_at_700hPa,T_2m,surface_upward_latent_heat_flux,surf_sens_flux,z_mid_at_700hPa,wind_speed_10m,surf_evap,U_at_10m_above_surface,V_at_10m_above_surface,LW_clrsky_flux_dn_at_model_bot,LW_clrsky_flux_up_at_model_top,LW_flux_dn_at_model_bot,LW_flux_up_at_model_bot,LW_flux_up_at_model_top,SW_clrsky_flux_dn_at_model_bot,SW_clrsky_flux_dn_at_model_top,SW_clrsky_flux_up_at_model_bot,SW_clrsky_flux_up_at_model_top,SW_flux_dn_at_model_bot,SW_flux_dn_at_model_top,SW_flux_up_at_model_bot,SW_flux_up_at_model_top,ShortwaveCloudForcing,LongwaveCloudForcing,isccp_cldtot"
  cmip_vars = "pr,cltisccp,evspsbl,hfls,hfss,huss,ps,psl,rlds,rldscs,rlus,rlut,rlutcs,rsds,rsdscs,rsdt,rsus,rsuscs,rtmt,uas,vas,sfcWind,tas,ts"
  
  
  [[ atm_3d_monthly_180x360_aave ]]
  input_component = "eamxx"
  #cmip_plevdata = "/lcrc/group/e3sm/diagnostics/e3sm_to_cmip_data/grids/vrt_remap_plev19.nc"
  case = "1ma_ne30pg2"
  input_files = "AVERAGE.nmonths_x1"
  interp_vars = "U,V,T_mid,z_mid,omega,RelativeHumidity,p_mid,qv"
  ts_subsection = "atm_3d_monthly_180x360_aave"
  vars="U,V,T_mid,z_mid,omega,RelativeHumidity,p_mid,qv"
  cmip_vars = "ta,ua,va,zg"


  [[ land_monthly ]]
  active = True
  # This subtask is a dependency for the ilamb task.
  # This subtask depends on the ts task's land_monthly subtask.
  # Notice this subtask name matches a subtask in the `ts` task.
  # If it did not, then the `ts_land_subsection` parameter would be required here to tell zppy which subtask to use.
  ts_grid = "180x360_traave"
  input_component = "elm"
  ts_land_subsection = "land_monthly"
  frequency = "monthly"
  input_files = "elm.h0"
  cmip_vars = "mrsos"
  
[tc_analysis]
active = True
walltime = "02:00:00" # Example elapsed time: 3296 seconds (55 minutes)
years = "1995:2004:10",
input_component = "eamxx"
case = "6ha_ne30pg2"
input_files = "AVERAGE.nhours_x6"
#input_subdir = "archive/atm/hist"
input_subdir = "run/"
ts_grid = "ne30pg2"
# Note: Users must provide the variable list in the fixed sequence required by
# this TempestExtremes cyclone-detection workflow:
# SLP,T@200hPa,T@500hPa,U@model_bottom,V@model_bottom,U@850hPa,V@850hPa
# For example:
# EAM: vars="PSL,T200,T500,UBOT,VBOT,U850,V850"
# EAMxx: vars="SeaLevelPressure,T_mid_at_200hPa,T_mid_at_500hPa,U_at_model_bot,V_at_model_bot,U_at_850hPa,V_at_850hPa"
tc_vars = "SeaLevelPressure,T_mid_at_200hPa,T_mid_at_500hPa,U_at_model_bot,V_at_model_bot,U_at_850hPa,V_at_850hPa"


  [e3sm_diags]
active = True
multiprocessing = True
num_workers = 8
ref_final_yr = 1995
ref_start_yr = 2004
ts_num_years = 5
walltime = "4:00:00"
years = "1995:2004:10",
environment_commands = "source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh"

  [[ atm_monthly_180x360_aave ]]
  # `e3sm_diags` is largely driven by which e3sm_diags sets are requested:
  climo_subsection = "atm_monthly_180x360_aave"
  climo_diurnal_subsection = "atm_monthly_diurnal_8xdaily_180x360_aave"
  climo_diurnal_frequency = "diurnal_8xdaily"
  ts_subsection = "atm_2d_monthly_180x360_aave"
  ts_daily_subsection = "atm_daily_180x360_aave"
  grid = '180x360_aave'
  clim_fkey = "1ma_ne30pg2"
  clim_diurnal_fkey = "3ha_ne30pg2"
  #sets="lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","enso_diags","qbo","diurnal_cycle","zonal_mean_2d_stratosphere","aerosol_aeronet","mp_partition","tropical_subseasonal","precip_pdf","tc_analysis","streamflow",
  sets="lat_lon","zonal_mean_xy","zonal_mean_2d","zonal_mean_2d_stratosphere","polar","diurnal_cycle","meridional_mean_2d","annual_cycle_zonal_mean","tropical_subseasonal","precip_pdf"
  short_name = 'e3sm.amip.EAMXX.test2_1'


  [[ lnd_monthly_mvm_lnd ]]
  # Depends on the climo task's land_monthly_climo subtask.
  sets = "lat_lon_land",
  climo_subsection = "land_monthly_climo"
  # Other parameters:
  diff_title = "Difference"
  grid = 'native'
  # The reference_data_path should point to pre-computed climatology files from a nclimo/zppy run
  reference_data_path = "/pscratch/sd/z/zhan391/e3smv4_project/20250906.wcycl1850.ne120pg2_r025_RRSwISC6to18E3r5.test6.1.chrysalis/post/lnd/native/clim"
  ref_name = "20250906.wcycl1850.ne120pg2_r025_RRSwISC6to18E3r5.test6.1.chrysalis"
  ref_final_yr = 96
  ref_start_yr = 105
  ref_years = "96-105",
  run_type = "model_vs_model"
  short_name = "e3sm.amip.EAMXX.test2_1"
  short_ref_name = "v3.HR.piControl-test6.1"
  swap_test_ref = False
  tag = "model_vs_model"


Note that the workflow here will depend on the following pull reqest as well:

  1. Enable EAMxx CMORization, E3SM Diags and add ts-level vertical regrid #827
  2. Improve TC analysis workflow robustness for both EAM and EAMxx #828

TEST directory on Perlmutter:

  • /pscratch/sd/z/zhan391/e3smv4_project/ne256pg2_ne256pg2.F20TR-SCREAMv1.July-1.spanc800.2xauto.acc150.n0032.test2.1/post/scripts

E3SM-DIAG webpage:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant