Add Viewer support#9
Conversation
forsyth2
left a comment
There was a problem hiding this comment.
@xylar I'm trying to port over E3SM-Project/zppy#616 to post-refactored code. It's actually an excellent example of why it was a good idea to pull the code out into a separate package in the first place: there are three packages that my editor is showing as unfound, meaning pre-refactor I must have just been picking them up from Unified.
Is every Unified dependency included in https://github.com/E3SM-Project/e3sm-unified/blob/main/recipes/e3sm-unified/meta.yaml? I'm wondering if some of these packages are inside others. (I see output_viewer but not bs4 or distutils).
| @@ -1,7 +1,10 @@ | |||
| # Script to plot some global atmosphere and ocean time series | |||
| import csv | |||
| import distutils.dir_util | |||
There was a problem hiding this comment.
Could not be resolved
There was a problem hiding this comment.
You should not be using distutils anymore. Try to find a python 3.13 compatible alternative to this function.
There was a problem hiding this comment.
There was a problem hiding this comment.
I've asked @altheaden if she would be willing to look for the preferred replacement for this function. Hopefully, she can come up with something.
| from output_viewer.build import build_page, build_viewer | ||
| from output_viewer.index import ( | ||
| OutputFile, | ||
| OutputGroup, | ||
| OutputIndex, | ||
| OutputPage, | ||
| OutputRow, | ||
| ) | ||
| from output_viewer.utils import rechmod |
There was a problem hiding this comment.
output_viewer could not be resolved.
No, that only shows direct dependencies. There are also dependencies-of-dependencies and so on. There isn't a great way to know what those are or what their constraints are. |
|
Is bs4 a different implementation of beautiful_soup? If so, what a mess! |
|
|
Glad that this work is proving its value! |
forsyth2
left a comment
There was a problem hiding this comment.
I now have a viewer index pointing to both the atm and lnd viewers! (Images below).
However, I have 5 remaining issues I need some help on (described in this review's comments).
- @chengzhuzhang -- Issues 2,3
- @tomvothecoder -- Issues 1,2,4,5
- @xylar -- Issue 4 (and maybe some of the others too)
Thanks!
| # since the action is run on a branch in detached head state. | ||
| # This is the equivalent of running "pre-commit run --all-files" locally. | ||
| # If you commit with the `--no-verify` flag, this check may fail. | ||
| # TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually. |
There was a problem hiding this comment.
Issue 1: When I run git commit it doesn't actually do the pre-commit checks. I don't get the passed messages. If I run pre-commit run --all-files afterward, there are in fact changes made, which I then need to amend in to the commit if I didn't think to run pre-commit run --all-files before committing. I'm not sure why it's not working.
There was a problem hiding this comment.
@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.
There was a problem hiding this comment.
Did you run pre-commit install beforehand?
There was a problem hiding this comment.
Ah, that fixes this, thanks!
| header = True | ||
| # TODO: how do we make sure the csv is actually accessible???? | ||
| # The current directory is where we ran the code from, which is not necessarily where the csv is. | ||
| csv_path = INCLUSIONS_DIR |
There was a problem hiding this comment.
Issue 4a: See Issue 4 comment.
| annual_average_dataset_for_var: xarray.core.dataset.Dataset | ||
| if metric == Metric.AVERAGE: | ||
| annual_average_dataset_for_var: xarray.core.dataset.Dataset = ( | ||
| self.f.temporal.group_average(var, "year") | ||
| annual_average_dataset_for_var = self.f.temporal.group_average( | ||
| var, "year" | ||
| ) | ||
| data_array = annual_average_dataset_for_var.data_vars[var] | ||
| elif metric == Metric.TOTAL: | ||
| annual_average_dataset_for_var = self.f.temporal.group_average( | ||
| var, "year" | ||
| ) | ||
| data_array = annual_average_dataset_for_var.data_vars[var] | ||
| # import pprint | ||
| # pprint.pprint( | ||
| # f"annual_average_dataset_for_var attributes={annual_average_dataset_for_var.attrs}" | ||
| # ) | ||
| # pprint.pprint(f"data_array attributes={data_array.attrs}") | ||
| # data_array *= area*landfrac | ||
| # TODO: Determine how to get area and landfrac |
There was a problem hiding this comment.
Issue 2: I know we're aiming for data_array *= area*landfrac for the Metric.TOTAL calculation, but I'm still a little confused about how to extract those. Something like the following?
self.f.temporal.group_average(var, "area")
self.f.temporal.group_average(var, "landfrac")
But those are known scalars, not computed averages, right?
There was a problem hiding this comment.
Use xarray to extract area or landfrac -- specify variable with [] for indexing. In annual_average_dataset_for_var.data_vars[var], instead of var use "area" or "landfrac"
| for rgn in parameters.regions: | ||
| run(parameters, requested_variables, rgn) | ||
| plots_per_page = parameters.nrows * parameters.ncols | ||
| # TODO: Is this how we want to determine when to make a viewer or should we have a `make_viewer` parameter in the cfg? |
There was a problem hiding this comment.
Issue 3: this is a design decision question. I guess it would be best to have a specific make_viewer parameter rather than relying on a user specifying they want 1 row and 1 column in the parameters. It's possible a user would want single plot images, but not a Viewer. (Although, the inverse doesn't work; we need 1x1 images to use in the Viewer). Once #3 is implemented too, that will make parameter specification less clunky.
There was a problem hiding this comment.
Upon further thought, I think I should just go ahead and add the make_viewer parameter. It's clearer to understand, and allows for the case of wanting single plot images but not a Viewer.
| @@ -0,0 +1,42 @@ | |||
| from enum import Enum | |||
|
|
|||
| # TODO: how to determine this automatically? | |||
There was a problem hiding this comment.
Issue 4: There are two cases where we need to access non-python files. (See the Issue 4a and 4b comments). The code can't seem to access these files without me hard-coding a path.
There was a problem hiding this comment.
@forsyth2, I think you need a manifest.in.
https://github.com/E3SM-Project/polaris/blob/a44d4b8b308067287c007d2b17176d827fa39c9d/MANIFEST.in
@altheaden and I worked hard to find an alternative with luck.
There was a problem hiding this comment.
Interesting, good to know, thanks!
| # import sys | ||
| # logger.debug(f"sys.prefix: {sys.prefix}, ls sys.prefix: {os.listdir(sys.prefix)}") | ||
| # TODO: figure out install_path | ||
| install_path: str = INCLUSIONS_DIR | ||
| path: str = os.path.join(install_path, "index_template.html") |
There was a problem hiding this comment.
Issue 4b: See Issue 4 comment.
E3SM Diags uses sys.prefix (INSTALL_PATH = os.path.join(sys.prefix, "share/e3sm_diags/") in e3sm_diags/__init__.py, but there's no zppy-interfaces in my {sys.prefix}/share dir)
| @@ -0,0 +1,126 @@ | |||
| import os | |||
There was a problem hiding this comment.
@zhangshixuan1987 For E3SM-Project/zppy#647, this Viewer file may be useful for generating Viewers for PCMDI Diags. (This will be in the main branch of zppy-interfaces after this PR merges).
There was a problem hiding this comment.
@xylar I tried adding MANIFEST.in in the latest commit (b9d16ca), but I'm still getting FileNotFoundError: [Errno 2] No such file or directory.
I just need to do the pip install . step, right? Or does this need to be loaded into conda? https://github.com/E3SM-Project/polaris/pull/64/files appears to literally just add MANIFEST.in and nothing else.
| recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv | ||
| recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html |
There was a problem hiding this comment.
I tried zppy-interfaces, zppy_interfaces, and zi-global-time-series here.
| recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv | ||
| recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html |
There was a problem hiding this comment.
This command is for recursively including all files of a given type in the package:
| recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv | |
| recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html | |
| recursive-include zppy_interfaces *.csv | |
| recursive-include zppy_interfaces *.html |
This is not the syntax to use for including individual files, which I would not recommend. You should not have cvs or html files in the python package that you don't want to include in the conda package.
There was a problem hiding this comment.
@xylar Hmm I'm still getting the same error even with this change.
There was a problem hiding this comment.
@forsyth2, I see. I think the problem is 2-fold and I hadn't realized that. I think we have solved the first problem. Can you verify that, after calling pip install ., you can find the place where zppy-interfaces is installed and that both the csv and html file are included there along with the python files? I think that will be the case with the MANIFEST.in as we have it now.
The next problem is that you are looking for files in a location that e3sm_diags would move them to, but this is not how zppy-interfaces works. You need to use importlib-resources to find the files within the zppy-interfaces package. I will try to make the relevant suggested changes.
xylar
left a comment
There was a problem hiding this comment.
Here is how I would handle finding the csv and html files in the package. INCLUSIONS_DIR should not be needed.
| from bs4 import BeautifulSoup | ||
|
|
||
| from zppy_interfaces.global_time_series.coupled_global_utils import ( | ||
| INCLUSIONS_DIR, |
There was a problem hiding this comment.
| INCLUSIONS_DIR, |
| td.append(a) | ||
| row_obj.append(td) | ||
|
|
||
| path: str = os.path.join(INCLUSIONS_DIR, "index_template.html") |
There was a problem hiding this comment.
| path: str = os.path.join(INCLUSIONS_DIR, "index_template.html") | |
| path: str = str(imp_res.files("zppy_interfaces.global_time_series") / | |
| "index_template.html") |
| # Relies on MANIFEST.in to include files | ||
| INCLUSIONS_DIR = "zppy_interfaces/global_time_series" | ||
|
|
There was a problem hiding this comment.
| # Relies on MANIFEST.in to include files | |
| INCLUSIONS_DIR = "zppy_interfaces/global_time_series" |
|
|
||
| from zppy_interfaces.global_time_series.coupled_global_plotting import make_plot_pdfs | ||
| from zppy_interfaces.global_time_series.coupled_global_utils import ( | ||
| INCLUSIONS_DIR, |
There was a problem hiding this comment.
| INCLUSIONS_DIR, |
| def construct_land_variables(requested_vars: List[str]) -> List[Variable]: | ||
| var_list: List[Variable] = [] | ||
| header = True | ||
| with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file: |
There was a problem hiding this comment.
| with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file: | |
| path: csv_filename = str(imp_res.files("zppy_interfaces.global_time_series") / | |
| "zppy_land_fields.html") | |
| with open(csv_filename, newline="") as csv_file: |
|
@xylar Thanks, those changes work! |
|
Wonderful! Glad I could help. |
| # ['AR', 'time_bounds', 'CWDC', 'FSH', 'GPP', 'H2OSNO', 'HR', 'LAISHA', 'LAISUN', 'NBP', 'QINTR', 'QOVER', 'QRUNOFF', 'QSOIL', 'QVEGE', 'QVEGT', 'RH2M', 'SOIL1C', 'SOIL2C', 'SOIL3C', 'SOIL4C', 'SOILWATER_10CM', 'TOTLITC', 'TOTVEGC', 'TSA', 'WOOD_HARVESTC', 'lon_bnds', 'lat_bnds'] | ||
| # TODO: looks like we don't actually have area or landfrac in the dataset |
There was a problem hiding this comment.
It looked like we didn't have area or landfrac in our test input data (see the results of this logger line).
I tried adding the extra_vars = "area,landfrac" line below in the zppy cfg, to produce a new set of test input data, but zppy-interfaces is still showing that the key "area" doesn't exist: [ERROR]: coupled_global.py(set_var:184) >> "No variable named 'area'.
[[ lnd_monthly_glb ]]
extra_vars = "area,landfrac"
frequency = "monthly"
input_files = "elm.h0"
input_subdir = "archive/lnd/hist"
mapping_file = "glb"
vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
years = "1985:1995:5",
These variables exist in the zppy input though:
cd /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051/archive/lnd/hist
ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float area"
# float area(lat, lon) ;
ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float landfrac"
# float landfrac(lat, lon) ;Maybe they need to be added to vars rather than extra_vars.
There was a problem hiding this comment.
@forsyth2 do you have a copy of the global time series files that ncclimo generated? I thought both should be fixed variables included there, if not we should try include both as extra variables.
There was a problem hiding this comment.
do you have a copy of the global time series files that ncclimo generated
Yeah, that's the input data I'm testing on.
$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241203/lnd/glb/ts/monthly/5yr
$ ls
AR_198501_198912.nc H2OSNO_199001_199412.nc QINTR_198501_198912.nc QVEGE_199001_199412.nc SOIL3C_198501_198912.nc TOTVEGC_199001_199412.nc
AR_199001_199412.nc HR_198501_198912.nc QINTR_199001_199412.nc QVEGT_198501_198912.nc SOIL3C_199001_199412.nc TSA_198501_198912.nc
CWDC_198501_198912.nc HR_199001_199412.nc QOVER_198501_198912.nc QVEGT_199001_199412.nc SOIL4C_198501_198912.nc TSA_199001_199412.nc
CWDC_199001_199412.nc LAISHA_198501_198912.nc QOVER_199001_199412.nc RH2M_198501_198912.nc SOIL4C_199001_199412.nc WOOD_HARVESTC_198501_198912.nc
FSH_198501_198912.nc LAISHA_199001_199412.nc QRUNOFF_198501_198912.nc RH2M_199001_199412.nc SOILWATER_10CM_198501_198912.nc WOOD_HARVESTC_199001_199412.nc
FSH_199001_199412.nc LAISUN_198501_198912.nc QRUNOFF_199001_199412.nc SOIL1C_198501_198912.nc SOILWATER_10CM_199001_199412.nc
GPP_198501_198912.nc LAISUN_199001_199412.nc QSOIL_198501_198912.nc SOIL1C_199001_199412.nc TOTLITC_198501_198912.nc
GPP_199001_199412.nc NBP_198501_198912.nc QSOIL_199001_199412.nc SOIL2C_198501_198912.nc TOTLITC_199001_199412.nc
H2OSNO_198501_198912.nc NBP_199001_199412.nc QVEGE_198501_198912.nc SOIL2C_199001_199412.nc TOTVEGC_198501_198912.nc
$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_extra_vars/lnd/glb/ts/monthly/5yr
$ ls
AR_198501_198912.nc H2OSNO_199001_199412.nc QINTR_198501_198912.nc QVEGE_199001_199412.nc SOIL3C_198501_198912.nc TOTVEGC_199001_199412.nc
AR_199001_199412.nc HR_198501_198912.nc QINTR_199001_199412.nc QVEGT_198501_198912.nc SOIL3C_199001_199412.nc TSA_198501_198912.nc
CWDC_198501_198912.nc HR_199001_199412.nc QOVER_198501_198912.nc QVEGT_199001_199412.nc SOIL4C_198501_198912.nc TSA_199001_199412.nc
CWDC_199001_199412.nc LAISHA_198501_198912.nc QOVER_199001_199412.nc RH2M_198501_198912.nc SOIL4C_199001_199412.nc WOOD_HARVESTC_198501_198912.nc
FSH_198501_198912.nc LAISHA_199001_199412.nc QRUNOFF_198501_198912.nc RH2M_199001_199412.nc SOILWATER_10CM_198501_198912.nc WOOD_HARVESTC_199001_199412.nc
FSH_199001_199412.nc LAISUN_198501_198912.nc QRUNOFF_199001_199412.nc SOIL1C_198501_198912.nc SOILWATER_10CM_199001_199412.nc
GPP_198501_198912.nc LAISUN_199001_199412.nc QSOIL_198501_198912.nc SOIL1C_199001_199412.nc TOTLITC_198501_198912.nc
GPP_199001_199412.nc NBP_198501_198912.nc QSOIL_199001_199412.nc SOIL2C_198501_198912.nc TOTLITC_199001_199412.nc
H2OSNO_198501_198912.nc NBP_199001_199412.nc QVEGE_198501_198912.nc SOIL2C_199001_199412.nc TOTVEGC_198501_198912.nc
$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_vars
# no lnd directory
$ cd scripts
$ grep -in "error" *.o*
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:15:ncrcat: ERROR no variables fit criteria for processing
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:17:ncrcat: ERROR no variables fit criteria for processing
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
ts_lnd_monthly_glb_1985-1989-0005.o639437:15:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1985-1989-0005.o639437:17:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1985-1989-0005.o639437:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
ts_lnd_monthly_glb_1990-1994-0005.o639438:15:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1990-1994-0005.o639438:17:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1990-1994-0005.o639438:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
There was a problem hiding this comment.
yep. You are right, neither area and landfrac in global mean datasets. I think it makes sense to try if both can be added to vars.
There was a problem hiding this comment.
But adding to vars doesn't work either -- that's that third ls block above.
From ts_lnd_monthly_glb_1985-1989-0005.o639437:
ncrcat: ERROR no variables fit criteria for processing
ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\
, aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\
w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm
ncrcat: ERROR no variables fit criteria for processing
ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\
, aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\
w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm
ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
That is, area, and landfrac clearly are included in the original nc files but I just can't seem to extract them either through extra_vars or vars.
There was a problem hiding this comment.
@forsyth2 thanks for testing both. It looks like extra_vars does work with generic variable splitting (e.x. land_monthly task) but not when --glb flag is on (e.x. land_monthly_global). I think there are a few possibilities for moving forward:
- include a [land_monthly] task to get both
areaandlandfrac, and use both in downstream. - based on @czender 's comment , there is a new version of
ncclimothat supports global integral scaling. To do so, we need additional logic to filter through variables in the time-series task. - we can ask @czender if it is possible to add
extra-varssupport in--glbmode, to keepareaandlandfracin the global time-series files. In the mean time, I think you could tentatively use the first approach to get values of two variables..
| logger.error(f"area.shape={area.shape}") # area.shape=(180, 360) | ||
| logger.error( | ||
| f"landfrac.shape={landfrac.shape}" | ||
| ) # landfrac.shape=(180, 360) | ||
| logger.error( | ||
| f"data_array.shape={data_array.shape}" | ||
| ) # data_array.shape=(10, 3) | ||
| # e: dimensions cannot change for in-place operations |
There was a problem hiding this comment.
@chengzhuzhang Ok, using the separate land_monthly non-global ts task (option 1 above), I can get area and landfrac, but now I'm having issues with the dimensions. I'm not sure how to take (180,360) to (10,3).
There was a problem hiding this comment.
Ah, ok I think I have a reasonable dimensionality reduction:
if metric == Metric.TOTAL:
logger.debug(
f"self.extra_var_dataset.keys()={list(self.extra_var_dataset.keys())}"
)
area: xarray.core.dataarray.DataArray = self.extra_var_dataset["area"]
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset["landfrac"]
# area.shape() = (180, 360)
total_area = area.sum() # Sum over all dimensions
# landfrac.shape=(180, 360)
average_landfrac = landfrac.mean() # Mean over all dimensions
total_land_area = total_area * average_landfrac
# data_array.shape = (number of years, number of regions)
data_array *= total_land_area
logger.info(
"for Metric.TOTAL, data_array has been scaled by total land area"
)
There was a problem hiding this comment.
Hello @forsyth2 and @chengzhuzhang, I'm trying to understand the change(s) you would like in ncclimo. As you know, area and landfrac can be output in every default (non-spatially averaged) timeseries by requesting them with the --var_xtr=area,landfrac. However, as @forsyth2 has discovered, this behavior changes when requesting global/regional-average timeseries. Currently the spatial-mean timeseries output includes only the geophysical field of interest, no matter what was requested with --var_xtr.
I would be happy to change that behavior so that in spatial-averaged timeseries files ...
- Always contain the full (i.e., non-spatially averaged) variables requested with
--var_xtrOR - Always contain the full
areaandlandfracvariables OR - Behave as currently by default, and contain the full
areaandlandfracvariables when an additional switch is provided - Behave as currently by default, and contain the full variables requested with
--var_xtrwhen an additional switch is provided.
These changes would be relatively straightforward to implement. Keep in mind that the current behavior was implemented so that spatial-mean timeseries files are orders of magnitude smaller than their non-spatial-average counterparts. Adding full fields to spatial average timeseries will significantly inflate their size. However, area and landfrac are time-constant variables so the resulting filesizes would still be much smaller than full timeseries files if only these variables are added.
Also happy to discuss other options, like providing a sane way for zppy to provide the desired scale factors to ncclimo which would then apply them to the fields during spatial-average timeseries generation. Feedback welcome.
There was a problem hiding this comment.
@forsyth2 The "dimensionality reduction" you mention above triggers some red flags with me. In particular this line
total_land_area = total_area * average_landfrac is questionable because area and landfrac can vary indepently so I would expect the total land area to be defined by total_land_area = (area*landfrac).sum. If I am right, the former method would produce small errors that might not be noticed (e.g., 1-2%) until/unless carefully examined. Or I could be totally off base because I'm not sure how you're using the variables in the code :) In any case, you might try both methods and then compare against a known-to-be-correct answer.
There was a problem hiding this comment.
I personally would say (1) or (4) since it allows users to set whatever extra variable they want rather than hard-coding in area or landfrac.
I appear to have a workaround here working now without these changes in ncclimo, but it is a bit clunky.
There was a problem hiding this comment.
@czender I was thinking the same, to have the product area_times_landfrac would be easiest for zppy in this use case.
There was a problem hiding this comment.
and perhaps making the new additon default would be fine, cause the new variables is small.
There was a problem hiding this comment.
OK I will add a field called something likearea_times_landfrac to the output of all spatially-averaged timeseries. I'm headed to AGU on Friday and will return 12/16. Hopefully finish and release this in NCO 5.3.0 sometime that week.
There was a problem hiding this comment.
Thanks for your help, Charlie! Have a nice trip to DC.
forsyth2
left a comment
There was a problem hiding this comment.
@chengzhuzhang I think this is getting pretty close to done. I just have a few comments, and then I'll ask the Land team to review further. I also want to update the unit tests too.
Results are visible at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/ (note: that's not a stable link, meaning it will update with every time I run updated code). How does that look?
Corresponding changes on the zppy side are at https://github.com/E3SM-Project/zppy/pull/648/files
| # since the action is run on a branch in detached head state. | ||
| # This is the equivalent of running "pre-commit run --all-files" locally. | ||
| # If you commit with the `--no-verify` flag, this check may fail. | ||
| # TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually. |
There was a problem hiding this comment.
@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.
| if "area_times_landfrac" in keys: | ||
| total_land_area = self.extra_var_dataset["area_times_landfrac"] | ||
| else: | ||
| area: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
| "area" | ||
| ] | ||
| landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
| "landfrac" | ||
| ] | ||
| # area.shape() = (180, 360) | ||
| # landfrac.shape() =(180, 360) | ||
| total_land_area = (area * landfrac).sum() # Sum over all dimensions | ||
| # data_array.shape = (number of years, number of regions) | ||
| data_array *= total_land_area |
There was a problem hiding this comment.
@chengzhuzhang This is how I'm handling the total_land_area scaling now.
There was a problem hiding this comment.
@forsyth2 it is okay to use this logic for now for demonstration. The scaling is only good for global area, but not for N./S. hemisphere, which we can update later after ncclimo includes the scaling factor.
There was a problem hiding this comment.
I think it is possible to get N_land_area and S_land_area, but sub-select with lat >= 0 or <= 0 for both area and landfrac.
There was a problem hiding this comment.
I tried updating to the following but I'm still getting some blank plots (strangely it's inconsistent if it's the glb or n or s plot that's empty).
if metric == Metric.TOTAL:
keys = list(self.extra_var_dataset.keys())
logger.debug(f"self.extra_var_dataset.keys()={keys}")
if "area_times_landfrac" in keys:
total_land_area = self.extra_var_dataset["area_times_landfrac"]
else:
area: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"area"
]
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"landfrac"
]
# area.shape() = (180, 360)
# landfrac.shape() = (180, 360)
total_land_area = (area * landfrac).sum() # Sum over all dimensions
# Account for hemispheric plots:
north_area = area.where(area.lat >= 0)
south_area = area.where(area.lat <= 0)
north_landfrac = landfrac.where(landfrac.lat >= 0)
south_landfrac = landfrac.where(landfrac.lat <= 0)
north_land_area = (north_area * north_landfrac).sum()
south_land_area = (south_area * south_landfrac).sum()
# data_array.shape = (number of years, number of regions)
# We want to keep those dimensions, but with these values:
# (glb*total_land_area, n*north_land_area, s*south_land_area)
data_array[:, 0] *= total_land_area
data_array[:, 1] *= north_land_area
data_array[:, 2] *= south_land_area
| if extra_vars_directory: | ||
| # If an extra_vars_directory is provided, we'll use that to open a new dataset | ||
| self.extra_var_dataset = xcdat.open_mfdataset( | ||
| f"{extra_vars_directory}*.nc", | ||
| center_times=True, | ||
| ) | ||
| else: | ||
| # Otherwise, we'll use the same dataset. | ||
| self.extra_var_dataset = self.dataset |
There was a problem hiding this comment.
@chengzhuzhang Until ncclimo is updated to allow extra vars in glb runs, we can just specify an alternative path (i.e., to the [land_monthly] output).
This looks good! I would suggest to update |
forsyth2
left a comment
There was a problem hiding this comment.
@thorntonpe @BunnyVon (and maybe also @dmricciuto @wlin7) This is ready for review.
For background context, we recently decided to split zppy into two packages: zppy itself will do its original purpose of orchestrating workflows while this new package zppy-interfaces will handle the "last mile" stretches of glue code to plot or otherwise post-process data. (I.e. zppy should not be doing any post-processing itself, only coordinating other post-processing tools.)
I have test results at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/.
How to run your own tests
I think you also wanted to run some of your own test cases before we merge this PR. To do that:
# Set up zppy-interfaces
cd ~
git clone git@github.com:E3SM-Project/zppy-interfaces.git
cd zppy-interfaces
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zi_env
conda activate my_zi_env
pip install .
# Set up zppy
cd ~
git clone git@github.com:E3SM-Project/zppy.git
cd zppy
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zppy_env
conda activate my_zppy_env
pip install .
# Create a cfg
cd ~
emacs test_land_viewers.cfg
zppy -c test_land_viewers.cfgSample cfg
Below is an example cfg that you could modify with your own input data and settings. (In particular, these parameters will likely need changing: case, input, output, www, experiment_name, figstr)
[default]
case = "v3.LR.historical_0051"
constraint = ""
dry_run = "False"
environment_commands = ""
fail_on_dependency_skip = True
guess_path_parameters = False
guess_section_parameters = False
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
input_subdir = archive/atm/hist
mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc"
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/unique_id/v3.LR.historical_0051"
partition = "debug"
qos = "regular"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_www/unique_id"
years = "1985:1989:2",
[ts]
active = True
e3sm_to_cmip_environment_commands = ""
walltime = "00:30:00"
[[ land_monthly ]]
extra_vars = "area,landfrac"
frequency = "monthly"
input_files = "elm.h0"
input_subdir = "archive/lnd/hist"
mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc"
years = "1985:1995:5",
vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
[[ atm_monthly_glb ]]
# Note global average won't work for 3D variables.
frequency = "monthly"
input_files = "eam.h0"
input_subdir = "archive/atm/hist"
mapping_file = "glb"
years = "1985:1995:5",
[[ lnd_monthly_glb ]]
frequency = "monthly"
input_files = "elm.h0"
input_subdir = "archive/lnd/hist"
mapping_file = "glb"
vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
years = "1985:1995:5",
[global_time_series]
active = True
climo_years = "1985-1989", "1990-1995",
environment_commands = "source <INSERT PATH TO CONDA>/conda.sh; conda activate my_zi_env"
experiment_name = "v3.LR.historical_0051"
figstr = "v3.LR.historical_0051"
make_viewer = True
num_cols = 1
num_rows = 1
plots_lnd = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
ts_num_years = 5
ts_years = "1985-1989", "1985-1995",
walltime = "00:30:00"
years = "1985-1995",
Reviewing code changes
I'm not sure how relevant the code changes themselves are to you, but I've marked some important areas as part of this review. Notably, the bulk of the changes are in this PR, but there are also a few changes on the zppy side: https://github.com/E3SM-Project/zppy/pull/648/files
| # Non-derived variables | ||
| annual_average_dataset_for_var: xarray.core.dataset.Dataset = ( | ||
| self.dataset.temporal.group_average(var, "year") | ||
| ) | ||
| data_array = annual_average_dataset_for_var.data_vars[var] | ||
| if metric == Metric.TOTAL: | ||
| keys = list(self.extra_var_dataset.keys()) | ||
| logger.debug(f"self.extra_var_dataset.keys()={keys}") | ||
| if "area_times_landfrac" in keys: | ||
| total_land_area = self.extra_var_dataset["area_times_landfrac"] | ||
| else: | ||
| area: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
| "area" | ||
| ] | ||
| landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
| "landfrac" | ||
| ] | ||
| # area.shape() = (180, 360) | ||
| # landfrac.shape() = (180, 360) | ||
| total_land_area = (area * landfrac).sum() # Sum over all dimensions | ||
|
|
||
| # Account for hemispheric plots: | ||
| north_area = area.where(area.lat >= 0) | ||
| south_area = area.where(area.lat <= 0) | ||
| north_landfrac = landfrac.where(landfrac.lat >= 0) | ||
| south_landfrac = landfrac.where(landfrac.lat <= 0) | ||
| north_land_area = (north_area * north_landfrac).sum() | ||
| south_land_area = (south_area * south_landfrac).sum() | ||
|
|
||
| # data_array.shape = (number of years, number of regions) | ||
| # We want to keep those dimensions, but with these values: | ||
| # (glb*total_land_area, n*north_land_area, s*south_land_area) | ||
| data_array[:, 0] *= total_land_area | ||
| data_array[:, 1] *= north_land_area | ||
| data_array[:, 2] *= south_land_area | ||
| units = data_array.units | ||
| # `units` will be "1" if it's a dimensionless quantity | ||
| if (units != "1") and (original_units != "") and original_units != units: | ||
| raise ValueError( | ||
| f"Units don't match up: Have {units} but expected {original_units}. This renders the supplied scale_factor ({scale_factor}) unusable." | ||
| ) | ||
| if (scale_factor != 1) and (final_units != ""): | ||
| data_array *= scale_factor | ||
| units = final_units | ||
| return data_array, units |
There was a problem hiding this comment.
This is the code block where we do average or total calculations.
| valid_vars, | ||
| invalid_vars, | ||
| rgn, | ||
| extra_vars_dict=exp["land"].replace("glb", "180x360_aave"), |
There was a problem hiding this comment.
Right now, ncclimo can't generate area and landfrac for globally averaged datasets. So, we're currently relying on having a separate land_monthly task defined in the cfg to set up these variables.
|
@forsyth2 and @chengzhuzhang, While Hence I suggest that The latest It is unclear whether you want Feedback welcome. Charlie |
|
Thank you for the update, Charlie @czender |
|
@czender Thanks, I like the
My personal preference would be to enable that functionality. It's nice to have the flexibility.
I'm working on Chrysalis, so I'll try |
|
@czender I get the following regardless if I set |
|
@forsyth2 That's because I haven't built NCO on Chrysalis (as opposed to Perlmutter) in a long time. I will try to do so today and ping you then. |
|
@forsyth2 Please try again on Chrysalis (and/or Perlmutter). I have rebuilt the latest snapshot there. BTW, the latest snapshot always includes |
|
@czender Hmm now I'm getting That's with this cfg snippet: Alternatively the error in #9 (comment) appears when using: What's the proper usage?
The one change in
That seems reasonable to me. |
|
@forsyth2 Sorry for the problems. The intended proper usage is your last row, don't add any switches at all, and everything you might want will be in the output files in global timeseries mode. As for why you get shared library errors when invoking |
|
Thanks @czender! To put everything in one place, I just ran cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# ts_lnd_monthly_glb_1985-1989-0005.status:ERROR (2)
# ts_lnd_monthly_glb_1990-1994-0005.status:ERROR (2)
cat ts_lnd_monthly_glb_19*.o*That gives: (For reference, E3SM-Project/zppy@ee60560 includes the changes for this) |
|
And the cfg is |
|
You have given me all the information I need. I think. I altered the |
|
@forsyth2 Did those latest |
|
@czender Thanks, I've gotten a lot further, but still seeing blank plots on the zppy-interfaces side. However, that could be an implementation problem on my end*. I'm still debugging. It does seem like *For reference, the latest relevant logic is: |
|
@forsyth2 Your logic looks good. Not sure why plots are blank. I will hold-off releasing until this works in zppy. However, these lines will double-count gridcells centered on the equator so that north+south != total: To avoid this NCO defines |
|
@czender I'm still running into issues with properly scaling the data by the valid area. I do feel like the problem is on the zppy-interfaces side, but obviously can't fully confirm that yet. I'm going to have to look into this more next week. As far as the release schedule, mainly we just need the new NCO by the Jan. 15 release candidate deadline for E3SM Unified, for testing. |
|
Mostly for my reference: zppy steps
zppy-interfaces steps
mv /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241220
mv /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars_v3/v3.LR.historical_0051/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/
Plots are always showing up blank, even after deleting cache. |
|
Turns out it may have been a cache problem after all. I'm seeing plots now on https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/table_lnd/index.html. Maybe I needed to delete more cached data?? Or use a different browser? |
|
@czender As far as I can tell, the NCO fix is in fact working. The cached empty plot issue was very unfortunate; it turns out the NCO fixes were working on Friday after all. |
|
FYI NCO 5.3.0 has been released and is in my personal directories on Perlmutter, Chrsalis, and Acme1. |
d09c1a1 to
dbcd011
Compare
dbcd011 to
f26e7ab
Compare
|
An example of our new commit paradigm, discussed on E3SM-Project/zstash#355: Rebase stepsgit checkout issue-601-viewers
# This part is unnecessary, but I like to create a backup branch in case I mess something up
git checkout -b issue-601-viewers-until-20250114
git checkout issue-601-viewers
git log
# We have 19 commits here
# We can probably group them as follows:
# 1. Add Viewer support
# 2. Working land viewer
# 3. Atm viewer exists but is not linked to
# 4. Index points to atm and lnd viewers
# 5. Misc updates
# 6. Refactored coupled_global
# 7. Misc updates
# 8. MANIFEST.in not working
# 9. Updates
# 10. Attempt adding extra_vars
# 11. Working Total plots
# 12. Improved Total plotting
# 13. Add make_viewer parameter
# 14. Only show non empty viewers
# 15. Address comments
# 16. Fix unit tests
# 17. Clean up code
# 18. Attempt using new NCO
# 19. TOTAL plots working
# So, now we can rebase accordingly
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514That gives: Which we'll change to: git log
# Now, we're down to 9 commits
# Let's give these commits better names
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514We have: Let's do: Rename those 4 to: GitHub tells us we're expecting a merge conflict: git fetch upstream
git rebase upstream/mainWe have: Change to: git grep -n "<<<" tests
# No more diffs
git add tests/integration/global_time_series/cases_global_time_series.py
git rebase --continue
git push -f upstream issue-601-viewers |
|
E3SM-Project/zppy#654 removed in |



Issue resolution
zppyPR: Add Viewer support zppy#648Select one: This pull request is...
1. Does this do what we want it to do?
Objectives:
global_time_series, similar to those of E3SM Diagsglobal_time_seriesplotsRequired:
2. Are the implementation details accurate & efficient?
Required:
If applicable:
zppy-interfaces/conda, not just animportstatement.beautifulsoup4,lxml,output_viewer3. Is this well documented?
Required:
4. Is this code clean?
Required:
If applicable: