Skip to content

Fix 3D variable exclusion for global time series to prevent partial string matches#779

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/fix-glb-atmosphere-error
Draft

Fix 3D variable exclusion for global time series to prevent partial string matches#779
Copilot wants to merge 5 commits into
mainfrom
copilot/fix-glb-atmosphere-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 10, 2026

Summary

Objectives:

  • Fix variable exclusion logic that incorrectly matched partial variable names, causing OMEGA500,U,U10OMEGA50010
  • Replace bash string substitution with sed pattern matching for complete token matching
  • Add test coverage for variable exclusion edge cases

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Small Change

  • To merge, I will use "Squash and merge". That is, this change should be a single commit.
  • Logic: I have visually inspected the entire pull request myself.
  • Pre-commit checks: All the pre-commits checks have passed.

Problem

When mapping_file = "glb", the template excluded 3D variables (U, U10, V) using ${vars//,U}, which removed all occurrences of the substring ,U. With input T,OMEGA500,U,U10,V, this produced T,OMEGA50010,V instead of T,OMEGA500.

Solution

zppy/templates/ts.bash: Replace string substitution with sed pattern matching:

# Old (buggy)
vars=${vars//,U}

# New (correct)
vars=$(echo "$vars" | sed 's/\(^\|,\)U10\(,\|$\)/\1/g; s/\(^\|,\)U\(,\|$\)/\1/g; s/\(^\|,\)V\(,\|$\)/\1/g; s/,,*/,/g; s/^,//; s/,$//')

Pattern \(^\|,\)VAR\(,\|$\) matches complete tokens, not substrings.

tests/test_ts_template.py: Added unit tests covering:

  • Variables at start/middle/end positions
  • Variables containing 'U' substring (USTAR, DUST, OMEGA500)
  • Empty and single-variable edge cases
Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug]: Error reported by excluding U for glb atmsphere variable</issue_title>
<issue_description>### What happened?

This error is reported by Mark Z.: My config file contains a list of vars that I want create global average. I can confirm these are the requested ones in my settings file (ts_atm_monthly_glb_1979-2022-0044.settings). But in the output file, it is hitting an error because it is trying to regrid a variable that I didn’t ask for (which does not exist). Namely, I am trying to regrid OMEGA500, but for some reason the ts_atm_monthly_glb_1979-2022-0044.o44894299 file shows that I am asking for OMEGA50010.

More specifically, it seems like it took my variable request, which includes
...T,OMEGA500,U,U10,V,ICEFRAC…
And sort of concatenated / dropped out the ",U,U" portion of it to make:
...T,OMEGA50010,V,ICEFRAC...

I think this problem is caused by the following lines:

{% if mapping_file == 'glb' -%}
vars={{ vars }}
# https://unix.stackexchange.com/questions/237297/the-fastest-way-to-remove-a-string-in-a-variable
# https://stackoverflow.com/questions/26457052/remove-a-substring-from-a-bash-variable
# Remove U, since it is a 3D variable and thus will not work with rgn_avg
vars=${vars//,U}
{%- else %}
vars={{ vars }}
{%- endif %}

I think at some point we can consider removing these lines, or make it more precisely to excluding 3d vars. note that V is also a 3d variable provided by the users. With the new NCO update from here #704, the user can generate global time-series for all 2d vars when var = "", (NCO takes care of fitering out multi-level fields).

What machine were you running on?

Perlmutter

Environment

zppy v3.0.0

What command did you run?

N/A

Copy your cfg file

N/A

What jobs are failing?

What stack trace are you encountering?

```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@forsyth2</author><body>
> With the new NCO update from here https://github.com/E3SM-Project/zppy/issues/704, the user can generate global time-series for all 2d vars when var = "", (NCO takes care of [filtering] out multi-level fields).

That is, as you've noted, with E3SM-Project/zppy#717 (the fix for E3SM-Project/zppy#704), setting `var =""` is equivalent to setting all vars _except_ the 3D ones. 

> sort of concatenated / dropped out the ",U,U" portion
> 
> I think at some point we can consider removing these lines, or make it more precisely to excluding 3d vars.


We'd need a thorough list of 3D vars to exclude. But yes, I see if we had something like "remove `U`, `U10`, `V`", then the variable list fed to `ncclimo` would have been correctly formed. 

So, I suppose the available options are:
1. Change `vars=${vars//,U} ` to `shopt -s extglob; vars=${vars//,@(U|U10|V)}`, `vars=${vars//,U@(10|)}; vars=${vars//,V}`, or `vars=$(echo "$vars" | sed 's/,U10\|,U\|,V//g')` (and expanding that list to include any additional 3D variables).
2. Remove `vars=${vars//,U} ` and have `ncclimo` fail when it hits the explicitly requested 3D variable.
3. Have Charlie update NCO to ignore explicitly requesting a 3D variable. [Here](https://github.com/E3SM-Project/zppy/issues/704#issuecomment-2779366045), Charlie says:
> If there is interest, I can make `ncclimo` ignore bad user choices (i.e., multi-level fields), though that can be a slippery slope</body></comment_new>
</comments>

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits February 10, 2026 20:17
Replace simple string substitution with sed pattern matching to correctly
exclude 3D variables (U, U10, V) without matching partial variable names
like OMEGA500. This fixes the bug where OMEGA500,U,U10 was incorrectly
becoming OMEGA50010.

Co-authored-by: chengzhuzhang <13056557+chengzhuzhang@users.noreply.github.com>
Added comprehensive tests to verify that the sed-based variable exclusion
correctly handles edge cases like OMEGA500,U,U10 without creating invalid
variable names.

Co-authored-by: chengzhuzhang <13056557+chengzhuzhang@users.noreply.github.com>
- Remove unused tempfile import
- Remove unused vars_with_sed variable
- Add assertion to verify non-glb behavior

Co-authored-by: chengzhuzhang <13056557+chengzhuzhang@users.noreply.github.com>
- Remove duplicate mapping_file parameter
- Keep test focused on non-glb behavior without area_nm param

Co-authored-by: chengzhuzhang <13056557+chengzhuzhang@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix error caused by excluding U for glb atmosphere variable Fix 3D variable exclusion for global time series to prevent partial string matches Feb 10, 2026
Copilot AI requested a review from chengzhuzhang February 10, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Error reported by excluding U for glb atmsphere variable

2 participants