Skip to content

AI attempt at changing eamxx build system to only build used physicss#8320

Open
jgfouca wants to merge 4 commits into
masterfrom
jgfouca/eamxx_cime_build_used_phys_only
Open

AI attempt at changing eamxx build system to only build used physicss#8320
jgfouca wants to merge 4 commits into
masterfrom
jgfouca/eamxx_cime_build_used_phys_only

Conversation

@jgfouca
Copy link
Copy Markdown
Member

@jgfouca jgfouca commented Apr 24, 2026

This could potentially be quite a speed up when it comes to building lots of eamxx builds during nightlies.

[BFB]

@jgfouca jgfouca self-assigned this Apr 24, 2026
@jgfouca jgfouca added BFB PR leaves answers BFB EAMxx C++ based E3SM atmosphere model (aka SCREAM) labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the EAMxx CMake build to conditionally build/link only the physics
packages that are active in a given CIME case, aiming to reduce build time for
nightly workflows while preserving BFB behavior.

Changes:

  • Add CIME-only logic in physics/CMakeLists.txt to query atm_procs_list and
    skip inactive physics subdirectories.
  • Update mct_coupling/CMakeLists.txt to link only physics libraries whose
    CMake targets were actually built.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
components/eamxx/src/physics/CMakeLists.txt Query active processes in CIME builds and conditionally add physics subdirectories.
components/eamxx/src/mct_coupling/CMakeLists.txt Make physics linking robust to conditionally-built physics targets.

eamxx_add_physics_subdirectory(spa)
eamxx_add_physics_subdirectory(spc)
eamxx_add_physics_subdirectory(nudging)
eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mam is now added via eamxx_add_physics_subdirectory(...) unconditionally. This bypasses the existing SCREAM_ENABLE_MAM option (defined in components/eamxx/CMakeLists.txt) and will force MAM (and the mam4xx external) to build even when SCREAM_ENABLE_MAM=OFF (e.g., some machine configs explicitly disable it). Wrap the mam subdirectory addition in an if (SCREAM_ENABLE_MAM) guard (and keep skipping it when the option is OFF), independent of the active-procs query result.

Suggested change
eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes)
if (SCREAM_ENABLE_MAM)
eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes)
endif()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mam atm physics processes will be on if SCREAM_ENABLE_MAM is off?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this PR removes SCREAM_ENABLE_MAM in one of the changes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odiazib , it doesn't eliminate it entirely. It just standardizes the check if mam is on in this file (by looking at active physics).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming here that somewhere in our config/buildnml system, there is code that checks if SCREAM_ENABLE_MAM is set and, if so, makes sure that mam stuff does not appear in atm_procs lists.

@jgfouca jgfouca requested a review from odiazib April 24, 2026 20:20
add_subdirectory(gw)

eamxx_add_physics_subdirectory(shoc)
eamxx_add_physics_subdirectory(cld_fraction cld_fraction cld_frac_net)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that could be a bit of problem here is that this implementation couples atm_proc names to this file. The AI wanted to do matching, like "mam*" to catch all mam4 atm_procs. I rejected this at first because I thought the proc names would match the subdir names exactly. This turned out to not be true for cld_fraction and mam, so it just listed them all.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that may be a big problem if a certain feature ever gets used. Namely, users could have 2+ instances of, say, p3, with arbitrary (and different) names:

   some_proc_group:
     atm_procs_list: [my_name, my_other_name]
     my_name:
        type: p3
        <some_params>
     my_other_name:
        type: p3
        <some_different_params>

Someone asked me about this possibility very recently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m working on an LDRD where we’re using a chemistry solver. I’m planning to use different chemistry for the troposphere and the stratosphere. The interfaces for both options are almost identical, with one difference that I can control using a namelist parameter. To make this approach more flexible, I want to use the same interface and select the behavior using different namelist parameters.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current impl, they'd have to add the proc name here. Is that acceptable? Should we add a new field to atm_procs that specifies which physics subdir it uses?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, adding the code src directory to the input yaml file is definitely not right.

@jgfouca jgfouca requested a review from rljacob April 24, 2026 20:37
list(APPEND SCREAM_LIBS zm)
endif()

if (SCREAM_ENABLE_MAM)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we use SCREAM_ENABLE_MAM to turn off MAM. If we remove it, we will always have to build with MAM.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odiazib , I don't think so. As long as mam4 processes are not in the atm_proc list, it won't get built. All physics now work this way.

nudging
tms
zm
mam
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we’re not using MAM, will this produce a build error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the TARGET check below will prevent it from being added if it doesn't exist

eamxx_add_physics_subdirectory(cosp)
eamxx_add_physics_subdirectory(tms)
eamxx_add_physics_subdirectory(iop_forcing)
eamxx_add_physics_subdirectory(zm)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess mam is missing here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mam was not one of the subdirs protected by the DOUBLE_PREC check.

add_subdirectory(gw)

eamxx_add_physics_subdirectory(shoc)
eamxx_add_physics_subdirectory(cld_fraction cld_fraction cld_frac_net)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m working on an LDRD where we’re using a chemistry solver. I’m planning to use different chemistry for the troposphere and the stratosphere. The interfaces for both options are almost identical, with one difference that I can control using a namelist parameter. To make this approach more flexible, I want to use the same interface and select the behavior using different namelist parameters.

eamxx_add_physics_subdirectory(spa)
eamxx_add_physics_subdirectory(spc)
eamxx_add_physics_subdirectory(nudging)
eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this PR removes SCREAM_ENABLE_MAM in one of the changes.

@rljacob
Copy link
Copy Markdown
Member

rljacob commented Apr 25, 2026

Wouldn't a better way to speed up the build be to build everything in one executable and use runtime configs to decide what to execute? Using the same executable in each test.

@jgfouca
Copy link
Copy Markdown
Member Author

jgfouca commented Apr 27, 2026

@rljacob , that has been a stretch goal for E3SM for years, but we've never tackled it. We currently have the somewhat hacky approach of allowing developers to explicitly set shared to True for certain python test suites. Guaranteeing build compatibility across cases is a tough problem that goes beyond eamxx.

@rljacobuc
Copy link
Copy Markdown

@jgfouca yes but eamxx should be fully runtime configurable and so an all F-case eamxx suite should be able easily use one executable.

@jgfouca
Copy link
Copy Markdown
Member Author

jgfouca commented Apr 27, 2026

@rljacob , that's true. I have not really explored using shared much for eamxx. This PR would make it impossible anyway. The complaint that triggered this PR was a user who created an eamxx case and then got a ZM build error but they were not even using ZM, so sharing vs. not sharing wasn't the issue.

@rljacob
Copy link
Copy Markdown
Member

rljacob commented Apr 27, 2026

(switching logins) This PR seems like an over-reaction to that case. Just make the ZM build more robust. Runtime configurability is where E3SM needs to go so we shouldn't be making that harder.

@jgfouca
Copy link
Copy Markdown
Member Author

jgfouca commented Apr 27, 2026

This PR seems like an over-reaction to that case

@rljacob , you could be right. Ultimately, it's up to the team to decide what they want, so I will need all the reviewers to weigh in here.

@jgfouca
Copy link
Copy Markdown
Member Author

jgfouca commented Apr 27, 2026

@rljacob , I should also note that the physics builds can be pretty expensive, especially on GPU machines, due to the size of the kernels. For nightlies, this could be motivation to explore shared more, but for individual developers working on a case they just made, they may get irritated that they are waiting 5-10 minutes for ZM to build when they aren't even using it.

@jgfouca jgfouca requested review from tcclevenger April 27, 2026 16:04
@mahf708
Copy link
Copy Markdown
Contributor

mahf708 commented Apr 28, 2026

As people who ran/run eamxx a lot and know a bit of eamxx dev, I think both @brhillman and I only advocate for a simple build-time flag for stuff like ZM (so any additional package in physics or elsewhere should have an associated build flag). A nice-to-have is making that flag FALSE by default for now. I think that simply is it for the issue encountered recently (and btw, not for the first time, but not everyone reports stuff)

I am not against this PR in principle and I am happy to review it carefully and offer more opinions (which I will do in a sec). I do (pretty strongly) agree with @jgfouca that we do NOT want to build stuff we don't need; so I disagree with taking the easy way out of just building everything and inflating e3sm.exe with random stuff. If eamxx becomes like the insane dumpster that's cam/eam where all sort of random stuff is built, then we are definitively cooked. Since we don't provide E3SM as a prebuilt package/executable, build time is a serious waste of time and increasing it is bad engineering.

Copy link
Copy Markdown
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue with this PR is that it contradicts the long-established ways of E3SM (whether those are right or wrong is a separate issue). In particular, E3SM is cornered in this progression due to historical reasons or maybe due to CIME design choices, idk

E3SM/run_e3sm.template.sh

Lines 114 to 136 in b481000

# Fetch code from Github
fetch_code
# Create case
create_newcase
# Custom PE layout
custom_pelayout
# Setup
case_setup
# Build
case_build
# Configure runtime options
runtime_options
# Copy script into case_script directory for provenance
copy_script
# Submit
case_submit

In this template run script (in the root of the repo no less), the user is expected to set/change runtime options AFTER the case is set up and built. In EAMxx, atm_procs_list is a runtime concern, and thus, inferring stuff from it for build time is counterintuitive.

An alternative approach is to basically add more complex logic in atm_procs_list so that if, after building, the user is requesting a package that's not built, it would print an error saying the executable wasn't built with that package, and instructing the user to --force-proc-not-built to set it anyway, but that force flag would reset the env run BUILD_COMPLETE flag, which would hopefully be enough to make the user rebuild the case. I think this alternative approach will require a fair bit of engineering, and it would still be a hassle and doesn't quite solve the core issue of runtime vs build-time concerns.

@mahf708
Copy link
Copy Markdown
Contributor

mahf708 commented Apr 28, 2026

To clarify, this is the type of control we want users to get comfortable with:

In EAM days, one would need to request COSP explicitly like so (in case setup phase):

E3SM/run_e3sm.template.sh

Lines 391 to 397 in b481000

# Build with COSP, except for a data atmosphere (datm)
if [ `./xmlquery --value COMP_ATM` == "datm" ]; then
echo $'\nThe specified configuration uses a data atmosphere, so cannot activate COSP simulator\n'
else
echo $'\nConfiguring E3SM to use the COSP simulator\n'
./xmlchange --id CAM_CONFIG_OPTS --append --val='-cosp'
fi

In SCREAMv0 days, this was the mechanism in the compset setup:

<!ENTITY scream_phys_defaults "-shoc_sgs -microphys p3 -chem spa -rad rrtmgp">

I think we just need these robust build-flags to follow this type of mod... ./xmlchange SCREAM_CMAKE_OPTIONS="SCREAM_NP 4 SCREAM_NUM_VERTICAL_LEV 128 SCREAM_NUM_TRACERS 10" (either via key-value pairs like below or another format you come up with)

key default other options
micro p3 none
macro shoc none
dconv none zm
... ... ...

The rest can be left to the user

@rljacob
Copy link
Copy Markdown
Member

rljacob commented Apr 28, 2026

Build vs. runtime complexity should probably be its own discussion. As long as its possible to compile everything (and that could be the default someday) I'm ok with this.

Copy link
Copy Markdown
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaning more and more toward being against runtime decision of what to build.

Imho, the only acceptable solution is to decide what to build based on compset, and set some cmake ENABLE-like vars in config_compset.xml. Injecting atmquery into our CMakeLists.txt (which is indeed the only way to achieve the runtime-option-check goal) is an antipattern, and is adding lots of complexity to the cmake code (and case workflow).

Besides disabling ZM and MAM (which can be done based on compset, now), there is very little to gain: tms is the only one that we sometimes turn on/off, but it's really 3 files total. I feel like shoc and p3 are the largest physics libs to build, and we are not really running without them.

My preference is to let power users set cmake vars before case.build:

./xmlchange --append SCREAM_CMAKE_OPTIONS="SCREAM_ENABLE_ZM OFF"

and then simply check those vars in our cmake:

option (SCREAM_ENABLE_ZM "Whether we need to build ZM" OFF)
option (SCREAM_ENABLE_MAM "Whether we need to build ZM" OFF)
...
if (SCREAM_ENABLE_ZM)
  add_subdirectory (zm)
endif()
if (SCREAM_ENABLE_MAM)
  add_subdirectory (mam)
endif()
...

These vars can have defaults based on compset, but power users can change this before case.build if they want to.
And even then, I am not really fond of the strategy: atm_procs_list is a runtime param, so one should be able to change it without rebuilding. Build-what-you-use is effectively changing this into a build-time param...

I'm putting "request changes" to signal that my vote is to not go this route. I'm not the only voice though, so if the consensus is to go this route, you can obviously override me.

@AaronDonahue
Copy link
Copy Markdown
Contributor

We discussed this during today's Dev call, for those that were not on the call I thought I would try to summarize what we decided. Those that were on the call feel free to correct me or provide more detail if I misstate something.

We decided that at the very least we should have two compsets for SCREAM/EAMxx. One that just builds the "supported" configuration and the other that builds everything that can be built. The latter is essentially what we have now. This allows us to at least have faster builds for only those processes that we have tested and made part of the default configuration. For the concerns regarding testing and having a single executable that is shared, we can use the "build everything" compset. Noting that we don't do that now, but may want to in the future.

For production runs we chatted about the mechanism already in CIME to force a user to rebuild when they make a change that should require it. For example, when changing the PE-Layout, ./case.submit gets blocked and the user is prompted to rebuild before it will be unlocked. We can do something similar for EAMxx when the user adds a process that has not been built yet.

Note, the way this would work is for ./atmchange commands that add a process to the process list, we may need to rebuild the code if that process wasn't compiled yet. We can leverage a capability already in CIME that locks down ./case.submit until the code is rebuilt, this happens for example when someone changes the PE-Layout. We can have buildnml do something similar.

This does bring up the question of what happens when someone is doing a fresh experiment with processes the deviate from the default? If we issue the ./atmchange commands before the ./case.build command in the case script then we should be safe. We would need to update the submission script template to call all atmchange commands earlier, and train those who don't use the template to do the same in their workflow.

@jgfouca is going to look at these details and come up with a plan. Ultimately, we did decide that building only what we need would be the way to go.

Although, several folks were not on the call this morning, so I am open to further discussion. Ultimately, I think we can come up with a clean plan that still provides flexibility but keeps our builds lean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BFB PR leaves answers BFB EAMxx C++ based E3SM atmosphere model (aka SCREAM)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants