AI attempt at changing eamxx build system to only build used physicss#8320
AI attempt at changing eamxx build system to only build used physicss#8320jgfouca wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the EAMxx CMake build to conditionally build/link only the physics
packages that are active in a given CIME case, aiming to reduce build time for
nightly workflows while preserving BFB behavior.
Changes:
- Add CIME-only logic in
physics/CMakeLists.txtto queryatm_procs_listand
skip inactive physics subdirectories. - Update
mct_coupling/CMakeLists.txtto link only physics libraries whose
CMake targets were actually built.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| components/eamxx/src/physics/CMakeLists.txt | Query active processes in CIME builds and conditionally add physics subdirectories. |
| components/eamxx/src/mct_coupling/CMakeLists.txt | Make physics linking robust to conditionally-built physics targets. |
| eamxx_add_physics_subdirectory(spa) | ||
| eamxx_add_physics_subdirectory(spc) | ||
| eamxx_add_physics_subdirectory(nudging) | ||
| eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes) |
There was a problem hiding this comment.
mam is now added via eamxx_add_physics_subdirectory(...) unconditionally. This bypasses the existing SCREAM_ENABLE_MAM option (defined in components/eamxx/CMakeLists.txt) and will force MAM (and the mam4xx external) to build even when SCREAM_ENABLE_MAM=OFF (e.g., some machine configs explicitly disable it). Wrap the mam subdirectory addition in an if (SCREAM_ENABLE_MAM) guard (and keep skipping it when the option is OFF), independent of the active-procs query result.
| eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes) | |
| if (SCREAM_ENABLE_MAM) | |
| eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes) | |
| endif() |
There was a problem hiding this comment.
I don't think mam atm physics processes will be on if SCREAM_ENABLE_MAM is off?
There was a problem hiding this comment.
Note that this PR removes SCREAM_ENABLE_MAM in one of the changes.
There was a problem hiding this comment.
@odiazib , it doesn't eliminate it entirely. It just standardizes the check if mam is on in this file (by looking at active physics).
There was a problem hiding this comment.
I am assuming here that somewhere in our config/buildnml system, there is code that checks if SCREAM_ENABLE_MAM is set and, if so, makes sure that mam stuff does not appear in atm_procs lists.
| add_subdirectory(gw) | ||
|
|
||
| eamxx_add_physics_subdirectory(shoc) | ||
| eamxx_add_physics_subdirectory(cld_fraction cld_fraction cld_frac_net) |
There was a problem hiding this comment.
One thing that could be a bit of problem here is that this implementation couples atm_proc names to this file. The AI wanted to do matching, like "mam*" to catch all mam4 atm_procs. I rejected this at first because I thought the proc names would match the subdir names exactly. This turned out to not be true for cld_fraction and mam, so it just listed them all.
There was a problem hiding this comment.
Oh, that may be a big problem if a certain feature ever gets used. Namely, users could have 2+ instances of, say, p3, with arbitrary (and different) names:
some_proc_group:
atm_procs_list: [my_name, my_other_name]
my_name:
type: p3
<some_params>
my_other_name:
type: p3
<some_different_params>Someone asked me about this possibility very recently.
There was a problem hiding this comment.
I’m working on an LDRD where we’re using a chemistry solver. I’m planning to use different chemistry for the troposphere and the stratosphere. The interfaces for both options are almost identical, with one difference that I can control using a namelist parameter. To make this approach more flexible, I want to use the same interface and select the behavior using different namelist parameters.
There was a problem hiding this comment.
With the current impl, they'd have to add the proc name here. Is that acceptable? Should we add a new field to atm_procs that specifies which physics subdir it uses?
There was a problem hiding this comment.
No, adding the code src directory to the input yaml file is definitely not right.
| list(APPEND SCREAM_LIBS zm) | ||
| endif() | ||
|
|
||
| if (SCREAM_ENABLE_MAM) |
There was a problem hiding this comment.
Note that we use SCREAM_ENABLE_MAM to turn off MAM. If we remove it, we will always have to build with MAM.
There was a problem hiding this comment.
@odiazib , I don't think so. As long as mam4 processes are not in the atm_proc list, it won't get built. All physics now work this way.
| nudging | ||
| tms | ||
| zm | ||
| mam |
There was a problem hiding this comment.
If we’re not using MAM, will this produce a build error?
There was a problem hiding this comment.
No, the TARGET check below will prevent it from being added if it doesn't exist
| eamxx_add_physics_subdirectory(cosp) | ||
| eamxx_add_physics_subdirectory(tms) | ||
| eamxx_add_physics_subdirectory(iop_forcing) | ||
| eamxx_add_physics_subdirectory(zm) |
There was a problem hiding this comment.
I guess mam is missing here.
There was a problem hiding this comment.
Mam was not one of the subdirs protected by the DOUBLE_PREC check.
| add_subdirectory(gw) | ||
|
|
||
| eamxx_add_physics_subdirectory(shoc) | ||
| eamxx_add_physics_subdirectory(cld_fraction cld_fraction cld_frac_net) |
There was a problem hiding this comment.
I’m working on an LDRD where we’re using a chemistry solver. I’m planning to use different chemistry for the troposphere and the stratosphere. The interfaces for both options are almost identical, with one difference that I can control using a namelist parameter. To make this approach more flexible, I want to use the same interface and select the behavior using different namelist parameters.
| eamxx_add_physics_subdirectory(spa) | ||
| eamxx_add_physics_subdirectory(spc) | ||
| eamxx_add_physics_subdirectory(nudging) | ||
| eamxx_add_physics_subdirectory(mam mam4_aci mam4_optics mam4_wetscav mam4_drydep mam4_srf_online_emiss mam4_aero_microphys mam4_constituent_fluxes) |
There was a problem hiding this comment.
Note that this PR removes SCREAM_ENABLE_MAM in one of the changes.
|
Wouldn't a better way to speed up the build be to build everything in one executable and use runtime configs to decide what to execute? Using the same executable in each test. |
|
@rljacob , that has been a stretch goal for E3SM for years, but we've never tackled it. We currently have the somewhat hacky approach of allowing developers to explicitly set |
|
@jgfouca yes but eamxx should be fully runtime configurable and so an all F-case eamxx suite should be able easily use one executable. |
|
@rljacob , that's true. I have not really explored using |
|
(switching logins) This PR seems like an over-reaction to that case. Just make the ZM build more robust. Runtime configurability is where E3SM needs to go so we shouldn't be making that harder. |
@rljacob , you could be right. Ultimately, it's up to the team to decide what they want, so I will need all the reviewers to weigh in here. |
|
@rljacob , I should also note that the physics builds can be pretty expensive, especially on GPU machines, due to the size of the kernels. For nightlies, this could be motivation to explore |
|
As people who ran/run eamxx a lot and know a bit of eamxx dev, I think both @brhillman and I only advocate for a simple build-time flag for stuff like ZM (so any additional package in physics or elsewhere should have an associated build flag). A nice-to-have is making that flag FALSE by default for now. I think that simply is it for the issue encountered recently (and btw, not for the first time, but not everyone reports stuff) I am not against this PR in principle and I am happy to review it carefully and offer more opinions (which I will do in a sec). I do (pretty strongly) agree with @jgfouca that we do NOT want to build stuff we don't need; so I disagree with taking the easy way out of just building everything and inflating e3sm.exe with random stuff. If eamxx becomes like the insane dumpster that's cam/eam where all sort of random stuff is built, then we are definitively cooked. Since we don't provide E3SM as a prebuilt package/executable, build time is a serious waste of time and increasing it is bad engineering. |
There was a problem hiding this comment.
The main issue with this PR is that it contradicts the long-established ways of E3SM (whether those are right or wrong is a separate issue). In particular, E3SM is cornered in this progression due to historical reasons or maybe due to CIME design choices, idk
Lines 114 to 136 in b481000
In this template run script (in the root of the repo no less), the user is expected to set/change runtime options AFTER the case is set up and built. In EAMxx, atm_procs_list is a runtime concern, and thus, inferring stuff from it for build time is counterintuitive.
An alternative approach is to basically add more complex logic in atm_procs_list so that if, after building, the user is requesting a package that's not built, it would print an error saying the executable wasn't built with that package, and instructing the user to --force-proc-not-built to set it anyway, but that force flag would reset the env run BUILD_COMPLETE flag, which would hopefully be enough to make the user rebuild the case. I think this alternative approach will require a fair bit of engineering, and it would still be a hassle and doesn't quite solve the core issue of runtime vs build-time concerns.
|
To clarify, this is the type of control we want users to get comfortable with: In EAM days, one would need to request COSP explicitly like so (in case setup phase): Lines 391 to 397 in b481000 In SCREAMv0 days, this was the mechanism in the compset setup: I think we just need these robust build-flags to follow this type of mod...
The rest can be left to the user |
|
Build vs. runtime complexity should probably be its own discussion. As long as its possible to compile everything (and that could be the default someday) I'm ok with this. |
There was a problem hiding this comment.
I am leaning more and more toward being against runtime decision of what to build.
Imho, the only acceptable solution is to decide what to build based on compset, and set some cmake ENABLE-like vars in config_compset.xml. Injecting atmquery into our CMakeLists.txt (which is indeed the only way to achieve the runtime-option-check goal) is an antipattern, and is adding lots of complexity to the cmake code (and case workflow).
Besides disabling ZM and MAM (which can be done based on compset, now), there is very little to gain: tms is the only one that we sometimes turn on/off, but it's really 3 files total. I feel like shoc and p3 are the largest physics libs to build, and we are not really running without them.
My preference is to let power users set cmake vars before case.build:
./xmlchange --append SCREAM_CMAKE_OPTIONS="SCREAM_ENABLE_ZM OFF"and then simply check those vars in our cmake:
option (SCREAM_ENABLE_ZM "Whether we need to build ZM" OFF)
option (SCREAM_ENABLE_MAM "Whether we need to build ZM" OFF)
...
if (SCREAM_ENABLE_ZM)
add_subdirectory (zm)
endif()
if (SCREAM_ENABLE_MAM)
add_subdirectory (mam)
endif()
...These vars can have defaults based on compset, but power users can change this before case.build if they want to.
And even then, I am not really fond of the strategy: atm_procs_list is a runtime param, so one should be able to change it without rebuilding. Build-what-you-use is effectively changing this into a build-time param...
I'm putting "request changes" to signal that my vote is to not go this route. I'm not the only voice though, so if the consensus is to go this route, you can obviously override me.
|
We discussed this during today's Dev call, for those that were not on the call I thought I would try to summarize what we decided. Those that were on the call feel free to correct me or provide more detail if I misstate something. We decided that at the very least we should have two compsets for SCREAM/EAMxx. One that just builds the "supported" configuration and the other that builds everything that can be built. The latter is essentially what we have now. This allows us to at least have faster builds for only those processes that we have tested and made part of the default configuration. For the concerns regarding testing and having a single executable that is shared, we can use the "build everything" compset. Noting that we don't do that now, but may want to in the future. For production runs we chatted about the mechanism already in CIME to force a user to rebuild when they make a change that should require it. For example, when changing the PE-Layout, Note, the way this would work is for This does bring up the question of what happens when someone is doing a fresh experiment with processes the deviate from the default? If we issue the @jgfouca is going to look at these details and come up with a plan. Ultimately, we did decide that building only what we need would be the way to go. Although, several folks were not on the call this morning, so I am open to further discussion. Ultimately, I think we can come up with a clean plan that still provides flexibility but keeps our builds lean. |
This could potentially be quite a speed up when it comes to building lots of eamxx builds during nightlies.
[BFB]