Skip to content

Execute GEOS via calling gcm_run.j + rename marine suites -- part 1, 2, & 3#677

Merged
Dooruk merged 79 commits intodevelopfrom
feature/exec_geos_direct_part1
Mar 6, 2026
Merged

Execute GEOS via calling gcm_run.j + rename marine suites -- part 1, 2, & 3#677
Dooruk merged 79 commits intodevelopfrom
feature/exec_geos_direct_part1

Conversation

@Dooruk
Copy link
Copy Markdown
Collaborator

@Dooruk Dooruk commented Dec 15, 2025

This is ready to go in. CI-workflows need to be modified to handle new suite names like so:

GEOS-ESM/CI-workflows#32

Main change 1:

SOCA (marine) suites are renamed:

3dvar -> 3dvar_marine
3dvar_cycle -> 3dvar_marine_cycle
3dfgat_cycle -> 3dfgat_marine_cycle

Main change 2:

Cylc calls gcm_run.j directly in flow.cylc.

With this new approach, SWELL can point to an existing GEOS experiment folder (the experiment.yaml key for that is geos_homdir) and the forecast folder is now located under experiment GEOSgcm/forecast directory. It is possible to hotstart. With this new approach, forecast directory is not erased and MAPL history outputs can be accumulated under there. I updated docs a bit but might add on more for GEOSgcm execution.

For those who stumbled upon this PR, more details on change 2 below:

The main thing happening here is that Cylc (flow.cylc) now calls gcm_run.j directly. To facilitate this a forecast directory was created under {swell_exp_dir}/GEOSgcm/forecast. This forecast folder is a replication of a GEOS experiment folder, with only a few changes regarding where HOMDIR, EXPDIR are defined. Model execution happens under {swell_exp_dir}/GEOSgcm/forecast/scratch similar to typical GEOS model runs.

Why was this change necessary:

  • We want SWELL to be less involved in GEOS model execution task(s). The previous method required lots of file manipulation (in particular due to the boundary condition files, AKA /RC files) in the forecast directory. This creates incompatibility while running/testing different GEOSgcm versions. Between multiple products and update frequencies, this is an important requirement.
  • In Cylc templating forecast dir can't be updated in flow.cylc if it is templated in a time dependent way.
  • subprocess simply couldn't run GEOSv12 on Milan nodes. I tried many combinations, it didn't pass beyond the initialization stage.
  • Defining sufficient nodes, MPI layouts etc. is handled by gcm_run.j. If users make mistake in terms of requesting sufficient SLURM nodes, GEOS tries submitting hundreds of instances to compensate lack of compute resources, then NCCS will yell at you.
  • (long term relevancy) gcm_run.j and gcm_setup.j scripts are being or will be modernized. This is work underway but might take a long time (especially gcm_run.j).

⚠️ Which means to use a gcm_run.j in SWELL, some parts should be erased or commented out. Or, my idea is that there could be conditional sections in gcm_run.j say SWELL_active, then gcm_run.j can skip those sections, which are mainly postprocessing anyway.

More details in below comment: #677 (comment)

Finally, little primer on gcm_run.j

Let's consider gcm_run.j in 4 stages:

  1. SLURM & node assignment
  2. Preprocessing
  3. Execution
  4. Postprocessing

In the current implementation, SWELL handles 2 & 3 via python and subprocess and 1 is assumed to be set properly by the user, which caused trouble with the NCCS. For DA purposes 4, postprocessing is explicitly handled by SWELL but that is not the focus of this PR.

In this proposed implementation, the main difference is that we rely on gcm_run.j for 2 and 3 by conducting surgical edits via PrepCoupledGeosRundir at few locations and running gcm_run.j directly from Cylc (which doesn't capture failed exit status):

    [[RunGeos]]
        script = "{{experiment_path}}/forecast/gcm_run.j"
        platform = {{platform}}
        [[[job]]]
            shell = /bin/csh
        [[[directives]]]
        {%- for key, value in scheduling["RunGeos"]["directives"]["all"].items() %}
            --{{key}} = {{value}}
        {%- endfor %}

I created the 3dfgat_coupled_cycle suite for testing, should work by default if anyone has time to check it out.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread src/swell/tasks/base/task_base.py
Comment thread src/swell/tasks/generate_b_climatology.py Outdated
Comment thread src/swell/tasks/get_coupled_geos_restart.py
Comment thread src/swell/tasks/get_coupled_geos_restart.py Outdated
Comment thread src/swell/tasks/link_coupled_geos_output.py Outdated
Comment thread src/swell/tasks/move_erase_da_restart.py Outdated
Comment thread src/swell/tasks/link_coupled_geos_output.py
Comment thread src/swell/tasks/prepare_analysis.py Outdated
Comment thread src/swell/utilities/file_system_operations.py Outdated
Comment thread src/swell/utilities/file_system_operations.py Outdated
Comment thread src/swell/utilities/question_defaults.py
@Dooruk Dooruk changed the title Execute GEOS via calling gcm_run.j -- part 1 Execute GEOS via calling gcm_run.j + rename marine suites -- part 1, 2, & 3 Mar 2, 2026
@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Mar 2, 2026

This is ready to go in and/or final testing. CI-workflows need to be modified to handle new suite names though.

@jeromebarre
Copy link
Copy Markdown
Contributor

jeromebarre commented Mar 4, 2026

@Dooruk For the sake of clarity and simplicity, would it be possible to break down the PR in two for the Main change 1 and 2? They are two disctinct feature improvements here. That would also speed up and facilitate the reviews. Thanks a lot!

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Mar 5, 2026

@Dooruk For the sake of clarity and simplicity, would it be possible to break down the PR in two for the Main change 1 and 2? They are two disctinct feature improvements here. That would also speed up and facilitate the reviews. Thanks a lot!

I already did this. I'm running out of time before I go on FMLA so I won't be able to do that again. Main change 1 is just renaming suites anyway.

I created this PR after your similar comment, it was only part 1 for a couple of months or so. @mer-a-o reviewed it and decided she will implement the Skylab approach to execute GEOS for compo, and we will collaborate for running NWP afterwards. This is not end all be all in terms of executing GEOS but it is needed urgently for marine group to run some experiments while I'm gone.

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Mar 6, 2026

Once GEOS-ESM/CI-workflows#32 goes in, this is good to go in but I need approvals

@mer-a-o
Copy link
Copy Markdown
Contributor

mer-a-o commented Mar 6, 2026

I was able to run the 3dvar_marine_cycle experiment successfully. I reviewed parts of this PR and believe my comments on those sections have been addressed. It would have been preferable to separate the renaming and documentation updates into a different PR, but given @Dooruk’s availability, I think it’s fine to merge this one. We can always go back and fix any issues we encounter along the way.

mer-a-o
mer-a-o previously approved these changes Mar 6, 2026
Copy link
Copy Markdown
Collaborator

@mranst mranst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes, I appreciate the documentation upgrades. I don't fully know the intricacies of the change to gcm_run.j, but from what I can see, this looks great. Only one thing sticks out to me, which is the eva comparison task for observations uses the suite name, so if you could change the names of these files to reflect the new marine names, I can give an approval:

src/swell/suites/compare/eva/comparison_observations-3dfgat_cycle-geos_marine.yaml' 'src/swell/suites/compare/eva/comparison_observations-3dfgat_cycle-geos_marine.yaml

@mranst mranst self-requested a review March 6, 2026 21:42
Copy link
Copy Markdown
Collaborator

@mranst mranst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Mar 6, 2026

Can't keep up with the pace of changes in develop:) thank you all for reviews

@Dooruk Dooruk merged commit a9debe0 into develop Mar 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants