Skip to content

HECC NAS platforms, take da-utils out, again#96

Closed
Dooruk wants to merge 27 commits into
developfrom
feature/nas/new_environment
Closed

HECC NAS platforms, take da-utils out, again#96
Dooruk wants to merge 27 commits into
developfrom
feature/nas/new_environment

Conversation

@Dooruk
Copy link
Copy Markdown
Collaborator

@Dooruk Dooruk commented Nov 12, 2025

Introduces aitken and pleiades (this one will get decommisioned soon).

Takes JCSDA out from default build.yaml. unfortunately we couldn't figure this out

da-utils can't be installed with iodaconv, so for now let's take it out.

@Dooruk Dooruk requested review from mathomp4 and mranst November 12, 2025 15:59
* Implement matrix strategy for coding norms

* same for clone

* fix syntax
@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Nov 12, 2025

There is a CI-workflow test for cloning cmake and oops. It uses JCSDA (public repo) and doesn't need .git-credentials.

If I take the JCSDA (public) repo out of the default build.yaml it fails, because the runner doesn't have access to the private repo.

https://github.com/GEOS-ESM/jedi_bundle/actions/runs/19304740441/job/55208905177?pr=96

Screenshot 2025-11-12 at 11 40 32 AM

So I'm confused how our Tier2 tests are running...

@mathomp4
Copy link
Copy Markdown
Member

Introduces aitken and pleiades (this one will get decommissioned soon). Takes JCSDA out from default build.yaml.

Well, sort of. The machine pleiades will go away, but I think the pfe login nodes will still exist.

I believe you can only get to the Rome and Cascade Lake nodes on Aitken from the pfe nodes just like you can only get to the Milan nodes from afe.

@mathomp4, is srun valid on NAS for GNU?

Nope! NAS is PBS so you need to use qsub to submit to batch. And I don't think there is an srun exact equivalent for PBS. That is, I'm not sure you can qsub command.

@mranst
Copy link
Copy Markdown
Collaborator

mranst commented Nov 12, 2025

So I'm confused how our Tier2 tests are running...

I believe it's because the swell tier2 tests are run under the gmao_ci user on discover, which does have git credentials, whereas these are run somewhere else

@mathomp4
Copy link
Copy Markdown
Member

So I'm confused how our Tier2 tests are running...

I believe it's because the swell tier2 tests are run under the gmao_ci user on discover, which does have git credentials, whereas these are run somewhere else

We might be able to add a GitHub secret for this? Maybe have the gmao_ci credential used in GitHub actions?

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Nov 12, 2025

We might be able to add a GitHub secret for this? Maybe have the gmao_ci credential used in GitHub actions?

This would be good, do we need SI admin privilieges?

Having both repos in the build.yaml is problematic. It should either be JCSDA-internal or JCSDA, at some point they may merge anyway.

@mathomp4
Copy link
Copy Markdown
Member

I pushed up a change that I think works:

remote: Repository not found.
fatal: repository 'https://github.com/JCSDA-internal/jedi-cmake/' not found
remote: Repository not found.
fatal: repository 'https://github.com/JCSDA-internal/oops/' not found
   _          _ _ _                     _ _        
  (_) ___  __| (_) |__  _   _ _ __   __| | | ___   Jedi Bundle Build System
  | |/ _ \/ _` | | '_ \| | | | '_ \ / _` | |/ _ \  Version 1.0.40
  | |  __/ (_| | | |_) | |_| | | | | (_| | |  __/  NASA Global Modeling and
 _/ |\___|\__,_|_|_.__/ \__,_|_| |_|\__,_|_|\___|       Assimilation Office
|__/       https://geos-esm.github.io/jedi_bundle
INFO JediBundle: Gathering repository information...
INFO JediBundle: Repository clone summary:
INFO JediBundle: -------------------------
INFO JediBundle: Branch         of jedicmake will be cloned from                                   
INFO JediBundle: Branch develop of oops      will be cloned from https://github.com/NOAA-EMC/oops  
INFO JediBundle: Tag    1.3.2   of gsibec    will be cloned from https://github.com/GEOS-ESM/GSIbec
INFO JediBundle: -------------------------
INFO JediBundle: Starting parallel cloning of 3 repositories
INFO JediBundle: Cloning 'oops'
INFO JediBundle: Cloning 'gsibec'
INFO JediBundle: Skipping explicit clone of 'jedicmake' since it's usually a module. If it's not a module it will be cloned at configure time.

I added the gmao_ci .git-credentials in this repo as a couple of secrets.

It's a bit confusing. This:

remote: Repository not found.
fatal: repository 'https://github.com/JCSDA-internal/jedi-cmake/' not found
remote: Repository not found.
fatal: repository 'https://github.com/JCSDA-internal/oops/' not found

seems bad but this:

INFO JediBundle: Starting parallel cloning of 3 repositories
INFO JediBundle: Cloning 'oops'
INFO JediBundle: Cloning 'gsibec'

and then:

Run ls -l jedi_bundle/oops/CMakeLists.txt
-rw-r--r-- 1 runner runner 4431 Nov 12 20:21 jedi_bundle/oops/CMakeLists.txt

seem good?

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Nov 12, 2025

Well, our clone logic is failing us again.. Looks like JCSDA-internal is being skipped and instead we are getting oops from NOAA-EMC.

INFO JediBundle: Branch of jedicmake will be cloned from
INFO JediBundle: Branch develop of oops will be cloned from https://github.com/NOAA-EMC/oops
INFO JediBundle: Tag 1.3.2 of gsibec will be cloned from https://github.com/GEOS-ESM/GSIbec

@mathomp4, we floated the idea of using mepo for cloning here, which might be a safer bet at this point:

#83

@mathomp4
Copy link
Copy Markdown
Member

@Dooruk Even mepo would have the same issue. It's still running git clone underneath! But I'm going to try a few more things. This should be possible!

@mathomp4
Copy link
Copy Markdown
Member

@Dooruk Okay. Update. I can duplicate what the CI is now seeing. It sees:

  jedi_bundle Clone src/jedi_bundle/config/build.yaml
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.10.19/x64
    PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.10.19/x64/lib/pkgconfig
    Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.19/x64
    Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.19/x64
    Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.10.19/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.19/x64/lib

   _          _ _ _                     _ _        
  (_) ___  __| (_) |__  _   _ _ __   __| | | ___   Jedi Bundle Build System
  | |/ _ \/ _` | | '_ \| | | | '_ \ / _` | |/ _ \  Version 1.0.40
  | |  __/ (_| | | |_) | |_| | | | | (_| | |  __/  NASA Global Modeling and
 _/ |\___|\__,_|_|_.__/ \__,_|_| |_|\__,_|_|\___|       Assimilation Office
|__/       https://geos-esm.github.io/jedi_bundle

INFO JediBundle: Gathering repository information...
ABORT JediBundle: No matching branch for 'jedicmake' was found in any organizations. ABORTING... 

and on discover, I see:

> PYTHONPATH=/discover/nobackup/mathomp4/JediBundle/try-install/lib/python3.13/site-packages ../try-install/bin/jedi_bundle Clone src/jedi_bundle/config/build.yaml

   _          _ _ _                     _ _
  (_) ___  __| (_) |__  _   _ _ __   __| | | ___   Jedi Bundle Build System
  | |/ _ \/ _` | | '_ \| | | | '_ \ / _` | |/ _ \  Version 1.0.40
  | |  __/ (_| | | |_) | |_| | | | | (_| | |  __/  NASA Global Modeling and
 _/ |\___|\__,_|_|_.__/ \__,_|_| |_|\__,_|_|\___|       Assimilation Office
|__/       https://geos-esm.github.io/jedi_bundle

INFO JediBundle: Gathering repository information...
ABORT JediBundle: No matching branch for 'jedicmake' was found in any organizations. ABORTING...

It's like something is missing(?) with jedicmake?

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Nov 13, 2025

@mathomp4 are you are saying you can't clone jedicmake from JCSDA-internal?

I was able to it with this branch just now with these modules loaded:

module purge
module use /discover/swdev/jcsda/spack-stack/scu17/modulefiles

module use /gpfsm/dswdev/jcsda/spack-stack/scu17/spack-stack-1.9.0/envs/ue-intel-2021.10.0/install/modulefiles/Core
module load stack-intel/2021.10.0
module load stack-intel-oneapi-mpi/2021.10.0
module load stack-python/3.11.7

module load git-lfs/3.4.0
module load py-pip/23.1.2

@mathomp4
Copy link
Copy Markdown
Member

@Dooruk One issue might be that it seems to be jedi-cmake:

https://github.com/JCSDA-internal/jedi-cmake

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Nov 14, 2025

@Dooruk One issue might be that it seems to be jedi-cmake:

https://github.com/JCSDA-internal/jedi-cmake

Good thought, they are defined here properly:

# Jedi cmake
- jedicmake:
repo_url_name: jedi-cmake
default_branch: develop
cmakelists: 'include( jedicmake/cmake/Functions/git_functions.cmake )'
recursive: true
tag: False

Seems like a git-credential issue.. I tried two gpt suggestions but couldn't figure it out and I can't test more without blowing up your email boxes with notifications.

Is this the gmao_ci credential? because it should have access to JCSDA-internal

@mer-a-o
Copy link
Copy Markdown

mer-a-o commented Nov 24, 2025

Sorry if this is not relevant: jedi-cmake is also available as part of the spack stack modules. So is it possible to skip building jedi-cmake as part of jedi_bundle?

@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Dec 1, 2025

Turns out we may end up having to using a combination of JCSDA and JCSDA-internal. See below:

https://github.com/JCSDA-internal/crtm/issues/456#issuecomment-3590446987

So while it is odd that we were not able to resolve the CI-workflow issue it is becoming more imperative to use mepo for the cloning task to be more explicit which repos we want from JCSDA's public and private repos. I think @mathomp4 is close on the JEDI side to merge some necessary changes for that to work?

@mathomp4
Copy link
Copy Markdown
Member

mathomp4 commented Dec 1, 2025

Turns out we may end up having to using a combination of JCSDA and JCSDA-internal. See below:

JCSDA-internal/crtm#456 (comment)

So while it is odd that we were not able to resolve the CI-workflow issue it is becoming more imperative to use mepo for the cloning task to be more explicit which repos we want from JCSDA's public and private repos. I think @mathomp4 is close on the JEDI side to merge some necessary changes for that to work?

Note that using a mix is fine in any case (mepo or not). It's just bad to use JCSDA-internal for CI work. I just cannot figure out how to get that to work...but it should be possible!

@Dooruk Dooruk changed the title HECC NAS platforms HECC NAS platforms, take da-utils out, again Dec 9, 2025
@Dooruk Dooruk marked this pull request as ready for review December 9, 2025 22:53
@Dooruk
Copy link
Copy Markdown
Collaborator Author

Dooruk commented Dec 9, 2025

I had to make some changes to this PR and it deviated from the origin. I hope to use mepo for cloning in near future..

Shows two check are pending, hope they go through soon.

Lot of hoping..

@Dooruk Dooruk closed this Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants