Skip to content

Use new apptainer singularity install to avoid transient errors#63

Merged
jo-basevi merged 4 commits intomainfrom
26-Use-Apptainer-Module
Apr 17, 2026
Merged

Use new apptainer singularity install to avoid transient errors#63
jo-basevi merged 4 commits intomainfrom
26-Use-Apptainer-Module

Conversation

@jo-basevi
Copy link
Copy Markdown
Collaborator

@jo-basevi jo-basevi commented Mar 17, 2026

NCI have recently installed an Apptainer-based container engine on Gadi, which uses a different driver for mounting container image. In most cases it will require just swapping module load singularity with module load apptainer. This will hopefully fix the transient errors (see #26) and so far no errors have surfaced in my local testing. It could be good to get this change into at least payu/dev so it can be further tested.

Ben Menadue also picked up that the short-circuit to detect if running inside a container in the launcher scripts doesn't correctly detect Apptainer containers. They suggested a more reliable way would be to inspect that process's status directly:

$ cat /proc/self/status | grep NoNewPrivs | awk '{print $2}'
0

This will return 0 if running outside a container or 1 if inside. (Or more precisely, 0 if launching a container will work and 1 if it won't.)

So far in my tests, there hasn't been any "FATAL: container creation failed" using apptainer, so we could maybe also remove that retry logic when launching the container?

atteggiani
atteggiani previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@atteggiani atteggiani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good @jo-basevi!

Only one comment about Ben's suggestion on detecting if the script is running inside a container:

I'm not familiar with the NoNewPrivs process status, but reading online I see it can give false-positives (i.e. NoNewPrivs != 0 can be set by other things unrelated to Apptainer).
Therefore, I'm thinking if it might be simpler to check on the existence of the APPTAINER_CONTAINER env variable that gets set by default within the container. This might also give false positives but it seems like a simpler approach.

Comment thread modules/common_v3 Outdated
@jo-basevi
Copy link
Copy Markdown
Collaborator Author

This PR is un-tested currently but it's more changes than I would like to a infrastructure that is going to be replaced soon

@jo-basevi
Copy link
Copy Markdown
Collaborator Author

I've tested the launcher.sh script changes, but I haven't tested the build script changes but it is just simply replacing common_v3 with ${COMMON_MODULEFILE} and defining COMMON_MODULEFILE in install_config.sh. So I think it'll be fine.

@atteggiani I am so excited for the staging directory for testing environment changes in the new infra.

@atteggiani atteggiani self-requested a review April 17, 2026 00:42
Copy link
Copy Markdown
Collaborator

@atteggiani atteggiani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good.

Even if you haven't tested it thoroughly I also think it should be fine based on the changes.

Yes, the STAGING environment in the new infrastructure will hopefully make testing easier and quicker, and also keeping each env version separate (with its own modulefiles for example) makes these kind of changes simpler too.
Hopefully it will be less of a pain to manage all the environments :)

@jo-basevi jo-basevi merged commit 508fcce into main Apr 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants