Use new apptainer singularity install to avoid transient errors#63
Use new apptainer singularity install to avoid transient errors#63
Conversation
There was a problem hiding this comment.
This looks pretty good @jo-basevi!
Only one comment about Ben's suggestion on detecting if the script is running inside a container:
I'm not familiar with the NoNewPrivs process status, but reading online I see it can give false-positives (i.e. NoNewPrivs != 0 can be set by other things unrelated to Apptainer).
Therefore, I'm thinking if it might be simpler to check on the existence of the APPTAINER_CONTAINER env variable that gets set by default within the container. This might also give false positives but it seems like a simpler approach.
|
This PR is un-tested currently but it's more changes than I would like to a infrastructure that is going to be replaced soon |
|
I've tested the @atteggiani I am so excited for the staging directory for testing environment changes in the new infra. |
atteggiani
left a comment
There was a problem hiding this comment.
The changes look good.
Even if you haven't tested it thoroughly I also think it should be fine based on the changes.
Yes, the STAGING environment in the new infrastructure will hopefully make testing easier and quicker, and also keeping each env version separate (with its own modulefiles for example) makes these kind of changes simpler too.
Hopefully it will be less of a pain to manage all the environments :)
NCI have recently installed an Apptainer-based container engine on Gadi, which uses a different driver for mounting container image. In most cases it will require just swapping
module load singularitywithmodule load apptainer. This will hopefully fix the transient errors (see #26) and so far no errors have surfaced in my local testing. It could be good to get this change into at leastpayu/devso it can be further tested.Ben Menadue also picked up that the short-circuit to detect if running inside a container in the launcher scripts doesn't correctly detect Apptainer containers. They suggested a more reliable way would be to inspect that process's status directly:
This will return 0 if running outside a container or 1 if inside. (Or more precisely, 0 if launching a container will work and 1 if it won't.)
So far in my tests, there hasn't been any "FATAL: container creation failed" using apptainer, so we could maybe also remove that retry logic when launching the container?