feat: customize R generation to include GPUs#3325
feat: customize R generation to include GPUs#3325google-oss-prow[bot] merged 3 commits intokubeflow:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the Flux runtime plugin to generate a Flux resource file (flux R encode) based on derived per-node CPU/GPU topology and introduces a new in-memory EmptyDir volume intended for shared memory.
Changes:
- Parameterize
templates/entrypoint.shsoflux R encodecan receive a computed resource spec (cores/GPU ranges). - Refactor Flux entrypoint generation to build explicit Flux flags (
-N/-nand optional-g) and derive anRspecstring for resource encoding. - Add a new
shared-memoryvolume constant and include a memory-backed EmptyDir volume in Flux view volumes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/runtime/framework/plugins/flux/templates/entrypoint.sh | Accepts a formatted resource spec argument for flux R encode. |
| pkg/runtime/framework/plugins/flux/flux.go | Builds Rspec/flags for Flux execution and adds a memory-backed EmptyDir volume to the pod spec. |
| pkg/constants/constants.go | Introduces FluxMemoryVolumeName constant for the new shared-memory volume. |
You can also share your feedback on Copilot code review. Take the survey.
| memoryVolumeAC := corev1ac.Volume(). | ||
| WithName(constants.FluxMemoryVolumeName). | ||
| WithEmptyDir(corev1ac.EmptyDirVolumeSource(). | ||
| WithMedium(corev1.StorageMediumMemory)) | ||
| fluxVolumeAC := corev1ac.Volume(). |
| } else { | ||
| tasks = fmt.Sprintf("-N %d -n %d", nodes, *info.RuntimePolicy.MLPolicySource.Flux.NumProcPerNode*nodes) | ||
| tasks = *info.RuntimePolicy.MLPolicySource.Flux.NumProcPerNode | ||
| } |
There was a problem hiding this comment.
I think, we should enforce kubebuilder validation for numProcPerNode>=1 here: https://github.com/converged-computing/trainer/blob/764a765a8f6a462bc84234a3f84270fe720acf09/pkg/apis/trainer/v1alpha1/trainjob_types.go#L266
There was a problem hiding this comment.
@vsoch Do you want to add validation for numProcPerNode in the followup PR?
There was a problem hiding this comment.
I think I had it before (in the first PR) but was asked to remove it because it’s part of the API as an annotation?
There was a problem hiding this comment.
What we do need to think about is a case that we have for the Flux Operator (that may not be dealt with well here). Often there is a desire to put the number of nodes, but just say "discover and use all of the cores that are found" and then you would do like:
flux submit -N 16 --exclusiveNote that I don't have any -n specified. For the Flux Operator, I allowed this case when the user put tasks as 0. Is there any way we can support something similar? Is the only way some special envar that flags and triggers the condition (regardless of what the -n is) since the validation of >1 is in the spec?
There was a problem hiding this comment.
I think I had it before (in the first PR) but was asked to remove it because it’s part of the API as an annotation?
Please open dedicated PR to introduce this kubebuilder validation, we need to ensure that
numProcPerNode >= 1
Note that I don't have any -n specified. For the Flux Operator, I allowed this case when the user put tasks as 0. Is there any way we can support something similar? I
If Flux can dynamically discover all available devices on the node, we can rely on this functionality when numProcPerNode is omitted.
There was a problem hiding this comment.
Excellent! I did not know that was an option. Thank you!
| // Path for Flux curve path | ||
| FluxCurveVolumePath = "/curve" | ||
|
|
||
| // Ensure MPI has full memory of the host |
| # Generate host resources | ||
| hosts=$(cat ${configroot}/etc/flux/system/hostlist) | ||
| flux R encode --hosts=${hosts} --local > /tmp/R | ||
| flux R encode --hosts=${hosts} %s > /tmp/R |
|
@andreyvelich I'm going for a quick run and will test with GPU when I am back! Would you like an example added to the examples/flux directory that uses GPUs? My plan is to test on AWS, lammps with GPU. |
Sure, let's add another example in Flux subdirectory. |
Flux detection of GPUs depends on hwloc plugins, and a newer version. To get around any edge cases where the dependencies are missing, we can easily generate the R from the expected resource spec (cores and gpus). We can also add a shared memory mount to ensure the job MPI gets all available shared memory of the host. The current default is 64M (automatic from container runtime) and it can have implications for MPI performance. Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Also update volume to be added to container. This still is pending testing on AWS! Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
|
@andreyvelich can you give me any insight as to why that is failing? Is it a flaky test or something else? |
|
Yes, we have some flaky e2e test: #3366 |
|
/retest |
|
Don't merge this yet - with the force push I am not convinced it is still working. I need to test again locally. |
|
I think my container is different and that's the issue. I'm going to add interactive mode to this PR. It's going to be essential for helping users, I know we will eventually need it, and I need it now, so it makes sense. |
|
Oh, I think I'm being dumb - I installed the current deployed master kubeflow image (which has flux now!) and not my PR image here. Derp! Trying again. |
|
yep that was it - still works great! Phew. |
|
@andreyvelich this is ready for review and eventual merge - apologies for my confusing posts! I was testing locally with the wrong container image (I had updated it back to kubeflow for the PR here, and needed to use my custom build). I just finished the demo for the Kubecon booth, and I'll be releasing the longer variant (about 12 minutes) that does an intro and two examples - LAMMPS with the elastic fabric adapter and LAMMPS with GPU, both on AWS! The second requires the update here. It's a fun and engaging short presentation that introduces Flux (2.5 minutes) followed by the demos. I'm going to wait for the release (and hopefully this paired merge) before posting that. Very excited! 🥳 |
|
Let’s get this merged today! 🎉 |
|
Sure, happy to unblock. @vsoch can you address this issue in the followup PR: #3325 (comment) |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Absolutely - I can address what you'd like for a followup! I'd like to add interactive mode for easy learning and debugging, so I will most definitely follow up soon. |
|
Thank you! 🙏 |

What this PR does / why we need it:
Flux detection of GPUs depends on hwloc plugins, and a newer version. To get around any edge cases where the dependencies are missing, we can easily generate the R from the expected resource spec (cores and gpus). We can also add a shared memory mount to ensure the job MPI gets all available shared memory of the host. The current default is 64M (automatic from container runtime) and it can have implications for MPI performance.
This will close #3321
I would like to test this with GPUs on AWS today before we do any kind of merge. Thank you!
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #3321
Checklist: