Adjust workspace sizing to allow 256-byte alignment of workspace offset pointers.#109
Merged
romerojosh merged 4 commits intomainfrom Mar 12, 2026
Merged
Adjust workspace sizing to allow 256-byte alignment of workspace offset pointers.#109romerojosh merged 4 commits intomainfrom
romerojosh merged 4 commits intomainfrom
Conversation
…et pointers. Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
106f992 to
2b77672
Compare
… bytes. Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Collaborator
Author
|
/build |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cuDecomp uses the user-provided workspace allocation as staging areas for the send and receive data for either an alltoall (transposes) or pairwise send/recv (halo exchange). The portion of the workspace used for the receive side is at an offset from the base workspace location, with the offset determined by the send side pencil size or the send side halo size.
It is sometimes beneficial to provide libraries input pointers that have an alignment equivalent to a standard
cudaMalloccall (256 bytes), but the current pattern used in cuDecomp does not enforce this alignment on the offset used for the receive side of the workspace.This PR applies 256-byte alignment to the workspace offset locations used during the transpose and halo exchange operations. To enable this, a marginal increase in the workspace sizing was required.