Skip to content

Adjust workspace sizing to allow 256-byte alignment of workspace offset pointers.#109

Merged
romerojosh merged 4 commits intomainfrom
workspace_pad
Mar 12, 2026
Merged

Adjust workspace sizing to allow 256-byte alignment of workspace offset pointers.#109
romerojosh merged 4 commits intomainfrom
workspace_pad

Conversation

@romerojosh
Copy link
Collaborator

cuDecomp uses the user-provided workspace allocation as staging areas for the send and receive data for either an alltoall (transposes) or pairwise send/recv (halo exchange). The portion of the workspace used for the receive side is at an offset from the base workspace location, with the offset determined by the send side pencil size or the send side halo size.

It is sometimes beneficial to provide libraries input pointers that have an alignment equivalent to a standard cudaMalloc call (256 bytes), but the current pattern used in cuDecomp does not enforce this alignment on the offset used for the receive side of the workspace.

This PR applies 256-byte alignment to the workspace offset locations used during the transpose and halo exchange operations. To enable this, a marginal increase in the workspace sizing was required.

…et pointers.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
… bytes.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh
Copy link
Collaborator Author

/build

@github-actions
Copy link

🚀 Build workflow triggered! View run

@github-actions
Copy link

✅ Build workflow passed! View run

@romerojosh romerojosh merged commit 8e4a7e0 into main Mar 12, 2026
4 checks passed
@romerojosh romerojosh deleted the workspace_pad branch March 23, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant