Skip to content

Add new NVSHMEM transpose communication backend with SM-based P2P copies.#114

Merged
romerojosh merged 14 commits intomainfrom
nvshmem_sm_backend
Mar 17, 2026
Merged

Add new NVSHMEM transpose communication backend with SM-based P2P copies.#114
romerojosh merged 14 commits intomainfrom
nvshmem_sm_backend

Conversation

@romerojosh
Copy link
Collaborator

This PR introduces a new transpose communication backend, CUDECOMP_TRANSPOSE_COMM_NVSHMEM_SM, that uses SM-driven transfers to perform intra-group (NVLink-connected) P2P transfers instead of using copy-engine (CE) driven transfers.

The existing CUDECOMP_TRANSPOSE_COMM_NVSHMEM backend issues all intra-group puts via the NVSHMEM host API nvshmemx_putmem_on_stream, which dispatches through the CEs. For large transfers, it is often the case that CE-driven transfers achieve higher bandwidth than SM-driven ones, which is why the existing backend was designed that way. However, as per-peer transfer sizes reduce, like in strong scaling or extreme weak scaling scenarios, SM-driven transfers become more competitive. In addition to this, the GPUs on systems equipped with NVSwitch-connectivity between GPUs (e.g., DGX systems, MNNVL systems) can typically only run a single CE-driven P2P transfer at a time, which can be a problem if individual transfer sizes become to small to saturate NVLink on their own. SM-driven transfers can more effectively run NVLink transfers to multiple peers concurrently, providing better total NVLink utilization in these scenarios.

Historically, the NCCL backend of cuDecomp has served the role of the "SM-driven" alltoall backend to fall back on when the CE-driven NVSHMEM backends were suboptimal. This PR provides a new alternative SM-driven backend for cuDecomp to make use of.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh
Copy link
Collaborator Author

/build

@github-actions
Copy link

🚀 Build workflow triggered! View run

@github-actions
Copy link

✅ Build workflow passed! View run

Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh romerojosh merged commit 90a3bc1 into main Mar 17, 2026
4 checks passed
@romerojosh romerojosh deleted the nvshmem_sm_backend branch March 23, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant