Modular rocSHMEM Fork

This fork enables the GDA backend (device-initiated RDMA over NIC) to work with Mojo in thread-per-gpu mode, without openmpi or ucx dependencies.

Quickstart

To build for Mojo and MAX, run the build_rocshmem.sh script, once it completes it will ask if you want to upload to S3, make sure you're logged in with aws sso login. Then you can update common.MODULE.bazel in the modular monorepo with the hash and URL it provides. If you want to test locally, press n on the prompt to upload to S3, and it'll provide a hash and local URL for common.MODULE.bazel.

Features

Support one GPU per thread instead of one GPU per process, enabling integration with MAX.
Each thread can now manage its own GPU with thread-local state for device IDs, initialization counters, and device state registration.
Separate host shared library librocshmem_host.so and device bitcode librocshmem_device.bc, and function to initialize device state into constant memory. Required for integration with Mojo.
Multi-node TCP bootstrap: Create unique IDs from IP address and port for RDMA key exchange across nodes without MPI or process launcher dependency.
Support for GDA backend with all RDMA drivers (ionic, mlx5, bnxt) but prioritizes ionic.
Fixed symmetric heap allocating to pinned host memory, and then copying to device memory, instead allocates directly to device.
Set sensible default environment variables to reduce Mojo-side configuration.

Testing changes

There is a convenience script to build with the GDA backend and ionic driver, then build the test, then pass arguments to the test binary at projects/rocshmem/tests/build_and_run_multi_node_test.sh.

On Node 0 (server)

First find node 0 IP address for the eno0 network management interface that enables TCP bootstraping e.g.:

ip addr show eno0

9: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    inet 10.24.8.110/21 brd 10.24.15.255 scope global eno0

We'd use 10.24.8.110.

Replace <NODE0_IP> with the eno0 IP below:

./build_and_run_multi_node_test.sh --run --node 0 --total-nodes 2 --gpus-per-node 8 --server <NODE0_IP> --port 12345

On Node 1 (client)

./build_and_run_multi_node_test.sh --run --node 1 --total-nodes 2 --gpus-per-node 8 --server <NODE0_IP> --port 12345

What Happens:

Node 0 creates a unique ID with server ip/port and validation data
Node 1 creates an identical unique ID
Both nodes then launch 8 threads (one per GPU)
All 16 PEs (8 per node) initialize rocSHMEM with an identical unique ID
The TCP bootstrap process shares RDMA keys etc. between the nodes and threads
Each PE performs a put operation to the next PE in a ring pattern
Data flows across all GPUs on both nodes via RDMA

To rebuild and run the multi-node test with debug information, add the debug flag on both the server and client e.g.:

./build_and_run_multi_node_test.sh --debug --run --node 0 --total-nodes 2 --gpus-per-node 8 --server <NODE0_IP> --port 12345

Original ROCm Systems README.md

Welcome to the ROCm Systems super-repo. This repository consolidates multiple ROCm systems projects into a single repository to streamline development, CI, and integration. The first set of projects focuses on requirements for building PyTorch.

Super-repo Status and CI Health

This table provides the current status of the migration of specific ROCm systems projects as well as a pointer to their current CI health.

Key:

Completed: Fully migrated and integrated. This super-repo should be considered the source of truth for this project. The old repo may still be used for release activities.
In Progress: Ongoing migration, tests, or integration. Please refrain from submitting new pull requests on the individual repo of the project, and develop on the super-repo.
Pending: Not yet started or in the early planning stages. The individual repo should be considered the source of truth for this project.

Component	Source of Truth	Migration Status
`amdsmi`	EMU	Pending
`aqlprofile`	Public	Completed
`clr`	Public	Completed
`hip`	Public	Completed
`hipother`	Public	Completed
`hip-tests`	Public	Completed
`rdc`	Public	Completed
`rocm-core`	Public	Completed
`rocminfo`	Public	Completed
`rocm-smi-lib`	Public	Completed
`rocprofiler`	Public	Completed
`rocprofiler-compute`	Public	Completed
`rocprofiler-register`	Public	Completed
`rocprofiler-sdk`	Public	Completed
`rocprofiler-systems`	Public	Completed
`rocr-runtime`	Public	Completed
`roctracer`	Public	Completed

Tentative migration schedule

Component	Tentative Date

*Remaining schedule to be determined.

TheRock CI Status

Note TheRock CI performs multi-component testing on top of builds leveraging TheRock build system.

Nomenclature

Project names have been standardized to match the casing and punctuation of released packages. This removes inconsistent camel-casing and underscores used in legacy repositories.

Structure

The repository is organized as follows:

projects/
  amdsmi/
  aqlprofile/
  clr/
  hip/
  hipother/
  hip-tests/
  rccl/
  rdc/
  rocm-core
  rocminfo/
  rocmsmilib/
  rocprofiler/
  rocprofiler-compute/
  rocprofiler-register/
  rocprofiler-sdk/
  rocprofiler-systems/
  rocrruntime/
  rocshmem/
  roctracer/

Each folder under projects/ corresponds to a ROCm systems project that was previously maintained in a standalone GitHub repository and released as distinct packages.
Each folder under shared/ contains code that existed in its own repository and is used as a dependency by multiple projects, but does not produce its own distinct packages in previous ROCm releases.

Goals

Enable unified build and test workflows across ROCm libraries.
Facilitate shared tooling, CI, and contributor experience.
Improve integration, visibility, and collaboration across ROCm library teams.

Getting Started

To begin contributing or building, see the CONTRIBUTING.md guide. It includes setup instructions, sparse-checkout configuration, development workflow, and pull request guidelines.

License

This super-repo contains multiple subprojects, each of which retains the license under which it was originally published.

📁 Refer to the LICENSE, LICENSE.md, or LICENSE.txt file within each projects/ or shared/ directory for specific license terms. 📄 Refer to the header notice in individual files outside projects/ or shared/ folders for their specific license terms.

Note: The root of this repository does not define a unified license across all components.

Questions or Feedback?

💬 Start a discussion
🐞 Open an issue

We're happy to help!

Name		Name	Last commit message	Last commit date
Latest commit History 76,385 Commits
.azuredevops		.azuredevops
.dvc		.dvc
.github		.github
docs		docs
projects		projects
shared/amdgpu-windows-interop		shared/amdgpu-windows-interop
.dvcignore		.dvcignore
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modular rocSHMEM Fork

Quickstart

Features

Testing changes

On Node 0 (server)

On Node 1 (client)

Original ROCm Systems README.md

Super-repo Status and CI Health

Tentative migration schedule

TheRock CI Status

Nomenclature

Structure

Goals

Getting Started

License

Questions or Feedback?

About

Uh oh!

Releases

Packages

Uh oh!

Languages

modularml/rocm-systems

Folders and files

Latest commit

History

Repository files navigation

Modular rocSHMEM Fork

Quickstart

Features

Testing changes

On Node 0 (server)

On Node 1 (client)

Original ROCm Systems README.md

Super-repo Status and CI Health

Tentative migration schedule

TheRock CI Status

Nomenclature

Structure

Goals

Getting Started

License

Questions or Feedback?

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages