This fork enables the GDA backend (device-initiated RDMA over NIC) to work with Mojo in thread-per-gpu mode, without openmpi or ucx dependencies.
To build for Mojo and MAX, run the build_rocshmem.sh script, once it completes
it will ask if you want to upload to S3, make sure you're logged in with
aws sso login. Then you can update common.MODULE.bazel in the modular
monorepo with the hash and URL it provides. If you want to test locally, press
n on the prompt to upload to S3, and it'll provide a hash and local URL for
common.MODULE.bazel.
- Support one GPU per thread instead of one GPU per process, enabling integration with MAX.
- Each thread can now manage its own GPU with thread-local state for device IDs, initialization counters, and device state registration.
- Separate host shared library
librocshmem_host.soand device bitcode librocshmem_device.bc, and function to initialize device state into constant memory. Required for integration with Mojo. - Multi-node TCP bootstrap: Create unique IDs from IP address and port for RDMA key exchange across nodes without MPI or process launcher dependency.
- Support for GDA backend with all RDMA drivers (ionic, mlx5, bnxt) but prioritizes ionic.
- Fixed symmetric heap allocating to pinned host memory, and then copying to device memory, instead allocates directly to device.
- Set sensible default environment variables to reduce Mojo-side configuration.
There is a convenience script to build with the GDA backend and ionic driver,
then build the test, then pass arguments to the test binary at
projects/rocshmem/tests/build_and_run_multi_node_test.sh.
First find node 0 IP address for the eno0 network management interface that enables TCP bootstraping e.g.:
ip addr show eno0
9: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
inet 10.24.8.110/21 brd 10.24.15.255 scope global eno0
We'd use 10.24.8.110.
Replace <NODE0_IP> with the eno0 IP below:
./build_and_run_multi_node_test.sh --run --node 0 --total-nodes 2 --gpus-per-node 8 --server <NODE0_IP> --port 12345./build_and_run_multi_node_test.sh --run --node 1 --total-nodes 2 --gpus-per-node 8 --server <NODE0_IP> --port 12345What Happens:
- Node 0 creates a unique ID with server ip/port and validation data
- Node 1 creates an identical unique ID
- Both nodes then launch 8 threads (one per GPU)
- All 16 PEs (8 per node) initialize rocSHMEM with an identical unique ID
- The TCP bootstrap process shares RDMA keys etc. between the nodes and threads
- Each PE performs a put operation to the next PE in a ring pattern
- Data flows across all GPUs on both nodes via RDMA
To rebuild and run the multi-node test with debug information, add the debug flag on both the server and client e.g.:
./build_and_run_multi_node_test.sh --debug --run --node 0 --total-nodes 2 --gpus-per-node 8 --server <NODE0_IP> --port 12345Welcome to the ROCm Systems super-repo. This repository consolidates multiple ROCm systems projects into a single repository to streamline development, CI, and integration. The first set of projects focuses on requirements for building PyTorch.
This table provides the current status of the migration of specific ROCm systems projects as well as a pointer to their current CI health.
Key:
- Completed: Fully migrated and integrated. This super-repo should be considered the source of truth for this project. The old repo may still be used for release activities.
- In Progress: Ongoing migration, tests, or integration. Please refrain from submitting new pull requests on the individual repo of the project, and develop on the super-repo.
- Pending: Not yet started or in the early planning stages. The individual repo should be considered the source of truth for this project.
| Component | Tentative Date |
|---|
*Remaining schedule to be determined.
Note TheRock CI performs multi-component testing on top of builds leveraging TheRock build system.
Project names have been standardized to match the casing and punctuation of released packages. This removes inconsistent camel-casing and underscores used in legacy repositories.
The repository is organized as follows:
projects/
amdsmi/
aqlprofile/
clr/
hip/
hipother/
hip-tests/
rccl/
rdc/
rocm-core
rocminfo/
rocmsmilib/
rocprofiler/
rocprofiler-compute/
rocprofiler-register/
rocprofiler-sdk/
rocprofiler-systems/
rocrruntime/
rocshmem/
roctracer/
- Each folder under
projects/corresponds to a ROCm systems project that was previously maintained in a standalone GitHub repository and released as distinct packages. - Each folder under
shared/contains code that existed in its own repository and is used as a dependency by multiple projects, but does not produce its own distinct packages in previous ROCm releases.
- Enable unified build and test workflows across ROCm libraries.
- Facilitate shared tooling, CI, and contributor experience.
- Improve integration, visibility, and collaboration across ROCm library teams.
To begin contributing or building, see the CONTRIBUTING.md guide. It includes setup instructions, sparse-checkout configuration, development workflow, and pull request guidelines.
This super-repo contains multiple subprojects, each of which retains the license under which it was originally published.
π Refer to the LICENSE, LICENSE.md, or LICENSE.txt file within each projects/ or shared/ directory for specific license terms.
π Refer to the header notice in individual files outside projects/ or shared/ folders for their specific license terms.
Note: The root of this repository does not define a unified license across all components.
- π¬ Start a discussion
- π Open an issue
We're happy to help!