[DistInf] Enable RDMA over Ionic AINICs for MoRI EP disaggregated inference by raviguptaamd · Pull Request #147 · ROCm/MAD

raviguptaamd · 2026-04-16T04:12:18Z

Summary

Enable MoRI IO KV cache transfer over Ionic RDMA NICs on clusters where public IPs are not routable between compute nodes
Mount host RDMA libraries (libionic.so, libibverbs, librdmacm and provider directories) into the container so MORI IO can discover Ionic NICs
Set VLLM_HOST_IP to each node's overlay IP so MoRIIO control plane (ZMQ handshake, block allocation notifications, proxy registration) routes through the routable overlay network instead of unreachable public IPs
Pass through MORI RDMA env vars (MORI_IB_GID_INDEX, MORI_RDMA_DEVICES, MORI_IO_LOG_LEVEL) from the launcher into the container
Switch from Docker to Podman for rootless container execution
Use --overlap on srun commands to avoid blocking the SLURM job step
Prefer 10.x.x.x overlay IPs for MASTER_ADDR and inter-node communication
Prefer MODEL_DIR for model path resolution before standard paths
Add PYTHONUNBUFFERED=1 for real-time Python log output
Add launch_mori_1p1d.sh convenience launcher for 1P/1D benchmarks
Update Dockerfile to install MORI from pinned commit on main

Problem

On clusters with Ionic AINICs (back-end RDMA) and Broadcom NICs (front-end overlay network), the MoRIIO connector's get_ip() returns the public IP which is not routable between compute nodes. This causes the decode node to be unable to send block allocation notifications back to the prefill node, creating a circular deadlock where both sides hang indefinitely waiting for KV transfer.

Solution

Set VLLM_HOST_IP per node to the overlay IP (10.x.x.x) — get_ip() checks this env var first
Mount host Ionic RDMA libraries into the container so mori::io::RdmaManager can discover Ionic NICs
Pass MORI_IB_GID_INDEX=1 to select the correct RoCE v2 GID for Ionic

Test Plan

DeepSeek-V3 1P/1D on 2x MI355X (8 GPUs each) with Ionic AINICs
Full benchmark suite: ISL/OSL 1024/1024, 8192/1024, 1024/8192
Concurrency sweep: 8, 16, 32, 64, 128, 256, 512
All requests successful, 0 failures across all configurations
RDMA over Ionic confirmed via RdmaBackend logs (nic=ionic)
MoRIIO handshake, KV transfer, and write worker all functional

Made with Cursor

…erence Enable MoRI IO KV cache transfer over Ionic RDMA NICs on clusters where public IPs are not routable between compute nodes. Key changes: - Mount host RDMA libraries (libionic, libibverbs, librdmacm) and provider directory into the container so MORI IO can discover Ionic NICs - Set VLLM_HOST_IP to each node's overlay IP so MoRIIO control plane (ZMQ handshake, block allocation notifications, proxy registration) routes through the overlay network instead of unreachable public IPs - Pass through MORI RDMA env vars (MORI_IB_GID_INDEX, MORI_RDMA_DEVICES, MORI_IO_LOG_LEVEL) from the launcher into the container - Switch from docker to podman for rootless container execution - Use --overlap on srun commands to avoid blocking the SLURM job step - Prefer 10.x.x.x overlay IPs for MASTER_ADDR and inter-node comms - Prefer MODEL_DIR for model path resolution before standard paths - Add PYTHONUNBUFFERED=1 for real-time Python log output - Add launch_mori_1p1d.sh convenience launcher for 1P/1D benchmarks - Update Dockerfile to install MORI from pinned commit on main Tested: DeepSeek-V3 1P/1D on 2x MI355X nodes with Ionic AINICs, full benchmark suite (ISL/OSL: 1024/1024, 8192/1024, 1024/8192, concurrency: 8-512), all requests successful with 0 failures. Made-with: Cursor

…Ionic AINIC Extends the Ionic AINIC RDMA support to multi-node disaggregated inference with 2 Prefill + 2 Decode nodes (DP=16). Key changes: - Remove 1P/1D restriction from run_xPyD_models.slurm and vllm_disagg_mori_ep.sh to allow xP>1 / yD>1 topologies - Add --ulimit memlock=-1:-1 to podman for large RDMA memory registrations (>32GB) required by MoRI IO - Pass NCCL_IB_HCA, NCCL_IB_GID_INDEX, NCCL_NET_GDR_LEVEL, NCCL_CROSS_NIC, and MORI_SOCKET_IFNAME into containers for proper multi-node RCCL and MoRI bootstrap over Ionic AINICs - Add apply_moriio_2pd_patches.sh for runtime vLLM patches (PR vllm-project/vllm#39276) fixing engine_id collisions and MoRIIO robustness in multi-node DP configurations - Restrict --kv-transfer-config to master nodes only (child nodes join via --headless and participate in EP all-to-all) - Add launch_mori_2p2d.sh example launcher for 2P/2D benchmarks Tested on AAC MI355X cluster with Ionic RDMA NICs achieving balanced RDMA traffic across all 4 nodes and 1,344 tok/s total throughput on DeepSeek-V3-5layer. Made-with: Cursor

Ravi Gupta added 2 commits April 16, 2026 04:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DistInf] Enable RDMA over Ionic AINICs for MoRI EP disaggregated inference#147

[DistInf] Enable RDMA over Ionic AINICs for MoRI EP disaggregated inference#147
raviguptaamd wants to merge 2 commits intoROCm:developfrom
raviguptaamd:feat/ionic-rdma-mori-ep

raviguptaamd commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raviguptaamd commented Apr 16, 2026

Summary

Problem

Solution

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant