3D world generation has emerged as a rapidly growing area of research, and we want to bring popular projects in this space to the ROCm ecosystem. One such project is Matrix-3D, a framework that generates an explorable 3D world from a text or image prompt by combining conditional video generation with panoramic 3D reconstruction, and representing the resulting scene as 3D Gaussian Splatting. See their tech report for full details.
In this blog, we describe how we deployed Matrix3D on AMD Instinct™ MI250 and MI300 GPUs. With a series of targeted modifications and optimizations, we made the framework both more efficient and more portable: end-to-end generation time for a single world drops from 2887s to 1306s on one MI250 and from 972s to 482s on one MI300.
- [Kernel optimization]: 🔥Replacing rendering kernels with more portable Triton kernels, with help from the kernel-writing agent GEAK, without sacrificing performance.
- [Faster 3DGS fitting]: 🔥Replacing the original rasterization backend with gsplat for better efficiency and portability.
- [Pipeline-level optimization]: 🔥Refactoring the pipeline to reduce repeated model loading, I/O overhead, and recomputation, while also accelerating depth-map merging.
- [Reproducible setup]: 🔥Providing step-by-step instructions for running Matrix3D on AMD GPUs.
- [End-to-end results]: 🔥Showing the speedup of the optimized version over the original implementation on AMD GPUs.
We show both image-to-image and text-to-image results below.
| Prompt | Panoramic Video | 3D Scene |
|---|---|---|
![]() |
![]() |
![]() |
| "an impressionistic winter landscape" | ![]() |
![]() |
The end-to-end latency is also illustrated in the table and figure below. Overall, the optimized version improves latency by 54% on the MI250 GPU and 50% on the MI300 GPU.
| Original | w/ gsplat | w/ solver opt. | w/ io opt. | Total Reduction | |
|---|---|---|---|---|---|
| MI250 | 2887 | 2527 | 1406 | 1306 | 54% |
| MI300 | 972 | 853 | 507 | 482 | 50% |
End-to-end latency comparison between the original and optimized pipelines on MI250 and MI300.
For ROCm GPUs, we suggest using the built-in docker at rocm/pytorch for example rocm/pytorch:rocm7.2_ubuntu22.04_py3.10_pytorch_release_2.9.1.
After running the docker, clone our project and run:
bash scripts/install_m3d.sh
All the dependencies will be installed automatically.
For text prompts, run:
bash scripts/run_m3d_i2i.sh
For image prompts, run:
bash scripts/run_m3d_t2i.sh
We are grateful for the excellent work of:





