Skip to content

AMD-AGI/m3d_rocm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient and Portable 3D Explorable World Generation on AMD GPUs


🔆 Introduction

3D world generation has emerged as a rapidly growing area of research, and we want to bring popular projects in this space to the ROCm ecosystem. One such project is Matrix-3D, a framework that generates an explorable 3D world from a text or image prompt by combining conditional video generation with panoramic 3D reconstruction, and representing the resulting scene as 3D Gaussian Splatting. See their tech report for full details.

In this blog, we describe how we deployed Matrix3D on AMD Instinct™ MI250 and MI300 GPUs. With a series of targeted modifications and optimizations, we made the framework both more efficient and more portable: end-to-end generation time for a single world drops from 2887s to 1306s on one MI250 and from 972s to 482s on one MI300.

📝 What This Project Covers

  • [Kernel optimization]: 🔥Replacing rendering kernels with more portable Triton kernels, with help from the kernel-writing agent GEAK, without sacrificing performance.
  • [Faster 3DGS fitting]: 🔥Replacing the original rasterization backend with gsplat for better efficiency and portability.
  • [Pipeline-level optimization]: 🔥Refactoring the pipeline to reduce repeated model loading, I/O overhead, and recomputation, while also accelerating depth-map merging.
  • [Reproducible setup]: 🔥Providing step-by-step instructions for running Matrix3D on AMD GPUs.
  • [End-to-end results]: 🔥Showing the speedup of the optimized version over the original implementation on AMD GPUs.

🎬 Examples

We show both image-to-image and text-to-image results below.

Prompt Panoramic Video 3D Scene
"an impressionistic winter landscape"

The end-to-end latency is also illustrated in the table and figure below. Overall, the optimized version improves latency by 54% on the MI250 GPU and 50% on the MI300 GPU.

Original w/ gsplat w/ solver opt. w/ io opt. Total Reduction
MI250 2887 2527 1406 1306 54%
MI300 972 853 507 482 50%

End-to-end latency comparison between the original and optimized pipelines on MI250 and MI300.

compare

Installation

For ROCm GPUs, we suggest using the built-in docker at rocm/pytorch for example rocm/pytorch:rocm7.2_ubuntu22.04_py3.10_pytorch_release_2.9.1.

After running the docker, clone our project and run:

bash scripts/install_m3d.sh

All the dependencies will be installed automatically.

Usage

For text prompts, run:

bash scripts/run_m3d_i2i.sh

For image prompts, run:

bash scripts/run_m3d_t2i.sh

Acknowledgements

We are grateful for the excellent work of:

About

This project is an optimized version of Matrix3D. It has better compatibility with ROCm ecosystem.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages