CuMesh_ROCm is a ROCm/HIP port of JeffreyXiang/CuMesh, a GPU-accelerated library for high-performance 3D geometry processing directly within the PyTorch ecosystem.
The original CuMesh was CUDA-only. This fork converts all CUDA APIs to ROCm/HIP, enabling execution on AMD GPUs.
- GPU-Accelerated Mesh Operations: Topology queries, simplification, hole filling, cleaning — on AMD GPUs via ROCm
- Remeshing: Narrow-band UDF + Dual Contouring
- UV Unwrapping: GPU chart clustering + xatlas packing
- cuBVH: Ray tracing and signed/unsigned distance queries (converted from cubvh)
| GPU | Status | Notes |
|---|---|---|
| NVIDIA (CUDA) | ✅ | Use original CuMesh |
| AMD RDNA3 (gfx11xx) | ✅ | Tested, stable |
| AMD RDNA4 (gfx1201) | Works at <500K elements. Large meshes crash due to rocPRIM bug #776 |
- Python >= 3.10
- PyTorch >= 2.4 (with ROCm support)
- ROCm >= 7.2
git clone https://github.com/ptj0225/CuMesh_ROCm.git --recursive
cd CuMesh_ROCm
pip install -e . --no-build-isolationFor specific GPU arch:
export GPU_ARCHS="gfx1201" # default: native
pip install -e . --no-build-isolation| Branch | Description |
|---|---|
main |
HIP conversion via hipcub (CUB compatibility layer) |
rocprim-direct |
HIP conversion via rocPRIM direct calls (no hipcub dependency) |
The conversion from CUDA to ROCm/HIP was done using:
- hipify-perl: Automated CUDA → HIP API translation (~95% automated)
- Manual fixes:
cuda::std→rocprim::tuple, half-precision flags, namespace corrections - cubvh: De-submoduled and fully converted to HIP (including
thrust::cuda::par→thrust::hip::par)
- gfx1201 (RDNA4) crash at >500K elements: Due to wavefront=32 not being fully supported in rocPRIM. See ROCm/rocPRIM#776
- Dual contour memory fault: Hashmap miss can cause out-of-bounds access. Fixed with bounds check in
simple_dual_contour.cu
Same as upstream CuMesh. See examples directory.
- JeffreyXiang/CuMesh — original CUDA implementation
- cubvh — CUDA BVH toolkit
- xatlas — UV parameterization library
- pamo — GPU parallel edge collapse reference