Minimal viable custom MPI for mesh decomposition#148
Merged
jacob-moore22 merged 52 commits intomainfrom Dec 11, 2025
Merged
Conversation
This reverts commit 9217adb.
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
End-to-end mesh decomposition example plus MPI communication plan overhaul
Context / Why
Highlights
New mesh decomposition example (
examples/mesh_decomp/)mesh_decomp.cpp,decomp_utils.h,mesh.h,state.h,mesh_io.h) that:install_ptscotch.sh) so the example can clone/build Scotch/PT-Scotch in-place underexamples/mesh_decomp/lib/.examples/CMakeLists.txt(behindMPI+KOKKOS) and introduces a localCMakeLists.txtthat wires in the freshly built PT-Scotch archives with whole-archive linking to guarantee symbol resolution.mesh_inputs.h, enabling future CLI/JSON integration..gitignorenow dropsexamples/mesh_decomp/lib/*to keep the vendored PT-Scotch artifacts out of version control.Communication plan rewrite (
src/include/communication_plan.h)CommunicationPlanstruct: communicator handles, neighbor rank vectors, ragged send/recv index tables, counts, and displacements.verify_graph_communicator,verify_send_recv) to sanity-check topology and send/recv metadata—critical while chasing nodal comm bugs.DCArrayKokkos,DRaggedRightArrayKokkos) so the plan stays consistent across host/device.communication_plan_old.h) for staged migration.MPI array container refactor (
src/include/mpi_types.h)MPICArrayKokkoson top of the new communication plan:mpi_type_map<>) replaces ad-hoc specialization calls.MPI_Neighbor_alltoallv).mpi_types_old.hfor comparison/testing.mapped_mpi_types.hno longer exposed) and clarifies the data-oriented ownership of host/device views.Testing
cmake --build <build_dir> --target mesh_decompexamples/mesh_decomp/install_ptscotch.shor letting CMake drive it.mpirun -n <ranks> mesh_decompvtk/Fierro.*.vtuoutputs per rank.test_kokkos_for,mtestkokkos, etc., if those touchMPICArrayKokkos.Follow-ups / Open Questions
mesh_inputs.hinto a true input parser so we can flip between generated and file-based meshes via CLI.CommunicationPlanbuffers withMPI_Neighbor_alltoallv_initto avoid per-step setup once the nodal exchange stabilizes.HAVE_GPU_AWARE_MPI) now that the communication plan explicitly stages host buffers—there may be opportunities to skip host staging entirely.install_ptscotch.shshould pin Scotch revisions and add checksum verification for reproducibility.Reference
mesh_decomporigin/mainexamples/mesh_decomp/{CMakeLists.txt,decomp_utils.h,mesh.h,mesh_io.h,mesh_decomp.cpp,mesh_inputs.h,state.h,install_ptscotch.sh}examples/CMakeLists.txtsrc/include/{communication_plan.h,communication_plan_old.h,mpi_types.h,mpi_types_old.h,mapped_mpi_types.h}.gitignore,scripts/build-matar.sh