Skip to content

Conversation

@tristanpoland
Copy link

This pull request adds GPU acceleration benchmarking support for Rapier 3D in both the main crate and the examples, and introduces new dependencies and features to enable GPU compute via WGPU. The changes include the addition of benchmarking binaries and infrastructure for comparing CPU and GPU performance, as well as the configuration needed to support GPU acceleration in both the single-precision and double-precision versions of the 2D and 3D crates.

GPU Acceleration Feature and Dependencies

  • Added a new optional feature gpu-acceleration to rapier2d, rapier2d-f64, rapier3d, and rapier3d-f64, which enables GPU compute support via the WGPU backend. This includes new optional dependencies on wgpu, bytemuck, and pollster. [1] [2] [3] [4] [5] [6] [7] [8]

Benchmarking Infrastructure

  • Added a new benchmark binary gpu_benchmarks to rapier3d (and similarly for rapier2d), which benchmarks CPU vs GPU performance for physics operations, printing a summary table. This is only built when the gpu-acceleration feature is enabled. [1] [2] [3]
  • Added a new example binary gpu_benchmark in examples3d, providing a similar CPU vs GPU benchmark for user experimentation. [1] [2]

Benchmark Implementation

  • The new benchmarks create test scenes with varying numbers of rigid bodies, run both CPU and GPU integration steps, and compare timings, including transfer overheads and actual GPU compute. Results are summarized in an easy-to-read table. [1] [2]

Development and Testing Improvements

  • Added criterion as a development dependency (for benchmarking with HTML reports) and configured the new GPU benchmarks to run only when the gpu-acceleration feature is enabled. [1] [2]

These changes lay the groundwork for further GPU compute integration and provide both automated and example-driven ways to measure performance gains.

Introduce an opt-in GPU acceleration feature (`gpu-acceleration`) across 2D/3D and f64 crates by adding optional dependencies (wgpu, bytemuck, pollster) to their Cargo.toml files. Add a new src/gpu module with: GpuContext (device/adapter initialization and checks), BufferManager / RigidBodyGpuBuffer (SoA GPU buffers and upload_rigid_bodies), and GpuComputePipeline (compute pipeline and binding helpers). Export the gpu module from lib.rs under the feature flag. Includes basic tests and helpers, enabling future GPU compute integration while keeping it opt-in.
Add Criterion dependency (with html_reports) and bench entries to crates/rapier2d/Cargo.toml and crates/rapier3d/Cargo.toml, gated by the "gpu-acceleration" feature. Add new GPU benchmark suites: crates/rapier3d/benches/gpu_benchmarks.rs (full Criterion benchmarks comparing CPU vs GPU across multiple scales: buffer upload, buffer allocation, CPU iteration baseline, roundtrip, and critical-scale comparisons; skips gracefully if GPU is unavailable). Also add an empty placeholder benches/gpu_benchmarks.rs for rapier2d. These changes enable running GPU performance tests and reporting HTML results when the feature and GPU are available.
Implement BufferManager::download_rigid_bodies to read back positions and velocities via staging buffers and buffer-to-buffer copies (with mapping and device.poll). Fix component packing for angular velocities and torques (store value in z component instead of x). Refactor a small inv_mass local var. Update GPU benchmarks to create the GPU buffer once per benchmark input (reuse across iterations) and add a gpu_roundtrip benchmark that uploads then downloads to measure roundtrip cost. Add bench_output.txt capturing benchmark output and compile warnings.
Introduce a GPU integration kernel and end-to-end benchmarks, update GPU API to use shared device/queue handles.

- Add a new GpuIntegrator (src/gpu/integrator.rs) and WGSL integration shader (src/gpu/shaders/integration.wgsl) implementing a symplectic Euler compute kernel.
- Update GPU device/context and BufferManager to use Arc<wgpu::Device>/Arc<wgpu::Queue> for shared ownership (breaking change: GpuContext.device/queue types and BufferManager::new signature changed).
- Export the new integrator from src/gpu/mod.rs.
- Replace the old criterion bench suite with a standalone benchmarking binary (examples3d/gpu_benchmark.rs), add a bin entry in examples3d/Cargo.toml and enable bench harness in crates/rapier3d/Cargo.toml.
- Rename bench_output.txt to phase2_results.txt and update contents with runtime warnings and a sample run showing a GPU initialization panic.

These changes implement Phase 2 GPU compute support and provide a concrete benchmark binary; note callers must be updated for the new Arc-based device/queue API.
Introduce GpuResidentState to manage rigid body state permanently on the GPU. Adds src/gpu/resident_state.rs (PhysX-style resident state) and exposes it from src/gpu/mod.rs. The module implements dirty tracking (added/removed/modified), a handle→index mapping, incremental sync_to_gpu that uploads only deltas, readback_for_rendering for minimal visualization data, buffer resizing, and a small unit test for the DirtyTracker. Uses BufferManager/RigidBodyGpuBuffer for GPU uploads; notes TODOs for partial uploads and GPU-side buffer copying to avoid CPU roundtrips.
@tristanpoland tristanpoland marked this pull request as draft February 10, 2026 21:36
@tristanpoland
Copy link
Author

This was meant to be a PR internal to my fork, my apologies 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant