TLAS and MultitypeSet by SimonDanisch · Pull Request #14 · JuliaGeometry/Raycore.jl

SimonDanisch · 2026-03-02T11:23:01Z

This adds:

GPU two-level acceleration (TLAS/BLAS): Instanced BVH with per-instance transforms, TLAS/StaticTLAS split (mutable for construction, immutable isbits for kernel traversal), Adapt.jl for CPU→GPU transfer
MultiTypeSet: GPU-safe heterogeneous collection with compile-time type-stable dispatch via with_index, enabling multiple material/texture types without dynamic dispatch on the GPU
GPU utilities: @get/@set SoA macros, for_unrolled/map_unrolled/reduce_unrolled for compile-time loop unrolling, FastClosure for GPU-safe closures

…ry/Raycore.jl into sd/gpu-instanced-bvh

…aycore.jl into sd/multitype-vec

@generated

…12) SetKey.type_idx was changed from UInt8 to UInt32 for LLVM/SPIR-V compatibility, but the @generated with_index function still compared against UInt8 literals. Since Julia's === checks both value and type, UInt32(1) === UInt8(1) is always false, causing all branches to fall through to the default (first material). This made every object render with the same material. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

On Metal, device pointers (Core.LLVMPtr) stored inside GPU buffers cannot be reliably dereferenced by kernels. The inline data (root_aabb) reads correctly, but following embedded pointers to per-BLAS node/primitive arrays returns zeros. Replace the pointer-based BLAS architecture in StaticTLAS with: - BLASDescriptor: lightweight struct with nodes_offset, primitives_offset, root_aabb - Flat concatenated arrays (all_blas_nodes, all_blas_prims) built from per-BLAS GPU arrays - Offset-based indexing in closest_hit/any_hit traversal Management kernels (update_tlas_leaf_aabbs_kernel!, etc.) still use blas_array but only read root_aabb (inline, unaffected). Verified: CPU and Metal produce identical results (mean pixel ~0.327 on 3-sphere test scene). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Move _build_flat_blas_arrays! from adapt_structure to sync! so data is ready before any kernel dispatch. - Add KA.synchronize after AK.sortperm in BLAS/TLAS build to ensure sort temp buffers aren't freed while GPU is still using them. - Replace merge_sort_by_key! with sortperm in TLAS topology (avoids 64-bit Int payload corruption on Lava). - Add _REBUILD_KEEPALIVE and synchronize in multitypeset rebuild_static!. - Remove underscore-prefixed function names in RaycoreLavaExt. - Add instanced BVH test.

Shadow GC-keepalive bag with ordering-assumed indexing (nodes at [2(i-1)+1], prims at +2) replaced with a typed struct per BLAS. Field ownership on the TLAS now transitively pins each BLAS's backing nodes and primitives buffers; the isbits device pointers stored in blas_array remain valid as long as the TLAS does. - struct BLASArrays { nodes, primitives } introduced. - TLAS.blas_storage::Vector{BLASArrays} replaces gpu_blas_arrays. - _to_isbits_blas takes the typed storage vector. - _build_flat_blas_arrays!, eltype(::TLAS), compaction path, and the per-BLAS update path all read storage[i].nodes / .primitives directly instead of doing arithmetic on a flat bag. Also includes: - HWTLAS.sync! compacts deleted handles instead of letting orphan instance/BLAS entries grow unbounded across re-push! cycles. - Delete the _REBUILD_KEEPALIVE global in multitypeset.jl; the caller already owns the old static via the compute-graph edge that holds it.

Old semantics: `instance_id` was an auto-incremented user tag, managed by the `TLAS.next_instance_id` counter. Nothing downstream consumed it; shader code looked up material via per-triangle metadata, forcing one BLAS per material (meshscatter BLAS explosion). New semantics: `instance_id == 0` means "inherit from the triangle's per-face metadata"; any nonzero value is forwarded verbatim through closest_hit / any_hit as the 5th return element and interpreted by the caller (Hikari uses it as a `medium_interface_idx` override). This matches Vulkan's `gl_InstanceCustomIndexEXT` and lets one BLAS carry N instances with distinct materials. - Drop `TLAS.next_instance_id` field. - `push!(tlas, mesh, transform; instance_id=UInt32(0))` — single-instance. - `push!(tlas, mesh, transforms; instance_ids=nothing)` — N-instance, `instance_ids::Vector{UInt32}` gives per-instance overrides. - Shared helpers `_build_and_append_blas!` and `_append_instances_with_handle!` extracted from the two push! methods. - `update!(tlas, handle, new_geometry)`: the non-Hikari fallback path stops embedding `first_desc.instance_id` into triangle metadata (that was nonsense under the new semantics anyway). - HWTLAS: `push!` accepts `instance_id` / `instance_ids` kwargs, `instance_custom_indices` now stores the override directly; compact path no longer remaps them (they're scene-level, not BLAS-level). - Tests updated: the `instance_ids` assertions now pass explicit overrides and check they round-trip.

Traversal already tracks `current_instance` as the 1-based position in `tlas.instances`. Returning that index directly gives callers single- source-of-truth access to the whole `InstanceDescriptor` — transforms, interface override, flags — without any of it needing to be duplicated into the 5-tuple. Before: `(hit, tri, t, bary, inst.instance_id)` — only the override. After: `(hit, tri, t, bary, inst_idx)` — look up whatever you need via `tlas.instances[inst_idx]`. Miss returns `UInt32(0)` (not INVALID_NODE, which was node-index typed and confusing for an instance slot). Tests updated to check the array-index semantics. The earlier switch to explicit `instance_ids` for the identification test is rolled back — default pushes now give 1, 2, 3 as array positions.

…ves triangle lookup Brings the HW wavefront in line with the new SW semantics: - `RTHitResult._pad1` replaced with `instance_id::UInt32` (carries `gl_InstanceID`). - HWTLAS.sync! precomputes `per_instance_tri_offsets` keyed by gl_InstanceID (= `blas_offsets[instance_blas_indices[i]]`). The backend no longer has to remap `instance_custom_index` through blas_offsets; the lookup is a single indexed load. - `instance_custom_index` now passes through untouched — it carries the interface override (`InstanceDescriptor.instance_id`). - `build_hw_tlas` extension-stub signature: new required kwarg `per_instance_tri_offsets`. - New intrinsic `rt_instance_id` exposed (maps to gl_InstanceID). The Lava extension: - `build_hw_tlas` uploads `per_instance_tri_offsets` as `off_gpu` (replacing the old per-BLAS `blas_offsets`), passes it into `Lava.HardwareAccel`. - `PrecomputedHitsAccel.closest_hit` uses `result.instance_id` for the triangle lookup and returns `result.instance_custom_index` as the 5th tuple element (the override). - `rt_instance_id(::LavaHWAdapted)` routes to `lava_rt_instance_id()`.

…aycore.jl into sd/multitype-vec

Design for the next release cut: move HWTLAS + HW RT orchestration from Raycore into Lava, delete the RaycoreLavaExt + Raycore-side stubs, drop the CPU fence in sync! in favor of Lava's timeline-gated deferred free, and restructure tests + docs around the new division (Raycore owns SW TLAS + abstract accel contract; Lava owns HWTLAS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Julia 1.12 requires a function to exist in a module before it can be extended from another module. Add a no-method stub so Lava can define Raycore.push_instances!(::HWTLAS, ...) in hwtlas.jl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The stub added in P2.1 (commit 5686896) is no longer needed -- Lava's push_instances! has been folded into Base.push! overloads in P3.4b. Nothing extends Raycore.push_instances! anymore. P3.4b cleanup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Stub for the cross-backend accessor that returns the GPU instance buffer behind an InstanceBatch handle. Concrete implementation lives in Lava for HWTLAS. P3.4c of the GPU rigid-body pipeline.

- Single user-facing mutation API on Raycore.TLAS: update_transform! / update_transforms! + sync! as the sole commit boundary; refit_tlas! / update_instance_transform[s]! demoted to internal helpers; the global _PENDING_TLAS_TRANSFORMS objectid-keyed cache is gone (writes go directly to tlas.instances). - Cross-backend Adapt.adapt(::TLAS) errors loudly when backends mismatch instead of silently returning a static_tlas with arrays on the wrong device. - Static_tlas drains its flat-BLAS arrays when the TLAS empties. - New stress suite (test_tlas_stress.jl) covering: 5000-instance refit loop with per-frame trace, 2000-instance rebuild loop, interleaved update+trace tight loops, 500-iter mesh grow/shrink with raytracing per iter, hard memory bounds with WeakRefs, use-after-free attempts. - test_mesh_update.jl + test_abstract_accel_contract.jl added; existing bounds tightened from ~25x slack to exact equality per iteration. - CI matrix: [cpu, lavapipe]. Lavapipe job apt-installs mesa-vulkan-drivers and forces VK_DRIVER_FILES; runtests.jl gates Lava- using suites behind a backend-selection layer (test_backend(), test_is_lavapipe(), test_has_hw_rt(), @lavapipe_broken). Lavapipe currently early-bails the kernel-using suites with a placeholder broken-test pending the upstream alignment-hint fix in Lava's SPIR-V emitter; cpu suite runs with KA.CPU(). - Lava added to [sources] (github.com/SimonDanisch/Lava.jl) so Pkg.test can resolve it on a fresh runner; Lava = "0.1" added to [compat] for Aqua.deps_compat. - Demos (wavefront_dynamic / lego / particles) migrated off the removed refit_tlas! / update_transform_at! APIs to the handle-based update_*! + sync! flow.

…aycore.jl into sd/multitype-vec

@ref

…e page `hw_acceleration_content.md` is embedded by BonitoBook into the registered `hw_acceleration.md` page. Both files had `# Hardware-Accelerated Ray Tracing` as their H1, which Documenter sees as two pages with the same slug, breaking `@ref` resolution and erroring the doc build with: Cannot resolve @ref for md"[Hardware-Accelerated Ray Tracing](@ref)" in docs/src/index.md. - Header with slug 'Hardware-Accelerated-Ray-Tracing' is not unique. Renamed the content file's H1 to `# Hardware Ray Tracing with Lava` to match the pattern of the other content/page pairs (which all use distinct H1s). Verified locally with `julia --project=docs/ docs/make.jl` — clean build.

Workflow-level `concurrency:` can't reference `matrix.*` — GitHub rejected the YAML with a parse-time error, which manifested as a 0-second-failed run with `log not found` and no jobs in the API. Moving the concurrency block inside `jobs.test` makes `matrix.backend` visible. Verified locally with `python3 -c "import yaml; …"`.

… init Lava transitively depends on GLFW, whose `__init__` unconditionally calls `glfwInit()` at module load time. On `ubuntu-latest` (headless) this fails: GLFW.GLFWError(65550, "X11: Failed to open display :0") ErrorException("glfwInit failed") Pkg.test precompiles all test deps before running tests, and precompilation finishes by loading the module — which triggers __init__. So even the cpu matrix entry, which never `using Lava` at runtime, can't get past Lava's precompile step. Standard Makie/GLFW.jl CI fix: apt-install `xvfb`, then prefix the runtest with `xvfb-run --auto-servernum`. julia-actions/julia-runtest@v1 supports `prefix:` for exactly this. Removed the obsolete `DISPLAY: ':0'` env (no X server was actually running on that DISPLAY).

The previous concurrency block kept `cancel-in-progress: false` for every trigger, so PR pushes serialized behind earlier (often-stuck) builds — the user observed a 17m+ in-flight run blocking newer PR commits. Per-ref grouping (`pages-${{ github.ref }}`) so master and PRs no longer share a queue at all. `cancel-in-progress` is true for PR refs (latest commit wins, matches ci.yml) and false for master/tag refs (production GitHub Pages deploys complete uninterrupted).

* hw_acceleration_content.md rewritten end-to-end without Hikari/RayMakie: builds a Raycore.TLAS and Lava.HWTLAS from the same scene, traces primary rays through both (SW via Raycore.closest_hit in a @kernel, HW via Lava.trace_closest_hits!), shows side-by-side depth heatmaps and a per-frame timing line. Pixel agreement verified locally (0/49152 hit-mask disagreement, max abs depth diff ~1e-5). * docs/Project.toml: add Lava as a doc dep + [sources] entry pointing at the GitHub URL (matches the test job's pattern, resolves on a standalone CI checkout). * viewfactors_content.md: GeometryBasics 0.5 dropped meta(...); vertex normals now go through GeometryBasics.mesh(...; normal=...). * RaycoreMakieExt.jl: arrows! is deprecated for 3-D inputs; ray plot recipe now uses arrows3d! so the bvh_hit_tests tutorial renders. * bvh_hit_tests_content.md: trailing newline fix only.

Same fix the test job got in 039ba13: GLFW (transitive dep of Lava) calls glfwInit() at module __init__ which needs an X display. ubuntu-latest is headless, so without xvfb the docs runner crashes the moment Pkg.precompile loads Lava — exactly the failure on the latest Documenter run. Also install mesa lavapipe so Vulkan device creation succeeds inside the hw_acceleration tutorial cell (it builds a Lava.HWTLAS and calls trace_closest_hits!); otherwise vkCreateInstance has no driver to pick.

@ref

The hw_acceleration tutorial was the only page that needed Lava on the docs CI, and Lava transitively pulls in GLFW, whose __init__ calls glfwInit() and crashes on the headless ubuntu runner. Until BonitoBook ships export_rich_markdown we just pre-render the one tutorial offline and ship the rendered output. * docs/src/hw_acceleration.md: replaces the BonitoBook InlineBook wrapper. Cell `(editor=…)` annotations stripped so the code blocks render as plain syntax-highlighted Julia, `md"…"` outputs replaced with their literal markdown (concrete numbers from a local run), and the SW vs HW depth heatmap embedded as a static PNG. * docs/src/assets/hw_acceleration_compare.png: the rendered figure. * docs/src/hw_acceleration_content.md: removed (merged into the page). * docs/src/index.md: update the @ref crosslink to match the new H1. * docs/Project.toml: drop Lava (and its [sources] entry) — no doc cell evaluates Lava code anymore. * .github/workflows/Documenter.yml: revert the xvfb + lavapipe scaffolding added in bf835f5; without Lava in docs deps the runner doesn't need them.

… tests src/instanced-bvh.jl * `TLAS` and `BLASArrays` switch from `::Any` to bounded abstract field types (`AbstractVector{InstanceDescriptor}`, `AbstractVector{BVHNode2}`, `Union{Nothing, StaticTLAS}`, etc.). Element types are now concrete on `tlas.instances[i]` / `tlas.nodes[i]` / `_flat_blas_descs[i]`, so CPU- side helpers (`get_instance`, `compact_instances!`, the Makie ext) stop boxing per element. The container itself stays abstract because `KA.allocate` returns backend-specific concrete types we don't want to fix at struct-definition time and `sync!` may reallocate to a different container across mutations. `blas_array` and the three `_flat_*` fields stay `Union{Nothing, AbstractVector}` because the concrete Triangle metadata type isn't known until the first push!. * `n_instances(tlas::TLAS)` no longer counts handles that have been `delete!`d but not yet compacted by `sync!`. Splits the method between `TLAS` (aware of `deleted_handles`) and `StaticTLAS` (no pending state). Project.toml * `julia = "1.10"` → `"1.12"` since 1.12 is what we test and ship on. * `[sources] Lava` pinned to a rev so a Raycore commit reproduces with the same Lava build. Bumping it is now a reviewed action. * Drop `BenchmarkTools` from `[extras]` — the only consumer was the type-stability section of test_unrolled.jl that we just removed. test/ * Delete `test_type_stability.jl` (475 LoC). The whole file was `@test_opt_alloc` against `@allocated == 0`, which 1.12's allocator accounting reports flaky. These checks have been bitrotten for a while; not worth porting on the path to release. * Strip the type-stability tail of `test_unrolled.jl` (179 LoC) for the same reason — the BenchmarkTools `tune!` path errors on 1.12 closures in a way the test wasn't designed for. Functional tests of the unrolled helpers remain. * `runtests.jl` reflects the deletions; the previously empty `Type Stability` testset and the silently-disabled `Unrolled` include are gone.

SimonDanisch and others added 30 commits December 22, 2025 19:23

add better camera

3f2e5c5

get things working

7360498

fixes tests and docs

cb3ca12

bvh4 experiment

b3312da

bvh4

cd1701c

Merge branch 'sd/gpu-instanced-bvh' of https://github.com/JuliaGeomet…

ad827a0

…ry/Raycore.jl into sd/gpu-instanced-bvh

fixes

4d56d99

unrolling and gpu tools

3d579cd

refactor our unroll strategy

058e724

getindex unrolled

286c7a2

implement multitype vec

0203b6d

api refactor

a4a3651

refactor

7494706

fixes

09d5d76

add mapreduce

6dc4e3a

renaming and fixes

59d3498

add deref for array for more uniform handling

819d1a4

add comment

2c5cb2c

small fixes

12b2ccb

improve updating support

ab4184c

fix empty blas?

23e0465

change for per triangle meta

70b2ea3

allow submesh materials

0bdbb63

refactor and cleanup

875706a

fix setkey for OpenCL

3e6b3c8

use less depth

f132fbe

Merge branch 'sd/multitype-vec' of https://github.com/JuliaGeometry/R…

7c0123f

…aycore.jl into sd/multitype-vec

polish for release

46133d2

SimonDanisch and others added 28 commits April 6, 2026 16:55

cleaner api

17f62ae

actually mark dirty

701d58d

merge with mark dirty fix

ee98343

memory cleanup and tests

31a7f63

merges

eb797a7

fix refit update

0b9d531

Merge branch 'sd/multitype-vec' of https://github.com/JuliaGeometry/R…

84eb954

…aycore.jl into sd/multitype-vec

feat(rt): instance_buffer(tlas, handle) accessor declaration

92a6a55

Stub for the cross-backend accessor that returns the GPU instance buffer behind an InstanceBatch handle. Concrete implementation lives in Lava for HWTLAS. P3.4c of the GPU rigid-body pipeline.

Merge branch 'sd/multitype-vec' of https://github.com/JuliaGeometry/R…

25dd45b

…aycore.jl into sd/multitype-vec

instance transform cleanup

8c4afd2

backward compat for mat4f

9454d95

SimonDanisch merged commit 006e2a2 into master May 8, 2026
4 checks passed

SimonDanisch deleted the sd/multitype-vec branch May 8, 2026 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLAS and MultitypeSet#14

TLAS and MultitypeSet#14
SimonDanisch merged 70 commits into
masterfrom
sd/multitype-vec

SimonDanisch commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SimonDanisch commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants