Skip to content

TLAS and MultitypeSet#14

Merged
SimonDanisch merged 70 commits into
masterfrom
sd/multitype-vec
May 8, 2026
Merged

TLAS and MultitypeSet#14
SimonDanisch merged 70 commits into
masterfrom
sd/multitype-vec

Conversation

@SimonDanisch
Copy link
Copy Markdown
Member

This adds:

  • GPU two-level acceleration (TLAS/BLAS): Instanced BVH with per-instance transforms, TLAS/StaticTLAS split (mutable for construction, immutable isbits for kernel traversal), Adapt.jl for CPU→GPU transfer
  • MultiTypeSet: GPU-safe heterogeneous collection with compile-time type-stable dispatch via with_index, enabling multiple material/texture types without dynamic dispatch on the GPU
  • GPU utilities: @get/@set SoA macros, for_unrolled/map_unrolled/reduce_unrolled for compile-time loop unrolling, FastClosure for GPU-safe closures

SimonDanisch and others added 30 commits December 22, 2025 19:23
…12)

SetKey.type_idx was changed from UInt8 to UInt32 for LLVM/SPIR-V
compatibility, but the @generated with_index function still compared
against UInt8 literals. Since Julia's === checks both value and type,
UInt32(1) === UInt8(1) is always false, causing all branches to fall
through to the default (first material). This made every object render
with the same material.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
On Metal, device pointers (Core.LLVMPtr) stored inside GPU buffers
cannot be reliably dereferenced by kernels. The inline data (root_aabb)
reads correctly, but following embedded pointers to per-BLAS node/primitive
arrays returns zeros.

Replace the pointer-based BLAS architecture in StaticTLAS with:
- BLASDescriptor: lightweight struct with nodes_offset, primitives_offset, root_aabb
- Flat concatenated arrays (all_blas_nodes, all_blas_prims) built from per-BLAS GPU arrays
- Offset-based indexing in closest_hit/any_hit traversal

Management kernels (update_tlas_leaf_aabbs_kernel!, etc.) still use
blas_array but only read root_aabb (inline, unaffected).

Verified: CPU and Metal produce identical results (mean pixel ~0.327
on 3-sphere test scene).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SimonDanisch and others added 28 commits April 6, 2026 16:55
- Move _build_flat_blas_arrays! from adapt_structure to sync! so data is
  ready before any kernel dispatch.

- Add KA.synchronize after AK.sortperm in BLAS/TLAS build to ensure sort
  temp buffers aren't freed while GPU is still using them.

- Replace merge_sort_by_key! with sortperm in TLAS topology (avoids 64-bit
  Int payload corruption on Lava).

- Add _REBUILD_KEEPALIVE and synchronize in multitypeset rebuild_static!.

- Remove underscore-prefixed function names in RaycoreLavaExt.

- Add instanced BVH test.
Shadow GC-keepalive bag with ordering-assumed indexing (nodes at
[2(i-1)+1], prims at +2) replaced with a typed struct per BLAS.  Field
ownership on the TLAS now transitively pins each BLAS's backing nodes
and primitives buffers; the isbits device pointers stored in blas_array
remain valid as long as the TLAS does.

- struct BLASArrays { nodes, primitives } introduced.
- TLAS.blas_storage::Vector{BLASArrays} replaces gpu_blas_arrays.
- _to_isbits_blas takes the typed storage vector.
- _build_flat_blas_arrays!, eltype(::TLAS), compaction path, and the
  per-BLAS update path all read storage[i].nodes / .primitives directly
  instead of doing arithmetic on a flat bag.

Also includes:
- HWTLAS.sync! compacts deleted handles instead of letting orphan
  instance/BLAS entries grow unbounded across re-push! cycles.
- Delete the _REBUILD_KEEPALIVE global in multitypeset.jl; the caller
  already owns the old static via the compute-graph edge that holds it.
Old semantics: `instance_id` was an auto-incremented user tag, managed by
the `TLAS.next_instance_id` counter.  Nothing downstream consumed it;
shader code looked up material via per-triangle metadata, forcing one
BLAS per material (meshscatter BLAS explosion).

New semantics: `instance_id == 0` means "inherit from the triangle's
per-face metadata"; any nonzero value is forwarded verbatim through
closest_hit / any_hit as the 5th return element and interpreted by the
caller (Hikari uses it as a `medium_interface_idx` override).  This
matches Vulkan's `gl_InstanceCustomIndexEXT` and lets one BLAS carry N
instances with distinct materials.

- Drop `TLAS.next_instance_id` field.
- `push!(tlas, mesh, transform; instance_id=UInt32(0))` — single-instance.
- `push!(tlas, mesh, transforms; instance_ids=nothing)` — N-instance,
  `instance_ids::Vector{UInt32}` gives per-instance overrides.
- Shared helpers `_build_and_append_blas!` and
  `_append_instances_with_handle!` extracted from the two push! methods.
- `update!(tlas, handle, new_geometry)`: the non-Hikari fallback path
  stops embedding `first_desc.instance_id` into triangle metadata (that
  was nonsense under the new semantics anyway).
- HWTLAS: `push!` accepts `instance_id` / `instance_ids` kwargs,
  `instance_custom_indices` now stores the override directly; compact
  path no longer remaps them (they're scene-level, not BLAS-level).
- Tests updated: the `instance_ids` assertions now pass explicit
  overrides and check they round-trip.
Traversal already tracks `current_instance` as the 1-based position in
`tlas.instances`.  Returning that index directly gives callers single-
source-of-truth access to the whole `InstanceDescriptor` — transforms,
interface override, flags — without any of it needing to be duplicated
into the 5-tuple.

Before: `(hit, tri, t, bary, inst.instance_id)` — only the override.
After:  `(hit, tri, t, bary, inst_idx)` — look up whatever you need via
                                          `tlas.instances[inst_idx]`.

Miss returns `UInt32(0)` (not INVALID_NODE, which was node-index typed
and confusing for an instance slot).

Tests updated to check the array-index semantics.  The earlier switch
to explicit `instance_ids` for the identification test is rolled back —
default pushes now give 1, 2, 3 as array positions.
…ves triangle lookup

Brings the HW wavefront in line with the new SW semantics:

- `RTHitResult._pad1` replaced with `instance_id::UInt32` (carries
  `gl_InstanceID`).
- HWTLAS.sync! precomputes `per_instance_tri_offsets` keyed by
  gl_InstanceID (= `blas_offsets[instance_blas_indices[i]]`).  The
  backend no longer has to remap `instance_custom_index` through
  blas_offsets; the lookup is a single indexed load.
- `instance_custom_index` now passes through untouched — it carries the
  interface override (`InstanceDescriptor.instance_id`).
- `build_hw_tlas` extension-stub signature: new required kwarg
  `per_instance_tri_offsets`.
- New intrinsic `rt_instance_id` exposed (maps to gl_InstanceID).

The Lava extension:
- `build_hw_tlas` uploads `per_instance_tri_offsets` as `off_gpu`
  (replacing the old per-BLAS `blas_offsets`), passes it into
  `Lava.HardwareAccel`.
- `PrecomputedHitsAccel.closest_hit` uses `result.instance_id` for the
  triangle lookup and returns `result.instance_custom_index` as the 5th
  tuple element (the override).
- `rt_instance_id(::LavaHWAdapted)` routes to `lava_rt_instance_id()`.
Design for the next release cut: move HWTLAS + HW RT orchestration from
Raycore into Lava, delete the RaycoreLavaExt + Raycore-side stubs, drop
the CPU fence in sync! in favor of Lava's timeline-gated deferred free,
and restructure tests + docs around the new division (Raycore owns SW
TLAS + abstract accel contract; Lava owns HWTLAS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Julia 1.12 requires a function to exist in a module before it can be
extended from another module. Add a no-method stub so Lava can define
Raycore.push_instances!(::HWTLAS, ...) in hwtlas.jl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The stub added in P2.1 (commit 5686896) is no longer needed --
Lava's push_instances! has been folded into Base.push! overloads
in P3.4b. Nothing extends Raycore.push_instances! anymore.

P3.4b cleanup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stub for the cross-backend accessor that returns the GPU instance buffer
behind an InstanceBatch handle. Concrete implementation lives in Lava
for HWTLAS.

P3.4c of the GPU rigid-body pipeline.
- Single user-facing mutation API on Raycore.TLAS: update_transform! /
  update_transforms! + sync! as the sole commit boundary; refit_tlas! /
  update_instance_transform[s]! demoted to internal helpers; the global
  _PENDING_TLAS_TRANSFORMS objectid-keyed cache is gone (writes go
  directly to tlas.instances).
- Cross-backend Adapt.adapt(::TLAS) errors loudly when backends mismatch
  instead of silently returning a static_tlas with arrays on the wrong
  device.
- Static_tlas drains its flat-BLAS arrays when the TLAS empties.
- New stress suite (test_tlas_stress.jl) covering: 5000-instance refit
  loop with per-frame trace, 2000-instance rebuild loop, interleaved
  update+trace tight loops, 500-iter mesh grow/shrink with raytracing
  per iter, hard memory bounds with WeakRefs, use-after-free attempts.
- test_mesh_update.jl + test_abstract_accel_contract.jl added; existing
  bounds tightened from ~25x slack to exact equality per iteration.
- CI matrix: [cpu, lavapipe]. Lavapipe job apt-installs
  mesa-vulkan-drivers and forces VK_DRIVER_FILES; runtests.jl gates Lava-
  using suites behind a backend-selection layer (test_backend(),
  test_is_lavapipe(), test_has_hw_rt(), @lavapipe_broken). Lavapipe
  currently early-bails the kernel-using suites with a placeholder
  broken-test pending the upstream alignment-hint fix in Lava's SPIR-V
  emitter; cpu suite runs with KA.CPU().
- Lava added to [sources] (github.com/SimonDanisch/Lava.jl) so Pkg.test
  can resolve it on a fresh runner; Lava = "0.1" added to [compat] for
  Aqua.deps_compat.
- Demos (wavefront_dynamic / lego / particles) migrated off the removed
  refit_tlas! / update_transform_at! APIs to the handle-based update_*!
  + sync! flow.
…e page

`hw_acceleration_content.md` is embedded by BonitoBook into the registered
`hw_acceleration.md` page. Both files had `# Hardware-Accelerated Ray Tracing`
as their H1, which Documenter sees as two pages with the same slug, breaking
`@ref` resolution and erroring the doc build with:

  Cannot resolve @ref for md"[Hardware-Accelerated Ray Tracing](@ref)" in
  docs/src/index.md.
  - Header with slug 'Hardware-Accelerated-Ray-Tracing' is not unique.

Renamed the content file's H1 to `# Hardware Ray Tracing with Lava` to match
the pattern of the other content/page pairs (which all use distinct H1s).
Verified locally with `julia --project=docs/ docs/make.jl` — clean build.
Workflow-level `concurrency:` can't reference `matrix.*` — GitHub
rejected the YAML with a parse-time error, which manifested as a
0-second-failed run with `log not found` and no jobs in the API.
Moving the concurrency block inside `jobs.test` makes `matrix.backend`
visible.  Verified locally with `python3 -c "import yaml; …"`.
… init

Lava transitively depends on GLFW, whose `__init__` unconditionally calls
`glfwInit()` at module load time.  On `ubuntu-latest` (headless) this
fails:

  GLFW.GLFWError(65550, "X11: Failed to open display :0")
  ErrorException("glfwInit failed")

Pkg.test precompiles all test deps before running tests, and
precompilation finishes by loading the module — which triggers __init__.
So even the cpu matrix entry, which never `using Lava` at runtime, can't
get past Lava's precompile step.

Standard Makie/GLFW.jl CI fix: apt-install `xvfb`, then prefix the
runtest with `xvfb-run --auto-servernum`.  julia-actions/julia-runtest@v1
supports `prefix:` for exactly this.  Removed the obsolete
`DISPLAY: ':0'` env (no X server was actually running on that DISPLAY).
The previous concurrency block kept `cancel-in-progress: false` for every
trigger, so PR pushes serialized behind earlier (often-stuck) builds —
the user observed a 17m+ in-flight run blocking newer PR commits.

Per-ref grouping (`pages-${{ github.ref }}`) so master and PRs no longer
share a queue at all.  `cancel-in-progress` is true for PR refs (latest
commit wins, matches ci.yml) and false for master/tag refs (production
GitHub Pages deploys complete uninterrupted).
* hw_acceleration_content.md rewritten end-to-end without Hikari/RayMakie:
  builds a Raycore.TLAS and Lava.HWTLAS from the same scene, traces primary
  rays through both (SW via Raycore.closest_hit in a @kernel, HW via
  Lava.trace_closest_hits!), shows side-by-side depth heatmaps and a
  per-frame timing line.  Pixel agreement verified locally (0/49152
  hit-mask disagreement, max abs depth diff ~1e-5).
* docs/Project.toml: add Lava as a doc dep + [sources] entry pointing at
  the GitHub URL (matches the test job's pattern, resolves on a
  standalone CI checkout).
* viewfactors_content.md: GeometryBasics 0.5 dropped meta(...); vertex
  normals now go through GeometryBasics.mesh(...; normal=...).
* RaycoreMakieExt.jl: arrows! is deprecated for 3-D inputs; ray plot
  recipe now uses arrows3d! so the bvh_hit_tests tutorial renders.
* bvh_hit_tests_content.md: trailing newline fix only.
Same fix the test job got in 039ba13: GLFW (transitive dep of Lava) calls
glfwInit() at module __init__ which needs an X display.  ubuntu-latest is
headless, so without xvfb the docs runner crashes the moment Pkg.precompile
loads Lava — exactly the failure on the latest Documenter run.

Also install mesa lavapipe so Vulkan device creation succeeds inside the
hw_acceleration tutorial cell (it builds a Lava.HWTLAS and calls
trace_closest_hits!); otherwise vkCreateInstance has no driver to pick.
The hw_acceleration tutorial was the only page that needed Lava on the
docs CI, and Lava transitively pulls in GLFW, whose __init__ calls
glfwInit() and crashes on the headless ubuntu runner.  Until BonitoBook
ships export_rich_markdown we just pre-render the one tutorial offline
and ship the rendered output.

* docs/src/hw_acceleration.md: replaces the BonitoBook InlineBook wrapper.
  Cell `(editor=…)` annotations stripped so the code blocks render as
  plain syntax-highlighted Julia, `md"…"` outputs replaced with their
  literal markdown (concrete numbers from a local run), and the SW vs HW
  depth heatmap embedded as a static PNG.
* docs/src/assets/hw_acceleration_compare.png: the rendered figure.
* docs/src/hw_acceleration_content.md: removed (merged into the page).
* docs/src/index.md: update the @ref crosslink to match the new H1.
* docs/Project.toml: drop Lava (and its [sources] entry) — no doc cell
  evaluates Lava code anymore.
* .github/workflows/Documenter.yml: revert the xvfb + lavapipe scaffolding
  added in bf835f5; without Lava in docs deps the runner doesn't need them.
… tests

src/instanced-bvh.jl
  * `TLAS` and `BLASArrays` switch from `::Any` to bounded abstract field
    types (`AbstractVector{InstanceDescriptor}`, `AbstractVector{BVHNode2}`,
    `Union{Nothing, StaticTLAS}`, etc.).  Element types are now concrete on
    `tlas.instances[i]` / `tlas.nodes[i]` / `_flat_blas_descs[i]`, so CPU-
    side helpers (`get_instance`, `compact_instances!`, the Makie ext) stop
    boxing per element.  The container itself stays abstract because
    `KA.allocate` returns backend-specific concrete types we don't want to
    fix at struct-definition time and `sync!` may reallocate to a different
    container across mutations.  `blas_array` and the three `_flat_*`
    fields stay `Union{Nothing, AbstractVector}` because the concrete
    Triangle metadata type isn't known until the first push!.

  * `n_instances(tlas::TLAS)` no longer counts handles that have been
    `delete!`d but not yet compacted by `sync!`.  Splits the method between
    `TLAS` (aware of `deleted_handles`) and `StaticTLAS` (no pending state).

Project.toml
  * `julia = "1.10"` → `"1.12"` since 1.12 is what we test and ship on.
  * `[sources] Lava` pinned to a rev so a Raycore commit reproduces with
    the same Lava build.  Bumping it is now a reviewed action.
  * Drop `BenchmarkTools` from `[extras]` — the only consumer was the
    type-stability section of test_unrolled.jl that we just removed.

test/
  * Delete `test_type_stability.jl` (475 LoC).  The whole file was
    `@test_opt_alloc` against `@allocated == 0`, which 1.12's allocator
    accounting reports flaky.  These checks have been bitrotten for a
    while; not worth porting on the path to release.
  * Strip the type-stability tail of `test_unrolled.jl` (179 LoC) for the
    same reason — the BenchmarkTools `tune!` path errors on 1.12 closures
    in a way the test wasn't designed for.  Functional tests of the
    unrolled helpers remain.
  * `runtests.jl` reflects the deletions; the previously empty `Type
    Stability` testset and the silently-disabled `Unrolled` include are
    gone.
@SimonDanisch SimonDanisch merged commit 006e2a2 into master May 8, 2026
4 checks passed
@SimonDanisch SimonDanisch deleted the sd/multitype-vec branch May 8, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants