TLAS and MultitypeSet#14
Merged
Merged
Conversation
…ry/Raycore.jl into sd/gpu-instanced-bvh
…aycore.jl into sd/multitype-vec
…12) SetKey.type_idx was changed from UInt8 to UInt32 for LLVM/SPIR-V compatibility, but the @generated with_index function still compared against UInt8 literals. Since Julia's === checks both value and type, UInt32(1) === UInt8(1) is always false, causing all branches to fall through to the default (first material). This made every object render with the same material. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
On Metal, device pointers (Core.LLVMPtr) stored inside GPU buffers cannot be reliably dereferenced by kernels. The inline data (root_aabb) reads correctly, but following embedded pointers to per-BLAS node/primitive arrays returns zeros. Replace the pointer-based BLAS architecture in StaticTLAS with: - BLASDescriptor: lightweight struct with nodes_offset, primitives_offset, root_aabb - Flat concatenated arrays (all_blas_nodes, all_blas_prims) built from per-BLAS GPU arrays - Offset-based indexing in closest_hit/any_hit traversal Management kernels (update_tlas_leaf_aabbs_kernel!, etc.) still use blas_array but only read root_aabb (inline, unaffected). Verified: CPU and Metal produce identical results (mean pixel ~0.327 on 3-sphere test scene). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move _build_flat_blas_arrays! from adapt_structure to sync! so data is ready before any kernel dispatch. - Add KA.synchronize after AK.sortperm in BLAS/TLAS build to ensure sort temp buffers aren't freed while GPU is still using them. - Replace merge_sort_by_key! with sortperm in TLAS topology (avoids 64-bit Int payload corruption on Lava). - Add _REBUILD_KEEPALIVE and synchronize in multitypeset rebuild_static!. - Remove underscore-prefixed function names in RaycoreLavaExt. - Add instanced BVH test.
Shadow GC-keepalive bag with ordering-assumed indexing (nodes at
[2(i-1)+1], prims at +2) replaced with a typed struct per BLAS. Field
ownership on the TLAS now transitively pins each BLAS's backing nodes
and primitives buffers; the isbits device pointers stored in blas_array
remain valid as long as the TLAS does.
- struct BLASArrays { nodes, primitives } introduced.
- TLAS.blas_storage::Vector{BLASArrays} replaces gpu_blas_arrays.
- _to_isbits_blas takes the typed storage vector.
- _build_flat_blas_arrays!, eltype(::TLAS), compaction path, and the
per-BLAS update path all read storage[i].nodes / .primitives directly
instead of doing arithmetic on a flat bag.
Also includes:
- HWTLAS.sync! compacts deleted handles instead of letting orphan
instance/BLAS entries grow unbounded across re-push! cycles.
- Delete the _REBUILD_KEEPALIVE global in multitypeset.jl; the caller
already owns the old static via the compute-graph edge that holds it.
Old semantics: `instance_id` was an auto-incremented user tag, managed by
the `TLAS.next_instance_id` counter. Nothing downstream consumed it;
shader code looked up material via per-triangle metadata, forcing one
BLAS per material (meshscatter BLAS explosion).
New semantics: `instance_id == 0` means "inherit from the triangle's
per-face metadata"; any nonzero value is forwarded verbatim through
closest_hit / any_hit as the 5th return element and interpreted by the
caller (Hikari uses it as a `medium_interface_idx` override). This
matches Vulkan's `gl_InstanceCustomIndexEXT` and lets one BLAS carry N
instances with distinct materials.
- Drop `TLAS.next_instance_id` field.
- `push!(tlas, mesh, transform; instance_id=UInt32(0))` — single-instance.
- `push!(tlas, mesh, transforms; instance_ids=nothing)` — N-instance,
`instance_ids::Vector{UInt32}` gives per-instance overrides.
- Shared helpers `_build_and_append_blas!` and
`_append_instances_with_handle!` extracted from the two push! methods.
- `update!(tlas, handle, new_geometry)`: the non-Hikari fallback path
stops embedding `first_desc.instance_id` into triangle metadata (that
was nonsense under the new semantics anyway).
- HWTLAS: `push!` accepts `instance_id` / `instance_ids` kwargs,
`instance_custom_indices` now stores the override directly; compact
path no longer remaps them (they're scene-level, not BLAS-level).
- Tests updated: the `instance_ids` assertions now pass explicit
overrides and check they round-trip.
Traversal already tracks `current_instance` as the 1-based position in
`tlas.instances`. Returning that index directly gives callers single-
source-of-truth access to the whole `InstanceDescriptor` — transforms,
interface override, flags — without any of it needing to be duplicated
into the 5-tuple.
Before: `(hit, tri, t, bary, inst.instance_id)` — only the override.
After: `(hit, tri, t, bary, inst_idx)` — look up whatever you need via
`tlas.instances[inst_idx]`.
Miss returns `UInt32(0)` (not INVALID_NODE, which was node-index typed
and confusing for an instance slot).
Tests updated to check the array-index semantics. The earlier switch
to explicit `instance_ids` for the identification test is rolled back —
default pushes now give 1, 2, 3 as array positions.
…ves triangle lookup Brings the HW wavefront in line with the new SW semantics: - `RTHitResult._pad1` replaced with `instance_id::UInt32` (carries `gl_InstanceID`). - HWTLAS.sync! precomputes `per_instance_tri_offsets` keyed by gl_InstanceID (= `blas_offsets[instance_blas_indices[i]]`). The backend no longer has to remap `instance_custom_index` through blas_offsets; the lookup is a single indexed load. - `instance_custom_index` now passes through untouched — it carries the interface override (`InstanceDescriptor.instance_id`). - `build_hw_tlas` extension-stub signature: new required kwarg `per_instance_tri_offsets`. - New intrinsic `rt_instance_id` exposed (maps to gl_InstanceID). The Lava extension: - `build_hw_tlas` uploads `per_instance_tri_offsets` as `off_gpu` (replacing the old per-BLAS `blas_offsets`), passes it into `Lava.HardwareAccel`. - `PrecomputedHitsAccel.closest_hit` uses `result.instance_id` for the triangle lookup and returns `result.instance_custom_index` as the 5th tuple element (the override). - `rt_instance_id(::LavaHWAdapted)` routes to `lava_rt_instance_id()`.
…aycore.jl into sd/multitype-vec
Design for the next release cut: move HWTLAS + HW RT orchestration from Raycore into Lava, delete the RaycoreLavaExt + Raycore-side stubs, drop the CPU fence in sync! in favor of Lava's timeline-gated deferred free, and restructure tests + docs around the new division (Raycore owns SW TLAS + abstract accel contract; Lava owns HWTLAS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Julia 1.12 requires a function to exist in a module before it can be extended from another module. Add a no-method stub so Lava can define Raycore.push_instances!(::HWTLAS, ...) in hwtlas.jl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The stub added in P2.1 (commit 5686896) is no longer needed -- Lava's push_instances! has been folded into Base.push! overloads in P3.4b. Nothing extends Raycore.push_instances! anymore. P3.4b cleanup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stub for the cross-backend accessor that returns the GPU instance buffer behind an InstanceBatch handle. Concrete implementation lives in Lava for HWTLAS. P3.4c of the GPU rigid-body pipeline.
- Single user-facing mutation API on Raycore.TLAS: update_transform! / update_transforms! + sync! as the sole commit boundary; refit_tlas! / update_instance_transform[s]! demoted to internal helpers; the global _PENDING_TLAS_TRANSFORMS objectid-keyed cache is gone (writes go directly to tlas.instances). - Cross-backend Adapt.adapt(::TLAS) errors loudly when backends mismatch instead of silently returning a static_tlas with arrays on the wrong device. - Static_tlas drains its flat-BLAS arrays when the TLAS empties. - New stress suite (test_tlas_stress.jl) covering: 5000-instance refit loop with per-frame trace, 2000-instance rebuild loop, interleaved update+trace tight loops, 500-iter mesh grow/shrink with raytracing per iter, hard memory bounds with WeakRefs, use-after-free attempts. - test_mesh_update.jl + test_abstract_accel_contract.jl added; existing bounds tightened from ~25x slack to exact equality per iteration. - CI matrix: [cpu, lavapipe]. Lavapipe job apt-installs mesa-vulkan-drivers and forces VK_DRIVER_FILES; runtests.jl gates Lava- using suites behind a backend-selection layer (test_backend(), test_is_lavapipe(), test_has_hw_rt(), @lavapipe_broken). Lavapipe currently early-bails the kernel-using suites with a placeholder broken-test pending the upstream alignment-hint fix in Lava's SPIR-V emitter; cpu suite runs with KA.CPU(). - Lava added to [sources] (github.com/SimonDanisch/Lava.jl) so Pkg.test can resolve it on a fresh runner; Lava = "0.1" added to [compat] for Aqua.deps_compat. - Demos (wavefront_dynamic / lego / particles) migrated off the removed refit_tlas! / update_transform_at! APIs to the handle-based update_*! + sync! flow.
…aycore.jl into sd/multitype-vec
…e page `hw_acceleration_content.md` is embedded by BonitoBook into the registered `hw_acceleration.md` page. Both files had `# Hardware-Accelerated Ray Tracing` as their H1, which Documenter sees as two pages with the same slug, breaking `@ref` resolution and erroring the doc build with: Cannot resolve @ref for md"[Hardware-Accelerated Ray Tracing](@ref)" in docs/src/index.md. - Header with slug 'Hardware-Accelerated-Ray-Tracing' is not unique. Renamed the content file's H1 to `# Hardware Ray Tracing with Lava` to match the pattern of the other content/page pairs (which all use distinct H1s). Verified locally with `julia --project=docs/ docs/make.jl` — clean build.
Workflow-level `concurrency:` can't reference `matrix.*` — GitHub rejected the YAML with a parse-time error, which manifested as a 0-second-failed run with `log not found` and no jobs in the API. Moving the concurrency block inside `jobs.test` makes `matrix.backend` visible. Verified locally with `python3 -c "import yaml; …"`.
… init
Lava transitively depends on GLFW, whose `__init__` unconditionally calls
`glfwInit()` at module load time. On `ubuntu-latest` (headless) this
fails:
GLFW.GLFWError(65550, "X11: Failed to open display :0")
ErrorException("glfwInit failed")
Pkg.test precompiles all test deps before running tests, and
precompilation finishes by loading the module — which triggers __init__.
So even the cpu matrix entry, which never `using Lava` at runtime, can't
get past Lava's precompile step.
Standard Makie/GLFW.jl CI fix: apt-install `xvfb`, then prefix the
runtest with `xvfb-run --auto-servernum`. julia-actions/julia-runtest@v1
supports `prefix:` for exactly this. Removed the obsolete
`DISPLAY: ':0'` env (no X server was actually running on that DISPLAY).
The previous concurrency block kept `cancel-in-progress: false` for every
trigger, so PR pushes serialized behind earlier (often-stuck) builds —
the user observed a 17m+ in-flight run blocking newer PR commits.
Per-ref grouping (`pages-${{ github.ref }}`) so master and PRs no longer
share a queue at all. `cancel-in-progress` is true for PR refs (latest
commit wins, matches ci.yml) and false for master/tag refs (production
GitHub Pages deploys complete uninterrupted).
* hw_acceleration_content.md rewritten end-to-end without Hikari/RayMakie: builds a Raycore.TLAS and Lava.HWTLAS from the same scene, traces primary rays through both (SW via Raycore.closest_hit in a @kernel, HW via Lava.trace_closest_hits!), shows side-by-side depth heatmaps and a per-frame timing line. Pixel agreement verified locally (0/49152 hit-mask disagreement, max abs depth diff ~1e-5). * docs/Project.toml: add Lava as a doc dep + [sources] entry pointing at the GitHub URL (matches the test job's pattern, resolves on a standalone CI checkout). * viewfactors_content.md: GeometryBasics 0.5 dropped meta(...); vertex normals now go through GeometryBasics.mesh(...; normal=...). * RaycoreMakieExt.jl: arrows! is deprecated for 3-D inputs; ray plot recipe now uses arrows3d! so the bvh_hit_tests tutorial renders. * bvh_hit_tests_content.md: trailing newline fix only.
Same fix the test job got in 039ba13: GLFW (transitive dep of Lava) calls glfwInit() at module __init__ which needs an X display. ubuntu-latest is headless, so without xvfb the docs runner crashes the moment Pkg.precompile loads Lava — exactly the failure on the latest Documenter run. Also install mesa lavapipe so Vulkan device creation succeeds inside the hw_acceleration tutorial cell (it builds a Lava.HWTLAS and calls trace_closest_hits!); otherwise vkCreateInstance has no driver to pick.
The hw_acceleration tutorial was the only page that needed Lava on the docs CI, and Lava transitively pulls in GLFW, whose __init__ calls glfwInit() and crashes on the headless ubuntu runner. Until BonitoBook ships export_rich_markdown we just pre-render the one tutorial offline and ship the rendered output. * docs/src/hw_acceleration.md: replaces the BonitoBook InlineBook wrapper. Cell `(editor=…)` annotations stripped so the code blocks render as plain syntax-highlighted Julia, `md"…"` outputs replaced with their literal markdown (concrete numbers from a local run), and the SW vs HW depth heatmap embedded as a static PNG. * docs/src/assets/hw_acceleration_compare.png: the rendered figure. * docs/src/hw_acceleration_content.md: removed (merged into the page). * docs/src/index.md: update the @ref crosslink to match the new H1. * docs/Project.toml: drop Lava (and its [sources] entry) — no doc cell evaluates Lava code anymore. * .github/workflows/Documenter.yml: revert the xvfb + lavapipe scaffolding added in bf835f5; without Lava in docs deps the runner doesn't need them.
… tests
src/instanced-bvh.jl
* `TLAS` and `BLASArrays` switch from `::Any` to bounded abstract field
types (`AbstractVector{InstanceDescriptor}`, `AbstractVector{BVHNode2}`,
`Union{Nothing, StaticTLAS}`, etc.). Element types are now concrete on
`tlas.instances[i]` / `tlas.nodes[i]` / `_flat_blas_descs[i]`, so CPU-
side helpers (`get_instance`, `compact_instances!`, the Makie ext) stop
boxing per element. The container itself stays abstract because
`KA.allocate` returns backend-specific concrete types we don't want to
fix at struct-definition time and `sync!` may reallocate to a different
container across mutations. `blas_array` and the three `_flat_*`
fields stay `Union{Nothing, AbstractVector}` because the concrete
Triangle metadata type isn't known until the first push!.
* `n_instances(tlas::TLAS)` no longer counts handles that have been
`delete!`d but not yet compacted by `sync!`. Splits the method between
`TLAS` (aware of `deleted_handles`) and `StaticTLAS` (no pending state).
Project.toml
* `julia = "1.10"` → `"1.12"` since 1.12 is what we test and ship on.
* `[sources] Lava` pinned to a rev so a Raycore commit reproduces with
the same Lava build. Bumping it is now a reviewed action.
* Drop `BenchmarkTools` from `[extras]` — the only consumer was the
type-stability section of test_unrolled.jl that we just removed.
test/
* Delete `test_type_stability.jl` (475 LoC). The whole file was
`@test_opt_alloc` against `@allocated == 0`, which 1.12's allocator
accounting reports flaky. These checks have been bitrotten for a
while; not worth porting on the path to release.
* Strip the type-stability tail of `test_unrolled.jl` (179 LoC) for the
same reason — the BenchmarkTools `tune!` path errors on 1.12 closures
in a way the test wasn't designed for. Functional tests of the
unrolled helpers remain.
* `runtests.jl` reflects the deletions; the previously empty `Type
Stability` testset and the silently-disabled `Unrolled` include are
gone.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds: