You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On Julia 1.12 + CUDACore (latest), the GPU tau-leaping kernel fails at compile time with InvalidIRError: unsupported call to an unknown function (call to jl_f_throw_methoderror) originating from count_rand → randexp → rand → rand_native_52.
This PR replaces pois_rand(PoissonRandom.PassthroughRNG(), λ) with pois_rand(λ) in ext/JumpProcessesKernelAbstractionsExt.jl (one call site).
Root cause
GPUCompiler's @device_override is implemented via an overlay method table whose entries outrank the regular method table irrespective of method specificity. So although PoissonRandom defines a specific Random.randexp(::PassthroughRNG) = randexp(), CUDACore's @device_override Random.randexp(rng::AbstractRNG) wins on device. The override is therefore entered with rng::PassthroughRNG, and its body calls Random.rand(rng, UInt52Raw()). That dispatches to rand(::AbstractRNG, ::SamplerTrivial{UInt52Raw{UInt64}}) which calls rng_native_52(rng). PassthroughRNG <: Random.AbstractRNG directly — not via RandomNumbers.AbstractRNG, the only thing that carries an rng_native_52 fallback — so rng_native_52(::PassthroughRNG) is statically a MethodError. GPUCompiler refuses to lower the unreachable-but-not-provably-so throw.
I verified this on Julia 1.12.6 by reproducing the error on the CPU:
julia> Random.rand(PoissonRandom.PassthroughRNG(), Random.UInt52Raw())
ERROR: MethodError: no method matching rng_native_52(::PassthroughRNG)
Stacktrace:
[1] rand(r::PassthroughRNG, ::Random.SamplerTrivial{Random.UInt52Raw{UInt64}, UInt64})
@ Random Random/src/generation.jl:114
[2] rand(rng::PassthroughRNG, X::Random.UInt52Raw{UInt64})
@ Random Random/src/Random.jl:255
pois_rand(λ) dispatches to pois_rand(Random.default_rng(), λ). On device, Random.default_rng() is overridden by CUDACore to return Philox2x32(). Philox2x32 <: RandomNumbers.AbstractRNG{UInt64}, which provides Random.rng_native_52(::RandomNumbers.AbstractRNG) = UInt64 (RandomNumbers.jl/src/common.jl:16). The chain rand(Philox2x32, UInt52Raw()) → _rand52(rng, UInt64) → rand(rng, UInt64) then resolves to CUDACore's explicit Random.rand(::Philox2x32{R}, ::Type{UInt64}) method.
Semantically equivalent to the previous code: in both cases the device-side RNG state is the same Philox2x32 with per-warp state — the PassthroughRNG was always just an indirection to rand()/randexp()/randn() which themselves go through default_rng() on device.
Test plan
GPU Tests workflow passes on this PR (the existing test/gpu/regular_jumps.jl SIR/SEIR + SimpleTauLeaping + EnsembleGPUKernel(CUDABackend()) test exercises this exact code path and is currently the regression on master).
Run Tests (CPU) stays green — code path is on the GPU extension only, no CPU paths touched.
Smallest possible fix; the broader PoissonRandom PassthroughRNG design issue (its Random.rand/randexp/randn family takes no Type arg and so cannot satisfy the standard Sampler→rng_native_52 chain) is out of scope here and arguably belongs upstream in PoissonRandom.
Update — CI revealed a second, independent GPU issue. Pushing a workaround.
After the first commit fixed the MethodError on rng_native_52(::PassthroughRNG), the kernel compile got further and now fails with:
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Stacktrace:
[1] randexp @ CUDACore/src/device/random.jl:345
...
[1] pois_rand @ PoissonRandom.jl:137
[3] macro expansion @ JumpProcessesKernelAbstractionsExt.jl:139
Line 345 is result = randexp_unlikely(rng, idx, x) inside CUDACore's @device_override Random.randexp(::AbstractRNG). JuliaGPU/CUDA.jl#3086 (in v6.1.0) replaced the original ziggurat recursion with a while true loop calling @noinline randexp_unlikely and threading Union{Float64, Nothing} for retry. On Julia 1.12 this lowers to a real function call requiring GC-frame setup → get_pgcstack, which GPUCompiler refuses. That's an upstream CUDA.jl bug; flagging here, not opening an issue against CUDA.jl without explicit go-ahead from @ChrisRackauckas.
JumpProcesses-side workaround in the second commit: drop pois_rand from the kernel and inline a count-method Poisson sampler that uses only Random.rand(rng, Float64) and log(). No randexp, no @noinline retry helpers, so the bad code path is never compiled. Random.rand(rng, Float64) on device dispatches through RandomNumbers.jl's rand(::AbstractRNG{UInt64}, ::Type{Float64}) → rand(rng, UInt52()) → CUDACore's explicit Random.rand(::Philox2x32, ::Type{UInt64}) — fully inlinable.
Cost: O(λ) iterations per Poisson draw. Acceptable for SimpleTauLeaping's small-λ regime (τ is chosen so jump counts per step stay small). If a high-λ use case shows up later, we can add a normal-approximation fast path for λ ≳ 30.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #588.
Summary
On Julia 1.12 + CUDACore (latest), the GPU tau-leaping kernel fails at compile time with
InvalidIRError: unsupported call to an unknown function (call to jl_f_throw_methoderror)originating fromcount_rand → randexp → rand → rand_native_52.This PR replaces
pois_rand(PoissonRandom.PassthroughRNG(), λ)withpois_rand(λ)inext/JumpProcessesKernelAbstractionsExt.jl(one call site).Root cause
GPUCompiler's
@device_overrideis implemented via an overlay method table whose entries outrank the regular method table irrespective of method specificity. So although PoissonRandom defines a specificRandom.randexp(::PassthroughRNG) = randexp(), CUDACore's@device_override Random.randexp(rng::AbstractRNG)wins on device. The override is therefore entered withrng::PassthroughRNG, and its body callsRandom.rand(rng, UInt52Raw()). That dispatches torand(::AbstractRNG, ::SamplerTrivial{UInt52Raw{UInt64}})which callsrng_native_52(rng).PassthroughRNG <: Random.AbstractRNGdirectly — not viaRandomNumbers.AbstractRNG, the only thing that carries anrng_native_52fallback — sorng_native_52(::PassthroughRNG)is statically aMethodError. GPUCompiler refuses to lower the unreachable-but-not-provably-so throw.I verified this on Julia 1.12.6 by reproducing the error on the CPU:
These are exactly frames
[1]and[2]from the failing GPU run (actions/runs/25629456752).Why this fix works
pois_rand(λ)dispatches topois_rand(Random.default_rng(), λ). On device,Random.default_rng()is overridden by CUDACore to returnPhilox2x32().Philox2x32 <: RandomNumbers.AbstractRNG{UInt64}, which providesRandom.rng_native_52(::RandomNumbers.AbstractRNG) = UInt64(RandomNumbers.jl/src/common.jl:16). The chainrand(Philox2x32, UInt52Raw()) → _rand52(rng, UInt64) → rand(rng, UInt64)then resolves to CUDACore's explicitRandom.rand(::Philox2x32{R}, ::Type{UInt64})method.Semantically equivalent to the previous code: in both cases the device-side RNG state is the same Philox2x32 with per-warp state — the
PassthroughRNGwas always just an indirection torand()/randexp()/randn()which themselves go throughdefault_rng()on device.Test plan
GPU Testsworkflow passes on this PR (the existingtest/gpu/regular_jumps.jlSIR/SEIR +SimpleTauLeaping+EnsembleGPUKernel(CUDABackend())test exercises this exact code path and is currently the regression on master).Run Tests(CPU) stays green — code path is on the GPU extension only, no CPU paths touched.Notes
PassthroughRNGdesign issue (itsRandom.rand/randexp/randnfamily takes noTypearg and so cannot satisfy the standard Sampler→rng_native_52chain) is out of scope here and arguably belongs upstream in PoissonRandom.🤖 Generated with Claude Code