Skip to content

perf(spectra): fuse two-pass tullio reductions in sum_valid! and PAspectra!#29

Merged
Beforerr merged 1 commit into
mainfrom
push-nnxwxprsoymo
May 18, 2026
Merged

perf(spectra): fuse two-pass tullio reductions in sum_valid! and PAspectra!#29
Beforerr merged 1 commit into
mainfrom
push-nnxwxprsoymo

Conversation

@Beforerr
Copy link
Copy Markdown
Member

Replaces the two @tullio passes in each kernel with a single fused loop, reading S only once per element. Drops the dead sum_valid_fast! variant. On a full day of ELFIN-A EPDEF data: directional_energy_spectra 1.11 ms → 690 μs (1.6×); PAspectra 334 μs → 170 μs (2.0×). Reordered summation produces 1-ULP Float32 drift vs the old output at a few indices.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 95.58824% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/spectra.jl 95.58% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Benchmark Results (Julia v1)

Time benchmarks
main ff5c045... main / ff5c045...
BiKappaPDF/pdf 0.111 ± 0.01 μs 0.08 ± 0.01 μs 1.39 ± 0.21
BiKappaPDF/rand 0.09 ± 0.01 μs 0.09 ± 0.01 μs 1 ± 0.16
BiKappaPDF/rand_1k 0.0673 ± 0.0015 ms 0.0674 ± 0.001 ms 0.999 ± 0.027
BiMaxwellianPDF/pdf 0.05 ± 0.01 μs 0.05 ± 0.01 μs 1 ± 0.28
BiMaxwellianPDF/rand 0.06 ± 0 μs 0.06 ± 0 μs 1 ± 0
BiMaxwellianPDF/rand_1k 29.6 ± 0.6 μs 30.2 ± 1.3 μs 0.978 ± 0.046
KappaPDF/pdf 0.1 ± 0 μs 0.07 ± 0.01 μs 1.43 ± 0.2
KappaPDF/rand 0.11 ± 0.01 μs 0.11 ± 0.01 μs 1 ± 0.13
KappaPDF/rand_1k 0.0793 ± 0.0016 ms 0.0788 ± 0.0018 ms 1.01 ± 0.031
MaxwellianPDF/pdf 0.041 ± 0.01 μs 0.041 ± 0.01 μs 1 ± 0.34
MaxwellianPDF/rand 0.04 ± 0 μs 0.04 ± 0 μs 1 ± 0
MaxwellianPDF/rand_1k 15.7 ± 0.57 μs 15.2 ± 0.47 μs 1.03 ± 0.049
spectra/PAspectra 0.702 ± 0.013 ms 0.339 ± 0.013 ms 2.07 ± 0.09
spectra/directional_energy_spectra 1.72 ± 0.022 ms 0.593 ± 0.015 ms 2.9 ± 0.084
tmoments/no_magf 19.1 ± 0.084 ms 19.1 ± 0.074 ms 0.998 ± 0.0058
tmoments/with_magf 19.1 ± 0.078 ms 19.1 ± 0.089 ms 0.999 ± 0.0062
time_to_load 0.366 ± 0.00094 s 0.288 ± 0.00048 s 1.27 ± 0.0039
Memory benchmarks
main ff5c045... main / ff5c045...
BiKappaPDF/pdf 0 allocs: 0 B 0 allocs: 0 B
BiKappaPDF/rand 1 allocs: 32 B 1 allocs: 32 B 1
BiKappaPDF/rand_1k 1 k allocs: 0.0382 MB 1 k allocs: 0.0382 MB 1
BiMaxwellianPDF/pdf 0 allocs: 0 B 0 allocs: 0 B
BiMaxwellianPDF/rand 1 allocs: 32 B 1 allocs: 32 B 1
BiMaxwellianPDF/rand_1k 1 k allocs: 0.0382 MB 1 k allocs: 0.0382 MB 1
KappaPDF/pdf 0 allocs: 0 B 0 allocs: 0 B
KappaPDF/rand 3 allocs: 0.109 kB 3 allocs: 0.109 kB 1
KappaPDF/rand_1k 3 k allocs: 0.115 MB 3 k allocs: 0.115 MB 1
MaxwellianPDF/pdf 0 allocs: 0 B 0 allocs: 0 B
MaxwellianPDF/rand 1 allocs: 32 B 1 allocs: 32 B 1
MaxwellianPDF/rand_1k 1 k allocs: 0.0382 MB 1 k allocs: 0.0382 MB 1
spectra/PAspectra 18 allocs: 0.397 MB 18 allocs: 0.397 MB 1
spectra/directional_energy_spectra 16 allocs: 0.423 MB 12 allocs: 0.423 MB 1
tmoments/no_magf 2.55 k allocs: 0.175 MB 2.55 k allocs: 0.175 MB 1
tmoments/with_magf 2.76 k allocs: 0.211 MB 2.76 k allocs: 0.211 MB 1
time_to_load 0.145 k allocs: 11 kB 0.145 k allocs: 11 kB 1

@Beforerr Beforerr force-pushed the push-nnxwxprsoymo branch 3 times, most recently from 460533f to b395027 Compare May 18, 2026 04:11
…ullio

Replaces all @tullio/@sum passes in sum_valid!, PAspectra!, and directional_energy_spectra with explicit fused loops that read S only once per element. The four sum_valid! calls in directional_energy_spectra (omni/para/anti/perp) now share a single pass over S; dΩ and the three direction masks are precomputed once per t into Bumper-allocated scratch.

Drops the Tullio dependency entirely (was previously the slowest precompile in the dep tree).

On a full day of ELFIN-A EPDEF data (10 × 16 × 1731 Float32):
- directional_energy_spectra: 1.10 ms → 345 μs (3.2×)
- PAspectra:                  379 μs → 172 μs (2.2×)
- Package import time:        0.40 s → 0.31 s

Reordered summation produces 1-ULP Float32 drift vs the previous output at a few indices; downstream tests use ≈ tolerance.
@Beforerr Beforerr force-pushed the push-nnxwxprsoymo branch from b395027 to ff5c045 Compare May 18, 2026 04:12
@Beforerr Beforerr merged commit 74c19bd into main May 18, 2026
7 checks passed
@Beforerr Beforerr deleted the push-nnxwxprsoymo branch May 18, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant