Benchmark improvements#1396
Conversation
- Move per-row interpretation notes out of the GitHub workflow comment
body into `benchmarks/README.md`, leaving the comment focused on
numbers; add the closing `<!-- benchmark-report -->` sentinel.
- Mark rows whose `t(logdensity)` is below ~100 ns with `*` so noisy
ratios are flagged in place, and add a short footnote explaining
what `*` means.
- Parenthesize default `Type` parameter syntax in benchmark models
(`(::Type{T})=Vector{Float64}`) for parser compatibility.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use `setup = deepcopy($params)` so each Chairmarks sample starts from a fresh input buffer instead of reusing the same vector across calls. Matches Mooncake's bench harness. Setup runs before the timed window, so the copy itself is excluded from measurements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
DynamicPPL.jl documentation for PR #1396 is available at: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1396 +/- ##
=======================================
Coverage 82.30% 82.30%
=======================================
Files 50 50
Lines 3543 3543
=======================================
Hits 2916 2916
Misses 627 627 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmarks @ 97456ecPerformancePerformance Ratio: Rows marked Main @ 90a74c3EnvironmentJulia Version 1.11.9 Commit 53a02c0720c (2026-02-06 00:27 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 4 × AMD EPYC 7763 64-Core Processor WORD_SIZE: 64 LLVM: libLLVM-16.0.6 (ORCJIT, znver3) Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores) |
sunxd3
left a comment
There was a problem hiding this comment.
looks good as far as I can tell with one thing to double check
Fresh-buffer benchmark setup
DynamicPPL.TestUtils.AD.run_ad, usesetup = deepcopy($params)for both the primal and gradient@becalls so each Chairmarks sample starts from a fresh input buffer instead of reusing the same vector across samples. Matches Mooncake's bench harness; setup runs outside the timed window, so the copy isn't measured.Plus some minor tweaks.
This appears to have a noticeable impact on log-density evaluation for tiny models, first noticed in #1363: e.g.,simple assume observe(linked = true)logdensity improves from ~20 ns to ~4 ns. In practise, these tiny nanosecond overheads won't matter at all, so the improvement here is only for perfectionists.EDIT: likely
LogDensityAt->logdensity_internalreduced logdensity's overhead by 20 ns in #1363