Skip to content

Benchmark improvements#1396

Merged
yebai merged 3 commits into
mainfrom
benchmark-improvements
May 19, 2026
Merged

Benchmark improvements#1396
yebai merged 3 commits into
mainfrom
benchmark-improvements

Conversation

@yebai
Copy link
Copy Markdown
Member

@yebai yebai commented May 18, 2026

Fresh-buffer benchmark setup

  • In DynamicPPL.TestUtils.AD.run_ad, use setup = deepcopy($params) for both the primal and gradient @be calls so each Chairmarks sample starts from a fresh input buffer instead of reusing the same vector across samples. Matches Mooncake's bench harness; setup runs outside the timed window, so the copy isn't measured.

Plus some minor tweaks.

This appears to have a noticeable impact on log-density evaluation for tiny models, first noticed in #1363: e.g., simple assume observe(linked = true) logdensity improves from ~20 ns to ~4 ns. In practise, these tiny nanosecond overheads won't matter at all, so the improvement here is only for perfectionists.

EDIT: likely LogDensityAt -> logdensity_internal reduced logdensity's overhead by 20 ns in #1363

yebai and others added 2 commits May 18, 2026 15:34
- Move per-row interpretation notes out of the GitHub workflow comment
  body into `benchmarks/README.md`, leaving the comment focused on
  numbers; add the closing `<!-- benchmark-report -->` sentinel.
- Mark rows whose `t(logdensity)` is below ~100 ns with `*` so noisy
  ratios are flagged in place, and add a short footnote explaining
  what `*` means.
- Parenthesize default `Type` parameter syntax in benchmark models
  (`(::Type{T})=Vector{Float64}`) for parser compatibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use `setup = deepcopy($params)` so each Chairmarks sample starts from a
fresh input buffer instead of reusing the same vector across calls.
Matches Mooncake's bench harness. Setup runs before the timed window,
so the copy itself is excluded from measurements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai requested a review from sunxd3 May 18, 2026 14:44
@github-actions
Copy link
Copy Markdown
Contributor

DynamicPPL.jl documentation for PR #1396 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1396/

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.30%. Comparing base (90a74c3) to head (97456ec).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1396   +/-   ##
=======================================
  Coverage   82.30%   82.30%           
=======================================
  Files          50       50           
  Lines        3543     3543           
=======================================
  Hits         2916     2916           
  Misses        627      627           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Benchmarks @ 97456ec

Performance

Performance Ratio:
Ratio of time to compute gradient and time to compute log-density.
Warning: results are very approximate! See benchmark notes for more context.

===================================================================================================
                                               eval                       gradient                 
                                            ----------  -------------------------------------------
Model                        dim    linked      primal     FwdDiff    RvsDiff    Mooncake    Enzyme
---------------------------------------------------------------------------------------------------
Simple assume observe*         1     false     5.87 ns       10.47    1200.34       29.01      6.31
Simple assume observe*         1      true     24.2 ns        2.50     323.68        7.03      0.62
Smorgasbord                  201     false     6.39 μs       69.19     126.55        6.36      8.90
Smorgasbord                  201      true     8.82 μs       62.60     124.93        5.39      5.93
Loop univariate 1k          1000     false     19.3 μs      972.11     282.24        7.36      6.14
Loop univariate 1k          1000      true     21.0 μs     1409.72     264.43        6.94      5.79
Multivariate 1k             1000     false     22.5 μs      377.58      80.88        8.98      2.59
Multivariate 1k             1000      true     19.9 μs      280.74      67.13       10.89      2.46
Loop univariate 10k        10000     false    189.0 μs    11080.76     297.10        7.28      5.83
Loop univariate 10k        10000      true    206.0 μs    11127.88     281.18        7.23      5.64
Multivariate 10k           10000     false    202.0 μs     5032.72      89.22       10.99      2.13
Multivariate 10k           10000      true    203.0 μs     5306.03      90.37       11.01      2.16
Dynamic                       15     false     1.42 μs         err      41.80       13.89     10.79
Dynamic                       10      true     1.93 μs        1.91      57.01       17.27     18.37
Submodel*                      1     false      7.1 ns        8.22    1026.07       24.14      4.98
Submodel*                      1      true     7.42 ns        8.20    1081.32       22.97      5.00
LDA                           12      true     22.2 μs        0.45       1.97       34.83       err
===================================================================================================

Rows marked * have t(logdensity) below about 100 ns; their ratios can be dominated by timer floor, fixed overhead, and run-to-run variation. For those rows, raw t(grad) is more meaningful than t(grad)/t(logdensity).

Main @ 90a74c3
==================================================================================================
                                              eval                       gradient                 
                                           ----------  -------------------------------------------
Model                       dim    linked      primal     FwdDiff    RvsDiff    Mooncake    Enzyme
--------------------------------------------------------------------------------------------------
Simple assume observe         1     false     5.19 ns       10.56    1146.17       31.66     10.82
Simple assume observe         1      true     18.6 ns        2.87     443.41       16.46      3.40
Smorgasbord                 201     false     8.94 μs       57.80      90.96        8.51      6.85
Smorgasbord                 201      true     19.9 μs       30.03      52.26        4.27      2.88
Loop univariate 1k         1000     false     51.1 μs      383.49     106.69        3.66      2.77
Loop univariate 1k         1000      true     50.6 μs      517.45     105.61        3.83      2.84
Multivariate 1k            1000     false     44.5 μs      238.08      37.92        4.98      1.93
Multivariate 1k            1000      true     44.3 μs      231.94      38.61        5.64      1.89
Loop univariate 10k       10000     false    227.0 μs    11755.35     244.13        6.30      5.54
Loop univariate 10k       10000      true    240.0 μs    13140.37     228.16        6.29      5.29
Multivariate 10k          10000     false    257.0 μs     6982.18      67.97       10.59      1.85
Multivariate 10k          10000      true    253.0 μs     6730.33      69.64       10.39      2.17
Dynamic                      15     false     2.43 μs         err      37.39       12.31     10.29
Dynamic                      10      true     3.64 μs        1.79      50.33       11.25     17.49
Submodel                      1     false     5.24 ns       20.50    1812.57       60.90     10.06
Submodel                      1      true     5.24 ns       22.08    2269.81       54.39     11.67
LDA                          12      true     40.8 μs        0.55       1.88       21.94       err
==================================================================================================
Environment
Julia Version 1.11.9
Commit 53a02c0720c (2026-02-06 00:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Copy link
Copy Markdown
Member

@sunxd3 sunxd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good as far as I can tell with one thing to double check

Comment thread src/test_utils/ad.jl
@yebai yebai added this pull request to the merge queue May 19, 2026
Merged via the queue into main with commit d2052a1 May 19, 2026
22 checks passed
@yebai yebai deleted the benchmark-improvements branch May 19, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants