Benchmark improvements by yebai · Pull Request #1396 · TuringLang/DynamicPPL.jl

yebai · 2026-05-18T14:42:14Z

Fresh-buffer benchmark setup

In DynamicPPL.TestUtils.AD.run_ad, use setup = deepcopy($params) for both the primal and gradient @be calls so each Chairmarks sample starts from a fresh input buffer instead of reusing the same vector across samples. Matches Mooncake's bench harness; setup runs outside the timed window, so the copy isn't measured.

Plus some minor tweaks.

~~This appears to have a noticeable impact on log-density evaluation for tiny models, first noticed in #1363~~: e.g., simple assume observe（linked = true) logdensity improves from ~20 ns to ~4 ns. In practise, these tiny nanosecond overheads won't matter at all, so the improvement here is only for perfectionists.

EDIT: likely LogDensityAt -> logdensity_internal reduced logdensity's overhead by 20 ns in #1363

- Move per-row interpretation notes out of the GitHub workflow comment body into `benchmarks/README.md`, leaving the comment focused on numbers; add the closing `` sentinel. - Mark rows whose `t(logdensity)` is below ~100 ns with `*` so noisy ratios are flagged in place, and add a short footnote explaining what `*` means. - Parenthesize default `Type` parameter syntax in benchmark models (`(::Type{T})=Vector{Float64}`) for parser compatibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Use `setup = deepcopy($params)` so each Chairmarks sample starts from a fresh input buffer instead of reusing the same vector across calls. Matches Mooncake's bench harness. Setup runs before the timed window, so the copy itself is excluded from measurements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T14:48:10Z

DynamicPPL.jl documentation for PR #1396 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1396/

codecov · 2026-05-18T15:03:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.30%. Comparing base (90a74c3) to head (97456ec).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1396   +/-   ##
=======================================
  Coverage   82.30%   82.30%           
=======================================
  Files          50       50           
  Lines        3543     3543           
=======================================
  Hits         2916     2916           
  Misses        627      627

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-05-18T15:05:31Z

Benchmarks @ `97456ec`

Performance

Performance Ratio:
Ratio of time to compute gradient and time to compute log-density.
Warning: results are very approximate! See benchmark notes for more context.

===================================================================================================
                                               eval                       gradient                 
                                            ----------  -------------------------------------------
Model                        dim    linked      primal     FwdDiff    RvsDiff    Mooncake    Enzyme
---------------------------------------------------------------------------------------------------
Simple assume observe*         1     false     5.87 ns       10.47    1200.34       29.01      6.31
Simple assume observe*         1      true     24.2 ns        2.50     323.68        7.03      0.62
Smorgasbord                  201     false     6.39 μs       69.19     126.55        6.36      8.90
Smorgasbord                  201      true     8.82 μs       62.60     124.93        5.39      5.93
Loop univariate 1k          1000     false     19.3 μs      972.11     282.24        7.36      6.14
Loop univariate 1k          1000      true     21.0 μs     1409.72     264.43        6.94      5.79
Multivariate 1k             1000     false     22.5 μs      377.58      80.88        8.98      2.59
Multivariate 1k             1000      true     19.9 μs      280.74      67.13       10.89      2.46
Loop univariate 10k        10000     false    189.0 μs    11080.76     297.10        7.28      5.83
Loop univariate 10k        10000      true    206.0 μs    11127.88     281.18        7.23      5.64
Multivariate 10k           10000     false    202.0 μs     5032.72      89.22       10.99      2.13
Multivariate 10k           10000      true    203.0 μs     5306.03      90.37       11.01      2.16
Dynamic                       15     false     1.42 μs         err      41.80       13.89     10.79
Dynamic                       10      true     1.93 μs        1.91      57.01       17.27     18.37
Submodel*                      1     false      7.1 ns        8.22    1026.07       24.14      4.98
Submodel*                      1      true     7.42 ns        8.20    1081.32       22.97      5.00
LDA                           12      true     22.2 μs        0.45       1.97       34.83       err
===================================================================================================

Rows marked * have t(logdensity) below about 100 ns; their ratios can be dominated by timer floor, fixed overhead, and run-to-run variation. For those rows, raw t(grad) is more meaningful than t(grad)/t(logdensity).

Main @ 90a74c3

==================================================================================================
                                              eval                       gradient                 
                                           ----------  -------------------------------------------
Model                       dim    linked      primal     FwdDiff    RvsDiff    Mooncake    Enzyme
--------------------------------------------------------------------------------------------------
Simple assume observe         1     false     5.19 ns       10.56    1146.17       31.66     10.82
Simple assume observe         1      true     18.6 ns        2.87     443.41       16.46      3.40
Smorgasbord                 201     false     8.94 μs       57.80      90.96        8.51      6.85
Smorgasbord                 201      true     19.9 μs       30.03      52.26        4.27      2.88
Loop univariate 1k         1000     false     51.1 μs      383.49     106.69        3.66      2.77
Loop univariate 1k         1000      true     50.6 μs      517.45     105.61        3.83      2.84
Multivariate 1k            1000     false     44.5 μs      238.08      37.92        4.98      1.93
Multivariate 1k            1000      true     44.3 μs      231.94      38.61        5.64      1.89
Loop univariate 10k       10000     false    227.0 μs    11755.35     244.13        6.30      5.54
Loop univariate 10k       10000      true    240.0 μs    13140.37     228.16        6.29      5.29
Multivariate 10k          10000     false    257.0 μs     6982.18      67.97       10.59      1.85
Multivariate 10k          10000      true    253.0 μs     6730.33      69.64       10.39      2.17
Dynamic                      15     false     2.43 μs         err      37.39       12.31     10.29
Dynamic                      10      true     3.64 μs        1.79      50.33       11.25     17.49
Submodel                      1     false     5.24 ns       20.50    1812.57       60.90     10.06
Submodel                      1      true     5.24 ns       22.08    2269.81       54.39     11.67
LDA                          12      true     40.8 μs        0.55       1.88       21.94       err
==================================================================================================

Environment

Julia Version 1.11.9
Commit 53a02c0720c (2026-02-06 00:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

sunxd3

looks good as far as I can tell with one thing to double check

yebai and others added 2 commits May 18, 2026 15:34

github-actions Bot assigned yebai May 18, 2026

yebai requested a review from sunxd3 May 18, 2026 14:44

Merge branch 'main' into benchmark-improvements

97456ec

sunxd3 approved these changes May 18, 2026

View reviewed changes

Comment thread src/test_utils/ad.jl

yebai added this pull request to the merge queue May 19, 2026

Merged via the queue into main with commit d2052a1 May 19, 2026
22 checks passed

yebai deleted the benchmark-improvements branch May 19, 2026 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark improvements#1396

Benchmark improvements#1396
yebai merged 3 commits into
mainfrom
benchmark-improvements

yebai commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

sunxd3 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yebai commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks @ 97456ec

Performance

Uh oh!

sunxd3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yebai commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading

Benchmarks @ `97456ec`