Skip to content

Minor improvements to benchmarks#1386

Merged
yebai merged 7 commits into
mainfrom
benchmarks
May 5, 2026
Merged

Minor improvements to benchmarks#1386
yebai merged 7 commits into
mainfrom
benchmarks

Conversation

@yebai
Copy link
Copy Markdown
Member

@yebai yebai commented May 5, 2026

No description provided.

yebai and others added 3 commits May 5, 2026 11:32
Bring DifferentiationInterface into the benchmarks env and adopt the
flatter markdown layout (no <details> wrapper, no "Gist:" prefix).
Released AbstractPPL/Bijectors are used instead of the fork-branch
sources from the source branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairs with the prior commit's benchmarks.jl markdown changes — the new
workflow benches PR head and main side-by-side and wraps main's table
in <details> on the CI side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the PrettyTables benchmark report with a manual text formatter
modeled on posteriordb-bench: top/bottom `=` rules, centered `eval`
and `gradient` banners, dashed subgroup underlines, and a stub of
Model/dim/linked columns. Keep the current pivoted data shape, with a
shared `primal` column and backend ratio columns labelled FwdDiff,
RvsDiff, Mooncake, and Enzyme.

While there, simplify the renderer by formatting rows once up front and
using a single backend key/label table as the source of truth. Update
the PR comment caption to explain that `primal` is shared
`t(logdensity)` and the backend columns are `t(grad)/t(logdensity)`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai marked this pull request as ready for review May 5, 2026 17:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

DynamicPPL.jl documentation for PR #1386 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1386/

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.26%. Comparing base (2691e7c) to head (8f48885).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1386   +/-   ##
=======================================
  Coverage   82.26%   82.26%           
=======================================
  Files          50       50           
  Lines        3535     3535           
=======================================
  Hits         2908     2908           
  Misses        627      627           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yebai and others added 3 commits May 5, 2026 21:03
Restructure the comment so the table comes first, followed by a
single paragraph explaining what each column means and how to read
the AD backend ratios. Update the surrounding workflow text:

- "## Benchmark Report" + separate PR head/Main lines collapsed into
  a single "## Benchmarks @ <sha>" heading.
- Foldout summaries shortened to "Main @ <sha>" and "Environment".
- Comparison hint ("compare against `main`") only appears when the
  baseline foldout is actually available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Benchmarks @ 8f48885

==================================================================================================
                                              eval                       gradient                 
                                           ----------  -------------------------------------------
Model                       dim    linked      primal     FwdDiff    RvsDiff    Mooncake    Enzyme
--------------------------------------------------------------------------------------------------
Simple assume observe         1     false     5.19 ns       10.72    1106.88       27.94     12.00
Simple assume observe         1      true     18.7 ns        5.94     469.84       16.41      3.37
Smorgasbord                 201     false     8.94 μs       55.89      89.44        7.79      6.21
Smorgasbord                 201      true     19.2 μs       29.44      55.16        3.72      2.86
Loop univariate 1k         1000     false     50.6 μs      367.10     103.98        3.54      2.72
Loop univariate 1k         1000      true     50.9 μs      513.91     110.49        3.35      2.61
Multivariate 1k            1000     false     42.5 μs      239.07      39.29        4.83      1.75
Multivariate 1k            1000      true     42.8 μs      243.80      39.35        5.41      1.85
Loop univariate 10k       10000     false    215.0 μs    11496.26     257.59        7.03      6.18
Loop univariate 10k       10000      true    226.0 μs    11508.06     258.79        6.27      5.81
Multivariate 10k          10000     false    247.0 μs     7057.37      70.30        9.65      1.83
Multivariate 10k          10000      true    244.0 μs     7019.44      72.24        9.42      1.93
Dynamic                      15     false     2.42 μs         err      30.93       11.94      8.83
Dynamic                      10      true     3.34 μs        2.07      41.36       10.51     17.50
Submodel                      1     false     5.49 ns       19.66    1415.59       53.79     11.66
Submodel                      1      true      5.2 ns       21.85    1496.48       46.69     12.49
LDA                          12      true     28.6 μs        0.62       2.03       23.36       err
==================================================================================================

Each row times one of DynamicPPL's reference models on this PR's head. Dim is the parameter count; Linked is true when parameters have been mapped to unconstrained space. t(logdensity) is the wall-clock time for one log-density evaluation. The AD (automatic differentiation) backend columns express gradient time as a multiple of t(logdensity) — a value of 10 means computing the gradient takes 10× as long as the log-density. Lower is better throughout; err means the backend errored on that model. Compare against main below to spot regressions.

Main @ 2691e7c

Gist: Smorgasbord

┌─────────────┬─────┬─────────────┬────────┬───────────────┬───────────────────────┐
│       Model │ Dim │  AD Backend │ Linked │ t(logdensity) │ t(grad)/t(logdensity) │
├─────────────┼─────┼─────────────┼────────┼───────────────┼───────────────────────┤
│ Smorgasbord │ 201 │ forwarddiff │  false │       6.21 μs │                 75.72 │
│ Smorgasbord │ 201 │ reversediff │  false │        6.3 μs │                127.41 │
│ Smorgasbord │ 201 │    mooncake │  false │       6.25 μs │                  6.40 │
│ Smorgasbord │ 201 │      enzyme │  false │       6.34 μs │                  6.92 │
│ Smorgasbord │ 201 │ forwarddiff │   true │        8.9 μs │                 65.07 │
│ Smorgasbord │ 201 │ reversediff │   true │       8.67 μs │                123.23 │
│ Smorgasbord │ 201 │    mooncake │   true │       8.92 μs │                  5.20 │
│ Smorgasbord │ 201 │      enzyme │   true │       8.89 μs │                  4.65 │
└─────────────┴─────┴─────────────┴────────┴───────────────┴───────────────────────┘
Full table (68 rows)
┌───────────────────────┬───────┬─────────────┬────────┬───────────────┬───────────────────────┐
│                 Model │   Dim │  AD Backend │ Linked │ t(logdensity) │ t(grad)/t(logdensity) │
├───────────────────────┼───────┼─────────────┼────────┼───────────────┼───────────────────────┤
│ Simple assume observe │     1 │ forwarddiff │  false │       6.28 ns │                 10.28 │
│ Simple assume observe │     1 │ reversediff │  false │       6.28 ns │               1058.79 │
│ Simple assume observe │     1 │    mooncake │  false │       6.28 ns │                 30.61 │
│ Simple assume observe │     1 │      enzyme │  false │       6.28 ns │                  6.20 │
│ Simple assume observe │     1 │ forwarddiff │   true │       21.5 ns │                  2.99 │
│ Simple assume observe │     1 │ reversediff │   true │       21.5 ns │                336.25 │
│ Simple assume observe │     1 │    mooncake │   true │       21.5 ns │                  9.21 │
│ Simple assume observe │     1 │      enzyme │   true │       21.4 ns │                  1.85 │
│           Smorgasbord │   201 │ forwarddiff │  false │       6.21 μs │                 75.72 │
│           Smorgasbord │   201 │ reversediff │  false │        6.3 μs │                127.41 │
│           Smorgasbord │   201 │    mooncake │  false │       6.25 μs │                  6.40 │
│           Smorgasbord │   201 │      enzyme │  false │       6.34 μs │                  6.92 │
│           Smorgasbord │   201 │ forwarddiff │   true │        8.9 μs │                 65.07 │
│           Smorgasbord │   201 │ reversediff │   true │       8.67 μs │                123.23 │
│           Smorgasbord │   201 │    mooncake │   true │       8.92 μs │                  5.20 │
│           Smorgasbord │   201 │      enzyme │   true │       8.89 μs │                  4.65 │
│    Loop univariate 1k │  1000 │ forwarddiff │  false │       18.8 μs │               1078.72 │
│    Loop univariate 1k │  1000 │ reversediff │  false │       18.9 μs │                284.18 │
│    Loop univariate 1k │  1000 │    mooncake │  false │       19.4 μs │                  8.40 │
│    Loop univariate 1k │  1000 │      enzyme │  false │       18.7 μs │                  6.90 │
│    Loop univariate 1k │  1000 │ forwarddiff │   true │       20.6 μs │               1445.64 │
│    Loop univariate 1k │  1000 │ reversediff │   true │       21.1 μs │                261.48 │
│    Loop univariate 1k │  1000 │    mooncake │   true │       20.7 μs │                  7.87 │
│    Loop univariate 1k │  1000 │      enzyme │   true │       20.3 μs │                  6.45 │
│       Multivariate 1k │  1000 │ forwarddiff │  false │       26.0 μs │                307.79 │
│       Multivariate 1k │  1000 │ reversediff │  false │       26.7 μs │                 63.65 │
│       Multivariate 1k │  1000 │    mooncake │  false │       29.3 μs │                  7.63 │
│       Multivariate 1k │  1000 │      enzyme │  false │       25.2 μs │                  1.98 │
│       Multivariate 1k │  1000 │ forwarddiff │   true │       25.3 μs │                268.24 │
│       Multivariate 1k │  1000 │ reversediff │   true │       25.6 μs │                 66.95 │
│       Multivariate 1k │  1000 │    mooncake │   true │       24.5 μs │                  9.06 │
│       Multivariate 1k │  1000 │      enzyme │   true │       23.7 μs │                  2.04 │
│   Loop univariate 10k │ 10000 │ forwarddiff │  false │      176.0 μs │              14179.52 │
│   Loop univariate 10k │ 10000 │ reversediff │  false │      178.0 μs │                331.97 │
│   Loop univariate 10k │ 10000 │    mooncake │  false │      178.0 μs │                  9.25 │
│   Loop univariate 10k │ 10000 │      enzyme │  false │      178.0 μs │                  7.06 │
│   Loop univariate 10k │ 10000 │ forwarddiff │   true │      199.0 μs │              13193.49 │
│   Loop univariate 10k │ 10000 │ reversediff │   true │      220.0 μs │                266.32 │
│   Loop univariate 10k │ 10000 │    mooncake │   true │      198.0 μs │                  8.25 │
│   Loop univariate 10k │ 10000 │      enzyme │   true │      198.0 μs │                  6.36 │
│      Multivariate 10k │ 10000 │ forwarddiff │  false │      220.0 μs │               4257.93 │
│      Multivariate 10k │ 10000 │ reversediff │  false │      221.0 μs │                 80.71 │
│      Multivariate 10k │ 10000 │    mooncake │  false │      220.0 μs │                  9.82 │
│      Multivariate 10k │ 10000 │      enzyme │  false │      220.0 μs │                  1.82 │
│      Multivariate 10k │ 10000 │ forwarddiff │   true │      217.0 μs │               4347.01 │
│      Multivariate 10k │ 10000 │ reversediff │   true │      217.0 μs │                 80.54 │
│      Multivariate 10k │ 10000 │    mooncake │   true │      218.0 μs │                  9.93 │
│      Multivariate 10k │ 10000 │      enzyme │   true │      218.0 μs │                  1.83 │
│               Dynamic │    15 │ forwarddiff │  false │           err │                   err │
│               Dynamic │    15 │ reversediff │  false │       1.43 μs │                 43.30 │
│               Dynamic │    15 │    mooncake │  false │       1.47 μs │                 11.70 │
│               Dynamic │    15 │      enzyme │  false │       1.42 μs │                 10.73 │
│               Dynamic │    10 │ forwarddiff │   true │       1.93 μs │                  1.94 │
│               Dynamic │    10 │ reversediff │   true │       1.97 μs │                 54.83 │
│               Dynamic │    10 │    mooncake │   true │       1.98 μs │                  9.89 │
│               Dynamic │    10 │      enzyme │   true │       1.99 μs │                 17.24 │
│              Submodel │     1 │ forwarddiff │  false │       6.29 ns │                 10.34 │
│              Submodel │     1 │ reversediff │  false │       6.29 ns │               1233.34 │
│              Submodel │     1 │    mooncake │  false │       6.28 ns │                 30.73 │
│              Submodel │     1 │      enzyme │  false │       6.29 ns │                  6.26 │
│              Submodel │     1 │ forwarddiff │   true │       5.98 ns │                 10.77 │
│              Submodel │     1 │ reversediff │   true │        6.3 ns │               1327.82 │
│              Submodel │     1 │    mooncake │   true │       6.28 ns │                 30.80 │
│              Submodel │     1 │      enzyme │   true │       6.29 ns │                  6.21 │
│                   LDA │    12 │ forwarddiff │   true │       22.5 μs │                  0.49 │
│                   LDA │    12 │ reversediff │   true │       23.8 μs │                  1.85 │
│                   LDA │    12 │    mooncake │   true │       23.4 μs │                 29.87 │
│                   LDA │    12 │      enzyme │   true │           err │                   err │
└───────────────────────┴───────┴─────────────┴────────┴───────────────┴───────────────────────┘
Environment
Julia Version 1.11.9
Commit 53a02c0720c (2026-02-06 00:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, icelake-server)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

@TuringLang TuringLang deleted a comment from github-actions Bot May 5, 2026
@yebai yebai added this pull request to the merge queue May 5, 2026
@yebai yebai removed this pull request from the merge queue due to a manual request May 5, 2026
@yebai yebai merged commit 4dfc048 into main May 5, 2026
22 checks passed
@yebai yebai deleted the benchmarks branch May 5, 2026 21:03
benchmarks/results.md
benchmarks/version_info.txt

benchmark-main:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to kind of revert the previous PR but now the comparision rartios have been removed? Quite confused about this churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants