Use AbstractPPL AD interface#1363
Open
yebai wants to merge 53 commits into
Open
Conversation
- remove DifferentiationInterface from Project.toml deps and compat - replace DI gradient preparation/evaluation with AbstractPPL.prepare and AbstractPPL.value_and_gradient - reuse LogDensityAt and closures without local AbstractPPL.prepare piracy - guard the Mooncake precompile workload until AbstractPPLMooncakeExt is loaded - pin AbstractPPL to the evaluator-interface branch in [sources] for this environment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # Project.toml # src/logdensityfunction.jl
2340748 to
e1cac2f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1363 +/- ##
==========================================
- Coverage 82.30% 81.55% -0.75%
==========================================
Files 50 50
Lines 3543 3535 -8
==========================================
- Hits 2916 2883 -33
- Misses 627 652 +25 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fix #1364 TODO - [x] use `main` as base, as otherwise benchmarking would fail, need to change back to #1364 before merging. - ~~Julia 1.10’s Pkg does not reliably apply the root [sources] override during build/test, so min CI fails.~~ This change switches `LogDensityFunction` AD preparation to pass a structural `LogDensityAt` problem directly into `AbstractPPL.prepare(...)` instead of wrapping it in backend-specific anonymous closures. It also removes the old `_use_closure` machinery, since the structural problem object now provides the stable one-argument evaluator shape that AD backends need. The accompanying test coverage adds a focused `ReverseDiff` check to confirm that `AutoReverseDiff(; compile=true)` still retains a compiled tape and can be reused across repeated `logdensity_and_gradient` calls. Benchmarks ``` +----------------------+------------+--------------+--------+--------------+ | Backend | PR grad μs | main grad μs | Ratio | Summary | +----------------------+------------+--------------+--------+--------------+ | ForwardDiff | 0.322 | 0.322 | 1.001x | 0.1% slower | | Forward Mooncake | 0.900 | 2.601 | 0.346x | 2.89x faster | | Forward Enzyme | 0.789 | 1.051 | 0.751x | 24.9% faster | | ReverseDiff | 14.239 | 15.225 | 0.935x | 6.5% faster | | ReverseDiff compiled | 5.355 | 5.872 | 0.912x | 8.8% faster | | Reverse Mooncake | 0.892 | 0.928 | 0.961x | 3.9% faster | | Reverse Enzyme | 1.253 | 1.268 | 0.988x | 1.2% faster | +----------------------+------------+--------------+--------+--------------+ ``` --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
DynamicPPL.jl documentation for PR #1363 is available at: |
Update LogDensityFunction gradient evaluation to match the AbstractPPL evaluator interface and make the floattypes environment resolve that source branch explicitly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the base-vs-head comparison entirely. The benchmark workflow now runs once on the PR head, on a pinned `ubuntu-22.04` runner, and reports absolute log-density times plus gradient/log-density ratios in the posted comment. Output schema follows Mooncake's bench harness; readers compare against recent main-branch comments to spot regressions. Noise reduction in `run_ad`: per-sample incremental GC teardown and a full GC before each measurement keep accumulated garbage from triggering mid-sample collections that inflate individual samples. Adds a `benchmark_seconds` knob for tightening the median estimate. Also removes the synthetic reference timing that normalised eval times against a non-DPPL function. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inline `Models.jl` and `DynamicPPLBenchmarks.jl` into `benchmarks.jl` and convert `benchmarks/Project.toml` from a package to a flat environment, mirroring Mooncake.jl's `bench/run_benchmarks.jl` layout. Also: take dim from `length(r.params)` (run_ad already constructed the LDF) so models are no longer evaluated twice on the success path; switch results to NamedTuples so `print_results` reads `r.name`/`r.dim`/...; extract `transform_strategy(islinked)` helper; drop unused imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # benchmarks/Project.toml # benchmarks/src/DynamicPPLBenchmarks.jl
Run a full cross-product of the 9 model configs × 4 AD backends ×
{linked, unlinked} = 72 rows, ordered model → linked → backend so each
model's eight rows are adjacent for side-by-side inspection.
`:reversediff_compiled` is excluded because compiled tapes are
input-dependent and silently produce wrong gradients on
parameter-dependent control flow (see CLAUDE.md).
Per-row logging mirrors Mooncake's `bench/run_benchmarks.jl`: an
`(i / N, name, (linked = …))` header, the backend on its own line,
then `t(logdensity)` / `t(grad)` formatted with units. `model_dimension`
is now defensive (returns `missing` on init failures) and the table
formats `missing` dims as `err`, so combos that crash during dimension
lookup still produce a well-formed row instead of derailing the run.
Also: add a `setup` stage to `run_ad`'s Chairmarks pipeline that
deep-copies `params` per sample, matching Mooncake's harness — setup
runs before the timed window, so the copy is excluded from
measurements. Widen `combos` to a typed `Tuple{...}[]` so it accepts
models with non-default contexts (e.g. the `condition`-wrapped
`loop_univariate`/`multivariate` rows).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…apsible full table LDA's discrete `Categorical` RVs make `linked = false` ill-defined for gradient-based AD, so all four backends previously errored on those rows. Skip them at combination time, leaving 68 rows. In markdown mode, emit a `### Gist: Smorgasbord` block with just that model's eight rows (Smorgasbord covers the broadest set of DPPL features, so it is the most informative single row band), then put the full 68-row table inside `<details><summary>` so it is collapsed by default in GitHub PR comments. Plain (non-markdown) output is unchanged. Drop the now-redundant `### Benchmark Results` heading from the workflow body since the script emits its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Required as a direct dep so the benchmarks project resolves cleanly without a manual `Pkg.resolve()` after `Pkg.instantiate()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
976a8eb to
49b1837
Compare
The Benchmarking comment now reads top-down as: SHA in the heading, Smorgasbord gist as a level-3 section, explanatory paragraph, full table as a level-3 section, and computer info as a foldable <details> with an inline <b>-styled summary (avoids markdown inside <details>, which renders inconsistently). The two integration jobs added to CI.yml were still on setup-julia@v2; bumped to @V3 for consistency with the main job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits Benchmarking.yml into three jobs (benchmark-pr, benchmark-main, post-comment). The PR-comment now shows PR-head numbers up top and main's numbers in a foldout below, with a column legend and labelled SHAs. If benchmark-main fails (e.g. transitionally before this branch lands on main, since main's bench script does not yet support markdown mode), post-comment still posts PR-head numbers and notes the main job result inline. Workflow hardening: concurrency group cancels superseded PR runs, 60min timeout per bench job, explicit pull-requests: write permission so fork PRs can post comments. Body assembly moved out of the YAML literal block scalar into a shell heredoc using `body-path:` (peter-evans pattern), avoiding env-var multi-line interpolation. Bench script no longer prints the obsolete "compare against main-branch PR run" paragraph -- main is now in the same comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # .github/workflows/Benchmarking.yml # benchmarks/Project.toml # benchmarks/benchmarks.jl # src/test_utils/ad.jl
The 68-row table dominates the PR comment vertically. Wrap it in a <details><summary> block so it is collapsed by default, leaving the Smorgasbord gist as the at-a-glance view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Long-form table (one row per (model, linked, backend)) becomes a pivoted table where each (Model, Dim, Linked) row spans all four AD backends as columns. 68 rows collapse to 17, the gist/full-table split goes away (single unified table), and the gist constant + foldout wrapper in benchmarks/benchmarks.jl are dropped. `t(logdensity)` does not depend on the AD backend, so the four primal samples per group are noise around a common value — take the minimum as the most stable estimate. Per-backend `err` cells render independently of the primal column. Workflow comment restructure to match: SHA goes inline in the H2 title, "PR head"/"Main" preamble lines drop, main-branch results live under their own H3 with a "Click to see" foldout (and a fallback H3 note if benchmark-main failed), Computer Information is its own H3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Registered Bijectors 0.15.17+ caps AbstractPPL below 0.15, so the main package and the integration/floattypes/docs envs need the replace-di-with-abstractppl branch to resolve alongside the AbstractPPL evaluator-interface branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The [sources] override for Bijectors requires Bijectors to be listed in [deps]; Pkg rejects sources for packages not present in deps or extras. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Benchmarks @ 3bdeb74PerformancePerformance Ratio: Rows marked Main @ d2052a1EnvironmentJulia Version 1.11.9 Commit 53a02c0720c (2026-02-06 00:27 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 4 × AMD EPYC 7763 64-Core Processor WORD_SIZE: 64 LLVM: libLLVM-16.0.6 (ORCJIT, znver3) Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores) |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbstractPPL's `prepare(::AbstractADType, ...)` method lives in its DifferentiationInterface extension, so test envs that exercise the AD path with ForwardDiff/Enzyme need DI present and imported. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Some backends (notably Enzyme) return gradients as non-Vector types (e.g. Enzyme.TupleArray). ADResult expects Vector, so collect() before storing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reading from `problem`'s fields keeps the AD prep target in sync if `LogDensityAt`'s shape ever changes, instead of duplicating the 5-tuple. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This comment was marked as spam.
This comment was marked as spam.
Pass the LogDensityFunction state as a constant `context` tuple to `AbstractPPL.prepare(adtype, logdensity_internal, x; ...)` instead of building a `LogDensityAt` problem object and routing through the Mooncake-only `raw_gradient_target` keyword. This restores a single prep entry point that both the DI and Mooncake AbstractPPL extensions can handle. Rename `logdensity_at` to `logdensity_internal` to reflect that it is the implementation behind `LogDensityProblems.logdensity(ldf, params)` rather than a user-facing wrapper. Keep `logdensity_at` as a const alias so existing references still resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
Switch the `evaluator-interface` revision overrides in every Project.toml to `main` now that the evaluator API has landed there. Also drop redundant parens in a `@model` Type default in benchmarks/benchmarks.jl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the LogDensityAt struct with a deprecation shim that emits `Base.depwarn` and returns an `AbstractPPL.Evaluators.VectorEvaluator` wrapping a closure over `logdensity_internal`. The new path is the sanctioned one now that AD prep flows through `AbstractPPL.prepare`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the closure-over-state form with the new `context::Tuple` kwarg on `AbstractPPL.prepare`, which threads constants through to the problem function. Equivalent semantics, no closure construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass `check_dims=false` to match the pre-deprecation `LogDensityAt`'s unchecked call behavior; callers that relied on it shouldn't suddenly hit a `DimensionMismatch`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move test/integration/{enzyme,reversediff}/ to test/ext/{DynamicPPLEnzymeCoreExt,DynamicPPLReverseDiffExt}/ and convert the MarginalLogDensities split (subdir env + sibling script) to a single test/ext/DynamicPPLMarginalLogDensitiesExt/{Project.toml,main.jl}.
Each integration env now follows the same layout. CI.yml updated accordingly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the `AbstractPPL = {url = ..., rev = "main"}` overrides from all
eight Project.toml files. Compat is already at 0.15, so the resolver
picks up the registered release.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot
pushed a commit
to Stars1233/DynamicPPL.jl
that referenced
this pull request
May 19, 2026
Fresh-buffer benchmark setup - In `DynamicPPL.TestUtils.AD.run_ad`, use `setup = deepcopy($params)` for both the primal and gradient `@be ` calls so each Chairmarks sample starts from a fresh input buffer instead of reusing the same vector across samples. Matches Mooncake's bench harness; setup runs outside the timed window, so the copy isn't measured. Plus some minor tweaks. ~~This appears to have a noticeable impact on log-density evaluation for tiny models, first noticed in TuringLang#1363~~: e.g., `simple assume observe(linked = true)` logdensity improves from ~20 ns to ~4 ns. In practise, these tiny nanosecond overheads won't matter at all, so the improvement here is only for perfectionists. EDIT: likely `LogDensityAt` -> `logdensity_internal` reduced logdensity's overhead by 20 ns in TuringLang#1363 --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Migrates DynamicPPL's AD plumbing to the new
AbstractPPL.prepare/value_and_gradient!!interface introduced in AbstractPPL 0.15, retiring DynamicPPL's direct dependency on DifferentiationInterface.The
LogDensityFunctionconstructor now hands AD preparation toAbstractPPL.prepare(adtype, logdensity_internal, x; context=..., check_dims=false)andLogDensityProblems.logdensity_and_gradientsimply forwards toAbstractPPL.value_and_gradient!!. The internal_use_closureheuristic that picked betweenBase.Fix1closures andDI.Constantarguments per backend is removed — that decision now lives in AbstractPPL.logdensity_atis renamed tologdensity_internal(with a const alias for compatibility), andLogDensityAtbecomes a deprecation shim that returns anAbstractPPL.Evaluators.VectorEvaluator.Notes
mainexcept for tiny models, where this branch beatsmainfor logdensity evaluation while gradient remains comparable.Replacing #1354.
Tests currently fail due to #1364Tiny primal time drop in `Simple assume observe`
The apparent AD regression in the tiny linked row is mostly caused by a much faster
primal denominator, not by slower raw gradients.
Reported row:
So Mooncake is slightly faster in raw time for this row. The ratio looks worse because
t(logdensity)became tiny.Root cause
On Julia 1.11, old main's optimised primal still contains a call boundary:
Current branch inlines through the same path, so the one-dimensional linked model reduces
to loading
params[1]and evaluating two scalar Normal logpdfs.Local Julia 1.11.9 checks:
Adding
@inlineonly toLogDensityProblems.logdensitydid not materially change oldmain. The important boundary is the assume/init path.