Add tagdepth fast path to tag comparison#807
Conversation
Fixes SciML/OrdinaryDiffEq.jl#3381 and superscedes SciML/OrdinaryDiffEq.jl#3587 . Also fixes NonlinearSolve.jl master and superscedes SciML/NonlinearSolve.jl#932 Superscedes #724 and is a better solution to #714. The crux of the issue is that ForwardDiff.jl's tagging system is somewhat designed around the tag only being used once, i.e. the function is created, the derivative function is called, the tag is set for that derivative as a type of the function being differentiated, and therefore it's unique. Then this ends up working with nested differentiation because you call the inner function first, usually, before the outer function, or only do the combination, and so the tag ordering is set correctly. Mixing tagging with precompilation then leads to this issue where it's possible for the outer tag to be precompiled before the inner tag. This makes the tag ordering the opposite, and what happens is then that the type promotion mechanism gets confused because it is tied to the tag ordering. This seems pretty fundamental because it's a useful property, it's the core property used to prevent perturbation confusion, but it means that this interaction between nested differentiation and precompilation ends up having odd bugs. I tried working around this downstream (SciML/OrdinaryDiffEq.jl#3587) but it was very nasty. Basically, you had to make sure you didn't have dual numbers automatically converting Float64s, as then sometimes it could convert to the inner type instead of the outer type, and it wouldn't do the normal conversion of first to the inner to then the wrapped outer type because doing so required the outer type to postdate the inner type. But, this really then showcases that the bug truly only manifests with nested types. And if you have nested types, you know you don't have perturbation confusion if one tag is nested deeper than the other tag, because there are not the same number of partials. So in the case where the tag depths are not the same, you can do an alternative tag ordering since you will have already proven perturbations aren't confusing. And in that case, you can choose the deeper nested tags to just always be `<` the less deeper tags. So added that and poof, tag nesting worked out in these cases with precompilation. So I think this captures the true crux of the problem and solves it at its core.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #807 +/- ##
==========================================
+ Coverage 90.75% 90.80% +0.05%
==========================================
Files 11 11
Lines 1071 1077 +6
==========================================
+ Hits 972 978 +6
Misses 99 99 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @inline tagdepth(::Type) = 0 | ||
| @inline tagdepth(::Type{<:Dual{T,V,N}}) where {T,V,N} = 1 + tagdepth(V) | ||
| @inline tagdepth(::Type{<:Tag{F,V}}) where {F,V} = 1 + tagdepth(V) |
There was a problem hiding this comment.
Why not just defining this function on ::Type and Type{<:Dual}? Why ::Type{<:Tag} as well?
There was a problem hiding this comment.
Because when you nest it's a tag of duals
| d1 = tagdepth(Tag{F1,V1}) | ||
| d2 = tagdepth(Tag{F2,V2}) | ||
| d1 != d2 && return d1 < d2 |
There was a problem hiding this comment.
You can define Tags basically arbitrarily, so why would this be safe?
There was a problem hiding this comment.
If the tags are defined according to the interfaces in the package, T is the type being differentiated. If T is the type being differentiated, then this gaurentees tag ordering by differentiation hierarchy. We previously had PRs closed about documenting this saying that it is non public API, so since this package always obeys that invarient internally and it's purposefully non public, it would be non breaking to enforce it.
| d1 = tagdepth(Tag{F1,V1}) | ||
| d2 = tagdepth(Tag{F2,V2}) | ||
| d1 != d2 && return d1 < d2 | ||
| return tagcount(Tag{F1,V1}) < tagcount(Tag{F2,V2}) |
There was a problem hiding this comment.
It seems one could still run into the same precompilation-caused problems here, e.g., if V1 === V2 (e.g. both Float64)?
There was a problem hiding this comment.
Build an example? All of the examples from before that cannot happen because it's not Float64 in both cases but Dual of Float64, and that tag nesting is exactly the part you missed from the earlier part.
|
@devmotion From your comments I think you missed the core part of this. The tag by design does f and eltype (V). That eltype when nested is itself a Tag Dual. In a nested differentiation context that establishes its own natural ordering for the duals, as in the nested context the promotion of duals to the outer dual via appending zero partials to the inner dual is the natural action. As a safety check, what I can do is add a proof that V1 nests V2 exactly as well, i.e. the stripped V1 matches V2. In that case the definition is very clear, and this is the actual case that is triggered by precompilation issue. |
|
Following my earlier review comments, I had a gut feeling that the depth-based fast path solves the symptom in some common cases but isn't really fixing the root cause — and I worried it might even produce incorrect results in cases where it triggers. To check, I challenged Claude to construct problematic examples for this PR. Two emerged; both reproduce empirically against this branch on Julia 1.12. TL;DR
The mechanism behind (1): Example 1 — regression: 3-level nested derivative with constant inner seedusing ForwardDiff
inner_deriv(d) = ForwardDiff.derivative(y -> y^2 * d, 1.0) # = 2*d
middle_grad(v) = ForwardDiff.gradient(v) do u
sum(inner_deriv(ui) * ui for ui in u) # = sum(2*ui^2)
end # grad = [4*u1, 4*u2]
outer_fn(x) = sum(middle_grad([x, 2x])) # = 12x
ForwardDiff.derivative(outer_fn, 0.5) # analytic: 12.0
Tags involved at the failing multiplication
The PR's fast path sees A simpler variant also throws on this PR and works on master: inner2(d) = ForwardDiff.derivative(y -> y * d, 1.0) # = d
mid2(v) = ForwardDiff.gradient(u -> sum(inner2.(u) .* u), v) # grad of sum(ui^2) = [2*u1, 2*u2]
out2(x) = sum(mid2([x, x])) # = 4x
ForwardDiff.derivative(out2, 0.5) # analytic: 4.0Example 2 — same-depth still broken: inverted tagcount with V=Float64 on both sidesusing ForwardDiff
const Tag = ForwardDiff.Tag
struct OuterF end
struct InnerF
x_dual::ForwardDiff.Dual{Tag{OuterF, Float64}, Float64, 1}
end
(c::InnerF)(y) = sin(c.x_dual * y)
(::OuterF)(x_dual::ForwardDiff.Dual{Tag{OuterF, Float64}, Float64, 1}) =
ForwardDiff.derivative(InnerF(x_dual), 1.0)
# Force tagcount to be evaluated in INVERTED order (simulating a precompile race):
ForwardDiff.tagcount(Tag{InnerF, Float64}) # returns 0
ForwardDiff.tagcount(Tag{OuterF, Float64}) # returns 1
# Analytic: d/dx (d/dy sin(x*y)|_{y=1}) at x=0.5 = cos(0.5) - 0.5*sin(0.5) ≈ 0.6378697925882713
ForwardDiff.derivative(OuterF(), 0.5)
Both tags have |
|
I ran into this one over the past two days, and was surprised to see these comments to be so fresh. For my case: a process simulation engine differentiates a solver residual with ForwardDiff, the residual calls a thermodynamics library (Clapeyron) that runs its own nested ForwardDiff internally, and a precompile workload baked an inverted I also reproduced @devmotion's regression independently: @devmotion's 3-level / constant-inner-seed example returns The cause is the one already noted in this thread — @inline tagdepth(::Type) = 0
@inline tagdepth(::Type{<:Dual{T,V,N}}) where {T,V,N} = 1 + tagdepth(V)
@inline tagdepth(::Type{<:Tag{F,V}}) where {F,V} = 1 + tagdepth(V)
# Does Tag `target` genuinely appear in X's value-type / tag chain?
# Deliberately does NOT recurse closure type params F.
@inline contains_tag(::Type, target) = false
@inline contains_tag(::Type{<:Tag{F,V}}, target) where {F,V} = contains_tag(V, target)
@inline contains_tag(::Type{<:Dual{T,V,N}}, target) where {T,V,N} =
(T === target) || contains_tag(T, target) || contains_tag(V, target)
@inline function ≺(::Type{Tag{F1,V1}}, ::Type{Tag{F2,V2}}) where {F1,V1,F2,V2}
T1 = Tag{F1,V1}
T2 = Tag{F2,V2}
d1 = tagdepth(T1)
d2 = tagdepth(T2)
if d1 != d2
genuinely_nested = d1 > d2 ? contains_tag(T1, T2) : contains_tag(T2, T1)
genuinely_nested && return d1 < d2 # depth wins ONLY if nesting is real
end
return tagcount(T1) < tagcount(T2)
end
Verified: the inverted- Hope this is helpful. |
|
I asked Claude to dig in on this thread and try five candidate approaches against a unified test matrix. Reporting back since the comparison is concrete. Test harness (8 cases)Single Julia 1.11 / ForwardDiff master process; isolated env per approach with using ForwardDiff, OrdinaryDiffEqRosenbrock, SciMLBase, ADTypes
const results = NamedTuple{(:case, :status, :detail), Tuple{String, String, String}}[]
function runtest(body, name)
try
r = body()
push!(results, (case = name, status = "OK", detail = string(r)[1:min(80, end)]))
catch e
push!(results, (case = name, status = "FAIL",
detail = "$(typeof(e).name.name): $(sprint(showerror, e)[1:min(180, end)])"))
end
end
# ===== Cases 1-6: SciML/OrdinaryDiffEq.jl#3587 deterministic matrix =====
# Outer ForwardDiff layer over a Rosenbrock23 ODE solve, with the outer Dual
# carried in via `p_dual` (and optionally `u0_dual`). Varies specialization
# mode {Full, Auto, No} × u0 eltype {Float64, Dual}.
function ode!(du, u, p, t)
du[1] = -p[1] * u[1]
du[2] = -u[1] - p[2] * u[2]
return nothing
end
const TestTag = ForwardDiff.Tag{:NestedForwardDiffOuter, Float64}
const p_dual = ForwardDiff.Dual{TestTag, Float64, 2}[
ForwardDiff.Dual{TestTag}(1.5, ForwardDiff.Partials{2, Float64}((1.0, 0.0))),
ForwardDiff.Dual{TestTag}(2.0, ForwardDiff.Partials{2, Float64}((0.0, 1.0))),
]
const u0_dual = [ForwardDiff.Dual{TestTag}(1.0, ForwardDiff.Partials{2, Float64}((0.0, 0.0))) for _ in 1:2]
for spec in (SciMLBase.FullSpecialize, SciMLBase.AutoSpecialize, SciMLBase.NoSpecialize)
for (label, u0) in (("Float64u0", [1.0, 1.0]), ("Dualu0", u0_dual))
runtest("#3587.$(spec).$(label)") do
ode_f = ODEFunction{true, spec}(ode!)
prob = ODEProblem(ode_f, u0, (0.0, 1.0), p_dual)
sol = solve(prob, Rosenbrock23(autodiff = AutoForwardDiff(chunksize = 2));
reltol = 1.0e-8, abstol = 1.0e-8)
SciMLBase.successful_retcode(sol) && all(u -> all(isfinite, u), sol.u) ? "ok" : "bad_retcode"
end
end
end
# ===== Case 7: @devmotion's example 1 — 3-level nested derivative with =====
# inner Float64 seed. Reproduces the regression introduced by #807's
# depth-only fast path.
runtest("devmotion.ex1.simple") do
inner2(d) = ForwardDiff.derivative(y -> y * d, 1.0)
mid2(v) = ForwardDiff.gradient(u -> sum(inner2.(u) .* u), v)
out2(x) = sum(mid2([x, x]))
g = ForwardDiff.derivative(out2, 0.5)
isapprox(g, 4.0; atol = 1.0e-10) ? "ok (g=$g)" : "wrong (g=$g, want 4.0)"
end
# ===== Case 8: @devmotion's example 2 — same-depth (both V=Float64) tags =====
# with tagcount forced into inverted order (simulating a precompile race
# that baked the inner tag's literal before the outer tag's).
struct OuterF_TestSuite end
struct InnerF_TestSuite
x_dual::ForwardDiff.Dual{ForwardDiff.Tag{OuterF_TestSuite, Float64}, Float64, 1}
end
(c::InnerF_TestSuite)(y) = sin(c.x_dual * y)
(::OuterF_TestSuite)(x_dual::ForwardDiff.Dual{ForwardDiff.Tag{OuterF_TestSuite, Float64}, Float64, 1}) =
ForwardDiff.derivative(InnerF_TestSuite(x_dual), 1.0)
runtest("devmotion.ex2.inverted_tagcount") do
ForwardDiff.tagcount(ForwardDiff.Tag{InnerF_TestSuite, Float64}) # → 0
ForwardDiff.tagcount(ForwardDiff.Tag{OuterF_TestSuite, Float64}) # → 1 (inverted)
g = ForwardDiff.derivative(OuterF_TestSuite(), 0.5)
expected = cos(0.5) - 0.5 * sin(0.5)
isapprox(g, expected; atol = 1.0e-10) ? "ok (g=$g)" : "wrong (g=$g, want $expected)"
endSummary table
Per-case results
CodeApproach A — OrdinaryDiffEq-side: skip
|
Fixes SciML/OrdinaryDiffEq.jl#3381 and superscedes SciML/OrdinaryDiffEq.jl#3587 . Also fixes NonlinearSolve.jl master and superscedes SciML/NonlinearSolve.jl#932
Superscedes #724 and is a better solution to #714.
The crux of the issue is that ForwardDiff.jl's tagging system is somewhat designed around the tag only being used once, i.e. the function is created, the derivative function is called, the tag is set for that derivative as a type of the function being differentiated, and therefore it's unique. Then this ends up working with nested differentiation because you call the inner function first, usually, before the outer function, or only do the combination, and so the tag ordering is set correctly.
Mixing tagging with precompilation then leads to this issue where it's possible for the outer tag to be precompiled before the inner tag. This makes the tag ordering the opposite, and what happens is then that the type promotion mechanism gets confused because it is tied to the tag ordering. This seems pretty fundamental because it's a useful property, it's the core property used to prevent perturbation confusion, but it means that this interaction between nested differentiation and precompilation ends up having odd bugs.
I tried working around this downstream (SciML/OrdinaryDiffEq.jl#3587) but it was very nasty. Basically, you had to make sure you didn't have dual numbers automatically converting Float64s, as then sometimes it could convert to the inner type instead of the outer type, and it wouldn't do the normal conversion of first to the inner to then the wrapped outer type because doing so required the outer type to postdate the inner type.
But, this really then showcases that the bug truly only manifests with nested types. And if you have nested types, you know you don't have perturbation confusion if one tag is nested deeper than the other tag, because there are not the same number of partials. So in the case where the tag depths are not the same, you can do an alternative tag ordering since you will have already proven perturbations aren't confusing. And in that case, you can choose the deeper nested tags to just always be
<the less deeper tags.So added that and poof, tag nesting worked out in these cases with precompilation. So I think this captures the true crux of the problem and solves it at its core.