Guard reverse-mode Store adjoint against NULL shadow pointers; preser…#2839
Open
minansys wants to merge 3 commits into
Open
Guard reverse-mode Store adjoint against NULL shadow pointers; preser…#2839minansys wants to merge 3 commits into
minansys wants to merge 3 commits into
Conversation
90471b8 to
afa4e47
Compare
…ve scratch pointers in srcConstant memcpy
Two related crashes / silent-miscompilations surfaced when AD-ing CFD code
(Ansys Fluent / Cromwell wall-function momentum kernel) with packed-state
structs, an outer "is-this-frame-moving?" guard, and inactive frame-velocity
fields whose shadow pointers can legitimately be NULL on some control-flow
paths.
1) Enzyme/AdjointGenerator.h -- visitCommonStore reverse path
Reverse-mode Store adjoint was unconditionally doing
dif1 = load(shadow_ptr);
setPtrDiffe(shadow_ptr, 0);
addToDiffe(orig_val, dif1, ...);
on the shadow pointer, with no NULL check. When the upstream code
produced an undef/null shadow for an inactive packed-state field that the
outer guard had previously protected, the reverse pass dereferenced it
and SIGSEGV'd inside the load-zero-fadd emitted for the gradient
accumulation. The fix wraps that triple in a runtime `if (shadow_ptr !=
null)` guard, splitting the reverse BB into <bb>_nnactive / <bb>_nnmerge
and letting the existing merge-block plumbing close the diamond.
2) Enzyme/GradientUtils.cpp -- SubTransferHelper srcConstant branch
When the memcpy/memset source is a constant (mode != ForwardModeSplit),
upstream lowered the shadow-side transfer to a `memset(shadow_dst, 0,
len)`. That is wrong for shadow allocas that visitCommonStore has
already populated with valid scratch-pointer entries: the memset
silently zeros those pointers and downstream loads read NULL, leading
to either gradient drops or the crash above.
The fix replaces the srcConstant memset with an unconditional memcpy
from the *primal* source into the shadow destination, with the proper
inttoptr casts, optional GEP-by-offset, and MaybeAlign on both
operands. This preserves any scratch-pointer payload that the reverse
pass needs.
The dead helper `ShadowDstHasPointerLoadInRange` and a leftover
`if (true || ...)` gate from an earlier defense-in-depth attempt are
also removed in this commit; both are subsumed by the unconditional
memcpy.
Regression test
test/Enzyme/ReverseMode/packed_state_outer_guard_undef_tape.ll
reproduces the original crash class: packed-state struct, outer frame-
motion guard, optional-pointer join leaving tape fields path-dependent.
Before the fix the IR FileCheck fails (or opt crashes); after the fix
it produces well-formed reverse-mode IR with the NULL-shadow diamond.
Validation
- Stand-alone .ll repro: opt -enzyme exits cleanly, FileCheck passes.
- End-to-end CFD case (Cromwell wall-function adjoint, runcrmadcpu
-dp): previously SIGSEGV'd inside the visitStoreInst reverse path;
now exits 0 and emits the full reverse-mode hash output. HashDump
comparison confirms the first 146271 lines are byte-identical to a
prior-known-good prefix and 67262 additional rule blocks are
produced past the v6 truncation point -- i.e. no silent gradient
regression.
afa4e47 to
42316a6
Compare
78953bc to
6f44b96
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#2834
Two related crashes / silent-miscompilations surfaced when with packed-state structs, an outer "is-this-frame-moving?" guard, and inactive frame-velocity fields whose shadow pointers can legitimately be NULL on some control-flow paths.
Enzyme/AdjointGenerator.h -- visitCommonStore reverse path
Reverse-mode Store adjoint was unconditionally doing
dif1 = load(shadow_ptr);
setPtrDiffe(shadow_ptr, 0);
addToDiffe(orig_val, dif1, ...);
on the shadow pointer, with no NULL check. When the upstream code
produced an undef/null shadow for an inactive packed-state field that the
outer guard had previously protected, the reverse pass dereferenced it
and SIGSEGV'd inside the load-zero-fadd emitted for the gradient
accumulation. The fix wraps that triple in a runtime
if (shadow_ptr != null)guard, splitting the reverse BB into _nnactive / _nnmergeand letting the existing merge-block plumbing close the diamond.
Enzyme/GradientUtils.cpp -- SubTransferHelper srcConstant branch
When the memcpy/memset source is a constant (mode != ForwardModeSplit),
upstream lowered the shadow-side transfer to a
memset(shadow_dst, 0, len). That is wrong for shadow allocas that visitCommonStore hasalready populated with valid scratch-pointer entries: the memset
silently zeros those pointers and downstream loads read NULL, leading
to either gradient drops or the crash above.
The fix replaces the srcConstant memset with an unconditional memcpy
from the primal source into the shadow destination, with the proper
inttoptr casts, optional GEP-by-offset, and MaybeAlign on both
operands. This preserves any scratch-pointer payload that the reverse
pass needs.
The dead helper
ShadowDstHasPointerLoadInRangeand a leftoverif (true || ...)gate from an earlier defense-in-depth attempt arealso removed in this commit; both are subsumed by the unconditional
memcpy.
Regression test
test/Enzyme/ReverseMode/packed_state_outer_guard_undef_tape.ll
reproduces the original crash class: packed-state struct, outer frame-
motion guard, optional-pointer join leaving tape fields path-dependent.
Before the fix the IR FileCheck fails (or opt crashes); after the fix
it produces well-formed reverse-mode IR with the NULL-shadow diamond.
Validation