Skip to content

Guard reverse-mode Store adjoint against NULL shadow pointers; preser…#2839

Open
minansys wants to merge 3 commits into
EnzymeAD:mainfrom
minansys:fix/reverse-store-null-shadow-guard
Open

Guard reverse-mode Store adjoint against NULL shadow pointers; preser…#2839
minansys wants to merge 3 commits into
EnzymeAD:mainfrom
minansys:fix/reverse-store-null-shadow-guard

Conversation

@minansys
Copy link
Copy Markdown
Collaborator

@minansys minansys commented May 22, 2026

#2834

Two related crashes / silent-miscompilations surfaced when with packed-state structs, an outer "is-this-frame-moving?" guard, and inactive frame-velocity fields whose shadow pointers can legitimately be NULL on some control-flow paths.

  1. Enzyme/AdjointGenerator.h -- visitCommonStore reverse path
    Reverse-mode Store adjoint was unconditionally doing
    dif1 = load(shadow_ptr);
    setPtrDiffe(shadow_ptr, 0);
    addToDiffe(orig_val, dif1, ...);
    on the shadow pointer, with no NULL check. When the upstream code
    produced an undef/null shadow for an inactive packed-state field that the
    outer guard had previously protected, the reverse pass dereferenced it
    and SIGSEGV'd inside the load-zero-fadd emitted for the gradient
    accumulation. The fix wraps that triple in a runtime if (shadow_ptr != null) guard, splitting the reverse BB into _nnactive / _nnmerge
    and letting the existing merge-block plumbing close the diamond.

  2. Enzyme/GradientUtils.cpp -- SubTransferHelper srcConstant branch
    When the memcpy/memset source is a constant (mode != ForwardModeSplit),
    upstream lowered the shadow-side transfer to a memset(shadow_dst, 0, len). That is wrong for shadow allocas that visitCommonStore has
    already populated with valid scratch-pointer entries: the memset
    silently zeros those pointers and downstream loads read NULL, leading
    to either gradient drops or the crash above.

    The fix replaces the srcConstant memset with an unconditional memcpy
    from the primal source into the shadow destination, with the proper
    inttoptr casts, optional GEP-by-offset, and MaybeAlign on both
    operands. This preserves any scratch-pointer payload that the reverse
    pass needs.

    The dead helper ShadowDstHasPointerLoadInRange and a leftover
    if (true || ...) gate from an earlier defense-in-depth attempt are
    also removed in this commit; both are subsumed by the unconditional
    memcpy.

Regression test
test/Enzyme/ReverseMode/packed_state_outer_guard_undef_tape.ll
reproduces the original crash class: packed-state struct, outer frame-
motion guard, optional-pointer join leaving tape fields path-dependent.
Before the fix the IR FileCheck fails (or opt crashes); after the fix
it produces well-formed reverse-mode IR with the NULL-shadow diamond.

Validation

  • Stand-alone .ll repro: opt -enzyme exits cleanly, FileCheck passes.
  • End-to-end CFD case (Cromwell wall-function adjoint, runcrmadcpu -dp): previously SIGSEGV'd inside the visitStoreInst reverse path; now exits 0 and emits the full reverse-mode hash output. HashDump comparison confirms the first 146271 lines are byte-identical to a prior-known-good prefix and 67262 additional rule blocks are produced past the v6 truncation point -- i.e. no silent gradient regression.

@minansys minansys force-pushed the fix/reverse-store-null-shadow-guard branch 2 times, most recently from 90471b8 to afa4e47 Compare May 22, 2026 15:15
…ve scratch pointers in srcConstant memcpy

Two related crashes / silent-miscompilations surfaced when AD-ing CFD code
(Ansys Fluent / Cromwell wall-function momentum kernel) with packed-state
structs, an outer "is-this-frame-moving?" guard, and inactive frame-velocity
fields whose shadow pointers can legitimately be NULL on some control-flow
paths.

1) Enzyme/AdjointGenerator.h -- visitCommonStore reverse path
   Reverse-mode Store adjoint was unconditionally doing
       dif1 = load(shadow_ptr);
       setPtrDiffe(shadow_ptr, 0);
       addToDiffe(orig_val, dif1, ...);
   on the shadow pointer, with no NULL check. When the upstream code
   produced an undef/null shadow for an inactive packed-state field that the
   outer guard had previously protected, the reverse pass dereferenced it
   and SIGSEGV'd inside the load-zero-fadd emitted for the gradient
   accumulation. The fix wraps that triple in a runtime `if (shadow_ptr !=
   null)` guard, splitting the reverse BB into <bb>_nnactive / <bb>_nnmerge
   and letting the existing merge-block plumbing close the diamond.

2) Enzyme/GradientUtils.cpp -- SubTransferHelper srcConstant branch
   When the memcpy/memset source is a constant (mode != ForwardModeSplit),
   upstream lowered the shadow-side transfer to a `memset(shadow_dst, 0,
   len)`. That is wrong for shadow allocas that visitCommonStore has
   already populated with valid scratch-pointer entries: the memset
   silently zeros those pointers and downstream loads read NULL, leading
   to either gradient drops or the crash above.

   The fix replaces the srcConstant memset with an unconditional memcpy
   from the *primal* source into the shadow destination, with the proper
   inttoptr casts, optional GEP-by-offset, and MaybeAlign on both
   operands. This preserves any scratch-pointer payload that the reverse
   pass needs.

   The dead helper `ShadowDstHasPointerLoadInRange` and a leftover
   `if (true || ...)` gate from an earlier defense-in-depth attempt are
   also removed in this commit; both are subsumed by the unconditional
   memcpy.

Regression test
   test/Enzyme/ReverseMode/packed_state_outer_guard_undef_tape.ll
   reproduces the original crash class: packed-state struct, outer frame-
   motion guard, optional-pointer join leaving tape fields path-dependent.
   Before the fix the IR FileCheck fails (or opt crashes); after the fix
   it produces well-formed reverse-mode IR with the NULL-shadow diamond.

Validation
   - Stand-alone .ll repro: opt -enzyme exits cleanly, FileCheck passes.
   - End-to-end CFD case (Cromwell wall-function adjoint, runcrmadcpu
     -dp): previously SIGSEGV'd inside the visitStoreInst reverse path;
     now exits 0 and emits the full reverse-mode hash output. HashDump
     comparison confirms the first 146271 lines are byte-identical to a
     prior-known-good prefix and 67262 additional rule blocks are
     produced past the v6 truncation point -- i.e. no silent gradient
     regression.
@minansys minansys force-pushed the fix/reverse-store-null-shadow-guard branch from afa4e47 to 42316a6 Compare May 22, 2026 16:36
@minansys minansys force-pushed the fix/reverse-store-null-shadow-guard branch from 78953bc to 6f44b96 Compare May 22, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant