fix: add pre-loop event init and tail event drain for back-edge sync …#444
fix: add pre-loop event init and tail event drain for back-edge sync …#444zhangstevenunity wants to merge 1 commit intomainfrom
Conversation
…428) The generated C++ Cube kernel hangs with --enable-insert-sync because: 1. Back-edge wait_flag ops hoisted to loop heads fire on iteration 0 against uninitialized event registers. This patch adds a pre-loop set_flag for each back-edge wait to initialize the event before the first iteration. 2. The tail sync (ptoas_auto_sync_tail with kBarrierAll) only emits pipe_barrier(PIPE_ALL) which does not drain pending event flag registers. This patch adds explicit wait_flag calls at the function tail to drain all back-edge event dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements back-edge synchronization handling to resolve issue #428, ensuring event registers are initialized before loops and drained at the function tail. Review feedback identifies a logic error in the event draining process, redundant code for setting insertion points, and opportunities to use more idiomatic C++ for comparison and deduplication.
| for (auto *sync : syncIR_.back()->pipeAfter) { | ||
| if (sync->uselessSync || sync->eventIds.empty()) | ||
| continue; | ||
| if (sync->isSyncWaitType()) { |
There was a problem hiding this comment.
There appears to be a logic error here. To "drain" pending events at the function tail, the compiler should identify producer operations (set_flag) that have not been matched by a consumer (wait_flag). If a wait_flag is already present in pipeAfter, the event is already being drained. The comment on line 403 correctly states this should scan for set_flag ops, but the code checks for isSyncWaitType(). This should be changed to isSyncSetType() to correctly identify pending signals that require an explicit drain.
| if (sync->isSyncWaitType()) { | |
| if (sync->isSyncSetType()) { |
| EmitTailEventDrain(rewriter, ret); | ||
|
|
||
| // Re-set insertion point before ret (after any drain ops we just emitted) | ||
| rewriter.setInsertionPoint(ret); |
There was a problem hiding this comment.
| bool operator<(const EventKey &o) const { | ||
| if (src != o.src) return static_cast<unsigned>(src) < static_cast<unsigned>(o.src); | ||
| if (dst != o.dst) return static_cast<unsigned>(dst) < static_cast<unsigned>(o.dst); | ||
| return eventId < o.eventId; | ||
| } |
There was a problem hiding this comment.
| auto addUnique = [&](PipelineType src, PipelineType dst, int eid) { | ||
| EventKey key{src, dst, eid}; | ||
| for (auto &existing : drainEvents) { | ||
| if (existing == key) | ||
| return; | ||
| } | ||
| drainEvents.push_back(key); | ||
| }; |
There was a problem hiding this comment.
Instead of a manual linear search for uniqueness in a lambda, you can simply collect all events and then use std::unique after the sorting step at line 421. This is more idiomatic and efficient as the number of events grows.
auto addUnique = [&](PipelineType src, PipelineType dst, int eid) {
drainEvents.push_back({src, dst, eid});
};| return; | ||
|
|
||
| // Sort for deterministic output. | ||
| llvm::sort(drainEvents); |
There was a problem hiding this comment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff9be95a47
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (auto *waitSync : pair.second) { | ||
| if (waitSync->uselessSync || waitSync->eventIds.empty()) | ||
| continue; | ||
| addUnique(waitSync->GetActualSrcPipe(), waitSync->GetActualDstPipe(), | ||
| waitSync->eventIds[0]); |
There was a problem hiding this comment.
Guard tail event drains by control-flow reachability
This unconditionally adds every collected loop-head event to the tail drain set, so AppendAutoSyncTailBarrierIfNeeded emits wait_flag for those events on all return paths. If a loop is conditionally skipped (e.g., loop inside an scf.if false branch), its pre-loop set_flag is never executed, but the function-tail drain wait still runs and can block on an uninitialized event, reintroducing hangs. The drain list needs to be filtered to events guaranteed to be initialized on the current return path (or emitted under matching control flow).
Useful? React with 👍 / 👎.
| if (!sync->isSyncWaitType()) | ||
| continue; | ||
| if (sync->eventIds.empty()) | ||
| continue; |
There was a problem hiding this comment.
Restrict pre-loop init to backward-loop waits
This treats all waits hoisted to LOOP_BEGIN as back-edge waits and emits synthetic pre-loop set_flag for each, but MoveSyncState::PlanMoveOutWaitSync hoists any wait whose producer is outside the loop (not only loop-carried/back-edge cases). Initializing those ordinary waits here can satisfy them before the real producer runs and weaken required ordering. Filter candidates with an explicit back-edge marker (for example GetForEndIndex().has_value() / reallocated-loop-head-tail metadata) before adding them to preLoopInitWaits_.
Useful? React with 👍 / 👎.
…(#428)
The generated C++ Cube kernel hangs with --enable-insert-sync because:
Back-edge wait_flag ops hoisted to loop heads fire on iteration 0 against uninitialized event registers. This patch adds a pre-loop set_flag for each back-edge wait to initialize the event before the first iteration.
The tail sync (ptoas_auto_sync_tail with kBarrierAll) only emits pipe_barrier(PIPE_ALL) which does not drain pending event flag registers. This patch adds explicit wait_flag calls at the function tail to drain all back-edge event dependencies.