Skip to content

fix: add pre-loop event init and tail event drain for back-edge sync …#444

Closed
zhangstevenunity wants to merge 1 commit intomainfrom
fix/issue-428-sync-hang
Closed

fix: add pre-loop event init and tail event drain for back-edge sync …#444
zhangstevenunity wants to merge 1 commit intomainfrom
fix/issue-428-sync-hang

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

…(#428)

The generated C++ Cube kernel hangs with --enable-insert-sync because:

  1. Back-edge wait_flag ops hoisted to loop heads fire on iteration 0 against uninitialized event registers. This patch adds a pre-loop set_flag for each back-edge wait to initialize the event before the first iteration.

  2. The tail sync (ptoas_auto_sync_tail with kBarrierAll) only emits pipe_barrier(PIPE_ALL) which does not drain pending event flag registers. This patch adds explicit wait_flag calls at the function tail to drain all back-edge event dependencies.

…428)

The generated C++ Cube kernel hangs with --enable-insert-sync because:

1. Back-edge wait_flag ops hoisted to loop heads fire on iteration 0
   against uninitialized event registers. This patch adds a pre-loop
   set_flag for each back-edge wait to initialize the event before the
   first iteration.

2. The tail sync (ptoas_auto_sync_tail with kBarrierAll) only emits
   pipe_barrier(PIPE_ALL) which does not drain pending event flag
   registers. This patch adds explicit wait_flag calls at the function
   tail to drain all back-edge event dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements back-edge synchronization handling to resolve issue #428, ensuring event registers are initialized before loops and drained at the function tail. Review feedback identifies a logic error in the event draining process, redundant code for setting insertion points, and opportunities to use more idiomatic C++ for comparison and deduplication.

for (auto *sync : syncIR_.back()->pipeAfter) {
if (sync->uselessSync || sync->eventIds.empty())
continue;
if (sync->isSyncWaitType()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There appears to be a logic error here. To "drain" pending events at the function tail, the compiler should identify producer operations (set_flag) that have not been matched by a consumer (wait_flag). If a wait_flag is already present in pipeAfter, the event is already being drained. The comment on line 403 correctly states this should scan for set_flag ops, but the code checks for isSyncWaitType(). This should be changed to isSyncSetType() to correctly identify pending signals that require an explicit drain.

Suggested change
if (sync->isSyncWaitType()) {
if (sync->isSyncSetType()) {

EmitTailEventDrain(rewriter, ret);

// Re-set insertion point before ret (after any drain ops we just emitted)
rewriter.setInsertionPoint(ret);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This call to setInsertionPoint is redundant. The insertion point is already set to ret (before the return operation) at line 318. Creating operations via the rewriter in EmitTailEventDrain maintains the insertion point before ret, so there is no need to reset it here.

Comment on lines +372 to +376
bool operator<(const EventKey &o) const {
if (src != o.src) return static_cast<unsigned>(src) < static_cast<unsigned>(o.src);
if (dst != o.dst) return static_cast<unsigned>(dst) < static_cast<unsigned>(o.dst);
return eventId < o.eventId;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comparison operator can be simplified using std::tie for better readability and maintainability. This avoids manual field-by-field comparison logic.

    bool operator<(const EventKey &o) const {
      return std::tie(src, dst, eventId) < std::tie(o.src, o.dst, o.eventId);
    }

Comment on lines +383 to +390
auto addUnique = [&](PipelineType src, PipelineType dst, int eid) {
EventKey key{src, dst, eid};
for (auto &existing : drainEvents) {
if (existing == key)
return;
}
drainEvents.push_back(key);
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of a manual linear search for uniqueness in a lambda, you can simply collect all events and then use std::unique after the sorting step at line 421. This is more idiomatic and efficient as the number of events grows.

  auto addUnique = [&](PipelineType src, PipelineType dst, int eid) {
    drainEvents.push_back({src, dst, eid});
  };

return;

// Sort for deterministic output.
llvm::sort(drainEvents);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the addUnique logic is simplified to just push_back, you should perform the deduplication here after sorting.

Suggested change
llvm::sort(drainEvents);
llvm::sort(drainEvents);
drainEvents.erase(std::unique(drainEvents.begin(), drainEvents.end()), drainEvents.end());

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff9be95a47

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +395 to +399
for (auto *waitSync : pair.second) {
if (waitSync->uselessSync || waitSync->eventIds.empty())
continue;
addUnique(waitSync->GetActualSrcPipe(), waitSync->GetActualDstPipe(),
waitSync->eventIds[0]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard tail event drains by control-flow reachability

This unconditionally adds every collected loop-head event to the tail drain set, so AppendAutoSyncTailBarrierIfNeeded emits wait_flag for those events on all return paths. If a loop is conditionally skipped (e.g., loop inside an scf.if false branch), its pre-loop set_flag is never executed, but the function-tail drain wait still runs and can block on an uninitialized event, reintroducing hangs. The drain list needs to be filtered to events guaranteed to be initialized on the current return path (or emitted under matching control flow).

Useful? React with 👍 / 👎.

Comment on lines +521 to +524
if (!sync->isSyncWaitType())
continue;
if (sync->eventIds.empty())
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict pre-loop init to backward-loop waits

This treats all waits hoisted to LOOP_BEGIN as back-edge waits and emits synthetic pre-loop set_flag for each, but MoveSyncState::PlanMoveOutWaitSync hoists any wait whose producer is outside the loop (not only loop-carried/back-edge cases). Initializing those ordinary waits here can satisfy them before the real producer runs and weaken required ordering. Filter candidates with an explicit back-edge marker (for example GetForEndIndex().has_value() / reallocated-loop-head-tail metadata) before adding them to preLoopInitWaits_.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant