Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion include/PTO/Transforms/InsertSync/SyncEventIdAllocation.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ class SyncEventIdAllocation {
void SetEventId(SyncOperation *sync);

SmallVector<bool> GetEventPool(const SyncOperation *sync, size_t eventIdNum);
int ScopePair(const SyncOperation *s);
int ScopePair(const SyncOperation *s) const;
void FindUseEventID(unsigned int begin, unsigned int end,
const SyncOperation *s, SmallVector<bool> &eventId);

Expand Down Expand Up @@ -91,6 +91,7 @@ class SyncEventIdAllocation {
SyncOperation *FindWidenSync(const SyncOperation *setSync,
const SyncOperation *waitSync);
void ClearEventId(const SyncOperation *sync);
bool scopePairHasLoopCarriedSync(int scopePair) const;

SmallVector<int>
GetAvailableEventId(SyncOperation *sync,
Expand Down
30 changes: 28 additions & 2 deletions lib/PTO/Transforms/InsertSync/SyncEventIdAllocation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ SmallVector<bool> SyncEventIdAllocation::GetEventPool(const SyncOperation *sync,
return eventIdPool;
}

int SyncEventIdAllocation::ScopePair(const SyncOperation *s) {
int SyncEventIdAllocation::ScopePair(const SyncOperation *s) const {
if (s->GetType() == SyncOperation::TYPE::SYNC_BLOCK_SET ||
s->GetType() == SyncOperation::TYPE::SYNC_BLOCK_WAIT) {
return 0;
Expand Down Expand Up @@ -480,11 +480,37 @@ void SyncEventIdAllocation::WidenEventId(SyncOps syncVector) {
bool canWiden = TryWidenByOtherSync(sync);
if (!canWiden) {
int scopePair = ScopePair(sync);
reallocatedPipePair.insert(scopePair);
// Loop-carried syncs need a fully initialized head/tail schedule.
// Reallocating an entire scope that already contains back-edge pairs can
// rewrite those safe preheat/drain edges into mismatched waits.
if (!scopePairHasLoopCarriedSync(scopePair))
reallocatedPipePair.insert(scopePair);
Comment on lines +486 to +487
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reallocate non-loop syncs even when scope has loop-carried deps

This guard skips reallocatedPipePair for the entire (src,dst) scope as soon as any loop-carried sync exists in that scope, so unrelated non-loop SET_EVENT syncs in the same pipe pair can no longer use the reallocation path when they run out of IDs. In that case they fall through to ChangeNoEventIdSyncToPipeAll and are downgraded to PIPE_ALL, which serializes execution globally and can cause major performance regressions for mixed loop/non-loop kernels that previously stayed on event-based sync.

Useful? React with 👍 / 👎.

}
}
}
}

bool SyncEventIdAllocation::scopePairHasLoopCarriedSync(int scopePair) const {
for (auto &element : syncIR_) {
for (auto *sync : element->pipeBefore) {
if (!sync || sync->uselessSync)
continue;
if (!sync->GetForEndIndex().has_value())
continue;
if (ScopePair(sync) == scopePair)
return true;
}
for (auto *sync : element->pipeAfter) {
if (!sync || sync->uselessSync)
continue;
if (!sync->GetForEndIndex().has_value())
continue;
if (ScopePair(sync) == scopePair)
return true;
}
}
return false;
}
Comment on lines +493 to +513
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The scopePairHasLoopCarriedSync function iterates over the entire syncIR_ structure, resulting in $O(N)$ complexity. Since this function is called within WidenEventId, which is itself invoked in a loop over syncIR_ in Allocate, the overall complexity of the allocation pass becomes $O(N^2)$. For large kernels with many operations, this could lead to significant compilation overhead.

Consider pre-calculating the set of scope pairs that contain loop-carried syncs once at the beginning of the Allocate method (after the initial allocation step) and storing it in a local set or a member variable to allow $O(1)$ lookups during the widening phase.


void SyncEventIdAllocation::clearAllocatedEventId() {
// Remove generated BackwardSync
Expand Down
Loading
Loading