[codex] fix Cube kernel hang in generated C++#443
[codex] fix Cube kernel hang in generated C++#443zhangstevenunity wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request prevents the reallocation of scope pairs containing loop-carried syncs during the event ID widening phase, ensuring that necessary head/tail schedules for loops remain intact. It also adds a regression test for issue #428. Feedback suggests optimizing the complexity of the loop-carried sync check by pre-calculating the affected scope pairs to avoid potential performance overhead in large kernels.
| bool SyncEventIdAllocation::scopePairHasLoopCarriedSync(int scopePair) const { | ||
| for (auto &element : syncIR_) { | ||
| for (auto *sync : element->pipeBefore) { | ||
| if (!sync || sync->uselessSync) | ||
| continue; | ||
| if (!sync->GetForEndIndex().has_value()) | ||
| continue; | ||
| if (ScopePair(sync) == scopePair) | ||
| return true; | ||
| } | ||
| for (auto *sync : element->pipeAfter) { | ||
| if (!sync || sync->uselessSync) | ||
| continue; | ||
| if (!sync->GetForEndIndex().has_value()) | ||
| continue; | ||
| if (ScopePair(sync) == scopePair) | ||
| return true; | ||
| } | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
The scopePairHasLoopCarriedSync function iterates over the entire syncIR_ structure, resulting in WidenEventId, which is itself invoked in a loop over syncIR_ in Allocate, the overall complexity of the allocation pass becomes
Consider pre-calculating the set of scope pairs that contain loop-carried syncs once at the beginning of the Allocate method (after the initial allocation step) and storing it in a local set or a member variable to allow
|
/run a5 test/basic/issue428_cube_sync_regression.pto |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #443issue428_cube_sync_regression
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7198be31a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!scopePairHasLoopCarriedSync(scopePair)) | ||
| reallocatedPipePair.insert(scopePair); |
There was a problem hiding this comment.
Reallocate non-loop syncs even when scope has loop-carried deps
This guard skips reallocatedPipePair for the entire (src,dst) scope as soon as any loop-carried sync exists in that scope, so unrelated non-loop SET_EVENT syncs in the same pipe pair can no longer use the reallocation path when they run out of IDs. In that case they fall through to ChangeNoEventIdSyncToPipeAll and are downgraded to PIPE_ALL, which serializes execution globally and can cause major performance regressions for mixed loop/non-loop kernels that previously stayed on event-based sync.
Useful? React with 👍 / 👎.
|
/run a3 test/basic/issue428_cube_sync_regression.pto |
Summary
Root cause
WidenEventIdwas reallocating whole scope pairs even when they contained loop-carried/back-edge sync operations. That rewrite could drop the pre-loopset_flag/ tail drain pairing forPIPE_M -> PIPE_MTE1, which left the generated Cube kernel with loop-head waits that were never primed.Validation
ptoaswith-DPTO_ENABLE_PYTHON_BINDING=OFF -DBUILD_TESTING=OFFtest/basic/issue428_cube_sync_regression.ptotest/basic/insert_sync_level3_enable.ptotest/basic/tinsert_a3_pipe_selection.ptotest/basic/tinsert_a5_pipe_selection.ptotest/basic/tmov_acc_mat_pipe_selection.ptotest/basic/tpush_tpop_frontend_lowering_a3.ptowith--check-prefix=SYNC-A3Closes #428