[codex] fix Cube kernel hang in generated C++ by zhangstevenunity · Pull Request #443 · hw-native-sys/PTOAS

zhangstevenunity · 2026-04-04T05:13:00Z

Summary

preserve the original event-id schedule for scope pairs that contain loop-carried syncs
add a regression test for the Cube kernel from issue [Bug] generated C++ Cube kernel hangs with version >= 0.18 #428
keep the safe preheat and drain waits that prevent the generated C++ kernel from hanging

Root cause

WidenEventId was reallocating whole scope pairs even when they contained loop-carried/back-edge sync operations. That rewrite could drop the pre-loop set_flag / tail drain pairing for PIPE_M -> PIPE_MTE1, which left the generated Cube kernel with loop-head waits that were never primed.

Validation

build ptoas with -DPTO_ENABLE_PYTHON_BINDING=OFF -DBUILD_TESTING=OFF
run FileCheck on test/basic/issue428_cube_sync_regression.pto
run FileCheck on test/basic/insert_sync_level3_enable.pto
run FileCheck on test/basic/tinsert_a3_pipe_selection.pto
run FileCheck on test/basic/tinsert_a5_pipe_selection.pto
run FileCheck on test/basic/tmov_acc_mat_pipe_selection.pto
run FileCheck on test/basic/tpush_tpop_frontend_lowering_a3.pto with --check-prefix=SYNC-A3

Closes #428

gemini-code-assist

Code Review

This pull request prevents the reallocation of scope pairs containing loop-carried syncs during the event ID widening phase, ensuring that necessary head/tail schedules for loops remain intact. It also adds a regression test for issue #428. Feedback suggests optimizing the complexity of the loop-carried sync check by pre-calculating the affected scope pairs to avoid potential performance overhead in large kernels.

gemini-code-assist · 2026-04-04T05:14:09Z

lib/PTO/Transforms/InsertSync/SyncEventIdAllocation.cpp

+bool SyncEventIdAllocation::scopePairHasLoopCarriedSync(int scopePair) const {
+  for (auto &element : syncIR_) {
+    for (auto *sync : element->pipeBefore) {
+      if (!sync || sync->uselessSync)
+        continue;
+      if (!sync->GetForEndIndex().has_value())
+        continue;
+      if (ScopePair(sync) == scopePair)
+        return true;
+    }
+    for (auto *sync : element->pipeAfter) {
+      if (!sync || sync->uselessSync)
+        continue;
+      if (!sync->GetForEndIndex().has_value())
+        continue;
+      if (ScopePair(sync) == scopePair)
+        return true;
+    }
+  }
+  return false;
+}


The scopePairHasLoopCarriedSync function iterates over the entire syncIR_ structure, resulting in $O(N)$ complexity. Since this function is called within WidenEventId, which is itself invoked in a loop over syncIR_ in Allocate, the overall complexity of the allocation pass becomes $O(N^2)$. For large kernels with many operations, this could lead to significant compilation overhead.

Consider pre-calculating the set of scope pairs that contain loop-carried syncs once at the beginning of the Allocate method (after the initial allocation step) and storing it in a local set or a member variable to allow $O(1)$ lookups during the widening phase.

zhangstevenunity · 2026-04-04T12:27:46Z

/run a5 test/basic/issue428_cube_sync_regression.pto

reedhecre · 2026-04-04T12:30:04Z

A5 板测失败

触发方式：manual
源码提交：c48bd1a067eb
结果汇总：OK 0 / FAIL 1 / SKIP 0
日志：/root/ptoas-board-monitor-a5/logs/20260404_202813_manual_pr443.log
手动指令：/run a5 test/basic/issue428_cube_sync_regression.pto
触发人：zhangstevenunity
指定用例：test/basic/issue428_cube_sync_regression.pto
触发评论：[codex] fix Cube kernel hang in generated C++ #443 (comment)
失败阶段：board-validation / exit=1

失败用例

issue428_cube_sync_regression (run, exit=2)

reedhecre · 2026-04-04T12:30:06Z

A5 板测失败详情：PR #443

issue428_cube_sync_regression

stage=run info=exit=2

/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:145:12: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_MTE1, EVENT_ID2);
           ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:145:20: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_MTE1, EVENT_ID2);
                   ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:146:12: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_MTE1, EVENT_ID3);
           ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:146:20: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_MTE1, EVENT_ID3);
                   ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:147:12: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  set_flag(PIPE_FIX, PIPE_M, EVENT_ID2);
           ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:147:22: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_FIX, PIPE_M, EVENT_ID2);
                     ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:148:12: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  set_flag(PIPE_FIX, PIPE_M, EVENT_ID6);
           ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:148:22: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_FIX, PIPE_M, EVENT_ID6);
                     ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:150:23: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_MTE2, PIPE_MTE1, EVENT_ID0);
                      ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:151:24: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  wait_flag(PIPE_MTE2, PIPE_MTE1, EVENT_ID0);
                       ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:154:12: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  set_flag(PIPE_MTE1, PIPE_M, EVENT_ID0);
           ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:154:23: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_MTE1, PIPE_M, EVENT_ID0);
                      ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:155:13: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  wait_flag(PIPE_MTE1, PIPE_M, EVENT_ID0);
            ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:155:24: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  wait_flag(PIPE_MTE1, PIPE_M, EVENT_ID0);
                       ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:157:12: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_FIX, EVENT_ID0);
           ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:157:20: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_FIX, EVENT_ID0);
                   ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:158:12: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_MTE1, EVENT_ID0);
           ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:158:20: error: the ranges of 2nd parameter must be [0, 1], [4, 5]
  set_flag(PIPE_M, PIPE_MTE1, EVENT_ID0);
                   ^
/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/npu_validation/basic/issue428_cube_sync_regression/issue428_cube_sync_regression_kernel.cpp:159:13: error: the ranges of 1st parameter must be [0, 1], [4, 5]
  wait_flag(PIPE_M, PIPE_FIX, EVENT_ID0);
            ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
gmake[2]: *** [CMakeFiles/issue428_cube_sync_regression_kernel.dir/build.make:76: CMakeFiles/issue428_cube_sync_regression_kernel.dir/issue428_cube_sync_regression_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/issue428_cube_sync_regression_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-04-04 20:30:00] ERROR: testcase failed (exit 2): issue428_cube_sync_regression
[2026-04-04 20:30:00] === SUMMARY ===
[2026-04-04 20:30:00] OK=0 FAIL=1 SKIP=0
[2026-04-04 20:30:00] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260404_202813_manual_pr443/remote_npu_validation_results.tsv

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7198be31a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-04T12:31:25Z

lib/PTO/Transforms/InsertSync/SyncEventIdAllocation.cpp

+        if (!scopePairHasLoopCarriedSync(scopePair))
+          reallocatedPipePair.insert(scopePair);


Reallocate non-loop syncs even when scope has loop-carried deps

This guard skips reallocatedPipePair for the entire (src,dst) scope as soon as any loop-carried sync exists in that scope, so unrelated non-loop SET_EVENT syncs in the same pipe pair can no longer use the reallocation path when they run out of IDs. In that case they fall through to ChangeNoEventIdSyncToPipeAll and are downgraded to PIPE_ALL, which serializes execution globally and can cause major performance regressions for mixed loop/non-loop kernels that previously stayed on event-based sync.

Useful? React with 👍 / 👎.

zhangstevenunity · 2026-04-05T02:48:08Z

/run a3 test/basic/issue428_cube_sync_regression.pto

fix(sync): preserve loop-carried event schedule

7198be3

gemini-code-assist bot reviewed Apr 4, 2026

View reviewed changes

zhangstevenunity marked this pull request as ready for review April 4, 2026 12:27

chatgpt-codex-connector bot reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] fix Cube kernel hang in generated C++#443

[codex] fix Cube kernel hang in generated C++#443
zhangstevenunity wants to merge 1 commit intomainfrom
codex/fix-issue-428

zhangstevenunity commented Apr 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

zhangstevenunity commented Apr 4, 2026

Uh oh!

reedhecre commented Apr 4, 2026

Uh oh!

reedhecre commented Apr 4, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 4, 2026

Uh oh!

zhangstevenunity commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (!scopePairHasLoopCarriedSync(scopePair))
		reallocatedPipePair.insert(scopePair);

Conversation

zhangstevenunity commented Apr 4, 2026

Summary

Root cause

Validation

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

zhangstevenunity commented Apr 4, 2026

Uh oh!

reedhecre commented Apr 4, 2026

A5 板测失败

失败用例

Uh oh!

reedhecre commented Apr 4, 2026

A5 板测失败详情：PR #443

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

zhangstevenunity commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants