[A5][Sync] remove identity tmov before insert-sync#420
[A5][Sync] remove identity tmov before insert-sync#420TaoTao-real wants to merge 16 commits intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new optimization pass, PTORemoveIdentityTMovPass, which removes identity pto.tmov operations where the source and destination are the same SSA value. This pass is specifically gated for the A5 architecture and is integrated into the ptoas tool to run before pto-insert-sync to prevent unnecessary synchronization edges. The review feedback suggests optimizing the pass implementation by performing the erasure directly within the walk callback, eliminating the need for an intermediate SmallVector and a secondary loop.
| SmallVector<TMovOp> toErase; | ||
| funcOp.walk([&](TMovOp op) { | ||
| if (canEraseIdentityTMov(op)) | ||
| toErase.push_back(op); | ||
| }); | ||
|
|
||
| for (TMovOp op : toErase) { | ||
| Value result = op.getResult(); | ||
| if (result && !result.use_empty()) | ||
| result.replaceAllUsesWith(op.getDst()); | ||
| op.erase(); | ||
| } |
There was a problem hiding this comment.
The current implementation first collects all TMovOps to be erased into a SmallVector and then iterates over this vector to perform the erasure. This two-step process can be simplified and made more efficient. You can perform the erasure directly within the walk callback. Since TMovOp has no regions, it's safe to erase it during the walk, which avoids the need for intermediate storage and a second loop.
funcOp.walk([&](TMovOp op) {
if (canEraseIdentityTMov(op)) {
Value result = op.getResult();
if (result && !result.use_empty()) {
result.replaceAllUsesWith(op.getDst());
}
op.erase();
}
});|
/run A5 identity_tmov_autosync_a5_only |
A5 板测成功
|
|
/run a5 test/basic/identity_tmov_autosync_a5_only.pto |
A5 板测成功
|
|
/run A5 identity_tmov_if_else_alias_a5 |
A5 板测失败
日志尾部 |
|
/run A5 identity_tmov_if_else_alias_a5 --pto-level=level3 |
A5 板测成功
|
|
/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #420issue828_softmax_rescale_incore_1_a5
|
|
/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #420issue828_softmax_rescale_incore_1_a5
|
|
/run A5 identity_tmov_if_else_alias_a5 --pto-level=level3 |
A5 板测成功
|
|
/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3 |
A5 板测成功
|
|
/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3 |
A5 板测失败详情:PR #420issue828_diag_else_3tmov_only_a5
|
A5 板测成功
|
|
/run A5 issue828_diag_if_identity_only_a5 --pto-level=level3 --disable-identity-tmov-cleanup |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #420issue828_diag_if_identity_only_a5
|
|
/run a5 issue828_softmax_rescale_incore_1_a5_if_aligned issue828_softmax_rescale_incore_1_a5_else_aligned --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #420issue828_softmax_rescale_incore_1_a5_else_aligned
|
|
/run A5 issue828_diag_else_trace_print_a5 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #420issue828_diag_else_trace_print_a5
|
|
/run A5 issue828_diag_else_trace_print_a5 --pto-level=level3 |
A5 板测成功
|
|
/run A5 issue828_diag_else_3tmov_only_a5 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #420issue828_diag_else_3tmov_only_a5
|
|
/run a5 issue828_diag_else_3tmov_only_a5 issue828_diag_else_3tmov_vrow1_a5 issue828_diag_else_3tmov_rowmajor_a5 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #420issue828_diag_else_3tmov_only_a5
|
|
/run a5 issue828_diag_else_3tmov_vrow1_a5 --pto-level=level3 |
A5 板测成功
|
|
/run a5 issue828_diag_else_3tmov_rowmajor_a5 --pto-level=level3 |
A5 板测成功
|
|
/run a5 issue828_diag_else_3tmov_vrow1_a5 --pto-level=level3 |
A5 板测成功
|
Summary
pto-remove-identity-tmov) that erasespto.tmov ins(%x) outs(%x)before auto-sync runs.ptoasdirectly beforePTOInsertSyncwhen--enable-insert-syncis enabled.test/basic/identity_tmov_autosync_a5_only.pto.Motivation
tmovbeing treated as a real producer/consumer by auto-sync, which can create spurious sync edges around a hardware no-op move.Design
PTORemoveIdentityTMovPass(func pass).pto.target_arch == "a5".tmov(src == dstSSA value).dst, rewires uses todstbefore erase.PTOResolveReservedBuffers -> PTORemoveIdentityTMov -> PTOInsertSync.Testing
ninja -C /Users/lishengtao/Documents/PTO/_codex_worktrees/ptoas_identity_tmov_a5/build ptoasptoas --pto-arch=a5 --enable-insert-sync test/basic/identity_tmov_autosync_a5_only.pto | FileCheck ... --check-prefix=A5ptoas --pto-arch=a3 --enable-insert-sync test/basic/identity_tmov_autosync_a5_only.pto | FileCheck ... --check-prefix=A3ptoas --pto-arch=a5 --enable-insert-sync test/basic/tmov_acc_mat_pipe_selection.pto | FileCheck ...Risk / Rollback