Skip to content

[A5][Sync] remove identity tmov before insert-sync#420

Open
TaoTao-real wants to merge 16 commits intohw-native-sys:mainfrom
TaoTao-real:codex/a5-identity-tmov-cleanup-zsu
Open

[A5][Sync] remove identity tmov before insert-sync#420
TaoTao-real wants to merge 16 commits intohw-native-sys:mainfrom
TaoTao-real:codex/a5-identity-tmov-cleanup-zsu

Conversation

@TaoTao-real
Copy link
Copy Markdown
Contributor

Summary

  • Add an A5-only cleanup pass (pto-remove-identity-tmov) that erases pto.tmov ins(%x) outs(%x) before auto-sync runs.
  • Wire the pass in ptoas directly before PTOInsertSync when --enable-insert-sync is enabled.
  • Add regression test test/basic/identity_tmov_autosync_a5_only.pto.

Motivation

  • Fixes the A5 hang risk caused by identity tmov being treated as a real producer/consumer by auto-sync, which can create spurious sync edges around a hardware no-op move.

Design

  • New pass: PTORemoveIdentityTMovPass (func pass).
  • Gated by module attribute pto.target_arch == "a5".
  • Removes only must-prove identity tmov (src == dst SSA value).
  • If optional result is used and type-compatible with dst, rewires uses to dst before erase.
  • Pipeline placement: PTOResolveReservedBuffers -> PTORemoveIdentityTMov -> PTOInsertSync.

Testing

  • Built: ninja -C /Users/lishengtao/Documents/PTO/_codex_worktrees/ptoas_identity_tmov_a5/build ptoas
  • New targeted checks:
    • ptoas --pto-arch=a5 --enable-insert-sync test/basic/identity_tmov_autosync_a5_only.pto | FileCheck ... --check-prefix=A5
    • ptoas --pto-arch=a3 --enable-insert-sync test/basic/identity_tmov_autosync_a5_only.pto | FileCheck ... --check-prefix=A3
  • Extra guard:
    • ptoas --pto-arch=a5 --enable-insert-sync test/basic/tmov_acc_mat_pipe_selection.pto | FileCheck ...

Risk / Rollback

  • Risk is low: behavior change is strictly A5-scoped and only for syntactic identity moves.
  • Rollback is straightforward: revert this PR.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new optimization pass, PTORemoveIdentityTMovPass, which removes identity pto.tmov operations where the source and destination are the same SSA value. This pass is specifically gated for the A5 architecture and is integrated into the ptoas tool to run before pto-insert-sync to prevent unnecessary synchronization edges. The review feedback suggests optimizing the pass implementation by performing the erasure directly within the walk callback, eliminating the need for an intermediate SmallVector and a secondary loop.

Comment on lines +56 to +67
SmallVector<TMovOp> toErase;
funcOp.walk([&](TMovOp op) {
if (canEraseIdentityTMov(op))
toErase.push_back(op);
});

for (TMovOp op : toErase) {
Value result = op.getResult();
if (result && !result.use_empty())
result.replaceAllUsesWith(op.getDst());
op.erase();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation first collects all TMovOps to be erased into a SmallVector and then iterates over this vector to perform the erasure. This two-step process can be simplified and made more efficient. You can perform the erasure directly within the walk callback. Since TMovOp has no regions, it's safe to erase it during the walk, which avoids the need for intermediate storage and a second loop.

    funcOp.walk([&](TMovOp op) {
      if (canEraseIdentityTMov(op)) {
        Value result = op.getResult();
        if (result && !result.use_empty()) {
          result.replaceAllUsesWith(op.getDst());
        }
        op.erase();
      }
    });

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 identity_tmov_autosync_a5_only

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:b0feca8ad536
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_103706_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260402_103706_manual_pr420.tsv
  • 手动指令:/run a5 identity_tmov_autosync_a5_only
  • 触发人:TaoTao-real
  • 指定用例:identity_tmov_autosync_a5_only
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@HecreReed
Copy link
Copy Markdown
Collaborator

/run a5 test/basic/identity_tmov_autosync_a5_only.pto

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:b0feca8ad536
  • 结果汇总:OK 1 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_111605_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260402_111605_manual_pr420.tsv
  • 手动指令:/run a5 test/basic/identity_tmov_autosync_a5_only.pto
  • 触发人:HecreReed
  • 指定用例:test/basic/identity_tmov_autosync_a5_only.pto
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 identity_tmov_if_else_alias_a5

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:e5f23477006d
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_162105_manual_pr420.log
  • 手动指令:/run a5 identity_tmov_if_else_alias_a5
  • 触发人:TaoTao-real
  • 指定用例:identity_tmov_if_else_alias_a5
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:emit-basic-pto-cases / exit=1

日志尾部

02_162105_manual_pr420/repo/build/tools/ptoas/ptoas --pto-arch a5 --enable-insert-sync /tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto -o /tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/payload/test/samples/Basic/identity_tmov_if_else_alias_a5-pto.cpp
loc("/tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto":10:13): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto":12:13): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto":17:20): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto":22:20): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto":28:20): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto":33:20): error: unexpected 'addr' operand: only supported when --pto-level=level3
loc("/tmp/ptoas-board-monitor-a5/runs/20260402_162105_manual_pr420/repo/test/basic/identity_tmov_if_else_alias_a5.pto":35:20): error: unexpected 'addr' operand: only supported when --pto-level=level3
===== END STAGE emit-basic-pto-cases rc=1 @ 2026-04-02 16:22:42 =====

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 identity_tmov_if_else_alias_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:418b304d1b99
  • 结果汇总:OK 1 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_162606_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260402_162606_manual_pr420.tsv
  • 手动指令:/run a5 identity_tmov_if_else_alias_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:identity_tmov_if_else_alias_a5
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:418b304d1b99
  • 结果汇总:OK 0 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_163306_manual_pr420.log
  • 手动指令:/run a5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_softmax_rescale_incore_1_a5
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • issue828_softmax_rescale_incore_1_a5 (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_softmax_rescale_incore_1_a5

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260402_163306_manual_pr420/npu_validation/Basic/issue828_softmax_rescale_incore_1_a5/main.cpp:124)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1763624] 2026-04-02-16:34:52.028.740 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 112, there is an aivec error exception, core id is 0, error code = 340, dump info: pc start: 0x100040800000, current: 0x1000408002e4, sc error info: 0xffffffffffff, su error info: 0xe7ffd23d1fdc0017,0x4240141410009bfd, mte error info: 0xfdd7e7ce0007fffb, vec error info: 0x4080063c0031009d, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(340) errorStr: The instruction access UB address is not aligned. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z24softmax_rescale_incore_1PfS_S_S_S_S_i, fault kernel info ext=_Z24softmax_rescale_incore_1PfS_S_S_S_S_i, program id=0, hash=17691714609164894166.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-04-02 16:35:24] ERROR: testcase failed (exit 1): issue828_softmax_rescale_incore_1_a5
[2026-04-02 16:35:24] === SUMMARY ===
[2026-04-02 16:35:24] OK=0 FAIL=1 SKIP=0
[2026-04-02 16:35:24] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260402_163306_manual_pr420/remote_npu_validation_results.tsv

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:418b304d1b99
  • 结果汇总:OK 0 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_163706_manual_pr420.log
  • 手动指令:/run a5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_softmax_rescale_incore_1_a5
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • issue828_softmax_rescale_incore_1_a5 (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_softmax_rescale_incore_1_a5

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260402_163706_manual_pr420/npu_validation/Basic/issue828_softmax_rescale_incore_1_a5/main.cpp:124)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1766972] 2026-04-02-16:38:54.193.452 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 113, there is an aivec error exception, core id is 0, error code = 340, dump info: pc start: 0x100040800000, current: 0x1000408002e4, sc error info: 0xffffffffffff, su error info: 0xe7ffd23d1fdc0017,0x4240141410009bfd, mte error info: 0xfdd7e7ce0007fffb, vec error info: 0x4080063c0031009d, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(340) errorStr: The instruction access UB address is not aligned. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z24softmax_rescale_incore_1PfS_S_S_S_S_i, fault kernel info ext=_Z24softmax_rescale_incore_1PfS_S_S_S_S_i, program id=0, hash=17691714609164894166.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-04-02 16:39:24] ERROR: testcase failed (exit 1): issue828_softmax_rescale_incore_1_a5
[2026-04-02 16:39:24] === SUMMARY ===
[2026-04-02 16:39:24] OK=0 FAIL=1 SKIP=0
[2026-04-02 16:39:24] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260402_163706_manual_pr420/remote_npu_validation_results.tsv

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 identity_tmov_if_else_alias_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:418b304d1b99
  • 结果汇总:OK 1 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_164108_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260402_164108_manual_pr420.tsv
  • 手动指令:/run a5 identity_tmov_if_else_alias_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:identity_tmov_if_else_alias_a5
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:0ecc165d131b
  • 结果汇总:OK 1 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_165005_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260402_165005_manual_pr420.tsv
  • 手动指令:/run a5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_softmax_rescale_incore_1_a5
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_softmax_rescale_incore_1_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_diag_else_3tmov_only_a5

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260402_203806_manual_pr420/npu_validation/Basic/issue828_diag_else_3tmov_only_a5/main.cpp:123)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1824142] 2026-04-02-20:39:52.156.635 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 121, there is an aivec error exception, core id is 0, error code = 340, dump info: pc start: 0x100040800000, current: 0x100040800218, sc error info: 0xffffffffffff, su error info: 0xe7ffd23d1fdc0017,0x4240141410009bfd, mte error info: 0xfdd7e7ce0007fffb, vec error info: 0x4080031000310046, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(340) errorStr: The instruction access UB address is not aligned. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z32issue828_diag_else_3tmov_only_a5PfS_S_S_S_S_, fault kernel info ext=_Z32issue828_diag_else_3tmov_only_a5PfS_S_S_S_S_, program id=0, hash=17014948628165233438.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-04-02 20:40:27] ERROR: testcase failed (exit 1): issue828_diag_else_3tmov_only_a5
[2026-04-02 20:40:27] === SUMMARY ===
[2026-04-02 20:40:27] OK=0 FAIL=1 SKIP=0
[2026-04-02 20:40:27] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260402_203806_manual_pr420/remote_npu_validation_results.tsv

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:bf38ccc2717a
  • 结果汇总:OK 1 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_204107_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260402_204107_manual_pr420.tsv
  • 手动指令:/run a5 issue828_diag_else_no_tmov_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_no_tmov_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_diag_if_identity_only_a5 --pto-level=level3 --disable-identity-tmov-cleanup

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:bf38ccc2717a
  • 结果汇总:OK 0 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_204905_manual_pr420.log
  • 手动指令:/run a5 issue828_diag_if_identity_only_a5 --pto-level=level3 --disable-identity-tmov-cleanup
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_if_identity_only_a5
  • PTOAS 参数:--pto-level=level3 --disable-identity-tmov-cleanup
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • issue828_diag_if_identity_only_a5 (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_diag_if_identity_only_a5

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260402_204905_manual_pr420/npu_validation/Basic/issue828_diag_if_identity_only_a5/main.cpp:123)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1831229] 2026-04-02-20:50:53.107.521 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 122, there is an aivec error exception, core id is 0, error code = 340, dump info: pc start: 0x100040800000, current: 0x1000408001ac, sc error info: 0xffffffffffff, su error info: 0xe7ffd23d1fdc0017,0x4240141410009bfd, mte error info: 0xfdd7e7ce0007fffb, vec error info: 0x408003100031004f, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(340) errorStr: The instruction access UB address is not aligned. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z33issue828_diag_if_identity_only_a5PfS_S_S_S_S_, fault kernel info ext=_Z33issue828_diag_if_identity_only_a5PfS_S_S_S_S_, program id=0, hash=8092653859669450896.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-04-02 20:51:24] ERROR: testcase failed (exit 1): issue828_diag_if_identity_only_a5
[2026-04-02 20:51:24] === SUMMARY ===
[2026-04-02 20:51:24] OK=0 FAIL=1 SKIP=0
[2026-04-02 20:51:24] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260402_204905_manual_pr420/remote_npu_validation_results.tsv

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run a5 issue828_softmax_rescale_incore_1_a5_if_aligned issue828_softmax_rescale_incore_1_a5_else_aligned --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:0167b49eae40
  • 结果汇总:OK 1 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260402_210309_manual_pr420.log
  • 手动指令:/run a5 issue828_softmax_rescale_incore_1_a5_if_aligned issue828_softmax_rescale_incore_1_a5_else_aligned --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_softmax_rescale_incore_1_a5_if_aligned,issue828_softmax_rescale_incore_1_a5_else_aligned
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • issue828_softmax_rescale_incore_1_a5_else_aligned (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_softmax_rescale_incore_1_a5_else_aligned

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260402_210309_manual_pr420/npu_validation/Basic/issue828_softmax_rescale_incore_1_a5_else_aligned/main.cpp:124)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1835823] 2026-04-02-21:06:11.133.415 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 123, there is an aivec error exception, core id is 0, error code = 340, dump info: pc start: 0x100040800000, current: 0x100040800318, sc error info: 0xffffffffffff, su error info: 0xe7ffd23d1fdc0017,0x4240141410009bfd, mte error info: 0xfdd7e7ce0007fffb, vec error info: 0x40800644003100aa, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(340) errorStr: The instruction access UB address is not aligned. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z24softmax_rescale_incore_1PfS_S_S_S_S_i, fault kernel info ext=_Z24softmax_rescale_incore_1PfS_S_S_S_S_i, program id=0, hash=13467545225720795163.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-04-02 21:06:46] ERROR: testcase failed (exit 1): issue828_softmax_rescale_incore_1_a5_else_aligned
[2026-04-02 21:06:46] === SUMMARY ===
[2026-04-02 21:06:46] OK=1 FAIL=1 SKIP=0
[2026-04-02 21:06:46] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260402_210309_manual_pr420/remote_npu_validation_results.tsv

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_diag_else_trace_print_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:9509eb37876e
  • 结果汇总:OK 0 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260403_090106_manual_pr420.log
  • 手动指令:/run a5 issue828_diag_else_trace_print_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_trace_print_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • issue828_diag_else_trace_print_a5 (run, exit=2)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_diag_else_trace_print_a5

stage=run info=exit=2

/usr/local/Ascend/cann-9.0.0/include/pto/common/pto_instr.hpp:320:5: error: use of undeclared identifier 'TPRINT_IMPL'
    MAP_INSTR_IMPL(TPRINT, src);
    ^
/usr/local/Ascend/cann-9.0.0/include/pto/common/pto_instr.hpp:20:34: note: expanded from macro 'MAP_INSTR_IMPL'
#define MAP_INSTR_IMPL(API, ...) API##_IMPL(__VA_ARGS__)
                                 ^
<scratch space>:79:1: note: expanded from here
TPRINT_IMPL
^
/tmp/ptoas-board-monitor-a5/runs/20260403_090106_manual_pr420/npu_validation/Basic/issue828_diag_else_trace_print_a5/issue828_diag_else_trace_print_a5_kernel.cpp:161:5: note: in instantiation of function template specialization 'pto::TPRINT<pto::Tile<pto::TileType::Vec, float, 1, 16, pto::BLayout::RowMajor, 1, 16, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
    TPRINT(v54);
    ^
1 error generated.
gmake[2]: *** [CMakeFiles/issue828_diag_else_trace_print_a5_kernel.dir/build.make:76: CMakeFiles/issue828_diag_else_trace_print_a5_kernel.dir/issue828_diag_else_trace_print_a5_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/issue828_diag_else_trace_print_a5_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-04-03 09:02:48] ERROR: testcase failed (exit 2): issue828_diag_else_trace_print_a5
[2026-04-03 09:02:48] === SUMMARY ===
[2026-04-03 09:02:48] OK=0 FAIL=1 SKIP=0
[2026-04-03 09:02:48] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260403_090106_manual_pr420/remote_npu_validation_results.tsv

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_diag_else_trace_print_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:573426ac91e8
  • 结果汇总:OK 1 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260403_094405_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260403_094405_manual_pr420.tsv
  • 手动指令:/run a5 issue828_diag_else_trace_print_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_trace_print_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run A5 issue828_diag_else_3tmov_only_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:95222f7bae9f
  • 结果汇总:OK 0 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260403_110706_manual_pr420.log
  • 手动指令:/run a5 issue828_diag_else_3tmov_only_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_3tmov_only_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • issue828_diag_else_3tmov_only_a5 (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_diag_else_3tmov_only_a5

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260403_110706_manual_pr420/npu_validation/Basic/issue828_diag_else_3tmov_only_a5/main.cpp:123)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1888219] 2026-04-03-11:08:52.983.270 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 126, there is an aivec error exception, core id is 0, error code = 340, dump info: pc start: 0x100040800000, current: 0x100040800134, sc error info: 0xffffffffffff, su error info: 0xe7ffd23d1fdc0017,0x4240141410009bfd, mte error info: 0xfdd7e7ce0007fffb, vec error info: 0x4080031000310047, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(340) errorStr: The instruction access UB address is not aligned. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z32issue828_diag_else_3tmov_only_a5PfS_S_S_S_S_, fault kernel info ext=_Z32issue828_diag_else_3tmov_only_a5PfS_S_S_S_S_, program id=0, hash=4910107971283643994.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-04-03 11:09:28] ERROR: testcase failed (exit 1): issue828_diag_else_3tmov_only_a5
[2026-04-03 11:09:28] === SUMMARY ===
[2026-04-03 11:09:28] OK=0 FAIL=1 SKIP=0
[2026-04-03 11:09:28] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260403_110706_manual_pr420/remote_npu_validation_results.tsv

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run a5 issue828_diag_else_3tmov_only_a5 issue828_diag_else_3tmov_vrow1_a5 issue828_diag_else_3tmov_rowmajor_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:95222f7bae9f
  • 结果汇总:OK 0 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260403_154106_manual_pr420.log
  • 手动指令:/run a5 issue828_diag_else_3tmov_only_a5 issue828_diag_else_3tmov_vrow1_a5 issue828_diag_else_3tmov_rowmajor_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_3tmov_only_a5,issue828_diag_else_3tmov_vrow1_a5,issue828_diag_else_3tmov_rowmajor_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • issue828_diag_else_3tmov_only_a5 (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #420

issue828_diag_else_3tmov_only_a5

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260403_154106_manual_pr420/npu_validation/Basic/issue828_diag_else_3tmov_only_a5/main.cpp:123)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1917450] 2026-04-03-15:42:53.051.176 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 129, there is an aivec error exception, core id is 0, error code = 340, dump info: pc start: 0x100040800000, current: 0x100040800134, sc error info: 0xffffffffffff, su error info: 0xe7ffd23d1fdc0017,0x4240141410009bfd, mte error info: 0xfdd7e7ce0007fffb, vec error info: 0x4080031000310047, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(340) errorStr: The instruction access UB address is not aligned. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=1, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z32issue828_diag_else_3tmov_only_a5PfS_S_S_S_S_, fault kernel info ext=_Z32issue828_diag_else_3tmov_only_a5PfS_S_S_S_S_, program id=0, hash=4910107971283643994.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-04-03 15:43:26] ERROR: testcase failed (exit 1): issue828_diag_else_3tmov_only_a5
[2026-04-03 15:43:26] === SUMMARY ===
[2026-04-03 15:43:26] OK=0 FAIL=1 SKIP=0
[2026-04-03 15:43:26] RESULTS_TSV=/tmp/ptoas-board-monitor-a5/runs/20260403_154106_manual_pr420/remote_npu_validation_results.tsv

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run a5 issue828_diag_else_3tmov_vrow1_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:95222f7bae9f
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260403_154807_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260403_154807_manual_pr420.tsv
  • 手动指令:/run a5 issue828_diag_else_3tmov_vrow1_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_3tmov_vrow1_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run a5 issue828_diag_else_3tmov_rowmajor_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:95222f7bae9f
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260403_155317_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260403_155317_manual_pr420.tsv
  • 手动指令:/run a5 issue828_diag_else_3tmov_rowmajor_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_3tmov_rowmajor_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

@TaoTao-real
Copy link
Copy Markdown
Contributor Author

/run a5 issue828_diag_else_3tmov_vrow1_a5 --pto-level=level3

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:0c703f5ebcb0
  • 结果汇总:OK 1 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260403_160214_manual_pr420.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260403_160214_manual_pr420.tsv
  • 手动指令:/run a5 issue828_diag_else_3tmov_vrow1_a5 --pto-level=level3
  • 触发人:TaoTao-real
  • 指定用例:issue828_diag_else_3tmov_vrow1_a5
  • PTOAS 参数:--pto-level=level3
  • 触发评论:[A5][Sync] remove identity tmov before insert-sync #420 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants