Skip to content

feat: add PTO op coverage for remaining math/expand ops#408

Merged
zhangstevenunity merged 5 commits intohw-native-sys:mainfrom
HecreReed:feature/add-pto-op-coverage
Apr 3, 2026
Merged

feat: add PTO op coverage for remaining math/expand ops#408
zhangstevenunity merged 5 commits intohw-native-sys:mainfrom
HecreReed:feature/add-pto-op-coverage

Conversation

@HecreReed
Copy link
Copy Markdown
Collaborator

Summary

  • add IR defs, verifiers, memory effects, and EmitC lowering for tfmod/tfmods, tcolprod/trowprod, and the new col/row expand ops
  • add positive and negative Python samples for the new ops and extend the runop row-reduction guard for rowprod
  • validate the new sample directories with runop.sh --enablebc on both default arch and --pto-arch a5

Validation

  • cmake --build build --parallel 8 --target install
  • PTOAS_BIN=$PWD/build/tools/ptoas/ptoas PTOBC_BIN=$PWD/build/tools/ptobc/ptobc PYTHON_BIN=/Users/laoda/miniforge3/envs/ptoas-py39/bin/python PYTHONPATH=$PWD/install:/Users/laoda/llvm-workspace/llvm-project/llvm/build-shared/tools/mlir/python_packages/mlir_core bash test/samples/runop.sh --enablebc -t Fmod
  • same runop.sh --enablebc coverage for Fmods Colprod Colexpandadd Colexpandexpdif Rowprod Rowexpandmax Rowexpandmin Rowexpandexpdif
  • repeat the same sample coverage with PTOAS_FLAGS="--pto-arch a5"

@reedhecre
Copy link
Copy Markdown

reedhecre commented Apr 1, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: feat: add PTO op coverage for remaining math/expand ops #408 feat: add PTO op coverage for remaining math/expand ops
  • Author: HecreReed
  • Base/Head: main / feature/add-pto-op-coverage
  • Head SHA: f8f3b083b30f
  • Trigger: PR 有新提交
  • Generated At: 2026-04-01T02:59:48Z
  • Previous Head SHA: 2a7d6f111d19
  • Status: completed

Summary

发现 3 个问题:新 row-expand verifier 放宽了操作数/shape 合约,A2/A3 的 tmp 版 row-expand 新 op 不能做 PTO-BC v0 编码,且 trowprod 仍缺少默认的 ptobc v0 覆盖。

Findings

  1. P2 新 row-expand 共享 verifier 只看 valid_shape,放过了错误的操作数角色和静态 shape lib/PTO/IR/PTO.cpp:6039

lib/PTO/IR/PTO.cpp 中新的 verifyTRowExpandReduceLikeOp 通过 src0MatchesDst/src1MatchesDst 仅按 valid_shape 选择“完整 tile”操作数,并且没有再对完整输入或 tmp 调用 verifyTileBufSameShapeAndElem/verifyTileBufSameValidShape。结果是两类无效 IR 会被接受:1) 对 trowexpandexpdif 而言,src1 作为完整 tile、src0 作为按行广播向量也能通过,但 lowering 仍固定按 TROWEXPANDEXPDIF(dst, src0, src1) 发射,语义会被反转;2) 完整输入或 tmp 的静态 tile shape 与 dst 不一致时也能通过验证。这样的 IR 会一路进入 codegen,最终要么算错,要么在后端/运行时报错。

  1. P2 trowexpandexpdif/max/min 的 4 操作数 tmp 形式无法编码到 PTO-BC v0 include/PTO/IR/PTOOps.td:3991

这三个新 op 在 IR 合约里都声明了 Optional tmp(见 include/PTO/IR/PTOOps.td),但 tools/ptobc/generated/ptobc_opcodes_v0.h:111-113 仍把它们登记成 operand_mode=0x00、num_operands=3 的固定三操作数 op。tools/ptobc/src/mlir_encode.cpp:429-430 会对 0x00 模式做严格 operand count 检查,所以任何合法的 A2/A3 4 操作数形式都会在 ptobc encode 阶段直接失败。因为名字已经命中 opcode 表,编码器也不会再退回 generic escape。

  1. P3 trowprod 新增后仍没有默认的 ptobc v0 opcode 支持 include/PTO/IR/PTOOps.td:4174

PR 新增了 pto.trowprod,但 tools/ptobc/generated/ptobc_opcodes_v0.h 里没有任何 pto.trowprod 条目。按照 tools/ptobc/src/mlir_encode.cpp:485-486,默认 ptobc encode 会把这种 op 当成“not in v0 opcode table”并报错,只有显式打开 PTOBC_ALLOW_GENERIC=1 才能走 generic escape。runop.sh 恰好开启了这个开关,所以 CI 不会暴露问题,但默认的 BC 兼容性仍然缺失。

@HecreReed HecreReed marked this pull request as ready for review April 1, 2026 02:49
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8f3b083b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +4002 to +4005
PTODpsType:$src0,
PTODpsType:$src1,
Optional<PTODpsType>:$tmp,
PTODpsType:$dst
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Align ptobc arity with optional tmp row-expand ops

These new row-expand ops are defined with an optional tmp operand (so A2/A3 can legally build 4-operand forms), but the v0 ptobc schema still encodes them as fixed 3-operand ops and the encoder enforces exact operand counts. As a result, valid IR such as pto.trowexpandmax/min/expdif with tmp will fail ptobc encode with an operand-count mismatch, breaking the bytecode workflow for supported A2/A3 forms.

Useful? React with 👍 / 👎.

Comment on lines +79 to +81
except subprocess.CalledProcessError:
pass
return git_output("diff-tree", "--no-commit-id", "--name-only", "--diff-filter=ACMR", "-r", head_sha)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve full changed-file set when git diff fails

When git diff base..head fails, the fallback uses git diff-tree -r head, which only reports files from the tip commit. In shallow or partially-fetched checkouts where the base SHA is missing, this silently drops files changed in earlier commits of the push, so the license-header check can pass even though required headers are missing in skipped files.

Useful? React with 👍 / 👎.

@zhangstevenunity zhangstevenunity merged commit 05918e6 into hw-native-sys:main Apr 3, 2026
11 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in pto project Apr 3, 2026
@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:merged
  • 源码提交:05918e6275d8
  • 结果汇总:OK 179 / FAIL 1 / SKIP 0
  • 日志:/tmp/ptoas-board-monitor/logs/20260403_170604_merged_pr408.log
  • 失败阶段:board-validation / exit=1

失败用例

  • rsqrt (run, exit=2)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #408

rsqrt

stage=run info=exit=2

/tmp/ptoas-board-monitor/runs/20260403_170604_merged_pr408/payload/pto-isa/include/pto/common/event.hpp:141:13: error: no member named 'Wait' in 'pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>'
    (events.Wait(), ...);
     ~~~~~~ ^
/tmp/ptoas-board-monitor/runs/20260403_170604_merged_pr408/payload/pto-isa/include/pto/common/pto_instr.hpp:54:5: note: in instantiation of function template specialization 'pto::WaitAllEvents<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
    WaitAllEvents(events...);
    ^
/tmp/ptoas-board-monitor/runs/20260403_170604_merged_pr408/payload/pto-isa/include/pto/common/pto_instr.hpp:1220:5: note: in instantiation of function template specialization 'pto::TSYNC<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
    TSYNC(events...);
    ^
/tmp/ptoas-board-monitor/runs/20260403_170604_merged_pr408/npu_validation/Rsqrt/rsqrt/rsqrt_kernel.cpp:96:3: note: in instantiation of function template specialization 'pto::TRSQRT<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TRSQRT(v12, v11, v13);
  ^
1 error generated.
gmake[2]: *** [CMakeFiles/rsqrt_kernel.dir/build.make:76: CMakeFiles/rsqrt_kernel.dir/rsqrt_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rsqrt_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-04-03 17:15:53] ERROR: testcase failed (exit 2): rsqrt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants