[InsertSync] scalar autosync + addptr alias-through (scalar-only scope) by TaoTao-real · Pull Request #206 · hw-native-sys/PTOAS

TaoTao-real · 2026-03-06T03:02:40Z

Summary

make pto.load_scalar / pto.store_scalar participate in InsertSync as PIPE_S ops
add pto.addptr alias-through in InsertSync translator, including constant-offset byte folding and conservative handling for dynamic offsets
keep scalar sync behavior aligned with AscendNPU-IR semantics:
- skip PIPE_S -> PIPE_S local sync insertion
- keep scalar cross-pipe dependencies as normal set_flag/wait_flag sync (no forced PIPE_ALL fallback)
add scalar sync regressions and runop structural checks:
- test_inject_sync_scalar_cross_pipe.py
- test_inject_sync_scalar_intra_pipe_barrier.py

Scope

This PR is intentionally scalar-sync only.
Layout-inference changes (e.g. rowmin/rowsum/rowmax related fixes) are explicitly excluded from this PR.

Motivation

InsertSync previously could miss synchronization around scalar load/store paths and addptr alias chains, which risks under-synchronization in scalar-memory-involved dependencies.

Design

LoadScalarOp / StoreScalarOp
- add OpPipeInterface
- return PIPE_S in getPipe()
InsertSync analysis
- same-pipe PIPE_S is skipped (in-order scalar pipeline)
- scalar cross-pipe dependencies are synchronized via set_flag/wait_flag
alias propagation
- add pto.addptr to alias chain updates
- support constant addptr offsets in byte-range analysis
- dynamic addptr offsets use conservative alias modeling (prefer over-sync to under-sync)

Testing

Local regression:

source /Users/lishengtao/Workspace/PTO/env.sh
export PTO_PYTHON_ROOT=/Users/lishengtao/Documents/PTO/PTOAS_upstream_main_main/build-main/python
pto_export_runtime
PTOAS_BIN=/Users/lishengtao/Documents/PTO/PTOAS_pr206_scalar_only/build-main/tools/ptoas/ptoas \
  bash test/samples/runop.sh -t Sync

Result: pass (A5-only sample skipped as expected).

A3 remote board validation (latest, commit 6c8c44e):

Host: 101.245.68.6 (DEVICE_ID=10, SOC_VERSION=Ascend910)
Flow: CI-equivalent test/npu_validation/scripts/run_remote_npu_validation.sh
Result: OK=19, FAIL=0, SKIP=1
Key scalar cases:
- test_inject_sync_scalar_cross_pipe: OK
- test_inject_sync_scalar_intra_pipe_barrier: OK

Risk / Rollback

Risk: scalar-related sync coverage expands compared with previous scalar-untracked behavior
Mitigation: same-pipe PIPE_S remains skipped; cross-pipe behavior follows existing set/wait model and has local + remote regression coverage
Rollback: revert this PR (changes are localized to InsertSync + Sync tests)

TaoTao-real · 2026-03-06T03:57:47Z

Added commit 269dedb for the long-term row-reduction layout fix and validated on A3 board before PR submission.

A3 remote validation (CI script flow):

Script: test/npu_validation/scripts/run_remote_npu_validation.sh
Env: STAGE=run, RUN_MODE=npu, SOC_VERSION=Ascend910, DEVICE_ID=2
Run only: rowmin,rowsum,rowmax

Result:

rowmin: OK
rowsum: OK
rowmax: OK
Summary: OK=3 FAIL=0

Extra note: when additionally running tensor_view_layout_dn, it fails with a pto-isa static assert (DN GlobalTensor with RowMajor MatTile) on A3; this case is orthogonal to the row-reduction fix above.

zhangstevenunity · 2026-03-17T03:21:58Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c8c44e6cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-17T03:30:08Z

lib/PTO/Transforms/InsertSync/PTOIRTranslator.cpp

+    if (hasUnknownAliasRange) {
+      // Dynamic pointer arithmetic cannot be modeled precisely here.
+      // Keep root/scope aliasing, but drop concrete range info conservatively.
+      newInfo->baseAddresses.clear();
+      newInfo->allocateSize = 0;


Keep unknown alias ranges conservative for local memory

Clearing baseAddresses and forcing allocateSize = 0 for unknown alias ranges is not conservative once that value flows into a later alias op with a known offset but unknown size (for example memref.reinterpret_cast): the later op repopulates baseAddresses but keeps allocateSize==0, and MemoryDependentAnalyzer::MemAlias then treats local-memory ranges as non-overlapping when both address lists are non-empty, which can drop real dependencies and skip required sync insertion. Please preserve an explicit “unknown range” state across descendants (or make local allocateSize==0 behave like GM unknown-size overlap).

Useful? React with 👍 / 👎.

…sion

zhangstevenunity · 2026-03-31T06:41:11Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d668c6fde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T06:46:42Z

lib/PTO/Transforms/InsertSync/PTOIRTranslator.cpp

+    if (info.first == -1) {
+      hasUnknownAliasRange = true;
+    } else {
+      deltaOffset = info.first;


Treat negative addptr offsets as unknown alias ranges

getStaticOffsetAndSize can return negative byte deltas for pto.addptr, but this path only treats -1 as “unknown” and otherwise applies the delta to baseAddresses (a uint64_t). For valid IR with negative element offsets (the addptr verifier does not forbid them), subtracting from a low base can underflow and wrap to a huge unsigned address, making two aliasing accesses look disjoint. In that scenario InsertSync can miss a real dependency and skip required synchronization, so negative deltas should be handled conservatively (e.g., mark unknownRange) or with checked signed range math before updating addresses.

Useful? React with 👍 / 👎.

TaoTao-real changed the title ~~[InsertSync] support scalar PIPE_S autosync and addptr alias-through~~ [WIP] [InsertSync] support scalar PIPE_S autosync and addptr alias-through Mar 6, 2026

TaoTao-real added 2 commits March 6, 2026 16:26

[InsertSync] add PIPE_S scalar sync and addptr alias-through

8aced45

[InsertSync] align scalar sync tests and add A3-safe PIPE_ALL fallback

40cee3b

TaoTao-real force-pushed the codex/load-store-scalar-pipe-s-sync branch from 269dedb to 40cee3b Compare March 6, 2026 08:30

TaoTao-real changed the title ~~[WIP] [InsertSync] support scalar PIPE_S autosync and addptr alias-through~~ [InsertSync] scalar autosync + addptr alias-through (scalar-only scope) Mar 6, 2026

TaoTao-real added 2 commits March 6, 2026 17:08

[InsertSync] skip S->S sync and align scalar intra-pipe check

3938e52

[InsertSync] remove scalar cross-pipe PIPE_ALL fallback

6c8c44e

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

lishengtao added 2 commits March 17, 2026 16:35

[InsertSync] preserve unknown local alias range conservatively

8be599b

[InsertSync] model scalar access ranges and add disjoint-range regres…

3d668c6

…sion

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

jiashu added this to pto project Apr 1, 2026

github-project-automation bot moved this to Todo in pto project Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InsertSync] scalar autosync + addptr alias-through (scalar-only scope)#206

[InsertSync] scalar autosync + addptr alias-through (scalar-only scope)#206
TaoTao-real wants to merge 6 commits intohw-native-sys:mainfrom
TaoTao-real:codex/load-store-scalar-pipe-s-sync

TaoTao-real commented Mar 6, 2026 •

edited

Loading

Uh oh!

TaoTao-real commented Mar 6, 2026

Uh oh!

zhangstevenunity commented Mar 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Uh oh!

zhangstevenunity commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TaoTao-real commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Motivation

Design

Testing

Risk / Rollback

Uh oh!

TaoTao-real commented Mar 6, 2026

Uh oh!

zhangstevenunity commented Mar 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

zhangstevenunity commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TaoTao-real commented Mar 6, 2026 •

edited

Loading