[InsertSync] scalar autosync + addptr alias-through (scalar-only scope)#206
[InsertSync] scalar autosync + addptr alias-through (scalar-only scope)#206TaoTao-real wants to merge 6 commits intohw-native-sys:mainfrom
Conversation
|
Added commit A3 remote validation (CI script flow):
Result:
Extra note: when additionally running |
269dedb to
40cee3b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c8c44e6cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (hasUnknownAliasRange) { | ||
| // Dynamic pointer arithmetic cannot be modeled precisely here. | ||
| // Keep root/scope aliasing, but drop concrete range info conservatively. | ||
| newInfo->baseAddresses.clear(); | ||
| newInfo->allocateSize = 0; |
There was a problem hiding this comment.
Keep unknown alias ranges conservative for local memory
Clearing baseAddresses and forcing allocateSize = 0 for unknown alias ranges is not conservative once that value flows into a later alias op with a known offset but unknown size (for example memref.reinterpret_cast): the later op repopulates baseAddresses but keeps allocateSize==0, and MemoryDependentAnalyzer::MemAlias then treats local-memory ranges as non-overlapping when both address lists are non-empty, which can drop real dependencies and skip required sync insertion. Please preserve an explicit “unknown range” state across descendants (or make local allocateSize==0 behave like GM unknown-size overlap).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d668c6fde
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (info.first == -1) { | ||
| hasUnknownAliasRange = true; | ||
| } else { | ||
| deltaOffset = info.first; |
There was a problem hiding this comment.
Treat negative addptr offsets as unknown alias ranges
getStaticOffsetAndSize can return negative byte deltas for pto.addptr, but this path only treats -1 as “unknown” and otherwise applies the delta to baseAddresses (a uint64_t). For valid IR with negative element offsets (the addptr verifier does not forbid them), subtracting from a low base can underflow and wrap to a huge unsigned address, making two aliasing accesses look disjoint. In that scenario InsertSync can miss a real dependency and skip required synchronization, so negative deltas should be handled conservatively (e.g., mark unknownRange) or with checked signed range math before updating addresses.
Useful? React with 👍 / 👎.
Summary
pto.load_scalar/pto.store_scalarparticipate in InsertSync asPIPE_Sopspto.addptralias-through in InsertSync translator, including constant-offset byte folding and conservative handling for dynamic offsetsPIPE_S -> PIPE_Slocal sync insertionset_flag/wait_flagsync (no forcedPIPE_ALLfallback)test_inject_sync_scalar_cross_pipe.pytest_inject_sync_scalar_intra_pipe_barrier.pyScope
This PR is intentionally scalar-sync only.
Layout-inference changes (e.g. rowmin/rowsum/rowmax related fixes) are explicitly excluded from this PR.
Motivation
InsertSync previously could miss synchronization around scalar load/store paths and
addptralias chains, which risks under-synchronization in scalar-memory-involved dependencies.Design
LoadScalarOp/StoreScalarOpOpPipeInterfacePIPE_SingetPipe()PIPE_Sis skipped (in-order scalar pipeline)set_flag/wait_flagpto.addptrto alias chain updatesaddptroffsets in byte-range analysisaddptroffsets use conservative alias modeling (prefer over-sync to under-sync)Testing
Local regression:
Result: pass (A5-only sample skipped as expected).
A3 remote board validation (latest, commit
6c8c44e):101.245.68.6(DEVICE_ID=10,SOC_VERSION=Ascend910)test/npu_validation/scripts/run_remote_npu_validation.shOK=19, FAIL=0, SKIP=1test_inject_sync_scalar_cross_pipe:OKtest_inject_sync_scalar_intra_pipe_barrier:OKRisk / Rollback
PIPE_Sremains skipped; cross-pipe behavior follows existing set/wait model and has local + remote regression coverage