Perf: replace read_reg with poll_reg in COND polling loops#428
Open
chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
Open
Perf: replace read_reg with poll_reg in COND polling loops#428chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new poll_reg function and a poll_acquire_barrier macro to optimize register polling across the a2a3 and a5 platforms. The poll_reg function allows for low-overhead volatile reads within hot loops by omitting memory barriers, while the poll_acquire_barrier ensures proper memory synchronization once the polling condition is satisfied. These primitives have been integrated into several executor components to improve performance. I have no further feedback to provide as no review comments were present.
c79b5a4 to
07401c3
Compare
Add poll_reg() — a barrier-free volatile read — for use in hot spin-wait loops that poll the AICore COND register. Add poll_acquire_barrier() (dmb ish on ARM64, compiler barrier on x86_64) inserted once on the cold path when the awaited condition is detected. - platform (a2a3, a5): add poll_reg() declaration and implementation; add poll_acquire_barrier() macro to memory_barrier.h - runtimes (host_build_graph, aicpu_build_graph, tensormap_and_ringbuffer on both a2a3 and a5): replace read_reg() → poll_reg() for the COND register reads inside the polling loop; insert poll_acquire_barrier() at each completion branch before accessing Normal memory The barrier cost is now O(1) per task completion instead of O(iterations), eliminating dmb overhead on every iteration of the "not-yet-done" hot path.
07401c3 to
694bb60
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
poll_reg()— a barrier-free volatile read — for hot COND register polling loops in AICPU executorspoll_acquire_barrier()macro (dmb ishon ARM64, compiler barrier on x86_64) inserted once on the completion path after the awaited condition is detectedread_reg()→poll_reg()in all COND polling sites across all 5 runtimes (a2a3: aicpu_build_graph, host_build_graph, tensormap_and_ringbuffer; a5: host_build_graph, tensormap_and_ringbuffer)The barrier cost is now O(1) per task completion instead of O(poll iterations), eliminating
dmboverhead on every iteration of the "not-yet-done" hot path.Testing
a2a3sim13/13,a5sim2/2