Fix: two ring-buffer allocator defects in pto_ring_buffer.h#431
Open
chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
Open
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the ring buffer and dependency pool logic across several files, adjusting allocation boundary checks and index calculations. The feedback identifies a significant risk regarding the use of signed 32-bit integers for monotonic counters; specifically, signed overflow is undefined behavior and the modulo operator can return negative results for negative operands, potentially leading to out-of-bounds memory access. It is recommended to use unsigned arithmetic or explicit casting to ensure safe index generation.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_ring_buffer.h
Outdated
Show resolved
Hide resolved
src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_ring_buffer.h
Outdated
Show resolved
Hide resolved
b8ef4e7 to
f6f97f3
Compare
Bug 1 — Heap wrap-around: change strict `>` to `>=` in try_bump_heap. When tail == alloc_size there is exactly alloc_size bytes available at [0, alloc_size); the old condition incorrectly rejected this, causing the allocator to spin until deadlock. Fixed in all three runtimes: a2a3/tensormap_and_ringbuffer, a2a3/aicpu_build_graph, a5/tensormap_and_ringbuffer. Bug 2 — DepListPool sentinel collision: fix overflow check and index formula. `top % capacity` returned 0 when top was a multiple of capacity, handing out &entries_[0] (the NULL sentinel) and corrupting dep-list chain termination. Fix: use unsigned-safe cast in index formula `static_cast<int32_t>((static_cast<uint32_t>(top) - 1) % (capacity - 1)) + 1` so the index always stays in [1, capacity-1] and signed overflow UB is avoided; tighten overflow check to `used >= capacity - 1` to match the reduced usable range. Applied to all three runtimes. Additionally: - Add copyright headers to the three pto_ring_buffer.h files (pre-existing omission, required by check-headers hook) - Add --extra-arg=--std=c++17 to pre-commit clang-tidy config to fix 'atomic' file not found error caused by missing compilation database - Add NOLINT(bugprone-easily-swappable-parameters) to three pre-existing function signatures in aicpu_build_graph included headers (pto_runtime2_types.h, pto_submit_types.h, tensor.h) - Apply clang-format to all modified files Fixes hw-native-sys#429
f6f97f3 to
2f4e574
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
try_bump_heapusedtail > alloc_size(strict greater-than), causing deadlock whentail == alloc_size— exactly enough space exists at[0, alloc_size)but the condition incorrectly rejected it. Fixed totail >= alloc_size.alloc()computedidx = top % capacity, which returns 0 (the NULL sentinel slot) whentopis a multiple ofcapacity. Fixed toidx = ((top - 1) % (capacity - 1)) + 1so the index always stays in[1, capacity-1]. Overflow check tightened fromused >= capacitytoused >= capacity - 1to match the reduced usable range.a2a3/tensormap_and_ringbuffer,a2a3/aicpu_build_graph,a5/tensormap_and_ringbuffer.Testing
pytest tests -m "not requires_hardware")./ci.sh -p a2a3sim)ctest -R test_ring_bufferonce test infrastructure is available)Fixes #429