Convert thread storage from std::array to stable_vector with dynamic growth#21
Convert thread storage from std::array to stable_vector with dynamic growth#21anujshuk-amd wants to merge 23 commits intotimemfrom
Conversation
Fixed a bug in processing command line parameters
…ction not being called when a default is defined (#3)
Invalid RPATH used for libunwind properties. Failing RPM generations and rpath check on RHEL 10. --------- Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Update submodule to ROCm/GOTCHA@c04cc3f (v1.0.8 +). Now points to the ROCm fork of GOTCHA
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Exclude perf_event_uncore events from evaluation * Change1: Optimization and cleaner code fixes
Fallback to use `patchelf`, if `chrpath` is not present. Change some logging from "WARNING"s to "AUTHOR_WARNING"s. Signed-off-by: David Galiffi <David.Galiffi@amd.com>
…g BEFORE keyword (#19)
…owth - Replace fixed std::array with std::vector for dynamic thread storage - Add ensure_capacity() with double-checked locking pattern - Use std::atomic<size_t> for lock-free capacity reads - Add bounds checking in get_storage operation - Implement geometric growth (2x doubling) when capacity exceeded - Initial capacity set to TIMEMORY_MAX_THREADS, grows as needed
8d9b5e6 to
696a160
Compare
1bb036a to
2624144
Compare
sputhala-amd
left a comment
There was a problem hiding this comment.
May I suggest you to see if the following approach sounds feasible:
-
Current approach has
dynamic_storage_base<StorageType, N>+set_storage<T> : basewithget()forwarding toget_storage().
Instead can we have one typedynamic_storage<StorageType>that owns all behavior.set_storage<T>andget_storage<T>only forward to a single static storage instance forstorage<T>*. No base class; the storage type is a single template that encapsulates capacity, growth, and access. -
Current approach has
capacity()takes the mutex and returnsm_chunks.size() * ChunkSize. Soget_storage()(idx)doesif (_idx >= get_capacity())and pays a lock on every call.
Instead keep anstd::atomic<size_t> m_capacity, updated under the mutex whenever chunks are added.capacity()becomesm_capacity.load(std::memory_order_acquire)with no lock. -
In current approach every
at()/operator[]takes the mutex to read the chunk pointer, so concurrent reads serialize.
Instead usestd::shared_mutex: writers (growth,reserve,ensure_capacity) takestd::unique_lock, readers (at(),operator[]when only reading the chunk pointer) takestd::shared_lock. Multiple readers can run in parallel.
| } | ||
|
|
||
| protected: | ||
| static storage_array_t& get_storage() |
There was a problem hiding this comment.
dynamic_storage_base::get_storage() returns a static instance, while set_storage<T>::get() below returned a different static _v.
| static storage_array_t& get() | ||
| { | ||
| static storage_array_t _v = { nullptr }; | ||
| static storage_array_t _v(max_threads, nullptr); |
There was a problem hiding this comment.
Should return get_storage(), so there is a single static instance. All set_storage and get_storage logic uses that same instance.
| } | ||
|
|
||
| std::vector<chunk_ptr> m_chunks{}; | ||
| size_t m_size{0}; |
There was a problem hiding this comment.
m_size should be std::atomic<size_t> to ensure thread safety.
| m_chunks.push_back(std::move(chunk)); | ||
| } | ||
|
|
||
| std::vector<chunk_ptr> m_chunks{}; |
There was a problem hiding this comment.
Usage of m_chunks should guarded with locks, because multiple threads could be calling ensure_capacity and add_chunk at the same time.
| { | ||
| std::lock_guard<std::mutex> lock(m_mutex); | ||
| while(idx >= capacity()) | ||
| add_chunk(); |
There was a problem hiding this comment.
After ensure_capacity(idx), at()/operator[] did (*m_chunks[idx/ChunkSize])[idx%ChunkSize] while another thread could be calling add_chunk() and modifying m_chunks.
| size_t size() const { return m_size; } | ||
| size_t capacity() const { return m_chunks.size() * ChunkSize; } | ||
|
|
||
| void reserve(size_t new_cap) |
There was a problem hiding this comment.
reserve here is also not thread safe, should be guarded with lock.
|
The PR commit history is difficult to follow because it includes unrelated changes. |
Later will move/remove this
Used by : ROCm/rocm-systems#2542
Motivation
These updates ensure that storage can grow as needed and that concurrent accesses are safely managed.
Technical Details
This pull request refactors the storage management logic in
source/timemory/operations/types.hppto improve thread safety and scalability. The main changes are the switch from a fixed-size array to a dynamically resizing vector for storage, the introduction of atomic and mutex-based synchronization for concurrent access, and the use of a configurable thread limit.Thread safety and dynamic storage management
set_storagefrom using a fixed-sizestd::arrayto a dynamically resizingstd::vector, allowing for more flexible and scalable storage of thread-local data. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))std::atomic,std::mutex) to ensure safe concurrent resizing and access of the storage vector. ([[1]](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eR42-R45),[[2]](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))ensure_capacitymethod inset_storageto grow the storage vector geometrically (doubling size) when a new thread index exceeds current capacity, ensuring efficient scaling and minimizing lock contention. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))4096with theTIMEMORY_MAX_THREADSmacro for greater flexibility. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))Safe access and read logic
get_storageto use atomic capacity checks before accessing the storage vector, returningnullptrfor out-of-bounds thread indices to prevent unsafe reads. ([source/timemory/operations/types.hppR905-R907](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eR905-R907))Test Plan
To be tested in CI/CD or rocm-system
Test Result
TBA
Submission Checklist