Convert thread storage from std::array to stable_vector with dynamic growth by anujshuk-amd · Pull Request #21 · ROCm/timemory

anujshuk-amd · 2026-01-08T18:43:02Z

Later will move/remove this
Used by : ROCm/rocm-systems#2542

Motivation

These updates ensure that storage can grow as needed and that concurrent accesses are safely managed.

Technical Details

This pull request refactors the storage management logic in source/timemory/operations/types.hpp to improve thread safety and scalability. The main changes are the switch from a fixed-size array to a dynamically resizing vector for storage, the introduction of atomic and mutex-based synchronization for concurrent access, and the use of a configurable thread limit.

Thread safety and dynamic storage management

Changed set_storage from using a fixed-size std::array to a dynamically resizing std::vector, allowing for more flexible and scalable storage of thread-local data. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))
Added atomic and mutex-based synchronization (std::atomic, std::mutex) to ensure safe concurrent resizing and access of the storage vector. ([[1]](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eR42-R45), [[2]](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))
Implemented an ensure_capacity method in set_storage to grow the storage vector geometrically (doubling size) when a new thread index exceeds current capacity, ensuring efficient scaling and minimizing lock contention. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))
Updated thread limit configuration by replacing the hardcoded 4096 with the TIMEMORY_MAX_THREADS macro for greater flexibility. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))

Safe access and read logic

Modified get_storage to use atomic capacity checks before accessing the storage vector, returning nullptr for out-of-bounds thread indices to prevent unsafe reads. ([source/timemory/operations/types.hppR905-R907](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eR905-R907))

Test Plan

To be tested in CI/CD or rocm-system

Test Result

TBA

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Fixed a bug in processing command line parameters

…ction not being called when a default is defined (#3)

Invalid RPATH used for libunwind properties. Failing RPM generations and rpath check on RHEL 10. --------- Signed-off-by: David Galiffi <David.Galiffi@amd.com>

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

Update submodule to ROCm/GOTCHA@c04cc3f (v1.0.8 +). Now points to the ROCm fork of GOTCHA

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

)

* Exclude perf_event_uncore events from evaluation * Change1: Optimization and cleaner code fixes

Fallback to use `patchelf`, if `chrpath` is not present. Change some logging from "WARNING"s to "AUTHOR_WARNING"s. Signed-off-by: David Galiffi <David.Galiffi@amd.com>

…g BEFORE keyword (#19)

…owth - Replace fixed std::array with std::vector for dynamic thread storage - Add ensure_capacity() with double-checked locking pattern - Use std::atomic<size_t> for lock-free capacity reads - Add bounds checking in get_storage operation - Implement geometric growth (2x doubling) when capacity exceeded - Initial capacity set to TIMEMORY_MAX_THREADS, grows as needed

…on review comment

…ge_base

sputhala-amd

May I suggest you to see if the following approach sounds feasible:

Current approach has dynamic_storage_base<StorageType, N> + set_storage<T> : base with get() forwarding to get_storage().
Instead can we have one type dynamic_storage<StorageType> that owns all behavior. set_storage<T> and get_storage<T> only forward to a single static storage instance for storage<T>*. No base class; the storage type is a single template that encapsulates capacity, growth, and access.
Current approach has capacity() takes the mutex and returns m_chunks.size() * ChunkSize. So get_storage()(idx) does if (_idx >= get_capacity()) and pays a lock on every call.
Instead keep an std::atomic<size_t> m_capacity, updated under the mutex whenever chunks are added. capacity() becomes m_capacity.load(std::memory_order_acquire) with no lock.
In current approach every at() / operator[] takes the mutex to read the chunk pointer, so concurrent reads serialize.
Instead use std::shared_mutex: writers (growth, reserve, ensure_capacity) take std::unique_lock, readers (at(), operator[] when only reading the chunk pointer) take std::shared_lock. Multiple readers can run in parallel.

sputhala-amd · 2026-02-06T17:45:49Z