Skip to content

Convert thread storage from std::array to stable_vector with dynamic growth#21

Draft
anujshuk-amd wants to merge 23 commits intotimemfrom
users/anujshuk/dynamic-thread-storage_ds
Draft

Convert thread storage from std::array to stable_vector with dynamic growth#21
anujshuk-amd wants to merge 23 commits intotimemfrom
users/anujshuk/dynamic-thread-storage_ds

Conversation

@anujshuk-amd
Copy link

@anujshuk-amd anujshuk-amd commented Jan 8, 2026

Later will move/remove this
Used by : ROCm/rocm-systems#2542

Motivation

These updates ensure that storage can grow as needed and that concurrent accesses are safely managed.

Technical Details

This pull request refactors the storage management logic in source/timemory/operations/types.hpp to improve thread safety and scalability. The main changes are the switch from a fixed-size array to a dynamically resizing vector for storage, the introduction of atomic and mutex-based synchronization for concurrent access, and the use of a configurable thread limit.

Thread safety and dynamic storage management

  • Changed set_storage from using a fixed-size std::array to a dynamically resizing std::vector, allowing for more flexible and scalable storage of thread-local data. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))
  • Added atomic and mutex-based synchronization (std::atomic, std::mutex) to ensure safe concurrent resizing and access of the storage vector. ([[1]](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eR42-R45), [[2]](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))
  • Implemented an ensure_capacity method in set_storage to grow the storage vector geometrically (doubling size) when a new thread index exceeds current capacity, ensuring efficient scaling and minimizing lock contention. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))
  • Updated thread limit configuration by replacing the hardcoded 4096 with the TIMEMORY_MAX_THREADS macro for greater flexibility. ([source/timemory/operations/types.hppL816-R872](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eL816-R872))

Safe access and read logic

  • Modified get_storage to use atomic capacity checks before accessing the storage vector, returning nullptr for out-of-bounds thread indices to prevent unsafe reads. ([source/timemory/operations/types.hppR905-R907](https://github.com/ROCm/timemory/pull/21/files#diff-e60a89742e16350cc3b36bbb562d4c77c8b1b7e34bd8a2011b6f81cf13fb771eR905-R907))

Test Plan

To be tested in CI/CD or rocm-system

Test Result

TBA

Submission Checklist

ajanicijamd and others added 18 commits January 31, 2025 11:39
Fixed a bug in processing command line parameters
…ction not being called when a default is defined (#3)
Invalid RPATH used for libunwind properties. Failing RPM generations and rpath check on RHEL 10.

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Update submodule to ROCm/GOTCHA@c04cc3f (v1.0.8 +).
Now points to the ROCm fork of GOTCHA
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Exclude perf_event_uncore events from evaluation

* Change1: Optimization and cleaner code fixes
Fallback to use `patchelf`, if `chrpath` is not present.
Change some logging from "WARNING"s to "AUTHOR_WARNING"s.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
…owth

- Replace fixed std::array with std::vector for dynamic thread storage
- Add ensure_capacity() with double-checked locking pattern
- Use std::atomic<size_t> for lock-free capacity reads
- Add bounds checking in get_storage operation
- Implement geometric growth (2x doubling) when capacity exceeded
- Initial capacity set to TIMEMORY_MAX_THREADS, grows as needed
@anujshuk-amd anujshuk-amd force-pushed the users/anujshuk/dynamic-thread-storage_ds branch 2 times, most recently from 8d9b5e6 to 696a160 Compare January 14, 2026 09:16
@anujshuk-amd anujshuk-amd force-pushed the users/anujshuk/dynamic-thread-storage_ds branch from 1bb036a to 2624144 Compare January 19, 2026 16:59
@anujshuk-amd anujshuk-amd changed the title Convert thread storage from std::array to std::vector with dynamic growth Convert thread storage from std::array to stable_vector with dynamic growth Jan 29, 2026
Copy link

@sputhala-amd sputhala-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest you to see if the following approach sounds feasible:

  1. Current approach has dynamic_storage_base<StorageType, N> + set_storage<T> : base with get() forwarding to get_storage().
    Instead can we have one type dynamic_storage<StorageType> that owns all behavior. set_storage<T> and get_storage<T> only forward to a single static storage instance for storage<T>*. No base class; the storage type is a single template that encapsulates capacity, growth, and access.

  2. Current approach has capacity() takes the mutex and returns m_chunks.size() * ChunkSize. So get_storage()(idx) does if (_idx >= get_capacity()) and pays a lock on every call.
    Instead keep an std::atomic<size_t> m_capacity, updated under the mutex whenever chunks are added. capacity() becomes m_capacity.load(std::memory_order_acquire) with no lock.

  3. In current approach every at() / operator[] takes the mutex to read the chunk pointer, so concurrent reads serialize.
    Instead use std::shared_mutex: writers (growth, reserve, ensure_capacity) take std::unique_lock, readers (at(), operator[] when only reading the chunk pointer) take std::shared_lock. Multiple readers can run in parallel.

}

protected:
static storage_array_t& get_storage()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic_storage_base::get_storage() returns a static instance, while set_storage<T>::get() below returned a different static _v.

static storage_array_t& get()
{
static storage_array_t _v = { nullptr };
static storage_array_t _v(max_threads, nullptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return get_storage(), so there is a single static instance. All set_storage and get_storage logic uses that same instance.

}

std::vector<chunk_ptr> m_chunks{};
size_t m_size{0};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_size should be std::atomic<size_t> to ensure thread safety.

m_chunks.push_back(std::move(chunk));
}

std::vector<chunk_ptr> m_chunks{};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of m_chunks should guarded with locks, because multiple threads could be calling ensure_capacity and add_chunk at the same time.

{
std::lock_guard<std::mutex> lock(m_mutex);
while(idx >= capacity())
add_chunk();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After ensure_capacity(idx), at()/operator[] did (*m_chunks[idx/ChunkSize])[idx%ChunkSize] while another thread could be calling add_chunk() and modifying m_chunks.

size_t size() const { return m_size; }
size_t capacity() const { return m_chunks.size() * ChunkSize; }

void reserve(size_t new_cap)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve here is also not thread safe, should be guarded with lock.

@sputhala-amd
Copy link

The PR commit history is difficult to follow because it includes unrelated changes.
Could you please create a fresh branch from rocprofiler-systems and open a new PR containing only the intended changes? This will make the review much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants