An alternative TLS implementation to fix a crash on musl/aarch64#600
An alternative TLS implementation to fix a crash on musl/aarch64#600zhengyu123 wants to merge 19 commits into
Conversation
CI Test ResultsRun: #27779791157 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-18 18:30:00 UTC |
|
Reliability & Chaos Results✅ All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/119735058 |
There was a problem hiding this comment.
Pull request overview
This PR introduces a pthread-key-based ThreadLocal helper as an alternative to C++ thread_local, and migrates LivenessTracker’s per-thread random/subsampling state to use it (motivated by musl/aarch64 TLS issues). It also adds C++ unit tests validating the new TLS wrapper behavior.
Changes:
- Added
ddprof-lib/src/main/cpp/threadLocal.himplementing aThreadLocal<T>wrapper on top ofpthread_key_*. - Updated
LivenessTracker::track()to replacethread_localPRNG/distribution/skipped-state withThreadLocalinstances. - Added
threadLocal_ut.cppgtest coverage forThreadLocalset/get semantics, per-thread isolation, and per-thread cleanup.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
ddprof-lib/src/main/cpp/threadLocal.h |
New pthread-key TLS abstraction (generic + double specialization). |
ddprof-lib/src/main/cpp/livenessTracker.cpp |
Switches liveness subsampling TLS state from thread_local to the new ThreadLocal wrapper. |
ddprof-lib/src/test/cpp/threadLocal_ut.cpp |
Adds gtests validating correctness and cleanup behavior of the new TLS wrapper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13c88912f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| int err = pthread_key_create(&_key, F); | ||
| // What to do if we can not create a key? | ||
| assert(err == 0); |
There was a problem hiding this comment.
Handle pthread_key_create failures outside assert
When the process exhausts pthread TLS keys, pthread_key_create returns an error; in release builds this project compiles with -DNDEBUG, so this assertion is removed and the static _key remains zero/unusable. Subsequent liveness sampling get()/set() calls can then operate on an unrelated key or repeatedly allocate and leak per-thread RNG state instead of failing cleanly, so the error path needs non-assert handling such as an unconditional abort or disabling this TLS instance.
Useful? React with 👍 / 👎.
| ThreadLocal& operator=(const ThreadLocal&) = delete; | ||
|
|
||
| ThreadLocal() { | ||
| int err = pthread_key_create(&_key, F); |
There was a problem hiding this comment.
[HIGH · robustness] pthread_key_create failure is guarded only by assert(), which is compiled out under NDEBUG. In a release build a failed key creation leaves _key uninitialized; all subsequent get()/set() calls then operate on an arbitrary or zero key, silently aliasing another thread's TSD slot (e.g. the ProfiledThread key in thread.cpp).
Suggestion: replace the assert with an unconditional abort:
if (err != 0) { fprintf(stderr, "ThreadLocal: pthread_key_create failed: %d\n", err); abort(); }This matches the existing error handling in thread.cpp and makes failures visible in production.
| } | ||
|
|
||
| ~ThreadLocal() { | ||
| pthread_key_delete(_key); |
There was a problem hiding this comment.
[HIGH · robustness] pthread_key_delete(_key) is called unconditionally in the destructor. If pthread_key_create failed in the constructor and the assert was compiled out (NDEBUG / release build), _key stays at its zero-initialized value (0). Calling pthread_key_delete(0) deletes a key we don't own — on musl, key 0 is legitimately held by another component (see thread.cpp:91–93) — corrupting the global pthread key table.
Suggestion: track construction success with a bool _key_valid member:
_key_valid = (pthread_key_create(&_key, F) == 0);
// ...
if (_key_valid) pthread_key_delete(_key);Apply the same pattern to the double specialization.
| T get() { | ||
| void* p = pthread_getspecific(_key); | ||
| if (p == nullptr && C != nullptr) { | ||
| p = C(); |
There was a problem hiding this comment.
[HIGH · robustness] After p = C() allocates an object, set() asserts on pthread_setspecific; under NDEBUG the assert is stripped. If pthread_setspecific then fails silently, the TSD slot stays null. Every subsequent get() re-invokes C() and leaks the prior allocation — unbounded per-call leak for the lazy-create path.
Suggestion: check pthread_setspecific directly after C() and free on failure:
p = C();
if (pthread_setspecific(_key, (const void*)p) != 0) {
if (F) F(p);
return T{};
}Alternatively, make set() call abort() instead of assert so failure is not silently swallowed in release builds.
| pthread_key_delete(_key); | ||
| } | ||
|
|
||
| void set(T value) { |
There was a problem hiding this comment.
[MEDIUM · correctness] The primary template casts T directly to const void* with no compile-time guard that sizeof(T) == sizeof(void*). Instantiation with a sub-pointer-width type (e.g. ThreadLocal<int>) compiles silently but produces implementation-defined behaviour on the set/get round-trip for values that set the sign bit.
Suggestion: add a static_assert in the primary template constructor mirroring the guards already in the double specialization:
static_assert(sizeof(T) == sizeof(void*),
"ThreadLocal<T> requires sizeof(T)==sizeof(void*); use a pointer type or add a specialization");| pthread_key_delete(_key); | ||
| } | ||
|
|
||
| void set(T value) { |
There was a problem hiding this comment.
[MEDIUM · correctness] set(nullptr) on a lazy-create ThreadLocal (i.e. C != nullptr) is semantically identical to 'never set': the next get() call will invoke C() and return a fresh object instead of null. This makes it impossible to store a null value or 'unset' without triggering re-creation, which is a silent foot-gun for the planned future conversions mentioned in the PR description.
Suggestion: document explicitly that set(nullptr) is reserved for clear() and must not be used when C != nullptr, or provide a separate reset() method that bypasses the lazy-create path.
| return value; | ||
| } | ||
|
|
||
| void clear() { |
There was a problem hiding this comment.
[MEDIUM · robustness] ThreadLocal<double>::clear() uses assert(err == 0) for the pthread_setspecific return value; under NDEBUG the assert is stripped. If pthread_setspecific fails silently, the TSD slot retains the stale encoded double bits; the next get() sees a non-null value and decodes garbage instead of returning 0.0.
Suggestion: mirror the defensive pattern in the generic clear() — read the existing value first, attempt the set, and only proceed if it succeeded — or use abort() instead of assert so the failure is not silently swallowed in release builds.
| } | ||
| } | ||
| skipped = 0; // reset the subsampling skipped bytes | ||
| skipped.set(0); // reset the subsampling skipped bytes |
There was a problem hiding this comment.
[LOW · correctness] (Pre-existing, not introduced by this PR) skipped.set(0) resets the accumulated skipped-bytes counter even when the current event was dropped (table full), losing weight that should carry forward to the next successfully recorded sample.
Suggestion: move skipped.set(0) inside the idx < _table_cap branch so it only resets when the sample was actually recorded.
|
|
||
| // The create function lazily initializes storage on first get() and is invoked | ||
| // exactly once per thread; subsequent get()s return the same pointer. | ||
| TEST_F(ThreadLocalTest, Lazy_CreateOncePerThread) { |
There was a problem hiding this comment.
[MEDIUM · completeness] The Lazy_* tests share g_tracked_tl, g_created, and g_freed as namespace-scope statics. Counts are reset per test but the TSD key itself is shared. In parallel test runs or with a thread-pool GTest runner where worker threads are reused, freed counts can be non-deterministic.
Suggestion: use a per-test-local ThreadLocal instance (or scoped key creation/deletion in SetUp/TearDown) so each test owns its own TSD key and counters, eliminating cross-test contamination regardless of thread reuse.
| ASSERT_NE(nullptr, first); | ||
| EXPECT_EQ(first, second); // same instance reused (pointer compare only) | ||
| EXPECT_EQ(1234, value); // created via create_tracked() | ||
| EXPECT_EQ(1, g_created.load()); // created exactly once |
There was a problem hiding this comment.
[LOW · completeness] Lazy_CreateOncePerThread resets g_freed to 0 but never asserts its value after the thread joins. A stub implementation that omits CLEAN_FUNC entirely would still pass this test.
Suggestion: add EXPECT_EQ(1, g_freed.load()); after the g_created assertion to verify the TSD destructor is called on thread exit.
|
|
||
| // Per-thread accumulation mirrors LivenessTracker's `skipped` usage: each thread | ||
| // keeps its own running sum, isolated from the others. | ||
| TEST_F(ThreadLocalTest, Double_PerThreadAccumulation) { |
There was a problem hiding this comment.
[LOW · completeness] The test suite has zero coverage for clear() on either ThreadLocal specialization. A stub clear() that did nothing would pass all 8 tests without failure.
Suggestion: add at minimum: (1) a test for generic clear() verifying the stored pointer is zeroed and the destructor ran; (2) a test for ThreadLocal<double>::clear() verifying get() returns 0.0 afterwards; (3) a test for clear() when F == nullptr verifying the slot is actually nulled.
What does this PR do?:
This PR is intended to replace
thread_localvariables (except Otel ones) with an alternative implementation, which is based onpthread_get/setspecific()API.Motivation:
We have encountered a quite few defects related to TLS - whose implementation is compiler specific. In some circumstances, the implementation is very restrictive that does not fit our usage pattern - it manifests in SCP-1238
Additional Notes:
To limit the scope, I only replaced 3 thread locals in
livenessTrackerwhich led to the crash reported in SCP-1238](https://datadoghq.atlassian.net/browse/SCP-1238)Converting
ProfiledThreadto use newly introducedThreadLocalwill be done in separate PR.How to test the change?:
For Datadog employees:
credentials of any kind, I've requested a security review (run the
dd:platform-security-reviewskill, or file a request via the PSEC review form).
bewairealso runs automatically on every PR.Unsure? Have a question? Request a review!