Skip to content

An alternative TLS implementation to fix a crash on musl/aarch64#600

Open
zhengyu123 wants to merge 19 commits into
mainfrom
zgu/tls_replacement
Open

An alternative TLS implementation to fix a crash on musl/aarch64#600
zhengyu123 wants to merge 19 commits into
mainfrom
zgu/tls_replacement

Conversation

@zhengyu123

@zhengyu123 zhengyu123 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?:
This PR is intended to replace thread_local variables (except Otel ones) with an alternative implementation, which is based on pthread_get/setspecific() API.

Motivation:
We have encountered a quite few defects related to TLS - whose implementation is compiler specific. In some circumstances, the implementation is very restrictive that does not fit our usage pattern - it manifests in SCP-1238

Additional Notes:
To limit the scope, I only replaced 3 thread locals in livenessTracker which led to the crash reported in SCP-1238](https://datadoghq.atlassian.net/browse/SCP-1238)

Converting ProfiledThread to use newly introduced ThreadLocal will be done in separate PR.

How to test the change?:

  • CI tests
  • New test case.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a security review (run the dd:platform-security-review
    skill, or file a request via the PSEC review form).
    bewaire also runs automatically on every PR.
  • This PR doesn't touch any of that.
  • JIRA: PROF-15114

Unsure? Have a question? Request a review!

@dd-octo-sts

dd-octo-sts Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #27779791157 | Commit: 93f91c9 | Duration: 14m 55s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-18 18:30:00 UTC

@datadog-prod-us1-4

datadog-prod-us1-4 Bot commented Jun 17, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 14 Pipeline jobs failed

DataDog/java-profiler | benchmarks-candidate-aarch64: [alloc]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall,alloc,memleak]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall]   View in Datadog   GitLab

View all 14 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 13c8891 | Docs | Datadog PR Page | Give us feedback!

@zhengyu123 zhengyu123 changed the title Zgu/tls replacement WIP: An alternative TLS implementation to fix a crash on musl/aarch64 Jun 18, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Reliability & Chaos Results

All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/119735058

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a pthread-key-based ThreadLocal helper as an alternative to C++ thread_local, and migrates LivenessTracker’s per-thread random/subsampling state to use it (motivated by musl/aarch64 TLS issues). It also adds C++ unit tests validating the new TLS wrapper behavior.

Changes:

  • Added ddprof-lib/src/main/cpp/threadLocal.h implementing a ThreadLocal<T> wrapper on top of pthread_key_*.
  • Updated LivenessTracker::track() to replace thread_local PRNG/distribution/skipped-state with ThreadLocal instances.
  • Added threadLocal_ut.cpp gtest coverage for ThreadLocal set/get semantics, per-thread isolation, and per-thread cleanup.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.

File Description
ddprof-lib/src/main/cpp/threadLocal.h New pthread-key TLS abstraction (generic + double specialization).
ddprof-lib/src/main/cpp/livenessTracker.cpp Switches liveness subsampling TLS state from thread_local to the new ThreadLocal wrapper.
ddprof-lib/src/test/cpp/threadLocal_ut.cpp Adds gtests validating correctness and cleanup behavior of the new TLS wrapper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddprof-lib/src/main/cpp/threadLocal.h Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/livenessTracker.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/livenessTracker.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
Comment thread ddprof-lib/src/main/cpp/threadLocal.h
zhengyu123 and others added 13 commits June 18, 2026 09:57
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@zhengyu123 zhengyu123 changed the title WIP: An alternative TLS implementation to fix a crash on musl/aarch64 An alternative TLS implementation to fix a crash on musl/aarch64 Jun 18, 2026
@zhengyu123 zhengyu123 marked this pull request as ready for review June 18, 2026 18:09
@zhengyu123 zhengyu123 requested a review from a team as a code owner June 18, 2026 18:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13c88912f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +57 to +59
int err = pthread_key_create(&_key, F);
// What to do if we can not create a key?
assert(err == 0);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle pthread_key_create failures outside assert

When the process exhausts pthread TLS keys, pthread_key_create returns an error; in release builds this project compiles with -DNDEBUG, so this assertion is removed and the static _key remains zero/unusable. Subsequent liveness sampling get()/set() calls can then operate on an unrelated key or repeatedly allocate and leak per-thread RNG state instead of failing cleanly, so the error path needs non-assert handling such as an unconditional abort or disabling this TLS instance.

Useful? React with 👍 / 👎.

ThreadLocal& operator=(const ThreadLocal&) = delete;

ThreadLocal() {
int err = pthread_key_create(&_key, F);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH · robustness] pthread_key_create failure is guarded only by assert(), which is compiled out under NDEBUG. In a release build a failed key creation leaves _key uninitialized; all subsequent get()/set() calls then operate on an arbitrary or zero key, silently aliasing another thread's TSD slot (e.g. the ProfiledThread key in thread.cpp).

Suggestion: replace the assert with an unconditional abort:

if (err != 0) { fprintf(stderr, "ThreadLocal: pthread_key_create failed: %d\n", err); abort(); }

This matches the existing error handling in thread.cpp and makes failures visible in production.

}

~ThreadLocal() {
pthread_key_delete(_key);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH · robustness] pthread_key_delete(_key) is called unconditionally in the destructor. If pthread_key_create failed in the constructor and the assert was compiled out (NDEBUG / release build), _key stays at its zero-initialized value (0). Calling pthread_key_delete(0) deletes a key we don't own — on musl, key 0 is legitimately held by another component (see thread.cpp:91–93) — corrupting the global pthread key table.

Suggestion: track construction success with a bool _key_valid member:

_key_valid = (pthread_key_create(&_key, F) == 0);
// ...
if (_key_valid) pthread_key_delete(_key);

Apply the same pattern to the double specialization.

T get() {
void* p = pthread_getspecific(_key);
if (p == nullptr && C != nullptr) {
p = C();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH · robustness] After p = C() allocates an object, set() asserts on pthread_setspecific; under NDEBUG the assert is stripped. If pthread_setspecific then fails silently, the TSD slot stays null. Every subsequent get() re-invokes C() and leaks the prior allocation — unbounded per-call leak for the lazy-create path.

Suggestion: check pthread_setspecific directly after C() and free on failure:

p = C();
if (pthread_setspecific(_key, (const void*)p) != 0) {
    if (F) F(p);
    return T{};
}

Alternatively, make set() call abort() instead of assert so failure is not silently swallowed in release builds.

pthread_key_delete(_key);
}

void set(T value) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM · correctness] The primary template casts T directly to const void* with no compile-time guard that sizeof(T) == sizeof(void*). Instantiation with a sub-pointer-width type (e.g. ThreadLocal<int>) compiles silently but produces implementation-defined behaviour on the set/get round-trip for values that set the sign bit.

Suggestion: add a static_assert in the primary template constructor mirroring the guards already in the double specialization:

static_assert(sizeof(T) == sizeof(void*),
    "ThreadLocal<T> requires sizeof(T)==sizeof(void*); use a pointer type or add a specialization");

pthread_key_delete(_key);
}

void set(T value) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM · correctness] set(nullptr) on a lazy-create ThreadLocal (i.e. C != nullptr) is semantically identical to 'never set': the next get() call will invoke C() and return a fresh object instead of null. This makes it impossible to store a null value or 'unset' without triggering re-creation, which is a silent foot-gun for the planned future conversions mentioned in the PR description.

Suggestion: document explicitly that set(nullptr) is reserved for clear() and must not be used when C != nullptr, or provide a separate reset() method that bypasses the lazy-create path.

return value;
}

void clear() {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM · robustness] ThreadLocal<double>::clear() uses assert(err == 0) for the pthread_setspecific return value; under NDEBUG the assert is stripped. If pthread_setspecific fails silently, the TSD slot retains the stale encoded double bits; the next get() sees a non-null value and decodes garbage instead of returning 0.0.

Suggestion: mirror the defensive pattern in the generic clear() — read the existing value first, attempt the set, and only proceed if it succeeded — or use abort() instead of assert so the failure is not silently swallowed in release builds.

}
}
skipped = 0; // reset the subsampling skipped bytes
skipped.set(0); // reset the subsampling skipped bytes

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW · correctness] (Pre-existing, not introduced by this PR) skipped.set(0) resets the accumulated skipped-bytes counter even when the current event was dropped (table full), losing weight that should carry forward to the next successfully recorded sample.

Suggestion: move skipped.set(0) inside the idx < _table_cap branch so it only resets when the sample was actually recorded.


// The create function lazily initializes storage on first get() and is invoked
// exactly once per thread; subsequent get()s return the same pointer.
TEST_F(ThreadLocalTest, Lazy_CreateOncePerThread) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM · completeness] The Lazy_* tests share g_tracked_tl, g_created, and g_freed as namespace-scope statics. Counts are reset per test but the TSD key itself is shared. In parallel test runs or with a thread-pool GTest runner where worker threads are reused, freed counts can be non-deterministic.

Suggestion: use a per-test-local ThreadLocal instance (or scoped key creation/deletion in SetUp/TearDown) so each test owns its own TSD key and counters, eliminating cross-test contamination regardless of thread reuse.

ASSERT_NE(nullptr, first);
EXPECT_EQ(first, second); // same instance reused (pointer compare only)
EXPECT_EQ(1234, value); // created via create_tracked()
EXPECT_EQ(1, g_created.load()); // created exactly once

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW · completeness] Lazy_CreateOncePerThread resets g_freed to 0 but never asserts its value after the thread joins. A stub implementation that omits CLEAN_FUNC entirely would still pass this test.

Suggestion: add EXPECT_EQ(1, g_freed.load()); after the g_created assertion to verify the TSD destructor is called on thread exit.


// Per-thread accumulation mirrors LivenessTracker's `skipped` usage: each thread
// keeps its own running sum, isolated from the others.
TEST_F(ThreadLocalTest, Double_PerThreadAccumulation) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW · completeness] The test suite has zero coverage for clear() on either ThreadLocal specialization. A stub clear() that did nothing would pass all 8 tests without failure.

Suggestion: add at minimum: (1) a test for generic clear() verifying the stored pointer is zeroed and the destructor ran; (2) a test for ThreadLocal<double>::clear() verifying get() returns 0.0 afterwards; (3) a test for clear() when F == nullptr verifying the slot is actually nulled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants