fix(profiler): name native threads periodically (PROF-15139)#601
Conversation
JIT/GC and other non-Java threads have no JVMTI ThreadStart, so they were named only at dump time from /proc; transient ones that exited first showed as "[tid=N]". Resolve native thread names periodically from the Libraries refresher thread (pure /proc, no JVMTI), which avoids the dying-thread crash class fixed in PROF-14548. Resolve names outside _ti_lock to avoid holding it across /proc I/O, decimate the scan to a 2s cadence and gate it on isRunning(), harden OS::threadName NUL-termination, and fix std::string length in the resolver. Adds ThreadInfo::updateThreadName unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR improves profile readability by proactively naming native (non-Java) threads (e.g., JIT/GC) while they are still alive, by periodically resolving names from /proc on the existing Libraries refresher thread. It also hardens thread-name handling and adds focused C++ unit coverage for the new ThreadInfo::updateThreadName locking contract.
Changes:
- Periodically call
Profiler::updateNativeThreadNames()fromLibraries::refresherLoop()(2s cadence, gated onProfiler::isRunning()). - Refactor
ThreadInfo::updateThreadName()so the resolver runs outside_ti_lock, avoiding holding the lock across blocking/procI/O. - Harden Linux
/proc/.../commNUL-termination and fix native-thread resolver to avoid storing trailing buffer garbage; add new gtest unit tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ddprof-lib/src/test/cpp/threadInfo_ut.cpp | Adds unit tests validating ThreadInfo::updateThreadName behavior and the resolve-outside-lock race contract. |
| ddprof-lib/src/main/cpp/threadInfo.cpp | Refactors updateThreadName to resolve outside the lock and preserve “do not overwrite” behavior on concurrent inserts. |
| ddprof-lib/src/main/cpp/profiler.h | Exposes and documents updateNativeThreadNames() for periodic invocation from the refresher thread. |
| ddprof-lib/src/main/cpp/profiler.cpp | Ensures resolved native thread names use std::string(name_buf) (no trailing garbage) during enumeration. |
| ddprof-lib/src/main/cpp/os_linux.cpp | Improves /proc thread-name NUL-termination logic to avoid dropping a real last character and ensure termination. |
| ddprof-lib/src/main/cpp/libraries.cpp | Piggy-backs native thread-name resolution onto the Libraries refresher thread with decimation and state gating. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5e97b6daf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
CI Test ResultsRun: #27763609595 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-18 13:57:25 UTC |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edebd321a1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Reliability & Chaos Results✅ All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/119650514 |
rkennke
left a comment
There was a problem hiding this comment.
This is a solid change, looks good to me!
The periodic refresher scan skips a thread whose /proc comm still equals the process's own name (likely mid-init), deferring to a later scan or the dump-time pass. Avoids latching a provisional name under the first-writer-wins ThreadInfo::updateThreadName. Dump-time path is unchanged (defer=false). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What does this PR do?:
Resolves native (non-Java) thread names periodically from the
Librariesrefresher thread so JIT compiler / GC threads no longer show up as[tid=N]in profiles. Previously these threads — which receive no JVMTIThreadStart— were named only at dump time from/proc, so any that exited before the dump lost their name.The periodic resolution uses the pure
/procpath (open/read/close) from a normal background thread, deliberately avoiding the JVMTI/vmStructsdying-thread crash class fixed in PROF-14548.Includes the follow-up hardening from review:
ThreadInfo::updateThreadNameresolves the name outside_ti_lock, so the lock is no longer held across blocking/procI/O.The scan is gated on
Profiler::isRunning()and decimated to a 2s cadence to bound its cost on high-thread-count processes.OS::threadNameNUL-termination is hardened to never drop a real character or leave the buffer unterminated.Fixes
std::string(name_buf, buffer_size)→std::string(name_buf)in the resolver (was storing the NUL + trailing garbage).Motivation:
PROF-15139 — native thread names (JIT/GC) regressed to
[tid=N], making timelines noisy and hard to read.Additional Notes:
Threads living shorter than the 2s refresh cadence are inherently unnameable via
/procand will still fall back to[tid=N]. Eager JVMTI-based resolution is intentionally avoided (see PROF-14548 / #518: dereferencing a terminating thread's freedosthreadstruct caused SIGSEGV).How to test the change?:
New C++ unit tests in
ddprof-lib/src/test/cpp/threadInfo_ut.cppcoverThreadInfo::updateThreadName, including the resolve-outside-lock concurrency contract (a racingset()during the unlocked window correctly wins). Run with./gradlew :ddprof-lib:gtestDebug_threadInfo_ut— 4/4 pass.For Datadog employees:
credentials of any kind, I've requested a security review (run the
dd:platform-security-reviewskill, or file a request via the PSEC review form).
bewairealso runs automatically on every PR.Unsure? Have a question? Request a review!