Skip to content

fix(profiler): name native threads periodically (PROF-15139)#601

Merged
jbachorik merged 4 commits into
mainfrom
jb/prof-15139-native-thread-names
Jun 18, 2026
Merged

fix(profiler): name native threads periodically (PROF-15139)#601
jbachorik merged 4 commits into
mainfrom
jb/prof-15139-native-thread-names

Conversation

@jbachorik

@jbachorik jbachorik commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?:

Resolves native (non-Java) thread names periodically from the Libraries refresher thread so JIT compiler / GC threads no longer show up as [tid=N] in profiles. Previously these threads — which receive no JVMTI ThreadStart — were named only at dump time from /proc, so any that exited before the dump lost their name.

The periodic resolution uses the pure /proc path (open/read/close) from a normal background thread, deliberately avoiding the JVMTI/vmStructs dying-thread crash class fixed in PROF-14548.

Includes the follow-up hardening from review:

  • ThreadInfo::updateThreadName resolves the name outside _ti_lock, so the lock is no longer held across blocking /proc I/O.

  • The scan is gated on Profiler::isRunning() and decimated to a 2s cadence to bound its cost on high-thread-count processes.

  • OS::threadName NUL-termination is hardened to never drop a real character or leave the buffer unterminated.

  • Fixes std::string(name_buf, buffer_size)std::string(name_buf) in the resolver (was storing the NUL + trailing garbage).

Motivation:

PROF-15139 — native thread names (JIT/GC) regressed to [tid=N], making timelines noisy and hard to read.

Additional Notes:

Threads living shorter than the 2s refresh cadence are inherently unnameable via /proc and will still fall back to [tid=N]. Eager JVMTI-based resolution is intentionally avoided (see PROF-14548 / #518: dereferencing a terminating thread's freed osthread struct caused SIGSEGV).

How to test the change?:

New C++ unit tests in ddprof-lib/src/test/cpp/threadInfo_ut.cpp cover ThreadInfo::updateThreadName, including the resolve-outside-lock concurrency contract (a racing set() during the unlocked window correctly wins). Run with ./gradlew :ddprof-lib:gtestDebug_threadInfo_ut — 4/4 pass.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a security review (run the dd:platform-security-review
    skill, or file a request via the PSEC review form).
    bewaire also runs automatically on every PR.
  • This PR doesn't touch any of that.
  • JIRA: PROF-15139

Unsure? Have a question? Request a review!

JIT/GC and other non-Java threads have no JVMTI ThreadStart, so they
were named only at dump time from /proc; transient ones that exited
first showed as "[tid=N]". Resolve native thread names periodically
from the Libraries refresher thread (pure /proc, no JVMTI), which
avoids the dying-thread crash class fixed in PROF-14548.

Resolve names outside _ti_lock to avoid holding it across /proc I/O,
decimate the scan to a 2s cadence and gate it on isRunning(), harden
OS::threadName NUL-termination, and fix std::string length in the
resolver. Adds ThreadInfo::updateThreadName unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik added the AI label Jun 17, 2026
@jbachorik jbachorik requested a review from Copilot June 17, 2026 21:50
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves profile readability by proactively naming native (non-Java) threads (e.g., JIT/GC) while they are still alive, by periodically resolving names from /proc on the existing Libraries refresher thread. It also hardens thread-name handling and adds focused C++ unit coverage for the new ThreadInfo::updateThreadName locking contract.

Changes:

  • Periodically call Profiler::updateNativeThreadNames() from Libraries::refresherLoop() (2s cadence, gated on Profiler::isRunning()).
  • Refactor ThreadInfo::updateThreadName() so the resolver runs outside _ti_lock, avoiding holding the lock across blocking /proc I/O.
  • Harden Linux /proc/.../comm NUL-termination and fix native-thread resolver to avoid storing trailing buffer garbage; add new gtest unit tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ddprof-lib/src/test/cpp/threadInfo_ut.cpp Adds unit tests validating ThreadInfo::updateThreadName behavior and the resolve-outside-lock race contract.
ddprof-lib/src/main/cpp/threadInfo.cpp Refactors updateThreadName to resolve outside the lock and preserve “do not overwrite” behavior on concurrent inserts.
ddprof-lib/src/main/cpp/profiler.h Exposes and documents updateNativeThreadNames() for periodic invocation from the refresher thread.
ddprof-lib/src/main/cpp/profiler.cpp Ensures resolved native thread names use std::string(name_buf) (no trailing garbage) during enumeration.
ddprof-lib/src/main/cpp/os_linux.cpp Improves /proc thread-name NUL-termination logic to avoid dropping a real last character and ensure termination.
ddprof-lib/src/main/cpp/libraries.cpp Piggy-backs native thread-name resolution onto the Libraries refresher thread with decimation and state gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddprof-lib/src/main/cpp/threadInfo.cpp Outdated
Comment thread ddprof-lib/src/test/cpp/threadInfo_ut.cpp
Comment thread ddprof-lib/src/main/cpp/profiler.cpp Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5e97b6daf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/cpp/threadInfo.cpp Outdated
Comment thread ddprof-lib/src/test/cpp/threadInfo_ut.cpp
@datadog-prod-us1-6

datadog-prod-us1-6 Bot commented Jun 17, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 14 Pipeline jobs failed

DataDog/java-profiler | benchmarks-candidate-aarch64: [alloc]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall,alloc,memleak]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall]   View in Datadog   GitLab

View all 14 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d35effb | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts

dd-octo-sts Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #27763609595 | Commit: 123fee3 | Duration: 13m 27s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-18 13:57:25 UTC

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik marked this pull request as ready for review June 18, 2026 09:31
@jbachorik jbachorik requested a review from a team as a code owner June 18, 2026 09:31

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edebd321a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/cpp/libraries.cpp Outdated
@dd-octo-sts

dd-octo-sts Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Reliability & Chaos Results

All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/119650514

@rkennke rkennke left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a solid change, looks good to me!

jbachorik and others added 2 commits June 18, 2026 14:36
The periodic refresher scan skips a thread whose /proc comm still equals
the process's own name (likely mid-init), deferring to a later scan or the
dump-time pass. Avoids latching a provisional name under the first-writer-wins
ThreadInfo::updateThreadName. Dump-time path is unchanged (defer=false).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik merged commit 30c0af5 into main Jun 18, 2026
105 checks passed
@jbachorik jbachorik deleted the jb/prof-15139-native-thread-names branch June 18, 2026 14:06
@github-actions github-actions Bot added this to the 1.45.0 milestone Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants