Skip to content

feat(PROF-15098): add batch reapply-by-id-and-bytes context API#598

Merged
jbachorik merged 21 commits into
mainfrom
prof-15098-context-by-id
Jun 18, 2026
Merged

feat(PROF-15098): add batch reapply-by-id-and-bytes context API#598
jbachorik merged 21 commits into
mainfrom
prof-15098-context-by-id

Conversation

@jbachorik

@jbachorik jbachorik commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?:

Adds setContextAttributesByIdAndBytes(int[], byte[][]) to ThreadContext, JavaProfiler, and ContextSetter. For each slot whose constantId > 0, writes both the DD sidecar encoding (read by the signal handler) and the OTEP attrs_data UTF-8 value (read by external eBPF/OTEP profilers) inside a single detach/attach window — no String allocation, hashing, or Dictionary lookup.

Also fixes the snapshotTags(int[]) guard from <= to >= (the old condition silently skipped the copy when the snapshot array was exactly the right size), and links the two independent MAX_CUSTOM_SLOTS / TAGS_STORAGE_LIMIT constants with cross-referencing comments.

Fixes ContextWallClockTest weight attribution: method1Impl accumulates wall-clock samples during monitor.wait() while method3 runs concurrently on the executor thread. Those samples were inflating method1's share to ~55% instead of the expected ~33%. Excluding Object.wait frames from the method1Impl attribution measures self-time only. The 0.3 allowedError for vm/vmx/fp/dwarf modes is retained to cover DWARF/VM trace-ID fragmentation that sheds some method2/method3 samples.

Motivation:

PROF-15098 — dd-trace-java's reapplyAppContext hot path must restore application-managed context attributes after a setContext span-activation wipes all custom slots. The existing setContextAttribute(int, String) API re-hashes and re-encodes the value on every call; the new API accepts the precomputed constant ID (from snapshotTags) and UTF-8 bytes (retained from the original String) so the hot path is zero-allocation.

Implementation notes:

  • Two-pass validation (check all slots before any write) prevents half-written records on malformed input.
  • setContextAttributesByIdAndBytes returns false and skips all writes when the record is already invalid (valid=0), preventing accidental resurrection of a cleared span-less record.
  • All N slot writes share one detach/attach window so a signal handler never observes a partially re-applied attribute set.
  • A shared writeSlot helper is used by both setContextAttributeDirect and the new batch path to avoid duplication of the sidecar + attrs_data write logic.
  • The (constantId, bytes) pairing is unverifiable by the callee; the contract is documented in the Javadoc.

Benchmarks (macOS/arm64, JDK 21, avgt ns/op):

Benchmark Score Error
reapplyByIdAndBytes — bare reapply, IDs+bytes already in hand 115 ± 1.5 ns
reapplyCycle — full cycle: setContext wipe + reapply (1t) 45 ± 0.8 ns
reapplyCycle_2t 46 ± 0.3 ns
reapplyCycle_4t 60 ± 1.4 ns
spanLifecycle (baseline: setContext + one setContextAttribute) 14 ns
setContextFull 6 ns

reapplyCycle at 45 ns restores 5 slots (compact + insert each) vs spanLifecycle at 14 ns for a single attribute — expected given the 5× attrs_data work. Scales cleanly under thread contention (46→60 ns at 4 threads, same pattern as spanLifecycle).

How to test the change?:

Six new tests in TagContextTest (deterministic readContextAttribute / snapshotTags readback — no flaky JFR-sample assertions):

  • testReapplyByIdAndBytes — happy path: set, snapshot, wipe via setContext, reapply, verify both sidecar and attrs_data views
  • testReapplyByIdAndBytesRejectsBadArgs — null arrays, length mismatch, null/oversized bytes
  • testReapplyByIdAndBytesReplacesExistingValue — compact-then-insert path (overwrites existing value)
  • testReapplyByIdAndBytesAfterClear — reapply after per-slot clearContextValue (valid stays 1)
  • testReapplyByIdAndBytesClearedRecord — reapply on a clearContext-cleared record (valid=0); must return false and not resurrect
  • testReapplyByIdAndBytesOverflowRollback — 10×255-byte values overflow attrs_data; overflowed slot's sidecar zeroed; slots written before the overflow are retained

Stress coverage: ReapplyContextAntagonist added to the chaos harness — drives the full reapply cycle from 8 threads while wall-clock signals fire, racing the detach/attach window. Run with --antagonists reapply-context.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a security review (run the dd:platform-security-review
    skill, or file a request via the PSEC review form).
    bewaire also runs automatically on every PR.
  • This PR doesn't touch any of that.
  • JIRA: PROF-15098

Unsure? Have a question? Request a review!

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik added the AI label Jun 15, 2026
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@datadog-datadog-prod-us1-2

datadog-datadog-prod-us1-2 Bot commented Jun 15, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 14 Pipeline jobs failed

DataDog/java-profiler | benchmarks-candidate-aarch64: [alloc]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall,alloc,memleak]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall]   View in Datadog   GitLab

View all 14 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 9f3e4ed | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts

dd-octo-sts Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #27773866793 | Commit: 264552f | Duration: 14m 56s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-18 16:48:54 UTC

jbachorik and others added 4 commits June 15, 2026 17:41
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ThreadContext.java Outdated
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ContextSetter.java
- setContextAttributesByIdAndBytes: return false when constantIds.length > MAX_CUSTOM_SLOTS
- snapshotTags(int[]): zero indices beyond attributes.size() to prevent foreign sidecar leakage
- Add 5 acceptance tests
@jbachorik jbachorik requested a review from Copilot June 17, 2026 18:23
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a zero-allocation “reapply app context” path by adding a batch API that restores custom context attributes using precomputed (constant-id, UTF-8-bytes) pairs in a single detach/attach window, and fixes a snapshotTags(int[]) guard bug that previously skipped exact-sized buffers.

Changes:

  • Add setContextAttributesByIdAndBytes(int[], byte[][]) to ThreadContext, JavaProfiler, and ContextSetter, reusing a shared slot-write helper to keep DD sidecar + OTEP attrs_data consistent.
  • Fix ContextSetter.snapshotTags(int[]) to copy for exact-sized buffers and zero-out extra indices for oversized buffers.
  • Add targeted tests, plus JMH + chaos-harness coverage for the reapply cycle under signal pressure.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ddprof-test/src/test/java/com/datadoghq/profiler/context/TagContextTest.java Adds unit tests for batch reapply-by-id-and-bytes behavior and snapshotTags buffer-size semantics.
ddprof-lib/src/main/java/com/datadoghq/profiler/ThreadContext.java Implements the batch reapply API and factors shared write logic to keep sidecar and attrs_data updates atomic.
ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java Exposes the new batch API through the public profiler facade.
ddprof-lib/src/main/java/com/datadoghq/profiler/ContextSetter.java Fixes snapshotTags guard/zeroing and adds a convenience wrapper for the new batch API.
ddprof-stresstest/src/jmh/java/com/datadoghq/profiler/stresstest/scenarios/throughput/ThreadContextBenchmark.java Adds JMH benchmarks for the bare reapply and full reapply cycle under different thread counts.
ddprof-stresstest/src/chaos/java/com/datadoghq/profiler/chaos/ReapplyContextAntagonist.java Adds a chaos antagonist to hammer the reapply cycle concurrently while wall-clock signals fire.
ddprof-stresstest/src/chaos/java/com/datadoghq/profiler/chaos/Main.java Registers the new antagonist under --antagonists reapply-context.
ddprof-stresstest/build.gradle.kts Adds ddprof-lib API compile-only dependency for the chaos source set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ThreadContext.java Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00194f8ae3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ContextSetter.java
@jbachorik jbachorik requested a review from Copilot June 17, 2026 20:22
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java Outdated
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java Outdated
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc14d7cc36

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java Outdated
jbachorik and others added 3 commits June 17, 2026 22:52
…ClockTest

method1Impl blocks in monitor.wait() while method3 runs on the executor
thread. Those WAITING samples were counted as method1Weight, inflating it
to ~55% instead of the expected ~33%.

Exclude stack traces containing "Object.wait" from the method1 attribution
so the assertion measures self-time only. Remove the 0.3 allowedError
relaxation that was masking the root cause.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The Object.wait exclusion fixed the method1Impl double-counting, but the
0.2 tolerance is too tight for DWARF/VM modes where native-frame PC
variation fragments trace IDs and sheds method2/method3 samples.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik marked this pull request as ready for review June 17, 2026 21:45
@jbachorik jbachorik requested a review from a team as a code owner June 17, 2026 21:45
@jbachorik jbachorik requested a review from ivoanjo June 18, 2026 09:03

@rkennke rkennke left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change!
Have you verified that dd-trace-java is ok with the flip <= vs >= semantics in snapshotTags(int[])?

  • Oversized (length > size): old = silent no-op (the bug); new = copies correctly.
  • Undersized (length < size): old = partial copy up to length; new = no-op.

So any external caller that passed a deliberately-undersized buffer expecting a partial copy would now get nothing.

@jbachorik

Copy link
Copy Markdown
Collaborator Author

Have you verified that dd-trace-java is ok with the flip

There will be a followup PR for dd-trace-java - it will need to get more adjustments for this functionality, anyhow

@ivoanjo ivoanjo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks good to me. I've left a bunch of comments, but they're all in the "we can remove a bit of code here, we can throw instead of swallowing failure here, we can simplify a bit here" vibe so nothing big

Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ContextSetter.java
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ContextSetter.java Outdated
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java Outdated
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ThreadContext.java Outdated
Comment thread ddprof-lib/src/main/java/com/datadoghq/profiler/ThreadContext.java
Comment thread ddprof-test/src/test/java/com/datadoghq/profiler/context/TagContextTest.java Outdated
jbachorik and others added 2 commits June 18, 2026 14:44
- Remove snapshotTags(int[]) overload; inline copyTags in no-arg form
- Make MAX_CUSTOM_SLOTS package-private; drop duplicate TAGS_STORAGE_LIMIT
- Throw NPE/IAE for null/mismatched args in setContextAttributesByIdAndBytes
- Remove pre-validation loop; let write loop handle bad utf8 naturally
- Drop "Delegates to" impl detail from JavaProfiler Javadoc
- Use distinct traceIdLow (not spanId) in antagonist, benchmark, tests
- Delete 3 dead snapshotTags(int[]) sizing tests

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
snapshotTags(int[]) is used in dd-trace-java PR 11646 — removing it
broke the caller. Restore the overload and the 3 contract tests.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@dd-octo-sts

dd-octo-sts Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Reliability & Chaos Results

All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/119752144

jbachorik and others added 7 commits June 18, 2026 15:40
Move null + length checks for active slots into a pre-pass before
detach() so a thrown NPE/IAE never leaves the record with valid=0.
Add post-throw sidecar assertions to the test to catch regressions.
Add int/uint32 safety comment in sframe.cpp.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- ReapplyContextAntagonist: throw ISE instead of counting reapply
  failures — reapply must always succeed when record is valid
- Remove now-dead reapplyFailures field and its stopGracefully logging
- TagContextTest: hoist Assumptions.assumeTrue(!Platform.isJ9()) into
  @beforeeach to avoid repeating it in every test method

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik merged commit bbd314b into main Jun 18, 2026
112 checks passed
@jbachorik jbachorik deleted the prof-15098-context-by-id branch June 18, 2026 17:20
@github-actions github-actions Bot added this to the 1.45.0 milestone Jun 18, 2026
// the record detached (valid=0) after an exception unwinds past attach().
for (int i = 0; i < len; i++) {
if (constantIds[i] > 0) {
byte[] bytes = Objects.requireNonNull(utf8[i], "utf8[" + i + "]");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, isn't this an allocation? 👀

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically

"utf8[" + i + "]"

...?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😱

#606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants