feat(PROF-15098): add batch reapply-by-id-and-bytes context API#598
Conversation
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
CI Test ResultsRun: #27773866793 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-18 16:48:54 UTC |
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- setContextAttributesByIdAndBytes: return false when constantIds.length > MAX_CUSTOM_SLOTS - snapshotTags(int[]): zero indices beyond attributes.size() to prevent foreign sidecar leakage - Add 5 acceptance tests
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR introduces a zero-allocation “reapply app context” path by adding a batch API that restores custom context attributes using precomputed (constant-id, UTF-8-bytes) pairs in a single detach/attach window, and fixes a snapshotTags(int[]) guard bug that previously skipped exact-sized buffers.
Changes:
- Add
setContextAttributesByIdAndBytes(int[], byte[][])toThreadContext,JavaProfiler, andContextSetter, reusing a shared slot-write helper to keep DD sidecar + OTEPattrs_dataconsistent. - Fix
ContextSetter.snapshotTags(int[])to copy for exact-sized buffers and zero-out extra indices for oversized buffers. - Add targeted tests, plus JMH + chaos-harness coverage for the reapply cycle under signal pressure.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| ddprof-test/src/test/java/com/datadoghq/profiler/context/TagContextTest.java | Adds unit tests for batch reapply-by-id-and-bytes behavior and snapshotTags buffer-size semantics. |
| ddprof-lib/src/main/java/com/datadoghq/profiler/ThreadContext.java | Implements the batch reapply API and factors shared write logic to keep sidecar and attrs_data updates atomic. |
| ddprof-lib/src/main/java/com/datadoghq/profiler/JavaProfiler.java | Exposes the new batch API through the public profiler facade. |
| ddprof-lib/src/main/java/com/datadoghq/profiler/ContextSetter.java | Fixes snapshotTags guard/zeroing and adds a convenience wrapper for the new batch API. |
| ddprof-stresstest/src/jmh/java/com/datadoghq/profiler/stresstest/scenarios/throughput/ThreadContextBenchmark.java | Adds JMH benchmarks for the bare reapply and full reapply cycle under different thread counts. |
| ddprof-stresstest/src/chaos/java/com/datadoghq/profiler/chaos/ReapplyContextAntagonist.java | Adds a chaos antagonist to hammer the reapply cycle concurrently while wall-clock signals fire. |
| ddprof-stresstest/src/chaos/java/com/datadoghq/profiler/chaos/Main.java | Registers the new antagonist under --antagonists reapply-context. |
| ddprof-stresstest/build.gradle.kts | Adds ddprof-lib API compile-only dependency for the chaos source set. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00194f8ae3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc14d7cc36
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ClockTest method1Impl blocks in monitor.wait() while method3 runs on the executor thread. Those WAITING samples were counted as method1Weight, inflating it to ~55% instead of the expected ~33%. Exclude stack traces containing "Object.wait" from the method1 attribution so the assertion measures self-time only. Remove the 0.3 allowedError relaxation that was masking the root cause. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The Object.wait exclusion fixed the method1Impl double-counting, but the 0.2 tolerance is too tight for DWARF/VM modes where native-frame PC variation fragments trace IDs and sheds method2/method3 samples. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
rkennke
left a comment
There was a problem hiding this comment.
Good change!
Have you verified that dd-trace-java is ok with the flip <= vs >= semantics in snapshotTags(int[])?
- Oversized (length > size): old = silent no-op (the bug); new = copies correctly.
- Undersized (length < size): old = partial copy up to length; new = no-op.
So any external caller that passed a deliberately-undersized buffer expecting a partial copy would now get nothing.
There will be a followup PR for dd-trace-java - it will need to get more adjustments for this functionality, anyhow |
ivoanjo
left a comment
There was a problem hiding this comment.
👍 This looks good to me. I've left a bunch of comments, but they're all in the "we can remove a bit of code here, we can throw instead of swallowing failure here, we can simplify a bit here" vibe so nothing big
- Remove snapshotTags(int[]) overload; inline copyTags in no-arg form - Make MAX_CUSTOM_SLOTS package-private; drop duplicate TAGS_STORAGE_LIMIT - Throw NPE/IAE for null/mismatched args in setContextAttributesByIdAndBytes - Remove pre-validation loop; let write loop handle bad utf8 naturally - Drop "Delegates to" impl detail from JavaProfiler Javadoc - Use distinct traceIdLow (not spanId) in antagonist, benchmark, tests - Delete 3 dead snapshotTags(int[]) sizing tests Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
snapshotTags(int[]) is used in dd-trace-java PR 11646 — removing it broke the caller. Restore the overload and the 3 contract tests. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Reliability & Chaos Results✅ All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/119752144 |
Move null + length checks for active slots into a pre-pass before detach() so a thrown NPE/IAE never leaves the record with valid=0. Add post-throw sidecar assertions to the test to catch regressions. Add int/uint32 safety comment in sframe.cpp. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- ReapplyContextAntagonist: throw ISE instead of counting reapply failures — reapply must always succeed when record is valid - Remove now-dead reapplyFailures field and its stopGracefully logging - TagContextTest: hoist Assumptions.assumeTrue(!Platform.isJ9()) into @beforeeach to avoid repeating it in every test method Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
| // the record detached (valid=0) after an exception unwinds past attach(). | ||
| for (int i = 0; i < len; i++) { | ||
| if (constantIds[i] > 0) { | ||
| byte[] bytes = Objects.requireNonNull(utf8[i], "utf8[" + i + "]"); |
There was a problem hiding this comment.
Wait, isn't this an allocation? 👀
There was a problem hiding this comment.
Specifically
"utf8[" + i + "]"
...?
What does this PR do?:
Adds
setContextAttributesByIdAndBytes(int[], byte[][])toThreadContext,JavaProfiler, andContextSetter. For each slot whoseconstantId > 0, writes both the DD sidecar encoding (read by the signal handler) and the OTEPattrs_dataUTF-8 value (read by external eBPF/OTEP profilers) inside a singledetach/attachwindow — no String allocation, hashing, or Dictionary lookup.Also fixes the
snapshotTags(int[])guard from<=to>=(the old condition silently skipped the copy when the snapshot array was exactly the right size), and links the two independentMAX_CUSTOM_SLOTS/TAGS_STORAGE_LIMITconstants with cross-referencing comments.Fixes
ContextWallClockTestweight attribution:method1Implaccumulates wall-clock samples duringmonitor.wait()whilemethod3runs concurrently on the executor thread. Those samples were inflatingmethod1's share to ~55% instead of the expected ~33%. ExcludingObject.waitframes from themethod1Implattribution measures self-time only. The 0.3allowedErrorfor vm/vmx/fp/dwarf modes is retained to cover DWARF/VM trace-ID fragmentation that sheds somemethod2/method3samples.Motivation:
PROF-15098 —
dd-trace-java'sreapplyAppContexthot path must restore application-managed context attributes after asetContextspan-activation wipes all custom slots. The existingsetContextAttribute(int, String)API re-hashes and re-encodes the value on every call; the new API accepts the precomputed constant ID (fromsnapshotTags) and UTF-8 bytes (retained from the original String) so the hot path is zero-allocation.Implementation notes:
setContextAttributesByIdAndBytesreturnsfalseand skips all writes when the record is already invalid (valid=0), preventing accidental resurrection of a cleared span-less record.detach/attachwindow so a signal handler never observes a partially re-applied attribute set.writeSlothelper is used by bothsetContextAttributeDirectand the new batch path to avoid duplication of the sidecar + attrs_data write logic.(constantId, bytes)pairing is unverifiable by the callee; the contract is documented in the Javadoc.Benchmarks (macOS/arm64, JDK 21,
avgtns/op):reapplyByIdAndBytes— bare reapply, IDs+bytes already in handreapplyCycle— full cycle:setContextwipe + reapply (1t)reapplyCycle_2treapplyCycle_4tspanLifecycle(baseline:setContext+ onesetContextAttribute)setContextFullreapplyCycleat 45 ns restores 5 slots (compact + insert each) vsspanLifecycleat 14 ns for a single attribute — expected given the 5× attrs_data work. Scales cleanly under thread contention (46→60 ns at 4 threads, same pattern asspanLifecycle).How to test the change?:
Six new tests in
TagContextTest(deterministicreadContextAttribute/snapshotTagsreadback — no flaky JFR-sample assertions):testReapplyByIdAndBytes— happy path: set, snapshot, wipe viasetContext, reapply, verify both sidecar and attrs_data viewstestReapplyByIdAndBytesRejectsBadArgs— null arrays, length mismatch, null/oversized bytestestReapplyByIdAndBytesReplacesExistingValue— compact-then-insert path (overwrites existing value)testReapplyByIdAndBytesAfterClear— reapply after per-slotclearContextValue(validstays 1)testReapplyByIdAndBytesClearedRecord— reapply on aclearContext-cleared record (valid=0); must returnfalseand not resurrecttestReapplyByIdAndBytesOverflowRollback— 10×255-byte values overflowattrs_data; overflowed slot's sidecar zeroed; slots written before the overflow are retainedStress coverage:
ReapplyContextAntagonistadded to the chaos harness — drives the full reapply cycle from 8 threads while wall-clock signals fire, racing thedetach/attachwindow. Run with--antagonists reapply-context.For Datadog employees:
credentials of any kind, I've requested a security review (run the
dd:platform-security-reviewskill, or file a request via the PSEC review form).
bewairealso runs automatically on every PR.Unsure? Have a question? Request a review!