Skip to content

Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347

Draft
szegedi wants to merge 15 commits into
mainfrom
otel-thread-ctx
Draft

Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347
szegedi wants to merge 15 commits into
mainfrom
otel-thread-ctx

Conversation

@szegedi

@szegedi szegedi commented Jun 9, 2026

Copy link
Copy Markdown

Adds a Node.js writer for the OpenTelemetry Thread Local Context Record (OTEP-4947), ported from the in-development upstream at polarsignals/custom-labels#16.

What's in this branch

Native addon (bindings/)

  • otel-thread-ctx.cc + otel-thread-ctx.hh: the writer, namespaced as dd::OtelThreadCtx::Init(exports) and called from binding.cc.
  • The discovery TLS symbol otel_thread_ctx_nodejs_v1 stays in extern "C" at file scope so it's exported by name through the dd_pprof.node dynsym table. It's a 32-byte struct holding the four fields a reader needs: cped_slot (V8 isolate's ContinuationPreservedEmbedderData slot pointer), als_handle (Global<Object> to the writer's AsyncLocalStorage), als_identity_hash (JS identity hash for hash-bucket narrowing), and undefined_addr (cached per-isolate undefined singleton for clean "no context" detection by the reader).
  • Records use a C99 flexible-array-member tail and are right-sized to fit the encoded attribute payload, with a 36-byte attrs_data floor (one cache line total — matches the OTEP "frugal writer" guidance) and ×2 geometric growth on append, capped at the OTEP-recommended 612-byte attrs_data ceiling. Append-on-realloc updates the wrapper's internal record_ pointer in place, so every async-context frame that holds the same wrapper reference observes the new buffer (no per-CPED record divergence).
  • binding.gyp adds -mtls-dialect=gnu2 on x86_64 Linux (required by OTEP-4947 for TLSDESC; on arm64 TLSDESC is the only dynamic TLS model so no flag is needed).

TypeScript layer (ts/src/otel-thread-ctx.ts)

API surface (Linux only; no-op stubs elsewhere). The shape mirrors the wall profiler's setContext/getContext pattern: callers construct one ThreadContext per logical scope (typically per trace span) and re-install the same JS reference on every async-context fire.

  • class ThreadContext — constructable from JS via new pprof.otelThreadCtx.ThreadContext(traceId, spanId, attributes?). Instance methods: appendAttributes(attrs) (append-only, mutates the record in place), isTruncated(), and debugBytes() (debug-only). Lifetime is GC-managed: the underlying native record is freed when no JS or async-context-frame reference survives.
  • getContext(): ThreadContext | undefined — returns whatever's attached to the active async-context frame.
  • setContext(ctx) — installs (or detaches, with undefined) a context in the active async-context frame. Re-installing the same reference across many frames is cheap (no allocation); per-span caching of the context object is the intended usage.
  • runWithContext(ctx, fn) — scoped variant: als.run-style.
  • makeNamedContext(keys) — returns a NamedContext with buildContext(opts) (resolves namedAttributes against the captured key list and constructs a ThreadContext) plus sugar methods (enterWithContext / runWithContext / clearContext) that compose buildContext with the top-level functions. It also exposes processContextAttributes, a frozen snapshot of the OTEP-4719 entries the caller should publish.

traceId/spanId are raw Uint8Array (16 and 8 bytes; Buffer works as a subclass). attributes is positional: index N is the value for uint8 key index N on the wire; null/undefined/holes are skipped. Per-value cap of 255 UTF-8 bytes (uint8 length prefix), total attrs_data cap of 612 bytes (OTEP-recommended 640-byte total record minus 28-byte header).

Process-context attributes

makeNamedContext(keys).processContextAttributes exposes a frozen snapshot ready to spread into whatever OTEP-4719 process-context publisher the application uses:

{
    'threadlocal.schema_version': 'nodejs_v1',
    'threadlocal.attribute_key_map': [...keys],
    // V8 layout constants captured from the V8 headers the addon was
    // compiled against — let the reader walk our wrapper and V8's
    // OrderedHashMap layout without doing its own V8-internal-symbol
    // lookups for the pointer-compression / sandbox state.
    'threadlocal.nodejs_v1.wrapped_object_offset': 24,
    'threadlocal.nodejs_v1.tagged_size': 8,
}

Tests

Mocha suite under ts/test/test-otel-thread-ctx.ts covering: ThreadContext construction and input parsing, on-the-wire record encoding (including multibyte UTF-8 truncation at the 255-byte per-value cap), the 612-byte cap and isTruncated, in-place append vs geometric realloc, async propagation, setContext/getContext/runWithContext lifecycle and idempotency, NamedContext.buildContext name resolution, processContextAttributes shape and immutability, and a readelf --dyn-syms check that the TLS symbol is exported with the right binding / visibility / type. The whole describe block is skipped on non-Linux.

A second commit adds a scripts/docker/ Dockerfile + launcher and a test:docker npm script that builds the addon and runs the full test suite in a Linux container — useful for running the new tests from macOS dev machines. End-to-end verified at 158 passing (96 existing pprof tests + the new suite).

szegedi added 2 commits June 9, 2026 16:05
Ports the in-development OpenTelemetry thread-context writer that
lives on the otel-thread-ctx-node branch of polarsignals/custom-labels
(szegedi fork) into this project. The two codebases will likely
diverge again later; for now this is a snapshot of the current state.

Structurally:
- bindings/otel-thread-ctx.cc/.hh: the native addon code, namespaced
  in `dd::` and exposed via OtelThreadCtx::Init(exports) called from
  binding.cc. The thread_local otel_thread_ctx_nodejs_v1 discovery
  symbol stays in extern "C" at file scope so it's exported by name
  through the dd_pprof.node dynsym table.
- ts/src/otel-thread-ctx.ts: the runWithContext / enterWithContext /
  makeNamedContext API, loading the native addon via node-gyp-build
  like the rest of this project.
- ts/test/test-otel-thread-ctx.ts: mocha port of the node:test suite.
  Skipped wholesale on non-Linux.
- binding.gyp: adds bindings/otel-thread-ctx.cc to both target source
  lists and the -mtls-dialect=gnu2 cflag on x86_64 Linux (required by
  the OTEP-4947 spec; on arm64 TLSDESC is the only dynamic TLS model
  so no flag is needed).

Verified by mocha against the built dd_pprof.node in a Linux
container (Node 22 with --experimental-async-context-frame):
35 passing.
Mirrors the test:docker mechanism in custom-labels/js: a Dockerfile
under scripts/docker/ extending node:24-bookworm with python3 and
build-essential, plus a launcher script that builds the image
(cached), mounts the repo read-only, copies it into /tmp/work inside
the container, and runs `npm install && npm test`. The host tree is
never modified (no stray node_modules/, build/, out/).

Node 24 is used so the full test suite — including the new OTEP-4947
thread-context tests, which need AsyncContextFrame — runs without
extra Node flags.

Run via `npm run test:docker`.
@datadog-prod-us1-3

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Overall package size

Self size: 2.42 MB
Deduped: 3.12 MB
No deduping: 3.12 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | pprof-format | 2.2.2 | 500.53 kB | 500.53 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

szegedi added 4 commits June 10, 2026 15:44
- Add a static_assert that offsetof(CtxWrap, record_) ==
  sizeof(node::ObjectWrap), since that offset is part of the reader ABI.
  Restructure CtxWrap so record_, capacity_, and truncated_ live in a
  single public access section: C++ leaves cross-access-control field
  ordering implementation-defined, so splitting them would allow a
  conforming compiler to reorder the bookkeeping fields ahead of
  record_.
- Add an acq_rel signal fence between the pointer swap and free() in
  the reallocate path. The pre-existing release fence only constrains
  prior writes; nothing was stopping the compiler from hoisting free()
  above the publication store, which would let a stopped reader follow
  self->record_ into freed memory.
- Restore the [[unlikely]] annotation on the IsConstructCall() check.
- Misc local cleanups (std::min/max in two spots, assert valid==1 after
  the memcpy instead of redundantly setting it).
The CtxWrap accessor that returns the raw record as a Uint8Array is only
intended for tests and out-of-process-reader development. Naming it
DebugBytes (and exposing it as wrap.debugBytes() on the JS prototype)
makes that explicit at every call site.
v8::internal::Internals::kContinuationPreservedEmbedderDataOffset was
introduced in Node 22. Older versions don't have
ContinuationPreservedEmbedderData at all, and the TS layer already
refuses to install the hook there via asyncContextFrameError, so
StoreAls is never actually invoked on Node < 22 — we just need the
addon to compile so the package installs.
@szegedi szegedi added the semver-minor Usually minor non-breaking improvements label Jun 11, 2026
szegedi added 3 commits June 11, 2026 18:06
Two more Node-version-sensitive spots blocking the prebuild against
Node 18.0.0 headers:

- ArrayBuffer::Data() wasn't exposed in V8 10.1 (Node 18.0). Switch to
  GetBackingStore()->Data(), which has been available since V8 7.4 /
  Node 12. The shared_ptr atomic is a per-call cost in CopyBytes
  (twice per CtxWrap::New) and DebugBytes — neither is a hot path.
- kEmbedderDataSlotExternalPointerOffset is Node 22+. Same shape as the
  earlier kContinuationPreservedEmbedderDataOffset guard: publish a
  sentinel 0 on older Node so the addon's exported surface stays
  consistent across majors. A would-be reader can't reach a live record
  through it anyway (no CPED on Node < 22).
…removal)

Node 26 ships V8 14.x, which removes the old String::Utf8Length /
WriteUtf8 / NO_NULL_TERMINATION trio in favor of the V2 versions, and
removes Object::GetIsolate() entirely. Switch the encode loop to the
V2 forms on Node >= 24 (Node 22 ships V8 12.4 which never gets V2;
Node 24's V8 13.6 has both, Node 26's V8 14.x has only V2). Replace
exports->GetIsolate() with Isolate::GetCurrent() unconditionally —
they're equivalent during module init and the latter is the only
version that survives Node 26.
- Reformat ts/test/test-otel-thread-ctx.ts via gts (prettier).
- Drop unused parameter declarations from the non-Linux stubs in
  ts/src/otel-thread-ctx.ts (they were carrying _-prefix names that
  gts's eslint still flags); TS allows fewer params on the assigned
  function than the declared variable's signature requires.
- Use strict ==/!= equality instead of loose null-check.
- Disable no-sparse-arrays in the test file: holes in attribute arrays
  are part of the wire format we're verifying.
- Use `void` prefix on the runWithContext() call inside the sync test
  whose return type confuses no-floating-promises.
__attribute__((visibility("default"))) is a GCC/Clang extension that
MSVC doesn't recognize, breaking the Windows prebuild. Guard it
behind __GNUC__/__clang__. Visibility is irrelevant on Windows anyway
since the OTEP-4947 reader contract is ELF-TLSDESC and only meaningful
on Linux.
Two cases the prior unconditional describe didn't cover:

- AsyncContextFrame (the writer's discovery substrate) is opt-in on
  Node 22/23 (requires --experimental-async-context-frame), on by
  default in Node 24+ (disable-able via --no-async-context-frame), and
  absent on Node < 22. The TS layer's asyncContextFrameError refuses
  to install the hook in each of those cases; the test now mirrors
  the same predicate so the suite is skipped instead of failing every
  test with "feature unavailable".

- The discovery-contract test reads the addon binary from
  build/Release/dd_pprof.node, which exists only on the
  build-from-source path. The prebuild-install / node-gyp-build CI
  matrix uses a prebuilt binary from prebuilds/, so the path doesn't
  exist there. Skip that one test when the file isn't present.
szegedi added 4 commits June 16, 2026 10:27
Add an otelThreadCtx namespace alongside time/heap so consumers can
reach the writer (runWithContext, enterWithContext, clearContext,
appendAttributes, isContextTruncated, makeNamedContext) via
require('@datadog/pprof').otelThreadCtx without importing internal
paths. The debug-only _currentRecordBytes accessor stays unexposed.
Native CtxWrap.spanIdMatches(buf) memcmps the 8-byte argument against
record_->span_id and returns a boolean — no allocation on the JS side
when the caller passes a stable Uint8Array (typically cached on the
span object).

Exposed at the JS layer as:

  - top-level currentSpanIdMatches(spanIdBytes), returns false outside
    a context, on non-Linux platforms, and for arguments that aren't
    8-byte Uint8Arrays.
  - NamedContext.currentSpanIdMatches(spanIdBytes), a passthrough.

Motivation: dd-trace-js's storage:enter channel fires on every
async-context resume. Without a way to ask "is this span already
the active context?", the writer ends up allocating a fresh CtxWrap
on each enter, even when re-entering the same span — the same
allocation-churn pattern the wall profiler fixed in dd-trace-js#8638.
…elpers

Reshape the otelThreadCtx namespace around an explicitly-allocated
CtxWrap object so consumers can cache one record per span and re-install
it without allocating churn:

- The native CtxWrap class is now constructable from JS via
  `new pprof.otelThreadCtx.CtxWrap(traceId, spanId, attributes?)`.
- New top-level functions get/set/runWithContext take a wrap reference
  (or undefined). `getContext() === wrap` becomes the cheap identity
  check that replaces the previous currentSpanIdMatches dance.
- The opts-form helpers (`enterWithContext`, `appendAttributes`,
  `clearContext`, `isContextTruncated`, `currentSpanIdMatches`) are
  removed at the top level. Callers go through the wrap directly:
  `wrap.appendAttributes(...)`, `wrap.isTruncated()`,
  `setContext(undefined)`.
- NamedContext stays as a name→index resolver: `buildContext(opts)`
  returns a CtxWrap with attributes resolved positionally;
  `runWithContext` / `enterWithContext` / `clearContext` are kept as
  one-liner sugar that compose with the new top-level functions.

Native: the CtxWrap class is registered with its new name (was
"OtelThreadCtxWrap") and the per-instance "append" method is now
"appendAttributes" for parity with the JS-side phrasing. The
SpanIdMatches binding is dropped.

Reasoning: under AsyncContextFrame each fork inherits the wrap by
reference, so once dd-trace-js caches one CtxWrap per span and
re-installs it on every storage:enter, both the allocation churn we
saw in the wall profiler (PR dd-trace-js#8638) and the "different
wraps in different CPEDs for the same span" stale-record edge case
go away — there's exactly one record per span across the whole
lifetime, and mutations via wrap.appendAttributes propagate
naturally because the native realloc-on-append path updates the
wrap's record_ pointer in place, never the JS wrapper.
`CtxWrap` leaks the C++ ObjectWrap implementation detail — as far as
the JS API is concerned, the object IS the thread context. Rename:

- The JS-visible class name (SetClassName) and TS interface:
  `CtxWrap` → `ThreadContext`.
- The TS constructor interface: `CtxWrapCtor` → `ThreadContextCtor`.
- The addon constructor export: `addon.ctxWrap` → `addon.threadContext`.
- The non-Linux fallback class: `NoopCtxWrap` → `NoopThreadContext`.
- Parameter/variable names previously named `wrap` (or `_wrap`) →
  `context` (or `_context`).

The native C++ class itself stays `CtxWrap` — that's the conventional
ObjectWrap-ish naming on the C++ side and isn't user-visible.

Error messages updated to refer to "ThreadContext" too ("not a
ThreadContext", "ThreadContext must be called with `new`", etc.).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-minor Usually minor non-breaking improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant