Skip to content

feat(tracing): OTLP trace metrics export [DO NOT MERGE]#18354

Draft
mabdinur wants to merge 23 commits into
mainfrom
munir/otlp-trace-metrics-export
Draft

feat(tracing): OTLP trace metrics export [DO NOT MERGE]#18354
mabdinur wants to merge 23 commits into
mainfrom
munir/otlp-trace-metrics-export

Conversation

@mabdinur

@mabdinur mabdinur commented May 29, 2026

Copy link
Copy Markdown
Contributor

Warning

DO NOT MERGE — pins libdatadog to a feature branch. Companion: DataDog/libdatadog#2067.

Overview

Wires the libdatadog OTLP trace-metrics export into dd-trace-py. When enabled, eligible spans are aggregated natively in Rust into a traces.span.sdk.metrics.duration histogram and exported to a /v1/metrics endpoint alongside traces. dd-trace-py's role is configuration only — stats aggregation, OTLP encoding, and HTTP export all run inside libdatadog.

Non-obvious details for reviewers

  • Enabled via OTEL_TRACES_SPAN_METRICS_ENABLED. Auto-enables when both OTEL_TRACES_EXPORTER=otlp and DD_METRICS_OTEL_ENABLED=true are set.
  • Flush cadence is fixed at 10s and is not driven by OTEL_METRIC_EXPORT_INTERVAL. Use _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL (ms) to shorten it in tests.
  • OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is passed to libdatadog verbatim — it must include /v1/metrics; it is not auto-suffixed.
  • _dd.stats_computed: "true" is set on every outbound OTLP trace ResourceSpans to tell Datadog Agent OTLP receivers to skip their concentrator and prevent double-counted APM metrics. Backwards compatible with released Agents.
  • src/native/Cargo.lock is pinned to munir/otlp-trace-metrics branch head (9cd1ad5).

Testing: tests/opentelemetry/test_otlp_trace.py::test_otlp_trace_metrics_exported_via_http. Cross-tracer coverage in DataDog/system-tests parametric suite.

mabdinur and others added 6 commits May 29, 2026 15:57
Adds the tri-state DD_TRACE_OTEL_STATS_COMPUTATION_ENABLED setting and a
_is_otlp_trace_metrics_enabled resolver that auto-enables OTLP trace metrics
export when both OTLP trace export and OTel metrics export are enabled.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds in-memory span-stats aggregation (SpanAggKey/SpanAggStats/SpanBuckets/
TimeBuckets) and a dependency-free OTLP serializer that emits the
dd.trace.span.duration histogram (count+sum, delta temporality) as protobuf
or JSON, mirroring the dd-trace-js reference implementation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds the HTTP exporter that POSTs serialized span-stats payloads to the OTLP
/v1/metrics endpoint with the protocol-appropriate Content-Type, swallows
transport errors, and emits export attempt/success/error telemetry counters.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds the SpanProcessor/PeriodicService that buckets finished top-level and
measured spans by end time and periodically exports the aggregated stats as
OTLP metrics, flushing remaining buckets on shutdown.

Co-authored-by: Cursor <cursoragent@cursor.com>
Registers OtlpSpanStatsProcessor in the default span processor factory when
OTLP trace metrics are enabled, and turns off native stats computation in that
case to avoid double-counting. Adds an end-to-end OTLP trace-metrics export
test and routes the exporter request through the connection's base path.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented May 29, 2026

Copy link
Copy Markdown

Codeowners resolved as

src/native/Cargo.lock                                                   @DataDog/apm-core-python

@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 15 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | contrib/opentelemetry 1/4   View in Datadog   GitLab

🧪 1 Test failed

test_otlp_trace_metrics_exported_via_http[py3.12] from test_otlp_trace.py   View in Datadog (Fix with Cursor)
Expected status 0, got 1.
=== Captured STDOUT ===
=== End of captured STDOUT ===
=== Captured STDERR ===
Traceback (most recent call last):
  File &#34;tests/opentelemetry/test_otlp_trace.py&#34;, line 139, in &lt;module&gt;
    assert metrics_payload is not None, &#34;No OTLP metrics payload received by mock server&#34;
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: No OTLP metrics payload received by mock server
=== End of captured STDERR ===

DataDog/apm-reliability/dd-trace-py | contrib/opentelemetry 2/4   View in Datadog   GitLab

🧪 1 Test failed

test_otlp_trace_metrics_exported_via_http[py3.12] from test_otlp_trace.py   View in Datadog (Fix with Cursor)
Expected status 0, got 1.
=== Captured STDOUT ===
=== End of captured STDOUT ===
=== Captured STDERR ===
Traceback (most recent call last):
  File &#34;tests/opentelemetry/test_otlp_trace.py&#34;, line 139, in &lt;module&gt;
    assert metrics_payload is not None, &#34;No OTLP metrics payload received by mock server&#34;
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: No OTLP metrics payload received by mock server
=== End of captured STDERR ===

DataDog/apm-reliability/dd-trace-py | contrib/opentelemetry 3/4   View in Datadog   GitLab

🧪 1 Test failed

test_otlp_trace_metrics_exported_via_http[py3.12] from test_otlp_trace.py   View in Datadog (Fix with Cursor)
Expected status 0, got 1.
=== Captured STDOUT ===
=== End of captured STDOUT ===
=== Captured STDERR ===
Traceback (most recent call last):
  File &#34;tests/opentelemetry/test_otlp_trace.py&#34;, line 139, in &lt;module&gt;
    assert metrics_payload is not None, &#34;No OTLP metrics payload received by mock server&#34;
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: No OTLP metrics payload received by mock server
=== End of captured STDERR ===

View all 15 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🔄 Datadog retried 6 tests - 0 passed on retry View in Datadog

🚧 19 tests that failed were ignored due to quarantine View in Datadog

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: c374358 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter

pr-commenter Bot commented May 29, 2026

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2026-06-16 01:15:01

Comparing candidate commit c374358 in PR branch munir/otlp-trace-metrics-export with baseline commit fd67a37 in branch main.

Found 0 performance improvements and 5 performance regressions! Performance is the same for 612 metrics, 10 unstable metrics.

scenario:iastaspects-stringio_noaspect

  • 🟥 execution_time [+42.099µs; +48.949µs] or [+11.988%; +13.939%]

scenario:iastaspectsospath-ospathbasename_aspect

  • 🟥 execution_time [+93.789µs; +101.051µs] or [+21.800%; +23.487%]

scenario:span-start

  • 🟥 execution_time [+1.150ms; +1.290ms] or [+7.359%; +8.254%]

scenario:startup-ddtrace_run_send_span

  • 🟥 execution_time [+1.602s; +1.605s] or [+86.173%; +86.351%]

scenario:telemetryaddmetric-1-count-metric-1-times

  • 🟥 execution_time [+294.949ns; +333.579ns] or [+14.230%; +16.094%]

mabdinur and others added 2 commits May 31, 2026 23:57
The subprocess harness fails when the native writer logs a trace-send error to
stderr. Route traces to the same mock OTLP server and filter received payloads
by path so the metrics assertions are isolated.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the Python-side OTLP span-stats pipeline with libdatadog-native
stats computation, OTLP histogram encoding, and HTTP export. dd-trace-py
now only reads the OTLP metrics configuration and supplies the endpoint
and headers to the NativeWriter, which delegates to libdatadog.

- Remove ddtrace/internal/otlp_stats/* and the OtlpSpanStatsProcessor wiring
- Add set_otlp_metrics_endpoint/headers native bindings
- Pin libdatadog to the munir/otlp-trace-metrics feature branch
- Move unit tests to libdatadog; keep the e2e test exporting via the
  native stats worker's periodic flush

DO NOT MERGE: depends on the libdatadog feature branch; the final
implementation will land in libdatadog.

Co-authored-by: Cursor <cursoragent@cursor.com>
mabdinur and others added 15 commits June 1, 2026 17:02
Align OTLP trace-metrics with the cross-tracer contract: rename the
enable flag to OTEL_CLIENT_STATS_COMPUTATION_ENABLED, add
DD_TRACE_OTEL_SEMANTICS_ENABLED to emit OTel-only attributes, and fix the
flush cadence at 10s with the internal _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL
(ms) test override. Re-pin libdatadog and migrate to libdd-remote-config.

Co-authored-by: Cursor <cursoragent@cursor.com>
Update the OTLP trace-metrics integration test for the renamed
traces.span.sdk.metrics.duration histogram, read service identity from
the InstrumentationScope, and shorten the fixed flush cadence via
_DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL.

Co-authored-by: Cursor <cursoragent@cursor.com>
Pins src/native/Cargo.lock to libdatadog 837e1917, which emits the OTLP
duration histogram with fixed spanmetrics-style explicit bounds.

Co-authored-by: Cursor <cursoragent@cursor.com>
libdatadog now reports service.name/service.version/deployment.environment.name
as resource attributes (single InstrumentationScope), emitting service.name on a
data point only when a span's service differs from the configured default. Re-pin
libdatadog to 897ee5cb9 and update the OTLP trace-metrics test accordingly.

Co-authored-by: Cursor <cursoragent@cursor.com>
libdatadog no longer emits a `dd-trace` InstrumentationScope (redundant with the
resource's telemetry.sdk.* attributes). Re-pin libdatadog to 1612d50ef and assert
the exported metrics carry no scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
…TRACES_SPAN_METRICS_ENABLED

Rename the OTLP trace-metrics enablement env var and regenerate the supported
configuration tables.

Co-authored-by: Cursor <cursoragent@cursor.com>
…efix)

Picks up the OTLP trace-metric attribute rename (dd.* -> datadog.*) from libdatadog.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bumps the libdatadog branch pin to the restructured tip of
DataDog/libdatadog#munir/otlp-trace-metrics. The OTLP trace-metrics PR
on the libdatadog side is now five focused commits with an identical
working tree:

  09aa180ea feat(ddsketch,trace-utils): OTLP roundtrip helpers
  21d4f8933 feat(trace-stats): expose exact per-cell sum/min/max via OTLP sidecar
  ab4140d51 refactor(data-pipeline): share OTLP HTTP transport; add OtlpMetricsConfig
  e8077d3e0 feat(data-pipeline): export client-computed span stats as OTLP trace metrics
  dbc8c4bf4 feat(trace-stats)!: key aggregation by gRPC method name

The dd-trace-py side picks up identical OTLP trace-metrics behavior; only
the underlying commit hash changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…als)

Bumps the libdatadog branch pin to the cleaned tip of
DataDog/libdatadog#munir/otlp-trace-metrics. The libdatadog PR is unchanged
in behavior — the new tip just restores doc/comment removals that weren't
core to the OTLP trace-metrics work (`OtlpTraceConfig` field docs,
`bufferLen` comment, `send_otlp_traces_http` multi-paragraph doc).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…9942)

Rebases DataDog/libdatadog#munir/otlp-trace-metrics onto the latest main
(a82069942 — Azure Functions instance name fallback fix) to eliminate
libdd-common drift that was causing the internal package_ffi_on_windows
GitLab CI job to fail. OTLP trace-metrics behavior is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…otlp-trace-metrics

The libdatadog branch added a Default variant to AssignmentReason.
Map it to Reason::Default in the From impl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…races)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…restore V1 negotiation)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant