feat(tracing): OTLP trace metrics export [DO NOT MERGE]#18354
Draft
mabdinur wants to merge 23 commits into
Draft
feat(tracing): OTLP trace metrics export [DO NOT MERGE]#18354mabdinur wants to merge 23 commits into
mabdinur wants to merge 23 commits into
Conversation
Adds the tri-state DD_TRACE_OTEL_STATS_COMPUTATION_ENABLED setting and a _is_otlp_trace_metrics_enabled resolver that auto-enables OTLP trace metrics export when both OTLP trace export and OTel metrics export are enabled. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds in-memory span-stats aggregation (SpanAggKey/SpanAggStats/SpanBuckets/ TimeBuckets) and a dependency-free OTLP serializer that emits the dd.trace.span.duration histogram (count+sum, delta temporality) as protobuf or JSON, mirroring the dd-trace-js reference implementation. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds the HTTP exporter that POSTs serialized span-stats payloads to the OTLP /v1/metrics endpoint with the protocol-appropriate Content-Type, swallows transport errors, and emits export attempt/success/error telemetry counters. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds the SpanProcessor/PeriodicService that buckets finished top-level and measured spans by end time and periodically exports the aggregated stats as OTLP metrics, flushing remaining buckets on shutdown. Co-authored-by: Cursor <cursoragent@cursor.com>
Registers OtlpSpanStatsProcessor in the default span processor factory when OTLP trace metrics are enabled, and turns off native stats computation in that case to avoid double-counting. Adds an end-to-end OTLP trace-metrics export test and routes the exporter request through the connection's base path. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Codeowners resolved as |
Contributor
|
BenchmarksBenchmark execution time: 2026-06-16 01:15:01 Comparing candidate commit c374358 in PR branch Found 0 performance improvements and 5 performance regressions! Performance is the same for 612 metrics, 10 unstable metrics. scenario:iastaspects-stringio_noaspect
scenario:iastaspectsospath-ospathbasename_aspect
scenario:span-start
scenario:startup-ddtrace_run_send_span
scenario:telemetryaddmetric-1-count-metric-1-times
|
The subprocess harness fails when the native writer logs a trace-send error to stderr. Route traces to the same mock OTLP server and filter received payloads by path so the metrics assertions are isolated. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the Python-side OTLP span-stats pipeline with libdatadog-native stats computation, OTLP histogram encoding, and HTTP export. dd-trace-py now only reads the OTLP metrics configuration and supplies the endpoint and headers to the NativeWriter, which delegates to libdatadog. - Remove ddtrace/internal/otlp_stats/* and the OtlpSpanStatsProcessor wiring - Add set_otlp_metrics_endpoint/headers native bindings - Pin libdatadog to the munir/otlp-trace-metrics feature branch - Move unit tests to libdatadog; keep the e2e test exporting via the native stats worker's periodic flush DO NOT MERGE: depends on the libdatadog feature branch; the final implementation will land in libdatadog. Co-authored-by: Cursor <cursoragent@cursor.com>
Align OTLP trace-metrics with the cross-tracer contract: rename the enable flag to OTEL_CLIENT_STATS_COMPUTATION_ENABLED, add DD_TRACE_OTEL_SEMANTICS_ENABLED to emit OTel-only attributes, and fix the flush cadence at 10s with the internal _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL (ms) test override. Re-pin libdatadog and migrate to libdd-remote-config. Co-authored-by: Cursor <cursoragent@cursor.com>
Update the OTLP trace-metrics integration test for the renamed traces.span.sdk.metrics.duration histogram, read service identity from the InstrumentationScope, and shorten the fixed flush cadence via _DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL. Co-authored-by: Cursor <cursoragent@cursor.com>
Pins src/native/Cargo.lock to libdatadog 837e1917, which emits the OTLP duration histogram with fixed spanmetrics-style explicit bounds. Co-authored-by: Cursor <cursoragent@cursor.com>
libdatadog now reports service.name/service.version/deployment.environment.name as resource attributes (single InstrumentationScope), emitting service.name on a data point only when a span's service differs from the configured default. Re-pin libdatadog to 897ee5cb9 and update the OTLP trace-metrics test accordingly. Co-authored-by: Cursor <cursoragent@cursor.com>
libdatadog no longer emits a `dd-trace` InstrumentationScope (redundant with the resource's telemetry.sdk.* attributes). Re-pin libdatadog to 1612d50ef and assert the exported metrics carry no scope. Co-authored-by: Cursor <cursoragent@cursor.com>
…TRACES_SPAN_METRICS_ENABLED Rename the OTLP trace-metrics enablement env var and regenerate the supported configuration tables. Co-authored-by: Cursor <cursoragent@cursor.com>
…efix) Picks up the OTLP trace-metric attribute rename (dd.* -> datadog.*) from libdatadog. Co-authored-by: Cursor <cursoragent@cursor.com>
Bumps the libdatadog branch pin to the restructured tip of DataDog/libdatadog#munir/otlp-trace-metrics. The OTLP trace-metrics PR on the libdatadog side is now five focused commits with an identical working tree: 09aa180ea feat(ddsketch,trace-utils): OTLP roundtrip helpers 21d4f8933 feat(trace-stats): expose exact per-cell sum/min/max via OTLP sidecar ab4140d51 refactor(data-pipeline): share OTLP HTTP transport; add OtlpMetricsConfig e8077d3e0 feat(data-pipeline): export client-computed span stats as OTLP trace metrics dbc8c4bf4 feat(trace-stats)!: key aggregation by gRPC method name The dd-trace-py side picks up identical OTLP trace-metrics behavior; only the underlying commit hash changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…als) Bumps the libdatadog branch pin to the cleaned tip of DataDog/libdatadog#munir/otlp-trace-metrics. The libdatadog PR is unchanged in behavior — the new tip just restores doc/comment removals that weren't core to the OTLP trace-metrics work (`OtlpTraceConfig` field docs, `bufferLen` comment, `send_otlp_traces_http` multi-paragraph doc). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…9942) Rebases DataDog/libdatadog#munir/otlp-trace-metrics onto the latest main (a82069942 — Azure Functions instance name fallback fix) to eliminate libdd-common drift that was causing the internal package_ffi_on_windows GitLab CI job to fail. OTLP trace-metrics behavior is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…otlp-trace-metrics The libdatadog branch added a Default variant to AssignmentReason. Map it to Reason::Default in the From impl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…races) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…restore V1 negotiation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Warning
DO NOT MERGE — pins
libdatadogto a feature branch. Companion: DataDog/libdatadog#2067.Overview
Wires the
libdatadogOTLP trace-metrics export into dd-trace-py. When enabled, eligible spans are aggregated natively in Rust into atraces.span.sdk.metrics.durationhistogram and exported to a/v1/metricsendpoint alongside traces. dd-trace-py's role is configuration only — stats aggregation, OTLP encoding, and HTTP export all run insidelibdatadog.Non-obvious details for reviewers
OTEL_TRACES_SPAN_METRICS_ENABLED. Auto-enables when bothOTEL_TRACES_EXPORTER=otlpandDD_METRICS_OTEL_ENABLED=trueare set.OTEL_METRIC_EXPORT_INTERVAL. Use_DD_TRACE_METRICS_OTEL_FLUSH_INTERVAL(ms) to shorten it in tests.OTEL_EXPORTER_OTLP_METRICS_ENDPOINTis passed to libdatadog verbatim — it must include/v1/metrics; it is not auto-suffixed._dd.stats_computed: "true"is set on every outbound OTLP traceResourceSpansto tell Datadog Agent OTLP receivers to skip their concentrator and prevent double-counted APM metrics. Backwards compatible with released Agents.src/native/Cargo.lockis pinned tomunir/otlp-trace-metricsbranch head (9cd1ad5).Testing:
tests/opentelemetry/test_otlp_trace.py::test_otlp_trace_metrics_exported_via_http. Cross-tracer coverage in DataDog/system-tests parametric suite.