feat(tracer): Implement OTLP Traces Export by mtoffl01 · Pull Request #4600 · DataDog/dd-trace-go

mtoffl01 · 2026-03-25T16:45:33Z

What does this PR do?

Implements the OTLP trace export pipeline (See: RFC) for dd-trace-go: when OTEL_TRACES_EXPORTER=otlp is set, the tracer converts Datadog spans into OTLP protobuf format and sends them directly to an OpenTelemetry Collector instead of the Datadog agent.

This adds three new components:

span_to_otlp.go — converts internal Datadog spans to OTLP TracesData protobuf, including resource attributes, span kind/status, events, links, 128-bit trace IDs, and sampling filtering (only sampled spans are exported).
otlp_writer.go — a traceWriter implementation that buffers converted OTLP spans, marshals them to protobuf, and flushes with retry logic.
otlp_transport.go — a lightweight HTTP transport for sending protobuf payloads to an OTLP endpoint, separate from the existing Datadog-protocol transport.

The PR also renames the existing transport interface and config.transport field to ddTransport to make it clear that it handles Datadog-specific traffic (msgpack traces, stats), and ensures ddTransport always points at the Datadog agent even when OTLP mode is active.

Motivation

We are adding native OTLP export support to dd-trace-go so users can send traces directly to any OTLP-compatible collector. A prior PR added the configuration layer that resolves OTEL_TRACES_EXPORTER, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, and OTEL_EXPORTER_OTLP_TRACES_HEADERS. This PR builds the runtime pipeline on top of that configuration.

IMPORTANT NOTE!

The existing agentTraceWriter and ddTransport was too tightly coupled to Datadog conventions — msgpack encoding, agent sampling rate feedback, stats collection — to extend it for the otlp path, so a separate writer and transport were introduced instead. The otlpTraceWriter owns its own otlpTransport directly (rather than reaching through config.ddTransport), keeping OTLP concerns isolated from the Datadog agent path. A follow-up PR may remove ddTransport from the config struct entirely, instead passing the transport directly to each writer and consumer at construction time. This would make all writer implementations structurally consistent — each owning its transport as a field — and eliminate the implicit coupling through config.

An additional PR to disable stats collection in otlp mode is in progress here.

Reviewer's Checklist

Changed code has unit tests for its functionality at or near 100% coverage.
System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
There is a benchmark for any new code, or changes to existing code.
If this interacts with the agent in a new way, a system test has been added.
New code is free of linting errors. You can check this by running make lint locally.
New code doesn't break existing tests. You can check this by running make test locally.
Add an appropriate team label so this PR gets put in the right place for the release notes.
All generated files are up to date. You can check this by running make generate locally.
Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running make fix-modules locally.

Unsure? Have a question? Request a review!

…port

… to determine otlp mode

Made-with: Cursor

…traceURL

…nfig registry to reflect this

…ut ddspan -> otlpspan conversion

… span_to_otlp_test.go

… service name differs from global service name

…me (#4576) ### What does this PR do? Moves the `strconv.ParseInt` call for `_dd.p.dm` (decision maker) out of the v1 payload flush path and into the write path. A `dm uint32` field is added to the `trace` struct. It is kept in sync with `propagatingTags[keyDecisionMaker]` at all mutation sites (`setPropagatingTagLocked`, `unsetPropagatingTagLocked`, `replacePropagatingTags`). A shared `unsetPropagatingTagLocked` helper is introduced so the delete-and-clear logic is not duplicated between `unsetPropagatingTag` and `setSamplingPriorityLockedWithForce`. `payloadV1.push` now reads the pre-computed value via `trace.decisionMaker()` instead of fetching the string from the propagating tags map and parsing it on every flush. ### Motivation `payloadV1.push` is called once per finished trace chunk. On every call it was acquiring a read lock, doing a map lookup, and then running `strconv.ParseInt` — all to read a value that was set at most once or twice during the lifetime of the trace. Moving the parse to write-time eliminates the per-flush map lookup and parse, replacing them with a single scalar read behind an `RLock`. ### Reviewer's Checklist - [x] Changed code has unit tests for its functionality at or near 100% coverage. - [x] New code is free of linting errors. You can check this by running `make lint` locally. - [x] New code doesn't break existing tests. You can check this by running `make test` locally. - [x] Add an appropriate team label so this PR gets put in the right place for the release notes. - [x] All generated files are up to date. You can check this by running `make generate` locally. Co-authored-by: dario.castane <dario.castane@datadoghq.com>

…ED` once, use a global `bool` (#4548) Stop checking environment variable `DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED` on span start. Checks it once, use a global `bool`. It'd be better to load it on `newConfig` and make the value available, but this one happens in `newSpanContext`, which modification will cause a lot of changes in multiple files, including tests. Complete #1905 work using the current benchmarking platform. - [x] Changed code has unit tests for its functionality at or near 100% coverage. - [x] There is a benchmark for any new code, or changes to existing code. - [x] New code is free of linting errors. You can check this by running `make lint` locally. - [x] New code doesn't break existing tests. You can check this by running `make test` locally. - [x] Add an appropriate team label so this PR gets put in the right place for the release notes. Unsure? Have a question? Request a review! Co-authored-by: kakkoyun <kakkoyun@users.noreply.github.com> Co-authored-by: kemal.akkoyun <kemal.akkoyun@datadoghq.com>

Signed-off-by: Moe Zein <moe.zein@datadoghq.com>

Co-authored-by: Benjamin De Bernardi <debernardi.benjamin@gmail.com>

pr-commenter · 2026-03-25T17:12:47Z

Benchmarks

Benchmark execution time: 2026-03-27 18:57:30

Comparing candidate commit 12d75b3 in PR branch mtoff/otlp-export-traces with baseline commit ee8dce2 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 215 metrics, 9 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

…field

mtoffl01 · 2026-03-27T15:13:30Z

Benchmark comparison: OTLP vs Agent TraceWriter

Time (sec/op)

Benchmark	OTLP	Agent
Add/1span	512.8ns	549.7ns	~same
Add/5spans	2.513µs	772.9ns	Agent 3.3x faster
Add/10spans	5.076µs	1.399µs	Agent 3.6x faster
Add/50spans	25.06µs	6.370µs	Agent 3.9x faster
Flush	47.12µs	390.7µs	OTLP 8.3x faster
ConcurrentAdd/1 goroutine	2.893µs	1.918µs	Agent 1.5x faster
ConcurrentAdd/2 goroutines	6.359µs	4.543µs	Agent 1.4x faster
ConcurrentAdd/4 goroutines	12.35µs	7.930µs	Agent 1.6x faster
ConcurrentAdd/8 goroutines	24.53µs	16.42µs	Agent 1.5x faster

Allocations (allocs/op)

Benchmark	OTLP	Agent
Add/1span	15	1	Agent 15x fewer
Add/5spans	75	1	Agent 75x fewer
Add/10spans	150	1	Agent 150x fewer
Add/50spans	750	1	Agent 750x fewer
Flush	102	223	OTLP 2.2x fewer

Summary

add: OTLP is slower because it allocates proto structs per span (15 allocs/span), whereas the agent encodes directly into a reusable byte buffer (1 alloc/span).
flush: OTLP is faster despite needing to proto.Marshal at flush time, whereas the agent's payload is already serialized and just needs to be sent. The difference is likely due to the agent flush path doing more work (payload bookkeeping, rate reading, statsd metrics) than the OTLP flush path (proto.Marshal, send, retry).
concurrent add: Both scale linearly with goroutine count, indicating clean mutex behavior with no unexpected contention overhead.

darccio · 2026-03-27T15:33:47Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c371138f9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-27T15:39:13Z

ddtrace/tracer/otlp_writer.go

+func newOTLPTraceWriter(c *config) *otlpTraceWriter {
+	return &otlpTraceWriter{
+		config:    c,
+		transport: newOTLPTransport(c.httpClient, c.internalConfig.OTLPTraceURL(), c.internalConfig.OTLPHeaders()),


Build OTLP transport with a non-UDS HTTP client

In OTLP mode this constructor reuses c.httpClient, but newConfig builds that client from the Datadog agent URL and, when the agent is configured via unix:// (or auto-detected UDS), the client dialer always targets the Unix socket regardless of request host. As a result, OTLP exports to OTEL_EXPORTER_OTLP_TRACES_ENDPOINT are sent through the agent socket and fail instead of reaching the collector. This breaks OTLP trace export in UDS-based deployments unless users manually override the HTTP client.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-27T15:39:14Z

ddtrace/tracer/span_to_otlp.go

+	case int64:
+		return otlpIntValue(val)
+	case float64:
+		return otlpDoubleValue(val)
+	case []any:


Preserve typed meta_struct values in OTLP conversion

This conversion only treats int64, float64, and []any as typed OTLP values, so common meta_struct payloads like int, []int64, []string, and other concrete slice/integer forms fall into the JSON-string fallback path. That silently changes structured attributes into strings in OTLP output, which loses type information and can break downstream processing that expects numeric/array values.

Useful? React with 👍 / 👎.

It's accurate that there are a few types not captured by this switch but to capture all types would be infinite and recursive. The only types stored in meta_struct and not captured here, that I could find, are:

map[string][]any

map[string][]*stacktrace.Event

darccio · 2026-03-27T16:37:17Z

ddtrace/tracer/span_to_otlp.go

+	return &otlptrace.Span{
+		TraceId:           convertTraceID(s.context.traceID.Upper(), s.context.traceID.Lower()),
+		SpanId:            convertSpanID(s.spanID),
+		ParentSpanId:      convertSpanID(s.parentID),


What should be the value of ParentSpanId if s.parentID is 0?

I think []byte{0}... @zacharycmontoya is that true?

darccio · 2026-03-27T16:40:41Z

ddtrace/tracer/span_to_otlp.go

+	add := func(key string, val *otlpcommon.AnyValue) bool {
+		if val != nil {
+			attrs = append(attrs, &otlpcommon.KeyValue{Key: key, Value: val})
+		}
+		return len(attrs) < maxAttributesCount
+	}


Could this be a real function? No need for a closure here, I think.

Also, what will happen with the dropped attributes when len(attrs) == maxAttributesCount? I'm curious too about the returned boolean, because it's checked sometimes but not always.

Closure vs function: You're right, this function will be built and torn down on every call to convertSpanAttributes, which we expect to happen many many times

Once we hit the limit, we silently drop the rest of the attributes for now

Checking/not checking the returned boolean was an oversight, it should be fixed now.

darccio · 2026-03-27T16:43:53Z

ddtrace/tracer/span_to_otlp.go

+	"github.com/DataDog/dd-trace-go/v2/internal/version"
+)
+
+const maxAttributesCount = 128


Thinking about this limit, I assume it comes from the spec: isn't the order in convertSpanAttributes relevant?

If meta contains hundreds of tags, which is pretty common, we risk losing important attributes. Iterating maps happens in random order, so we might should guarantee adding first the required attributes that the collector might expect.

Yes, but the priority order is explicitly undefined for now, beyond span.kind and operation.name which have the top priority.

From my conversation with @zacharycmontoya the other day:

We should prioritize these two span attributes [span.kind and operation.name] over all others, so I recommend adding them first.
I don't really want to prescribe an order of preference for the meta/metrics/metastruct, only that operation.name and span.type are always set first

In follow-up PRs we would support the "dropped attributes count" -- a metric that increments for every attribute we have to drop beyond the limit -- but that's not in scope yet

darccio · 2026-03-27T16:44:42Z

ddtrace/tracer/otlp_writer_test.go

+	assert.Equal(t, numAdders*spansPerAdder, totalSpans)
+}
+
+// TODO: Add benchmark tests.


Aren't benchmarks present in otlp_writer_bench_test.go?

yes this was leftover 😆

ddtrace/tracer/span_to_otlp.go

ddtrace/tracer/otlp_writer_bench_test.go

darccio · 2026-03-27T16:47:55Z

ddtrace/tracer/otlp_transport.go

+
+// send posts a protobuf-encoded payload to the configured OTLP endpoint.
+func (t *otlpTransport) send(data []byte) error {
+	req, err := http.NewRequest("POST", t.endpoint, bytes.NewReader(data))


Should we use a context to be able to cancel the request in case there are timeouts?

Co-authored-by: Dario Castañé <d@rio.hn>

mtoffl01 and others added 30 commits March 23, 2026 09:30

Allow Config to resolve traceURL; pass this as a field into httptrans…

8f7f60f

…port

Fix otlp protocol resolution and add tests for all new logic

3e3fec8

Implement OTEL_EXPORTER_OTLP_TRACES_HEADERS

c559f8f

Fix superfluous referecne to unsupport OTLP header config

cb6908e

Introduce otlpExportMode on Config; use this instead of traceProtocol…

a85fedf

… to determine otlp mode

fix relationship between otlpexportmode and protocol

33a9a37

replace tracer.config.otlpExportMode with internalconfig otlpExportMode

3e51fdd

Made-with: Cursor

Clean up resolveOTLPHeaders

5e252be

cleanup

51f72a2

more cleanup and remove OTEL_EXPORTER_OTLP_ENDPOINT

b71eec7

Store otlpTraceURL instead of traceURL on Config; let tracer resolve …

f48d44f

…traceURL

Make DD_TRACE_AGENT_PROTOCOL_VERSION override OTEL_TRACES_EXPORTER

f4360b4

Merge branch 'main' into mtoff/otlp-export-config

49d0937

Use GetMap to resolve OTEL_EXPORTER_OTLP_TRACES_HEADERS and change co…

a88b7b2

…nfig registry to reflect this

give OTEL_EXPORTER_OTLP_TRACES_ENDPOINT default value of null

d475920

Merge branch 'main' into mtoff/otlp-export-config

8188bbe

flesh out traceWriterOtlp

af909c3

add new otlpSender for transporting otlp traces specifically; build o…

9246084

…ut ddspan -> otlpspan conversion

rename file payload_otlp_convert.go -> span_to_otlp.go; and introduce…

a934fdd

… span_to_otlp_test.go

introduce otlp_writer_test.go: basic otlp writer tests

2cfa217

Fix: only export sampled spans, and correct telemetry.sdk.version value

7bba29d

Test buildResource

89cb7e6

implement edge case: add 'service.name' attribute to span when span's…

da1efd4

… service name differs from global service name

add comment about s.metaStruct

49053c1

chore: disable automated dependency updater config [incident-51602]

ffbea4e

Signed-off-by: Moe Zein <moe.zein@datadoghq.com>

merge conflicts

9882338

merge conflicts

5072b2a

Apply suggestion from @genesor

78e160f

Co-authored-by: Benjamin De Bernardi <debernardi.benjamin@gmail.com>

mtoffl01 and others added 8 commits March 25, 2026 13:31

Merge branch 'main' into mtoff/otlp-export-traces

6deec12

rename transport to ddtransport: be clear

996c4d7

remove otlpSender; make otlpTraceWriter handle its own http client

697c2ba

introduce new otlpTransport, plus tests

d7749bb

cleanup setting transport on tracer.config in non-otlp conditions

dcd8a10

fix imports

6dbab08

fix lint errors

d0765c8

Add comment reminder about benchmarking protobuf

acc9688

mtoffl01 marked this pull request as ready for review March 25, 2026 18:51

mtoffl01 requested a review from a team as a code owner March 25, 2026 18:51

mtoffl01 marked this pull request as draft March 25, 2026 19:38

mtoffl01 added 6 commits March 25, 2026 16:06

implement flush condition if hit payloadMaxSize

490d6f1

Write benchmarks for otlp trace writer

c07bf3c

fix nits from CI

90dccdd

Add last span atttributes and attribute limit, as well as tracestate …

99ef3c8

…field

fix checklocks on buffer

ddfc0e2

fix gofmt otlp_writer.go

c371138

mtoffl01 marked this pull request as ready for review March 26, 2026 19:44

Merge branch 'main' into mtoff/otlp-export-traces

ac556e6

chatgpt-codex-connector bot reviewed Mar 27, 2026

View reviewed changes

darccio requested changes Mar 27, 2026

View reviewed changes

darccio requested review from a team and darccio and removed request for darccio March 27, 2026 17:46

mtoffl01 and others added 2 commits March 27, 2026 14:08

Fix converSpanAttribute

cd63872

Apply suggestions from code review

12d75b3

Co-authored-by: Dario Castañé <d@rio.hn>

Conversation

mtoffl01 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

IMPORTANT NOTE!

Reviewer's Checklist

Uh oh!

pr-commenter bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

mtoffl01 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark comparison: OTLP vs Agent TraceWriter

Time (sec/op)

Allocations (allocs/op)

Summary

Uh oh!

darccio commented Mar 27, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mtoffl01 commented Mar 25, 2026 •

edited

Loading

pr-commenter bot commented Mar 25, 2026 •

edited

Loading

mtoffl01 commented Mar 27, 2026 •

edited

Loading