Conversation
… to determine otlp mode
…nfig registry to reflect this
…ut ddspan -> otlpspan conversion
… span_to_otlp_test.go
… service name differs from global service name
…me (#4576) ### What does this PR do? Moves the `strconv.ParseInt` call for `_dd.p.dm` (decision maker) out of the v1 payload flush path and into the write path. A `dm uint32` field is added to the `trace` struct. It is kept in sync with `propagatingTags[keyDecisionMaker]` at all mutation sites (`setPropagatingTagLocked`, `unsetPropagatingTagLocked`, `replacePropagatingTags`). A shared `unsetPropagatingTagLocked` helper is introduced so the delete-and-clear logic is not duplicated between `unsetPropagatingTag` and `setSamplingPriorityLockedWithForce`. `payloadV1.push` now reads the pre-computed value via `trace.decisionMaker()` instead of fetching the string from the propagating tags map and parsing it on every flush. ### Motivation `payloadV1.push` is called once per finished trace chunk. On every call it was acquiring a read lock, doing a map lookup, and then running `strconv.ParseInt` — all to read a value that was set at most once or twice during the lifetime of the trace. Moving the parse to write-time eliminates the per-flush map lookup and parse, replacing them with a single scalar read behind an `RLock`. ### Reviewer's Checklist - [x] Changed code has unit tests for its functionality at or near 100% coverage. - [x] New code is free of linting errors. You can check this by running `make lint` locally. - [x] New code doesn't break existing tests. You can check this by running `make test` locally. - [x] Add an appropriate team label so this PR gets put in the right place for the release notes. - [x] All generated files are up to date. You can check this by running `make generate` locally. Co-authored-by: dario.castane <dario.castane@datadoghq.com>
…ED` once, use a global `bool` (#4548) Stop checking environment variable `DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED` on span start. Checks it once, use a global `bool`. It'd be better to load it on `newConfig` and make the value available, but this one happens in `newSpanContext`, which modification will cause a lot of changes in multiple files, including tests. Complete #1905 work using the current benchmarking platform. - [x] Changed code has unit tests for its functionality at or near 100% coverage. - [x] There is a benchmark for any new code, or changes to existing code. - [x] New code is free of linting errors. You can check this by running `make lint` locally. - [x] New code doesn't break existing tests. You can check this by running `make test` locally. - [x] Add an appropriate team label so this PR gets put in the right place for the release notes. Unsure? Have a question? Request a review! Co-authored-by: kakkoyun <kakkoyun@users.noreply.github.com> Co-authored-by: kemal.akkoyun <kemal.akkoyun@datadoghq.com>
Signed-off-by: Moe Zein <moe.zein@datadoghq.com>
Co-authored-by: Benjamin De Bernardi <debernardi.benjamin@gmail.com>
BenchmarksBenchmark execution time: 2026-03-27 18:57:30 Comparing candidate commit 12d75b3 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 215 metrics, 9 unstable metrics.
|
Benchmark comparison: OTLP vs Agent TraceWriterTime (sec/op)
Allocations (allocs/op)
Summary
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c371138f9b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func newOTLPTraceWriter(c *config) *otlpTraceWriter { | ||
| return &otlpTraceWriter{ | ||
| config: c, | ||
| transport: newOTLPTransport(c.httpClient, c.internalConfig.OTLPTraceURL(), c.internalConfig.OTLPHeaders()), |
There was a problem hiding this comment.
Build OTLP transport with a non-UDS HTTP client
In OTLP mode this constructor reuses c.httpClient, but newConfig builds that client from the Datadog agent URL and, when the agent is configured via unix:// (or auto-detected UDS), the client dialer always targets the Unix socket regardless of request host. As a result, OTLP exports to OTEL_EXPORTER_OTLP_TRACES_ENDPOINT are sent through the agent socket and fail instead of reaching the collector. This breaks OTLP trace export in UDS-based deployments unless users manually override the HTTP client.
Useful? React with 👍 / 👎.
| case int64: | ||
| return otlpIntValue(val) | ||
| case float64: | ||
| return otlpDoubleValue(val) | ||
| case []any: |
There was a problem hiding this comment.
Preserve typed meta_struct values in OTLP conversion
This conversion only treats int64, float64, and []any as typed OTLP values, so common meta_struct payloads like int, []int64, []string, and other concrete slice/integer forms fall into the JSON-string fallback path. That silently changes structured attributes into strings in OTLP output, which loses type information and can break downstream processing that expects numeric/array values.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
It's accurate that there are a few types not captured by this switch but to capture all types would be infinite and recursive. The only types stored in meta_struct and not captured here, that I could find, are:
- map[string][]any
- map[string][]*stacktrace.Event
| return &otlptrace.Span{ | ||
| TraceId: convertTraceID(s.context.traceID.Upper(), s.context.traceID.Lower()), | ||
| SpanId: convertSpanID(s.spanID), | ||
| ParentSpanId: convertSpanID(s.parentID), |
There was a problem hiding this comment.
What should be the value of ParentSpanId if s.parentID is 0?
There was a problem hiding this comment.
I think []byte{0}... @zacharycmontoya is that true?
ddtrace/tracer/span_to_otlp.go
Outdated
| add := func(key string, val *otlpcommon.AnyValue) bool { | ||
| if val != nil { | ||
| attrs = append(attrs, &otlpcommon.KeyValue{Key: key, Value: val}) | ||
| } | ||
| return len(attrs) < maxAttributesCount | ||
| } |
There was a problem hiding this comment.
Could this be a real function? No need for a closure here, I think.
Also, what will happen with the dropped attributes when len(attrs) == maxAttributesCount? I'm curious too about the returned boolean, because it's checked sometimes but not always.
There was a problem hiding this comment.
- Closure vs function: You're right, this function will be built and torn down on every call to convertSpanAttributes, which we expect to happen many many times
- Once we hit the limit, we silently drop the rest of the attributes for now
- Checking/not checking the returned boolean was an oversight, it should be fixed now.
| "github.com/DataDog/dd-trace-go/v2/internal/version" | ||
| ) | ||
|
|
||
| const maxAttributesCount = 128 |
There was a problem hiding this comment.
Thinking about this limit, I assume it comes from the spec: isn't the order in convertSpanAttributes relevant?
If meta contains hundreds of tags, which is pretty common, we risk losing important attributes. Iterating maps happens in random order, so we might should guarantee adding first the required attributes that the collector might expect.
There was a problem hiding this comment.
Yes, but the priority order is explicitly undefined for now, beyond span.kind and operation.name which have the top priority.
From my conversation with @zacharycmontoya the other day:
We should prioritize these two span attributes [span.kind and operation.name] over all others, so I recommend adding them first.
I don't really want to prescribe an order of preference for the meta/metrics/metastruct, only that operation.name and span.type are always set first
There was a problem hiding this comment.
In follow-up PRs we would support the "dropped attributes count" -- a metric that increments for every attribute we have to drop beyond the limit -- but that's not in scope yet
| assert.Equal(t, numAdders*spansPerAdder, totalSpans) | ||
| } | ||
|
|
||
| // TODO: Add benchmark tests. |
There was a problem hiding this comment.
Aren't benchmarks present in otlp_writer_bench_test.go?
There was a problem hiding this comment.
yes this was leftover 😆
|
|
||
| // send posts a protobuf-encoded payload to the configured OTLP endpoint. | ||
| func (t *otlpTransport) send(data []byte) error { | ||
| req, err := http.NewRequest("POST", t.endpoint, bytes.NewReader(data)) |
There was a problem hiding this comment.
Should we use a context to be able to cancel the request in case there are timeouts?
Co-authored-by: Dario Castañé <d@rio.hn>
What does this PR do?
Implements the OTLP trace export pipeline (See: RFC) for dd-trace-go: when
OTEL_TRACES_EXPORTER=otlpis set, the tracer converts Datadog spans into OTLP protobuf format and sends them directly to an OpenTelemetry Collector instead of the Datadog agent.This adds three new components:
The PR also renames the existing transport interface and config.transport field to ddTransport to make it clear that it handles Datadog-specific traffic (msgpack traces, stats), and ensures ddTransport always points at the Datadog agent even when OTLP mode is active.
Motivation
We are adding native OTLP export support to dd-trace-go so users can send traces directly to any OTLP-compatible collector. A prior PR added the configuration layer that resolves
OTEL_TRACES_EXPORTER,OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, andOTEL_EXPORTER_OTLP_TRACES_HEADERS. This PR builds the runtime pipeline on top of that configuration.IMPORTANT NOTE!
The existing
agentTraceWriterandddTransportwas too tightly coupled to Datadog conventions — msgpack encoding, agent sampling rate feedback, stats collection — to extend it for the otlp path, so a separate writer and transport were introduced instead. TheotlpTraceWriterowns its ownotlpTransportdirectly (rather than reaching throughconfig.ddTransport), keeping OTLP concerns isolated from the Datadog agent path. A follow-up PR may removeddTransportfrom the config struct entirely, instead passing the transport directly to each writer and consumer at construction time. This would make all writer implementations structurally consistent — each owning its transport as a field — and eliminate the implicit coupling through config.An additional PR to disable stats collection in otlp mode is in progress here.
Reviewer's Checklist
make lintlocally.make testlocally.make generatelocally.make fix-moduleslocally.Unsure? Have a question? Request a review!