feat(data-pipeline): OTLP HTTP/protobuf trace export#2115
Conversation
Records the approved design: vendor OTLP trace + collector protos and generate prost types (zero new runtime deps), keep the hand-rolled serde JSON path, share one mapper with a serde->prost converter, and select the protocol via builder + C FFI. Includes the dd-trace-py companion wiring and the layered E2E plan (local receiver, system-tests, sdk-backend-verify). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bite-sized, TDD-structured plan across 9 phases: vendor+generate prost types, serde->prost converter, encoder dispatch, protocol config through builder + C FFI, full validation gauntlet + libdatadog PR, then dd-trace-py PyO3/writer wiring and three E2E tiers (local receiver, system-tests, sdk-backend-verify) + dd-trace-py PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Vendors opentelemetry/proto/trace/v1/trace.proto and opentelemetry/proto/collector/trace/v1/trace_service.proto from open-telemetry/opentelemetry-proto commit 1e725b853bc8f6b46ee62e8232e4c83017b9536f (matching the already-vendored common.proto and resource.proto). Adds both protos to the prost_build compile list in build.rs, generates the committed Rust types (opentelemetry.proto.trace.v1.rs and opentelemetry.proto.collector.trace.v1.rs), and updates _includes.rs. Also qualifies "Span" → "pb.Span" / "pb.idx.Span" in build.rs type_attribute calls to prevent serde derives from leaking into the new opentelemetry::proto::trace::v1::Span type.
…ests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-type match Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o avoid doctest failures
… Grpc content-type arm
|
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
ee71538 to
664f16f
Compare
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
The OTLP design spec and implementation plan are linked from the PR description (internal chonk) rather than committed to the repo. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Selecting OtlpProtocol::Grpc on the Rust builder previously built a working-looking exporter that then failed on every send with a mis-typed Internal(InvalidWorkerState) error. Reject it in build_async (covering sync build + wasm) with BuilderErrorKind::InvalidConfiguration (FFI: InvalidArgument), matching the C FFI set_otlp_protocol setter so both entry points fail fast and identically. The send-time arm stays as a defensive guard. Also replace the "skip as instructed" placeholder comment in the OTLP serde->prost converter with a real test exercising link()/event() conversion (link trace/span ID byte assembly, tracestate, and link/event attributes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mabdinur
left a comment
There was a problem hiding this comment.
Faithful protobuf encoder with safe fallbacks (no panics), correct content-type selection, and gRPC is rejected at both build and send time. No breaking API changes and good JSON/protobuf parity tests. LGTM.
mabdinur
left a comment
There was a problem hiding this comment.
Approval stands, these are all nits / follow-up material, nothing blocking:
-
proto_convert.rs (perf): on the http/protobuf path, mapper.rs formats every ID to a hex string and every timestamp to a string, then this code reverses it (hex_to_bytes/parse_u64) and deep-clones the whole serde tree, per span per export. Native IDs are u128/u64 already. Reasonable as a documented v1, but worth a follow-up to build the prost types straight from the native span types and skip the round-trip.
-
otlp/config.rs: OtlpProtocol is public but not #[non_exhaustive]. Nothing breaks today since it's new, but without it any future variant becomes a breaking change for downstream exhaustive matches. Worth adding now while the enum is fresh.
-
otlp_encoder/mod.rs (scope question): confirming this is traces-only, which seems like the right first step with OtlpProtocol as the encoding axis. Are metrics/logs intentionally out of scope here, handled by separate signal pipelines rather than extending this path?
-
json_types.rs (maintainability): the OTLP schema now has two hand-maintained representations (serde json_types and prost) bridged by the From impls in proto_convert, with no shared source of truth, so a field added to one side can silently diverge. The cross-encoder parity test only compares name and spanId. Could we extend it to trace_id (base64 vs raw bytes), status, and at least one attribute so drift gets caught?
-
libdd-data-pipeline-ffi/src/trace_exporter.rs: the protocol string re-parse can't fail today since the setter already restricts the values, so an error here is a silent no-op. If the accepted set ever drifts from FromStr this would quietly drop the protocol. Consider expect/log, or rely on the builder validation.
| // Copyright 2024-Present Datadog, Inc. https://www.datadoghq.com/ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| //! Converts the hand-rolled serde OTLP request (the JSON wire model) into the generated |
There was a problem hiding this comment.
This is going to be incredibly unperformant. The mapper is taking a Span and turning everything into a String for JSON. And then this is converting everything back for protobuf, which is also doing unnecessary extra string allocations.
The problem is the mapper is conflating mapping to OTLP and serializing the wire format. I think it would be much better to refactor the mapper to use an intermediate representation and separate concerns. Using prost for the IR probably makes the most sense? It'll be much more performant to build prost types in the mapper, then serialize them directly to JSON via Serde.
There was a problem hiding this comment.
This to me is the fundamental problem. It re-introduces the same problems I just spent 9 months trying to fix.
There was a problem hiding this comment.
| } | ||
| } | ||
|
|
||
| impl From<&j::ExportTraceServiceRequest> for ProtoReq { |
There was a problem hiding this comment.
From taking the request by ref, with only a borrow you can't move the owned Strings out is leading to a ton of clones in proto_convert. Where this is used the serde request is a single-use intermediate that quickly gets dropped after. If you change this to From<ExportTraceServiceRequest> you can avoid a lot of costly allocations.
| crate::otlp::config::OtlpProtocol::HttpProtobuf => { | ||
| libdd_common::header::APPLICATION_PROTOBUF | ||
| } | ||
| crate::otlp::config::OtlpProtocol::HttpJson | crate::otlp::config::OtlpProtocol::Grpc => { |
There was a problem hiding this comment.
I wouldn't recommended silently setting the content_type to JSON here when matching on GRPC. Just return an error. The function signature is already a Result.
| #[allow(dead_code)] | ||
| pub(crate) protocol: OtlpProtocol, | ||
| /// OTLP export protocol (selects body encoding and content-type). | ||
| pub protocol: OtlpProtocol, |
There was a problem hiding this comment.
We're matching protocol several times in this PR and it's kind of awkward because OtlpProtocol contains a currently unsupported variant. Why not keep the user facing OtlpProtocol as is, and internally have something like enum OtlpWireProtocol { Json, Protobuf }? It'll make the code much cleaner and when GRPC support is added you can get rid of OTLPWireProtocol. Or you could do:
impl OtlpWireProtocol {
fn content_type(&self) -> HeaderValue { /* 2 arms */ }
fn encode(&self, req: &ExportTraceServiceRequest) -> Result<Vec<u8>> { /* 2 arms */ }
} and encapsulate the correct behavior in the type.
|
|
||
| config.out_dir(output_path.clone()); | ||
|
|
||
| // The vendored OpenTelemetry trace protos carry doc comments with indented example blocks |
There was a problem hiding this comment.
This means no docs for the protobuf files will be generated at all. You can just update the Cargo.toml and add doctest = false to [lib] and get rid of this.
There was a problem hiding this comment.
Good call on keeping the docs. doctest = false on [lib] doesn't actually stop cargo test --doc from compiling the proto comments, so instead I dropped disable_comments, kept the docs, and fenced the one example as a text block so rustdoc renders it instead of running it. Updated in 18f28c5.
There was a problem hiding this comment.
This doc needs to be updated
| ) | ||
| } | ||
|
|
||
| /// Sets the OTLP export protocol. Accepts the OTel-standard values `http/json` (default) or |
There was a problem hiding this comment.
The doc should probably make it clear that this function is inert if an otlp endpoint isn't set.
| if let Some(ref proto) = config.otlp_protocol { | ||
| // The FFI setter only stores "http/json"/"http/protobuf", so this parse always | ||
| // succeeds here; a parse failure just leaves the builder's default protocol. | ||
| if let Ok(p) = proto.parse::<libdd_data_pipeline::otlp::OtlpProtocol>() { |
There was a problem hiding this comment.
Similar to other comments, this is dangerous if there is ever code drift. This silently drops a parse failure and falls back to the default protocol.
| TraceExporterError::Internal(InternalErrorKind::InvalidWorkerState(e.to_string())) | ||
| })?, | ||
| OtlpProtocol::HttpProtobuf => { | ||
| libdd_trace_utils::otlp_encoder::encode_otlp_protobuf(&request) |
There was a problem hiding this comment.
Where's the error handling here, like the with the other protocols?
There was a problem hiding this comment.
Added in 809914e - wire.encode() returns a Result now and the error is logged + propagated, same as the others.
| use std::time::Duration; | ||
|
|
||
| /// OTLP trace export protocol. HTTP/JSON is currently supported. | ||
| /// OTLP trace export protocol. |
There was a problem hiding this comment.
Seems like grpc should still be excluded here?
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t; both encodings
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… encoding, errors on grpc Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The build-time gRPC check fired unconditionally, so a `grpc` protocol (e.g. resolved from the OTel-default OTEL_EXPORTER_OTLP_PROTOCOL) failed the build of a normal Datadog-agent exporter even with no OTLP endpoint set — violating the "protocol is inert without an endpoint" contract. Move the rejection inside the OTLP-endpoint branch so it only fires when OTLP export is actually enabled. The send-time arm remains a guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…per-span alloc Pre-push review cleanups: keep the internal OtlpWireProtocol and the json_serializer module crate-private (not public API surface), and use eq_ignore_ascii_case in the span-kind mappers to avoid a per-span to_lowercase() allocation on the encode hot path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Criterion benches for the OTLP encoder hot paths — native spans -> prost IR (the mapper), and prost IR -> HTTP/protobuf and HTTP/JSON wire — plus end-to-end map+encode, over ~1000-span payloads (one large trace and many small traces). Inputs are decoded from msgpack into borrowed SpanSlices, matching the production exporter path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… hex Pre-size the per-export span Vec and per-span attribute Vec in the prost mapper to avoid reallocations as they fill. This also makes the resulting prost IR more compact, speeding up the downstream protobuf encode (a sequential read of the IR). Serialize OTLP/JSON trace/span ids from a stack buffer (hex::encode_to_slice) instead of allocating a String per id. Benches (~1000 spans): map -21%, protobuf encode -19%, JSON encode -9%, end-to-end -10..12%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ck buffer OTLP encodes 64-bit integers (nanosecond timestamps and intValue attributes) as JSON strings to avoid precision loss; these were each allocating a String per span via to_string(). Format them into a stack buffer instead (NumStr, mirroring the HexId id writer). Removes ~3 timestamp + N int-attribute String allocations per span on the JSON path. Bench: encode_json a further -4.5% (now -12.5% vs the original baseline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s text Addresses the build.rs comment-strip review note. `[lib] doctest = false` does not suppress doctests under `cargo test --doc` (a required gate), so instead fence the one offending example block (Span.attributes in trace.proto) as a ```text block — rustdoc renders it as text, not a Rust doctest — and drop `disable_comments` so the OTLP proto docs are generated onto the prost structs again. `cargo test --doc` stays green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_otlp_grpc_without_endpoint_still_builds builds a real TraceExporter, which spawns the agent_info worker and makes syscalls miri can't execute. Add #[cfg_attr(miri, ignore)] to match the file's other live-build tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What does this PR do?
Adds OTLP HTTP/protobuf as a trace-export encoding alongside HTTP/JSON, selectable via the OTel-standard
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL(http/jsondefault,http/protobuf). The generatedprostOTLP types are the single intermediate representation: the mapper builds them directly from native spans, protobuf isprost::encode_to_vec, and JSON is a serde serializer over the same types.grpcparses but is rejected (HTTP-only).Motivation
libdatadog's OTLP trace export only spoke HTTP/JSON. SDKs that honor
OTEL_EXPORTER_OTLP_TRACES_PROTOCOLneedhttp/protobufto match the OTel default and to talk to collectors that expect protobuf.Additional Notes
encode_otlp_protobufisencode_to_vec;encode_otlp_jsonis a serde serializer (json_serializer.rs) emitting OTLP-spec JSON (hex ids, base64 bytes, int64-as-string, lowerCamelCase, proto3 defaults omitted).OtlpProtocolcarriesGrpc; an internalOtlpWireProtocol { Json, Protobuf }encapsulatescontent_type()/encode(). gRPC is rejected, but only when an OTLP endpoint is configured, so agrpcvalue left over fromOTEL_EXPORTER_OTLP_PROTOCOLstays inert for a normal agent exporter.How to test the change?
cargo test -p libdd-trace-utils -p libdd-data-pipeline -p libdd-data-pipeline-ffi,cargo test --doc, clippy, fmt, andcargo ffi-testpass. A parity test asserts the JSON and protobuf encodings carry the same span from the one prost IR.application/x-protobuf(HTTP 200), spans ingested with correct service/resource and a preserved 128-bit trace id.cargo bench -p libdd-trace-utils --bench main -- otlp/.BREAKING CHANGE: removes the previously public
libdd_trace_utils::otlp_encoder::json_typesmodule (the hand-rolled OTLP JSON model:OtlpSpan,ExportTraceServiceRequest,KeyValue,Status, thespan_kind/status_codeconsts, etc.). OTLP encoding now builds prost-generated types as the single IR. libdatadog consumers pin by version, so they pick this up on the next release.