Skip to content

feat: Stable Session ID headers for telemetry#8352

Open
khanayan123 wants to merge 9 commits intomasterfrom
ayan.khan/stable-session-id-headers
Open

feat: Stable Session ID headers for telemetry#8352
khanayan123 wants to merge 9 commits intomasterfrom
ayan.khan/stable-session-id-headers

Conversation

@khanayan123
Copy link
Copy Markdown

@khanayan123 khanayan123 commented Mar 20, 2026

Summary

Adds stable session ID headers to telemetry requests per the Stable Service Instance Identifier RFC.

  • DD-Session-ID (runtime_id) added to every telemetry request via default headers
  • DD-Root-Session-ID added only when inherited from a parent process (child process detection)
  • Root session ID propagated to child processes via _DD_ROOT_DOTNET_SESSION_ID env var (registered in config registry)
  • Root session ID initialized eagerly in Instrumentation.InitializeNoNativeParts() so child processes spawned before the first telemetry flush inherit it

Files changed

  • Instrumentation.cs — Eagerly initializes root session ID at startup
  • RuntimeId.csGetRootSessionId() with env var inheritance and auto-propagation
  • TelemetryConstants.cs — Header name constants
  • TelemetryHttpHeaderNames.cs — Session headers added to default agent and intake headers
  • supported-configurations.yaml — Registers _DD_ROOT_DOTNET_SESSION_ID in config registry
  • RuntimeIdTests.cs — Tests for root session ID (default + inherited paths)

Related PRs

Test plan

  • SetsRequiredHeaders validates DD-Session-ID present on all telemetry
  • RootSessionId_UsesRuntimeIdWhenNotInherited_AndInheritsWhenSet covers both paths
  • System tests validate cross-process session ID inheritance

@khanayan123 khanayan123 added area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) area:telemetry and removed area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) labels Mar 20, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 23, 2026

Benchmarks

Benchmark execution time: 2026-03-31 02:34:00

Comparing candidate commit 9eff4f3 in PR branch ayan.khan/stable-session-id-headers with baseline commit f22a608 in branch master.

Found 7 performance improvements and 7 performance regressions! Performance is the same for 257 metrics, 17 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

  • 🟩 execution_time [-106.881ms; -106.543ms] or [-50.915%; -50.754%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

  • 🟩 execution_time [-94.271ms; -92.490ms] or [-46.175%; -45.303%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

  • 🟩 execution_time [-17.097ms; -11.223ms] or [-8.071%; -5.298%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1

  • 🟥 execution_time [+527.460µs; +702.573µs] or [+22.855%; +30.442%]
  • 🟥 throughput [-113.174op/s; -84.810op/s] or [-25.656%; -19.226%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0

  • 🟥 execution_time [+123.711µs; +128.702µs] or [+12.183%; +12.675%]
  • 🟥 throughput [-110.919op/s; -106.808op/s] or [-11.263%; -10.846%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

  • 🟩 throughput [+15300.412op/s; +16531.514op/s] or [+5.010%; +5.413%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

  • 🟥 throughput [-42822.050op/s; -36355.514op/s] or [-8.880%; -7.539%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

  • 🟥 allocated_mem [+24.595KB; +24.629KB] or [+9.655%; +9.668%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

  • 🟩 throughput [+1860.366op/s; +3734.239op/s] or [+8.935%; +17.935%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

  • 🟩 throughput [+1493.869op/s; +3250.370op/s] or [+8.016%; +17.442%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

  • 🟥 execution_time [+5.177ms; +6.107ms] or [+5.977%; +7.051%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net6.0

  • 🟩 execution_time [-22.088ms; -16.451ms] or [-10.174%; -7.578%]

@dd-trace-dotnet-ci-bot
Copy link
Copy Markdown

dd-trace-dotnet-ci-bot bot commented Mar 23, 2026

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (8352) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration71.99 ± (72.07 - 72.44) ms71.76 ± (71.67 - 71.97) ms-0.3%
.NET Framework 4.8 - Bailout
duration75.66 ± (75.57 - 75.92) ms76.20 ± (76.11 - 76.47) ms+0.7%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1069.02 ± (1069.60 - 1075.40) ms1076.34 ± (1077.33 - 1084.01) ms+0.7%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms22.54 ± (22.46 - 22.63) ms22.26 ± (22.23 - 22.30) ms-1.2%
process.time_to_main_ms84.75 ± (84.24 - 85.26) ms83.56 ± (83.36 - 83.76) ms-1.4%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.91 ± (10.90 - 10.91) MB10.93 ± (10.93 - 10.93) MB+0.2%✅⬆️
runtime.dotnet.threads.count12 ± (12 - 12)12 ± (12 - 12)+0.0%
.NET Core 3.1 - Bailout
process.internal_duration_ms22.25 ± (22.22 - 22.28) ms22.26 ± (22.22 - 22.29) ms+0.0%✅⬆️
process.time_to_main_ms84.75 ± (84.56 - 84.94) ms84.66 ± (84.48 - 84.84) ms-0.1%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.95 ± (10.95 - 10.95) MB10.97 ± (10.96 - 10.97) MB+0.2%✅⬆️
runtime.dotnet.threads.count13 ± (13 - 13)13 ± (13 - 13)+0.0%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms222.65 ± (221.58 - 223.72) ms224.70 ± (223.51 - 225.89) ms+0.9%✅⬆️
process.time_to_main_ms534.07 ± (533.06 - 535.08) ms531.59 ± (530.29 - 532.89) ms-0.5%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.27 ± (48.24 - 48.30) MB48.31 ± (48.28 - 48.35) MB+0.1%✅⬆️
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)-1.2%
.NET 6 - Baseline
process.internal_duration_ms20.86 ± (20.83 - 20.90) ms20.96 ± (20.93 - 20.99) ms+0.5%✅⬆️
process.time_to_main_ms71.58 ± (71.42 - 71.74) ms71.32 ± (71.18 - 71.46) ms-0.4%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.61 ± (10.61 - 10.61) MB10.64 ± (10.64 - 10.65) MB+0.3%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 6 - Bailout
process.internal_duration_ms21.07 ± (21.04 - 21.10) ms21.00 ± (20.97 - 21.03) ms-0.3%
process.time_to_main_ms73.17 ± (73.01 - 73.34) ms72.84 ± (72.69 - 72.98) ms-0.5%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.71 ± (10.71 - 10.72) MB10.75 ± (10.75 - 10.75) MB+0.3%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms390.50 ± (388.33 - 392.66) ms388.12 ± (385.98 - 390.27) ms-0.6%
process.time_to_main_ms530.09 ± (529.12 - 531.06) ms537.36 ± (536.37 - 538.34) ms+1.4%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed50.30 ± (50.27 - 50.32) MB50.37 ± (50.34 - 50.40) MB+0.1%✅⬆️
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)+0.1%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms19.24 ± (19.21 - 19.28) ms19.23 ± (19.20 - 19.26) ms-0.1%
process.time_to_main_ms71.98 ± (71.82 - 72.15) ms71.73 ± (71.59 - 71.88) ms-0.3%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.66 ± (7.66 - 7.67) MB7.70 ± (7.70 - 7.71) MB+0.5%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 8 - Bailout
process.internal_duration_ms19.38 ± (19.34 - 19.43) ms19.42 ± (19.38 - 19.45) ms+0.2%✅⬆️
process.time_to_main_ms73.10 ± (72.95 - 73.24) ms73.47 ± (73.31 - 73.63) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.70 ± (7.69 - 7.71) MB7.75 ± (7.74 - 7.75) MB+0.6%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms309.95 ± (307.65 - 312.24) ms310.07 ± (307.47 - 312.67) ms+0.0%✅⬆️
process.time_to_main_ms488.91 ± (488.14 - 489.68) ms494.97 ± (494.04 - 495.90) ms+1.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed37.21 ± (37.18 - 37.23) MB37.29 ± (37.26 - 37.31) MB+0.2%✅⬆️
runtime.dotnet.threads.count27 ± (27 - 27)27 ± (27 - 27)-0.0%

HttpMessageHandler

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration212.28 ± (212.00 - 212.96) ms188.74 ± (188.59 - 189.32) ms-11.1%
.NET Framework 4.8 - Bailout
duration216.60 ± (216.22 - 217.27) ms192.23 ± (192.02 - 192.38) ms-11.3%
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1248.66 ± (1245.83 - 1253.25) ms1148.79 ± (1151.97 - 1160.65) ms-8.0%
.NET Core 3.1 - Baseline
process.internal_duration_ms208.15 ± (207.68 - 208.62) ms184.28 ± (184.00 - 184.56) ms-11.5%
process.time_to_main_ms90.85 ± (90.55 - 91.15) ms79.11 ± (78.96 - 79.27) ms-12.9%
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed15.94 ± (15.92 - 15.95) MB16.14 ± (16.08 - 16.21) MB+1.3%✅⬆️
runtime.dotnet.threads.count20 ± (20 - 20)20 ± (19 - 20)-1.8%
.NET Core 3.1 - Bailout
process.internal_duration_ms209.42 ± (208.87 - 209.96) ms183.42 ± (183.19 - 183.64) ms-12.4%
process.time_to_main_ms92.71 ± (92.42 - 93.00) ms80.38 ± (80.30 - 80.46) ms-13.3%
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed15.99 ± (15.97 - 16.01) MB16.02 ± (15.89 - 16.15) MB+0.2%✅⬆️
runtime.dotnet.threads.count21 ± (21 - 21)20 ± (20 - 20)-4.7%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms418.97 ± (417.11 - 420.82) ms390.11 ± (388.74 - 391.49) ms-6.9%
process.time_to_main_ms567.14 ± (565.72 - 568.57) ms518.62 ± (517.72 - 519.52) ms-8.6%
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed59.08 ± (59.02 - 59.13) MB58.40 ± (58.21 - 58.60) MB-1.1%
runtime.dotnet.threads.count30 ± (30 - 30)30 ± (30 - 30)-0.2%
.NET 6 - Baseline
process.internal_duration_ms213.13 ± (212.66 - 213.61) ms188.73 ± (188.47 - 188.99) ms-11.4%
process.time_to_main_ms78.73 ± (78.47 - 78.99) ms68.92 ± (68.81 - 69.03) ms-12.5%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.23 ± (16.22 - 16.25) MB15.76 ± (15.58 - 15.94) MB-2.9%
runtime.dotnet.threads.count20 ± (19 - 20)18 ± (17 - 18)-10.2%
.NET 6 - Bailout
process.internal_duration_ms213.11 ± (212.53 - 213.69) ms187.20 ± (187.02 - 187.37) ms-12.2%
process.time_to_main_ms80.34 ± (80.12 - 80.57) ms69.83 ± (69.78 - 69.89) ms-13.1%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.23 ± (16.21 - 16.26) MB15.53 ± (15.36 - 15.71) MB-4.3%
runtime.dotnet.threads.count21 ± (20 - 21)18 ± (18 - 19)-10.6%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms597.39 ± (595.21 - 599.57) ms597.32 ± (594.74 - 599.91) ms-0.0%
process.time_to_main_ms571.09 ± (569.78 - 572.40) ms517.87 ± (517.16 - 518.59) ms-9.3%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed61.71 ± (61.60 - 61.82) MB61.97 ± (61.88 - 62.07) MB+0.4%✅⬆️
runtime.dotnet.threads.count31 ± (31 - 31)30 ± (30 - 30)-1.8%
.NET 8 - Baseline
process.internal_duration_ms211.64 ± (211.21 - 212.07) ms185.89 ± (185.65 - 186.13) ms-12.2%
process.time_to_main_ms78.83 ± (78.53 - 79.13) ms68.12 ± (68.01 - 68.23) ms-13.6%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.54 ± (11.52 - 11.56) MB11.60 ± (11.49 - 11.70) MB+0.5%✅⬆️
runtime.dotnet.threads.count19 ± (19 - 19)17 ± (17 - 18)-8.4%
.NET 8 - Bailout
process.internal_duration_ms211.89 ± (211.40 - 212.37) ms184.72 ± (184.56 - 184.87) ms-12.8%
process.time_to_main_ms79.80 ± (79.56 - 80.04) ms69.08 ± (69.02 - 69.13) ms-13.4%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.62 ± (11.60 - 11.64) MB11.58 ± (11.46 - 11.70) MB-0.3%
runtime.dotnet.threads.count20 ± (20 - 20)18 ± (18 - 19)-8.4%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms570.41 ± (562.32 - 578.51) ms524.59 ± (521.80 - 527.39) ms-8.0%
process.time_to_main_ms524.72 ± (523.63 - 525.82) ms474.36 ± (473.68 - 475.04) ms-9.6%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed51.39 ± (51.29 - 51.48) MB51.02 ± (50.99 - 51.05) MB-0.7%
runtime.dotnet.threads.count30 ± (30 - 30)30 ± (30 - 30)-0.4%
Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts
FakeDbCommand (.NET Framework 4.8)
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (72ms)  : 70, 74
    master - mean (72ms)  : 70, 75

    section Bailout
    This PR (8352) - mean (76ms)  : 74, 78
    master - mean (76ms)  : 74, 77

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (1,081ms)  : 1033, 1128
    master - mean (1,072ms)  : 1031, 1114

Loading
FakeDbCommand (.NET Core 3.1)
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (113ms)  : 109, 116
    master - mean (115ms)  : 103, 127

    section Bailout
    This PR (8352) - mean (113ms)  : 110, 116
    master - mean (114ms)  : 110, 118

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (796ms)  : 776, 816
    master - mean (792ms)  : 774, 811

Loading
FakeDbCommand (.NET 6)
gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (98ms)  : 96, 101
    master - mean (98ms)  : 95, 101

    section Bailout
    This PR (8352) - mean (100ms)  : 98, 102
    master - mean (100ms)  : 98, 103

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (952ms)  : 906, 999
    master - mean (948ms)  : 916, 979

Loading
FakeDbCommand (.NET 8)
gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (99ms)  : 95, 103
    master - mean (99ms)  : 96, 102

    section Bailout
    This PR (8352) - mean (100ms)  : 98, 103
    master - mean (100ms)  : 98, 102

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (834ms)  : 800, 867
    master - mean (832ms)  : 788, 875

Loading
HttpMessageHandler (.NET Framework 4.8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (189ms)  : 185, 193
    master - mean (212ms)  : 207, 218

    section Bailout
    This PR (8352) - mean (192ms)  : 190, 194
    master - mean (217ms)  : 211, 222

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (1,156ms)  : 1093, 1219
    master - mean (1,250ms)  : 1193, 1306

Loading
HttpMessageHandler (.NET Core 3.1)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (272ms)  : 266, 277
    master - mean (309ms)  : 299, 318

    section Bailout
    This PR (8352) - mean (271ms)  : 269, 274
    master - mean (312ms)  : 302, 321

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (940ms)  : 913, 968
    master - mean (1,027ms)  : 999, 1056

Loading
HttpMessageHandler (.NET 6)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (265ms)  : 262, 269
    master - mean (301ms)  : 293, 310

    section Bailout
    This PR (8352) - mean (265ms)  : 262, 267
    master - mean (303ms)  : 293, 312

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (1,145ms)  : 1111, 1180
    master - mean (1,201ms)  : 1165, 1238

Loading
HttpMessageHandler (.NET 8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8352) - mean (263ms)  : 260, 267
    master - mean (301ms)  : 294, 308

    section Bailout
    This PR (8352) - mean (263ms)  : 261, 265
    master - mean (302ms)  : 296, 308

    section CallTarget+Inlining+NGEN
    This PR (8352) - mean (1,031ms)  : 986, 1075
    master - mean (1,131ms)  : 1010, 1253

Loading

@khanayan123 khanayan123 force-pushed the ayan.khan/stable-session-id-headers branch from c70ab79 to 14696e1 Compare March 24, 2026 14:55
Implements the Stable Service Instance Identifier RFC for .NET telemetry.

- DD-Session-ID (= runtime_id) added to every telemetry request
- DD-Root-Session-ID added only when the process inherited _DD_ROOT_DOTNET_SESSION_ID from a parent (child process scenario)
- Root session ID auto-propagates to child processes via Environment.SetEnvironmentVariable()
- _DD_ROOT_DOTNET_SESSION_ID registered in supported-configurations.yaml as ConfigurationKeys.Telemetry.RootSessionId

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@khanayan123 khanayan123 force-pushed the ayan.khan/stable-session-id-headers branch from 1334cb0 to dc79c61 Compare March 25, 2026 15:20
khanayan123 and others added 2 commits March 25, 2026 12:39
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@khanayan123 khanayan123 marked this pull request as ready for review March 26, 2026 18:34
@khanayan123 khanayan123 requested review from a team as code owners March 26, 2026 18:34
@lucaspimentel
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4351aa034f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Eagerly set _DD_ROOT_DOTNET_SESSION_ID in TracerManager constructor
so child processes spawned before the first telemetry flush inherit it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khanayan123
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

documentation: |-
Configuration key for whether telemetry metrics should be sent.
<see cref="Datadog.Trace.Telemetry.TelemetrySettings.MetricsEnabled"/>
_DD_ROOT_DOTNET_SESSION_ID:
Copy link
Copy Markdown
Member

@lucaspimentel lucaspimentel Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the leading underscore? _DD... instead of DD..? Is that something we're doing now for internal env vars? The RFC says DD_ROOT_<LIB>_SESSION_ID.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this seems non standard

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixing environment variables with _DD is convention we use in dd-trace-py to mark a configuration as internal/private. All implementations currently follow this spec, the implementation plan was out of date but I just pushed a fix.

Ideally we'd use _DD to be consistent across SDKs but this isn't a big deal for us. We would like to do what's best for the .NET library and ideally this PR would introduce minimal changes to library internals. From a backend perspective as long as the expected headers sent by the telemetry client we should be good.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the standard and it follows the spec and the other implementations, then yeah, go for it. I only asked because I had not seen it before and I didn't see in the RFC. Thanks!

(I wonder though it we have places where we look for DD_* env vars that would need to be updated. Probably not production code, but maybe tests or scripts that list all DD env vars for troubleshooting.)

Copy link
Copy Markdown
Author

@khanayan123 khanayan123 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _DD prefix was established by the Python implementation (dd-trace-py#16839).

The implementation proposal/parent RFC only mandates the HTTP header names (DD-Session-ID, DD-Root-Session-ID) the env var naming (DD_ROOT_LIB_SESSION_ID) is mentioned only in the "Process Spawning: Fork & Exec Support" appendix as an implementation detail.

The underscore prefix signals this is internal: we don't want to expose it as a public API, and keeping it private gives us flexibility to replace the propagation mechanism in a minor version without a breaking change.

That said, this is purely an internal convention if the DD_ prefix is preferred for the DOTNET implementation, I can update it. It's not customer-facing either way.

Copy link
Copy Markdown
Author

@khanayan123 khanayan123 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion on slack and after updating the implementation proposal we will keep _DD_ROOT_LIB_SESSION_ID as the naming convention across SDKs for this env var

private static string _rootSessionId;

public static string Get() => LazyInitializer.EnsureInitialized(ref _runtimeId, () => GetImpl());
public static string Get() => LazyInitializer.EnsureInitialized(ref _runtimeId, () => GetRuntimeIdImpl());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of public Get() and private GetImpl() was intentional (one method is the private implementation of the other).

To keep that naming convention, we can either rename the private GetRuntimeIdImpl() back to GetImpl(), or rename the public method to GetRuntimeId()`, which I think is better overall to distinguish the two:

GetRuntimeId()     -> GetRuntimeIdImpl()
GetRootSessionId() -> GetRootSessionIdImpl()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeId.GetRuntimeId() 😅 arguably we need a different name for RuntimeId now but whatever 😅

Copy link
Copy Markdown
Member

@lucaspimentel lucaspimentel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeId.GetRuntimeId()

Whoa. I was looking only at the method names and didn't notice this. We could also go with:

Get()              -> GetImpl()
GetRootSessionId() -> GetRootSessionIdImpl()

No strong feelings either way, as long as we keep the internal consistency: Method()->MethodImpl().

Copy link
Copy Markdown
Author

@khanayan123 khanayan123 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with approach 2 to minimize lines changes:

Get() -> GetImpl()
GetRootSessionId() -> GetRootSessionIdImpl()

Environment.GetEnvironmentVariable(ConfigurationKeys.Telemetry.RootSessionId)
.Should().Be(rootSessionId);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains tests for the Tracer class. Since these two tests are for the RuntimeId class, can you move this into a tracer/test/Datadog.Trace.Tests/Util/RuntimeIdTests.cs file? In Datadog.Trace.Tests we generally try to match the file/namespace layout of Datadog.Trace.

tracer/src/Datadog.Trace/Util/RuntimeId.cs

tracer/test/Datadog.Trace.Tests/Util/RuntimeIdTests.cs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tracer/test/Datadog.Trace.Tests/Util/RuntimeIdTests.cs

var rootSessionId = Datadog.Trace.Util.RuntimeId.GetRootSessionId();
Environment.GetEnvironmentVariable(ConfigurationKeys.Telemetry.RootSessionId)
.Should().Be(rootSessionId);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a test that sets env var _DD_ROOT_DOTNET_SESSION_ID and asserts that RuntimeId.GetRootSessionId()returns the inherited value (not the current runtime_id)?

I'm not sure how we would handle resetting the private static fields in RuntimeId, though. Might need some refactoring to make it more testable.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added RuntimeId.ResetForTests() (follows SmallCacheOrNoCache.ResetForTests() pattern) and a single test that covers both paths: first asserts GetRootSessionId() defaults to runtime_id when no env var is set, then resets, pre-sets the env var, and asserts the inherited value is returned instead of the current runtime_id

documentation: |-
Configuration key for whether telemetry metrics should be sent.
<see cref="Datadog.Trace.Telemetry.TelemetrySettings.MetricsEnabled"/>
_DD_ROOT_DOTNET_SESSION_ID:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this seems non standard

Comment on lines +60 to +66
var sessionId = RuntimeId.Get();
request.AddHeader(TelemetryConstants.SessionIdHeader, sessionId);
var rootSessionId = RuntimeId.GetRootSessionId();
if (rootSessionId != sessionId)
{
request.AddHeader(TelemetryConstants.RootSessionIdHeader, rootSessionId);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these headers are constants for the lifetime of the process, it would be preferable to add them to the TelemetryAgentHttpHeaderHelper and TelemetryAgentHttpHeaderHelper types. The other headers are added here because they depend on the payload. (Arguably container Metadata should be added elsewhere too, but that's a pre-existing issue 😅 )

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved session headers to TelemetryHttpHeaderNames

private static string _rootSessionId;

public static string Get() => LazyInitializer.EnsureInitialized(ref _runtimeId, () => GetImpl());
public static string Get() => LazyInitializer.EnsureInitialized(ref _runtimeId, () => GetRuntimeIdImpl());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeId.GetRuntimeId() 😅 arguably we need a different name for RuntimeId now but whatever 😅

}

var rootId = Get();
EnvironmentHelpers.SetEnvironmentVariable(ConfigurationKeys.Telemetry.RootSessionId, rootId);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, this simply won't work in many cases, such that this whole approach is potentially flawed...

  • On Unix, environment changes are not seen by native processes. There's no way to remediate this.
  • Users may well call Process.Start() with a custom env var block, in which case this won't be inherited.
  • If UseShellExecute=true is set when starting a new process, this won't work.

The "good" news is that starting new dotnet processes is not a standard pattern in general, so hopefully it just doesn't really matter...

Copy link
Copy Markdown
Contributor

@anna-git anna-git Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we already instrument ProcessStart , maybe the RootSessionId could be passed down via the instrumentation setting some additional args or something 🤔

Copy link
Copy Markdown
Member

@lucaspimentel lucaspimentel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting new dotnet processes is not a standard pattern

Yet we have a whole instrumentation for it. 😉

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Process

(edit: I had not refreshed to see Anna's comment before I posted mine 😅 )

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucaspimentel @anna-git @andrewlock Good point about the Process instrumentation.

Would you prefer we inject _DD_ROOT_DOTNET_SESSION_ID directly into the child process's ProcessStartInfo.Environment via the existing ClrProfiler/AutoInstrumentation/Process hooks, rather than relying on Environment.SetEnvironmentVariable()? And would that address the Unix limitations Andrew mentioned and handle cases where the user provides a custom env block or sets UseShellExecute=true?

Copy link
Copy Markdown
Author

@khanayan123 khanayan123 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since dotnet already instruments process start it might make sense to utilize that instead of env vars, we implemented it in a similar way for Node

Copy link
Copy Markdown
Author

@khanayan123 khanayan123 Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved per @andrewlock's recommendation, we'll keep SetEnvironmentVariable() and accept it won't give 100% coverage. Process instrumentation is too risky for this. Spawning child .NET processes is rare enough that this is acceptable.

Comment on lines +116 to +120

// Eagerly initialize the root session ID so child processes inherit it
// even if spawned before the first telemetry flush.
_ = RuntimeId.GetRootSessionId();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than putting this here, I wonder if you actually need to call this in Instrumentation.Initialize, and make sure this flows through to the native profiler too. The reason being that they might send telemetry independently (I believe libdatadog does for example), and in that case you need to make sure they also apply the RootSessionId 🤔.

If you do modify that interface and pass it through to the profiler side, then you can remove that here too.

Copy link
Copy Markdown
Author

@khanayan123 khanayan123 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewlock Yes I believe that libdatadog does send telemetry independently, and those requests currently won't include the session headers.

@mabdinur is working on adding session ID support to libdatadog's FFI and once that lands we'll have a follow-up PR to pass it through from the managed side

I've also moved the GetRootSessionId() call earlier to Instrumentation.InitializeNoNativeParts()

- implementation: A
type: string
default: null
product: Telemetry
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be marked as a "platform key" so that it can be read directly from the environment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, anything starting with _DD_ has to be controlled by ConfigRegistry, hence the constraint in the analyzer

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any resolution on this @anna-git @andrewlock? Since this is purely an internal implementation detail (the RFC only mandates the HTTP header names, not the env var), I'm happy to rename it to whatever works best with DOTNET standards whether that's dropping the _DD prefix to allow it to be a platform key, or keeping the current approach

Copy link
Copy Markdown
Author

@khanayan123 khanayan123 Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved per discussion with @anna-git , this should stay as a config key in ConfigRegistry (not a platform key). Platform keys are for non-Datadog config keys. Since this is ours (_DD_ prefix), ConfigRegistry is the right place, which is how it's currently implemented.

- Rename private GetRuntimeIdImpl() back to GetImpl() to preserve naming convention
- Move session ID headers from per-request in JsonTelemetryTransport to
  TelemetryHttpHeaderNames default headers (constant for process lifetime)
- Move tests from TracerTests.cs to Util/RuntimeIdTests.cs to match source layout
- Move eager root session ID init from TracerManager to Instrumentation.InitializeNoNativeParts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khanayan123 khanayan123 requested review from a team as code owners March 27, 2026 15:56
Comment on lines +25 to +35
var headers = new List<KeyValuePair<string, string>>
{
new(TelemetryConstants.ClientLibraryLanguageHeader, TracerConstants.Language),
new(TelemetryConstants.ClientLibraryVersionHeader, TracerConstants.AssemblyVersion),
new(HttpHeaderNames.TracingEnabled, "false") // don't add automatic instrumentation to requests directed to the agent
];
new(HttpHeaderNames.TracingEnabled, "false"), // don't add automatic instrumentation to requests directed to the agent
new(TelemetryConstants.SessionIdHeader, RuntimeId.Get()),
};

AddRootSessionIdHeader(headers);

return headers.ToArray();
Copy link
Copy Markdown
Member

@lucaspimentel lucaspimentel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use List<T> and we know the number of headers it will contain, we should initialize it with the required size to minimize the heap allocations of growing the list.

We should also look into removing ToArray(), which allocates another array. Maybe we can change the return type from KeyValuePair<string, string>[] to List<KeyValuePair<string, string>>.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pre-sized arrays with direct index assignment

$"{TelemetryConstants.ClientLibraryLanguageHeader}: {TracerConstants.Language}" + DatadogHttpValues.CrLf +
$"{TelemetryConstants.ClientLibraryVersionHeader}: {TracerConstants.AssemblyVersion}" + DatadogHttpValues.CrLf +
$"{HttpHeaderNames.TracingEnabled}: false" + DatadogHttpValues.CrLf;
internal static string HttpSerializedDefaultAgentHeaders { get; } = BuildSerializedAgentHeaders();
Copy link
Copy Markdown
Member

@lucaspimentel lucaspimentel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be a compile-time constant but now it's doing more work in a static initializer, including calls to RuntimeId.Get() and RuntimeId.GetRootSessionId(). Consider using a lazy initializer instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to LazyInitializer.EnsureInitialized, defers the RuntimeId calls until first access instead of running in the static initializer

internal static KeyValuePair<string, string>[] GetDefaultIntakeHeaders(TelemetrySettings.AgentlessSettings settings)
{
var headerCount = settings.Cloud is null ? 4 : 7;
var headers = new List<KeyValuePair<string, string>>
Copy link
Copy Markdown
Member

@lucaspimentel lucaspimentel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. If we're doing to use List<T>, try to initialize to the required size and try to avoid the ToArray().

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pre-sized arrays with direct index assignment

headers[4] = new(TelemetryConstants.CloudProviderHeader, cloud.Provider);
headers[5] = new(TelemetryConstants.CloudResourceTypeHeader, cloud.ResourceType);
headers[6] = new(TelemetryConstants.CloudResourceIdentifierHeader, cloud.ResourceIdentifier);
serialized += $"{TelemetryConstants.RootSessionIdHeader}: {rootSessionId}" + DatadogHttpValues.CrLf;
Copy link
Copy Markdown
Member

@lucaspimentel lucaspimentel Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't concatenate strings like this. Consider using a cached StringBuilder or a ValueStringBuilder to reduce heap allocations.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to StringBuilderCache.Acquire() / GetStringAndRelease()

$"{HttpHeaderNames.TracingEnabled}: false" + DatadogHttpValues.CrLf +
$"{TelemetryConstants.SessionIdHeader}: {RuntimeId.Get()}" + DatadogHttpValues.CrLf;

var sessionId = RuntimeId.Get();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another call to RuntimeId.Get() two lines above that we may want to consolidate.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@khanayan123 khanayan123 marked this pull request as draft March 30, 2026 14:14
khanayan123 and others added 2 commits March 30, 2026 11:44
- Use pre-sized arrays instead of List<T> + ToArray()
- Lazy-initialize HttpSerializedDefaultAgentHeaders
- Use StringBuilderCache instead of string concatenation
- Consolidate duplicate RuntimeId.Get() calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add RuntimeId.ResetForTests() to allow resetting cached state
- Add RootSessionId_InheritsFromEnvVar test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@khanayan123 khanayan123 force-pushed the ayan.khan/stable-session-id-headers branch from 199c641 to 3406b39 Compare March 31, 2026 01:30
@khanayan123 khanayan123 marked this pull request as ready for review March 31, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants