Skip to content

Harden Linux Docker retries in CI#8358

Open
NachoEchevarria wants to merge 43 commits intomasterfrom
dd/ci/linux-docker-cgroup-retries
Open

Harden Linux Docker retries in CI#8358
NachoEchevarria wants to merge 43 commits intomasterfrom
dd/ci/linux-docker-cgroup-retries

Conversation

@NachoEchevarria
Copy link
Copy Markdown
Collaborator

@NachoEchevarria NachoEchevarria commented Mar 24, 2026

Summary

Add Linux Docker readiness check to CI, mirroring the existing Windows approach (PR #8245). Targets transient cgroup/systemd/dbus failures — the largest CI flake category (10 commits blocked).

Reason for change

retryCountOnTaskFailure retries on the same agent, so if Docker is broken on that VM, all retries fail. This adds active recovery (service restart) before Docker commands run.

For instance:

2026-03-25T15:05:08.0833114Z docker: Error response from daemon: failed to create task for container: Unavailable: error reading from server: EOF
2026-03-25T15:05:08.0833917Z 
2026-03-25T15:05:08.0834644Z Run 'docker run --help' for more information

Changes

  • New ensure-docker-ready-linux.sh: waits for Docker daemon, attempts systemctl restart docker if the service is down (mirrors Windows Restart-Service), emits diagnostics on failure
  • ensure-docker-ready.yml: Linux readiness step added alongside existing Windows step
  • run-in-docker.yml: embedded ensure-docker-ready.yml call — automatically covers all ~27 Linux jobs that use this template.
  • ultimate-pipeline.yml: added readiness check to integration_tests_windows_msi job (the only uncovered Windows docker-compose caller)

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown
Contributor

View session in Datadog

Bits Dev status: ✅ Done

CI Auto-fix: Disabled | Enable

Comment @DataDog to request changes

@datadog-prod-us1-4
Copy link
Copy Markdown

I can only run on private repositories.

@NachoEchevarria NachoEchevarria added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Mar 24, 2026
@github-actions github-actions bot added the area:builds project files, build scripts, pipelines, versioning, releases, packages label Mar 25, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 25, 2026

Benchmarks

Benchmark execution time: 2026-03-27 12:57:20

Comparing candidate commit d98d1f7 in PR branch dd/ci/linux-docker-cgroup-retries with baseline commit 05f70bb in branch master.

Found 7 performance improvements and 10 performance regressions! Performance is the same for 244 metrics, 27 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

  • 🟥 execution_time [+9.876ms; +15.676ms] or [+5.000%; +7.936%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

  • 🟩 execution_time [-30.845ms; -24.461ms] or [-13.776%; -10.925%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

  • 🟥 execution_time [+11.553ms; +15.432ms] or [+5.850%; +7.814%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

  • 🟥 execution_time [+18.966ms; +23.302ms] or [+9.501%; +11.674%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

  • 🟩 execution_time [-26.655ms; -26.058ms] or [-13.252%; -12.955%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack netcoreapp3.1

  • 🟩 execution_time [-47.774µs; -20.624µs] or [-12.900%; -5.569%]
  • 🟩 throughput [+141.604op/s; +325.082op/s] or [+5.154%; +11.833%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472

  • 🟩 execution_time [-127.875µs; -118.725µs] or [-6.255%; -5.808%]
  • 🟩 throughput [+30.282op/s; +32.503op/s] or [+6.190%; +6.644%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0

  • 🟥 execution_time [+74.669µs; +79.531µs] or [+7.372%; +7.852%]
  • 🟥 throughput [-71.961op/s; -67.706op/s] or [-7.288%; -6.858%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

  • 🟥 execution_time [+120.406µs; +128.527µs] or [+6.190%; +6.608%]
  • 🟥 throughput [-31.897op/s; -29.935op/s] or [-6.204%; -5.823%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

  • 🟩 allocated_mem [-20.033KB; -19.994KB] or [-7.298%; -7.284%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan netcoreapp3.1

  • 🟥 execution_time [+13.892ms; +19.264ms] or [+6.996%; +9.702%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

  • 🟥 execution_time [+10.307ms; +13.845ms] or [+5.217%; +7.008%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes netcoreapp3.1

  • 🟥 execution_time [+14.574ms; +18.775ms] or [+7.362%; +9.485%]

@dd-trace-dotnet-ci-bot
Copy link
Copy Markdown

dd-trace-dotnet-ci-bot bot commented Mar 25, 2026

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (8358) and master.

⚠️ Potential regressions detected

HttpMessageHandler

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration193.46 ± (193.25 - 194.07) ms212.89 ± (212.90 - 214.32) ms+10.0%❌⬆️
.NET Framework 4.8 - Bailout
duration196.16 ± (196.05 - 196.84) ms219.00 ± (218.82 - 220.74) ms+11.6%❌⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1150.65 ± (1151.26 - 1157.73) ms1241.20 ± (1238.73 - 1246.80) ms+7.9%❌⬆️
Full Metrics Comparison

FakeDbCommand

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration72.15 ± (72.15 - 72.46) ms72.24 ± (72.26 - 72.57) ms+0.1%✅⬆️
.NET Framework 4.8 - Bailout
duration76.39 ± (76.17 - 76.50) ms76.13 ± (76.10 - 76.45) ms-0.3%
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1073.18 ± (1072.51 - 1078.17) ms1074.21 ± (1074.99 - 1080.45) ms+0.1%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms22.40 ± (22.35 - 22.45) ms22.38 ± (22.33 - 22.43) ms-0.1%
process.time_to_main_ms83.68 ± (83.44 - 83.92) ms83.57 ± (83.40 - 83.75) ms-0.1%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.91 ± (10.91 - 10.91) MB10.90 ± (10.90 - 10.91) MB-0.1%
runtime.dotnet.threads.count12 ± (12 - 12)12 ± (12 - 12)+0.0%
.NET Core 3.1 - Bailout
process.internal_duration_ms22.25 ± (22.21 - 22.29) ms22.33 ± (22.29 - 22.37) ms+0.4%✅⬆️
process.time_to_main_ms84.98 ± (84.78 - 85.18) ms84.81 ± (84.61 - 85.01) ms-0.2%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.94 ± (10.94 - 10.95) MB10.94 ± (10.94 - 10.95) MB+0.0%✅⬆️
runtime.dotnet.threads.count13 ± (13 - 13)13 ± (13 - 13)+0.0%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms223.85 ± (222.81 - 224.88) ms225.58 ± (224.20 - 226.96) ms+0.8%✅⬆️
process.time_to_main_ms533.57 ± (532.31 - 534.83) ms532.67 ± (531.24 - 534.11) ms-0.2%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.32 ± (48.29 - 48.35) MB48.27 ± (48.24 - 48.30) MB-0.1%
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)-0.1%
.NET 6 - Baseline
process.internal_duration_ms21.00 ± (20.97 - 21.02) ms21.03 ± (20.99 - 21.06) ms+0.1%✅⬆️
process.time_to_main_ms72.14 ± (71.99 - 72.29) ms72.52 ± (72.36 - 72.69) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.63 ± (10.63 - 10.63) MB10.63 ± (10.63 - 10.64) MB+0.1%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 6 - Bailout
process.internal_duration_ms21.02 ± (20.99 - 21.06) ms20.95 ± (20.91 - 20.99) ms-0.3%
process.time_to_main_ms73.19 ± (73.01 - 73.37) ms73.12 ± (72.95 - 73.30) ms-0.1%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.73 ± (10.73 - 10.73) MB10.74 ± (10.74 - 10.75) MB+0.1%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms216.01 ± (214.61 - 217.40) ms215.53 ± (214.12 - 216.94) ms-0.2%
process.time_to_main_ms530.99 ± (529.64 - 532.35) ms535.09 ± (533.67 - 536.51) ms+0.8%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed50.12 ± (50.10 - 50.15) MB50.14 ± (50.11 - 50.17) MB+0.0%✅⬆️
runtime.dotnet.threads.count29 ± (29 - 29)29 ± (29 - 29)+0.5%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms19.14 ± (19.09 - 19.18) ms19.31 ± (19.27 - 19.35) ms+0.9%✅⬆️
process.time_to_main_ms71.36 ± (71.18 - 71.54) ms71.73 ± (71.59 - 71.87) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.69 ± (7.68 - 7.70) MB7.68 ± (7.67 - 7.68) MB-0.2%
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 8 - Bailout
process.internal_duration_ms19.35 ± (19.31 - 19.38) ms19.43 ± (19.39 - 19.48) ms+0.4%✅⬆️
process.time_to_main_ms73.21 ± (73.05 - 73.37) ms73.14 ± (72.97 - 73.31) ms-0.1%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.75 ± (7.74 - 7.76) MB7.73 ± (7.73 - 7.74) MB-0.2%
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms160.07 ± (159.25 - 160.89) ms160.60 ± (159.57 - 161.64) ms+0.3%✅⬆️
process.time_to_main_ms490.08 ± (489.03 - 491.13) ms493.63 ± (492.45 - 494.80) ms+0.7%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed36.93 ± (36.90 - 36.95) MB36.85 ± (36.83 - 36.88) MB-0.2%
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)-0.1%

HttpMessageHandler

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration193.46 ± (193.25 - 194.07) ms212.89 ± (212.90 - 214.32) ms+10.0%❌⬆️
.NET Framework 4.8 - Bailout
duration196.16 ± (196.05 - 196.84) ms219.00 ± (218.82 - 220.74) ms+11.6%❌⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1150.65 ± (1151.26 - 1157.73) ms1241.20 ± (1238.73 - 1246.80) ms+7.9%❌⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms187.25 ± (186.89 - 187.62) ms210.28 ± (209.49 - 211.07) ms+12.3%✅⬆️
process.time_to_main_ms80.70 ± (80.47 - 80.93) ms89.53 ± (89.24 - 89.82) ms+10.9%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.14 ± (16.11 - 16.17) MB15.93 ± (15.91 - 15.94) MB-1.3%
runtime.dotnet.threads.count20 ± (19 - 20)20 ± (20 - 20)+2.0%✅⬆️
.NET Core 3.1 - Bailout
process.internal_duration_ms186.52 ± (186.21 - 186.82) ms209.95 ± (209.18 - 210.72) ms+12.6%✅⬆️
process.time_to_main_ms82.09 ± (81.94 - 82.24) ms91.21 ± (90.88 - 91.54) ms+11.1%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.15 ± (16.13 - 16.18) MB16.04 ± (16.02 - 16.06) MB-0.7%
runtime.dotnet.threads.count21 ± (20 - 21)21 ± (21 - 21)+1.5%✅⬆️
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms395.92 ± (394.50 - 397.34) ms421.00 ± (419.62 - 422.39) ms+6.3%✅⬆️
process.time_to_main_ms522.87 ± (521.84 - 523.90) ms565.36 ± (564.11 - 566.60) ms+8.1%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed58.82 ± (58.67 - 58.98) MB59.19 ± (59.15 - 59.23) MB+0.6%✅⬆️
runtime.dotnet.threads.count30 ± (30 - 30)30 ± (30 - 30)+0.5%✅⬆️
.NET 6 - Baseline
process.internal_duration_ms191.61 ± (191.24 - 191.97) ms219.31 ± (218.27 - 220.35) ms+14.5%✅⬆️
process.time_to_main_ms69.75 ± (69.61 - 69.89) ms79.56 ± (79.26 - 79.87) ms+14.1%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.12 ± (15.98 - 16.25) MB16.19 ± (16.17 - 16.20) MB+0.4%✅⬆️
runtime.dotnet.threads.count19 ± (18 - 19)20 ± (20 - 20)+5.1%✅⬆️
.NET 6 - Bailout
process.internal_duration_ms191.37 ± (190.95 - 191.78) ms223.50 ± (222.51 - 224.48) ms+16.8%✅⬆️
process.time_to_main_ms71.00 ± (70.85 - 71.16) ms82.13 ± (81.73 - 82.54) ms+15.7%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.07 ± (15.92 - 16.22) MB16.20 ± (16.18 - 16.21) MB+0.8%✅⬆️
runtime.dotnet.threads.count20 ± (20 - 20)21 ± (20 - 21)+4.0%✅⬆️
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms413.66 ± (412.25 - 415.07) ms448.81 ± (447.09 - 450.52) ms+8.5%✅⬆️
process.time_to_main_ms523.82 ± (522.76 - 524.87) ms568.00 ± (566.84 - 569.16) ms+8.4%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed60.94 ± (60.90 - 60.99) MB60.73 ± (60.65 - 60.80) MB-0.4%
runtime.dotnet.threads.count31 ± (30 - 31)31 ± (31 - 31)+0.9%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms188.84 ± (188.46 - 189.21) ms215.23 ± (214.58 - 215.88) ms+14.0%✅⬆️
process.time_to_main_ms69.45 ± (69.24 - 69.65) ms77.47 ± (77.26 - 77.69) ms+11.6%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.79 ± (11.77 - 11.82) MB11.52 ± (11.51 - 11.54) MB-2.3%
runtime.dotnet.threads.count18 ± (18 - 18)19 ± (19 - 19)+4.5%✅⬆️
.NET 8 - Bailout
process.internal_duration_ms188.15 ± (187.87 - 188.42) ms214.37 ± (213.38 - 215.37) ms+13.9%✅⬆️
process.time_to_main_ms70.31 ± (70.19 - 70.44) ms78.74 ± (78.51 - 78.98) ms+12.0%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.80 ± (11.73 - 11.86) MB11.58 ± (11.57 - 11.60) MB-1.8%
runtime.dotnet.threads.count19 ± (19 - 19)20 ± (20 - 20)+5.0%✅⬆️
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms340.99 ± (339.88 - 342.11) ms448.51 ± (441.98 - 455.05) ms+31.5%✅⬆️
process.time_to_main_ms480.85 ± (479.91 - 481.79) ms523.41 ± (522.40 - 524.41) ms+8.8%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed48.81 ± (48.77 - 48.85) MB50.49 ± (50.36 - 50.62) MB+3.4%✅⬆️
runtime.dotnet.threads.count30 ± (30 - 30)30 ± (30 - 30)+0.6%✅⬆️
Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts
FakeDbCommand (.NET Framework 4.8)
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (72ms)  : 70, 75
    master - mean (72ms)  : 70, 75

    section Bailout
    This PR (8358) - mean (76ms)  : 75, 78
    master - mean (76ms)  : 75, 78

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (1,078ms)  : 1039, 1116
    master - mean (1,075ms)  : 1035, 1115

Loading
FakeDbCommand (.NET Core 3.1)
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (113ms)  : 109, 116
    master - mean (113ms)  : 107, 119

    section Bailout
    This PR (8358) - mean (114ms)  : 111, 117
    master - mean (114ms)  : 111, 117

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (795ms)  : 776, 814
    master - mean (796ms)  : 773, 818

Loading
FakeDbCommand (.NET 6)
gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (100ms)  : 97, 103
    master - mean (99ms)  : 96, 103

    section Bailout
    This PR (8358) - mean (100ms)  : 98, 103
    master - mean (100ms)  : 97, 103

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (788ms)  : 769, 808
    master - mean (782ms)  : 758, 807

Loading
FakeDbCommand (.NET 8)
gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (99ms)  : 95, 102
    master - mean (98ms)  : 95, 101

    section Bailout
    This PR (8358) - mean (100ms)  : 98, 102
    master - mean (100ms)  : 98, 103

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (694ms)  : 678, 709
    master - mean (689ms)  : 668, 710

Loading
HttpMessageHandler (.NET Framework 4.8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (214ms)  : 203, 224
    master - mean (194ms)  : 189, 198

    section Bailout
    This PR (8358) - mean (220ms)  : crit, 205, 234
    master - mean (196ms)  : 193, 200

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (1,243ms)  : crit, 1184, 1301
    master - mean (1,154ms)  : 1108, 1201

Loading
HttpMessageHandler (.NET Core 3.1)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (310ms)  : 293, 328
    master - mean (276ms)  : 270, 283

    section Bailout
    This PR (8358) - mean (311ms)  : crit, 296, 326
    master - mean (277ms)  : 273, 281

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (1,025ms)  : crit, 992, 1058
    master - mean (950ms)  : 921, 979

Loading
HttpMessageHandler (.NET 6)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (308ms)  : 290, 327
    master - mean (269ms)  : 264, 274

    section Bailout
    This PR (8358) - mean (314ms)  : crit, 293, 336
    master - mean (270ms)  : 265, 276

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (1,058ms)  : crit, 1020, 1097
    master - mean (968ms)  : 945, 991

Loading
HttpMessageHandler (.NET 8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8358) - mean (303ms)  : 291, 315
    master - mean (268ms)  : 262, 274

    section Bailout
    This PR (8358) - mean (305ms)  : crit, 285, 325
    master - mean (268ms)  : 264, 272

    section CallTarget+Inlining+NGEN
    This PR (8358) - mean (1,009ms)  : crit, 911, 1107
    master - mean (852ms)  : 833, 871

Loading

@NachoEchevarria NachoEchevarria marked this pull request as ready for review March 27, 2026 09:30
@NachoEchevarria NachoEchevarria requested a review from a team as a code owner March 27, 2026 09:30
#!/usr/bin/env bash
# Linux Docker readiness check — mirrors the Windows PowerShell logic in ensure-docker-ready.yml.
# Waits for the Docker daemon, attempts service restarts if needed, and fails fast
# so the job can be rescheduled on a different agent.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# so the job can be rescheduled on a different agent.

This doesn't happen automatically though right, you have to manually retry? But it would never work anyway, so that's fine, just checking 🙂

# Waits for the Docker daemon, attempts service restarts if needed, and fails fast
# so the job can be rescheduled on a different agent.

set -u
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably exit on failing commands too? Or does that break things? Meh, maybe best to leave it 😅

Suggested change
set -u
set -eu

fi

# If Docker is not responding, try restarting the service
if command -v systemctl >/dev/null 2>&1; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this command fails (because systemctl is not available, then we don't actually restart anything, so we may as well just exit early instead I think? We can do that after logging the docker state if we want to. Alternatively, we could just not restart it, but still try to wait?

Comment on lines +96 to +97
log "Docker service is ${svc_status}. Attempting restart ${restart_count}/${DOCKER_MAX_RESTARTS}..."
try_restart_docker
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggestion based on previous comment)

Suggested change
log "Docker service is ${svc_status}. Attempting restart ${restart_count}/${DOCKER_MAX_RESTARTS}..."
try_restart_docker
if ! command -v systemctl >/dev/null 2>&1; then
log "Docker service is ${svc_status}, but systmctl is not available, so unable to explicitly restart service."
else
log "Docker service is ${svc_status}. Attempting restart ${restart_count}/${DOCKER_MAX_RESTARTS}..."
try_restart_docker
fi

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we are already checking systemctl is-active docker in line 93, so if we reach this part of the code, the command was successful. I will leave the original code. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, buy the problem is if it wasn't successful, we end up stuck in this loop where we're just sleeping repeatedly, not restarting anything, but still sleeping 🤔 That's the scenario this is trying to highlight 😄

@@ -0,0 +1,113 @@
#!/usr/bin/env bash
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we try to just use POSIX shell instead of bash - if it's simple to convert to that, then it may save us some pain later

displayName: BuildWindowsIntegrationTests
retryCountOnTaskFailure: 3

- template: steps/ensure-docker-ready.yml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍

NachoEchevarria and others added 5 commits March 27, 2026 12:47
Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
@NachoEchevarria NachoEchevarria requested a review from a team as a code owner March 27, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos area:builds project files, build scripts, pipelines, versioning, releases, packages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants