Reduce sdk-build job timeout from 150 to 50 minutes#54752
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces the default Azure Pipelines job timeout for the sdk-build job template from 150 minutes to 45 minutes, aligning timeouts with the faster “tests queued to Helix” workflow and limiting wasted compute when jobs hang.
Changes:
- Lower
timeoutInMinutesdefault ineng/pipelines/templates/jobs/sdk-build.ymlfrom 150 to 45 minutes.
Now that tests are queued to Helix rather than run inline, the build legs complete much faster (worst case ~40 min on macOS). The previous 150-minute timeout wastes significant compute when a job hangs. 50 minutes provides ~25% headroom above the observed worst case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7367ad1 to
29dac69
Compare
nagilson
left a comment
There was a problem hiding this comment.
I was initially hesitant to lower this because I've seen it go over 150 in the past and succeed, but the data you demonstrated here was helpful in building confidence.
In theory having links might be helpful to backup the data, but I also recognize the CI runs will be deleted shortly.
Now, why dotnetup got stuck at :
Downloading .NET SDK 11.0.100-preview.5.26227.104 ( 29.0 MB / 183.6 MB): 15%
is of interest to me although the most concerning is perhaps that it it's not respecting the 10 minute timeout and ran for 2 hours. I'll investigate or file an issue.
Keep in mind the Helix Monitor Job is what allows this change. The "build" jobs no longer wait on helix, they simply queue the workitems. |
|
@nagilson - FYI dotnetup failure just happened in the fullFX leg. |
Got it, thanks - Im treating this as 2 issues, the install script fallback logic using '' and then the transient issue. |
Now that tests are queued to Helix rather than run inline, the build legs complete much faster. The previous 150-minute timeout wastes significant compute when a job hangs (e.g. build 1462067 where a macOS job hung for the full 150 minutes on SDK install).
Analysis
Sampled 7 recent successful PR builds (definition 101) and measured build leg durations (excluding Dotnet-Format Integration Tests):
The worst-case observed is ~40 minutes (macOS arm64 TestBuild). A 50-minute timeout provides ~25% headroom above the observed maximum while cutting wasted compute by ~67% when hangs occur.