Skip to content

test: fix update-check TempDir race + ws Start_GracefulShutdown poll#72

Merged
tzone85 merged 1 commit into
mainfrom
test/flake-remediation-sleep-to-poll
Jun 11, 2026
Merged

test: fix update-check TempDir race + ws Start_GracefulShutdown poll#72
tzone85 merged 1 commit into
mainfrom
test/flake-remediation-sleep-to-poll

Conversation

@tzone85

@tzone85 tzone85 commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

Two flake reductions surfaced by the 2026-06-11 test audit, bundled because they share the same shape — replace a sleep / a leaky goroutine with a deterministic wait.

  • TestCheckForModelUpdates_UpdateCheckDisabledInConfig (+ sibling ValidConfig_NoUpdateCheck): DefaultConfig now has update_check=true, so even tests that "expect early return" fall through and start the background poll goroutine. That goroutine writes <HOME>/.nxd/update-status.json AFTER the test returns, racing t.TempDir's cleanup. Repro in CI: unlinkat … directory not empty under -race. Fix: hard-disable with t.Setenv("NXD_UPDATE_CHECK", "false") — the function returns at root.go:70 before any goroutine spawns.
  • TestServer_Start_GracefulShutdown used a flat 200 ms sleep to wait for bind. Replace with a 10 ms poll on s.BindAddr() capped at 2 s. Three repeated runs now complete in ~20 ms each (vs 200 ms before).

Test plan

  • TestCheckForModelUpdates_* × 5 reruns clean locally under -race.
  • TestServer_Start_GracefulShutdown × 3 reruns: each ~20 ms.
  • go build ./..., go vet ./..., go test ./... -count=1 -timeout 240s all green.

Audit traceability

Test audit TEST-P2-2 + task #7 (update-check flake).

Two flake reductions surfaced by the 2026-06-11 test audit, bundled
because they share the same shape — replace a sleep / a leaky goroutine
with a deterministic wait.

- TestCheckForModelUpdates_UpdateCheckDisabledInConfig (+ sibling
  ValidConfig_NoUpdateCheck): DefaultConfig now has update_check=true,
  so even tests that "expect early return" fall through and start the
  background poll goroutine. That goroutine writes
  <HOME>/.nxd/update-status.json AFTER the test returns, racing
  t.TempDir's cleanup. Repro in CI shows up as `unlinkat … directory
  not empty` under -race. Hard-disable the poll with
  t.Setenv("NXD_UPDATE_CHECK", "false") — the function returns at
  line 70 of root.go before any goroutine spawns.
- TestServer_Start_GracefulShutdown used a flat 200ms sleep to wait for
  the listener to bind. Replace with a 10ms poll loop on s.BindAddr()
  with a 2s deadline. Three repeated runs now complete in ~20ms each
  (vs 200ms before) and there's no hard-coded wait to drift into flake
  on slow CI runners.
  Couldn't use the existing waitForPort helper because the test sets
  s.port = 0 (let the OS pick).

Surfaced by 2026-06-11 test audit (TEST-P2-2, plus task #7 update-check
flake).
@tzone85 tzone85 merged commit 9ceed31 into main Jun 11, 2026
9 of 10 checks passed
@tzone85 tzone85 deleted the test/flake-remediation-sleep-to-poll branch June 11, 2026 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant