fix(ci): require N consecutive HTTP successes in wait-for-grpc.sh (next)#63
Merged
Merged
Conversation
Observed flake: probe returns HTTP 200 once on the first attempt that clears the connection-refused phase, exits, tests start, ALL tests fail with 'TypeError: Failed to fetch' to the gRPC backend. The single-probe gate isn't strict enough — a one-shot 200 (e.g. tonic-health responding before the rest of the dispatcher is fully wired) currently passes. Upgrade the readiness signal to N consecutive HTTP successes spaced PROBE_INTERVAL apart (defaults: 3 successes, 0.5s apart), so the probe only declares the server ready after ~1s of demonstrably-stable response. Any non-success in the streak resets it to zero and the slow-poll loop resumes — so a momentary blip during init doesn't get counted twice on either side. Tracked occurrences across recent PR runs: web-sdk PR #23 ci-shard-4, PR #29 ci-shard-1 + ci-shard-4, PR #27 multiple shards.
WiktorStarczewski
added a commit
that referenced
this pull request
Apr 30, 2026
…en configs
- vitest.config.js: exclude crates/web-client/js/passkey-keystore.js
from coverage scope. The new file depends on browser-only WebAuthn
PRF APIs that aren't reachable from node, so it can't be unit-tested
via vitest. It's covered end-to-end by
crates/web-client/test/passkey-keystore.test.ts under Playwright
instead. Without this exclusion, coverage was 73.67% (vs 95% gate).
- knip.jsonc: remove 'dexie' from ignoreDependencies. dexie was
previously only loaded transitively at test runtime (via rollup-bundled
page.evaluate scripts that knip's static scan can't see), but
passkey-keystore.js now imports it directly at the source level —
making the ignore unnecessary. Knip flagged this with 'Remove from
ignoreDependencies'.
- Pulls in scripts/wait-for-grpc.sh's stricter probe (#62 / #63), which
addresses the gRPC dispatch flake that previously hit several PRs'
integration shards.
cargo check --workspace --target wasm32-unknown-unknown is clean (build
artifact compiles against the dep retarget at miden-client wiktor-storekeys
that this PR carries on Cargo.toml). Test stage may surface additional
issues now that the build clears — particularly the 'webclient_new
undefined' WASM-symbol issue we saw on prior runs of this branch — but
those need to be diagnosed against a fresh CI run rather than guessed at.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tighten
scripts/wait-for-grpc.shso a single transient HTTP success doesn't pass the readiness gate. The probe now requires N consecutive HTTP successes spacedPROBE_INTERVALapart before declaring the gRPC server ready (defaults: 3 successes, 0.5s apart → ~1s of stable response).Why
Observed flake mode:
wait-for-grpc.shdeclared the server responsive after the FIRST attempt that returned an HTTP code in [1-5][0-9][0-9]. Empirically, the testing-node-builder's HTTP layer can flicker up briefly during init (tonic-health, for example, comes online before the rest of the dispatcher is fully wired). The probe exits, tests start, and ALL tests fail withTypeError: Failed to fetchagainst the gRPC backend.Recent occurrences this would have caught:
Behaviour change
000/ connect-refused, so the slow-poll loop resumesThe slow
sleep 1and the 90s deadline are unchanged; the streak adds at most ~1s to the steady-state ready-detection time.Test plan
Detect relevant changesflags this asnon_docs(it's a script inscripts/).If the flake persists after this lands, the next escalation is probing with a real grpc-web request (vs plain GET) so we exercise the actual dispatch path. Tracking that as a follow-up.