ci: probe /healthz, not /, when waiting for wash dev#2
Open
ericgregory wants to merge 2 commits into
Open
Conversation
The integration runner waits for the dev server by polling GET /. The api-gateway has an explicit early-return for / in components/api-gateway/src/routes.rs::dispatch that returns 200 "ocelaudit booting" *before* AppState::startup() has finished — every other path returns 503 in that window. Result: the runner declared ready as soon as / came up, then the tests/api/m*.sh scripts immediately hit /healthz and /api/v1/* and got 503 across the board. Switch the probe to /healthz, which is only 200 once AppState is Ok (storage initialized, signer loaded). All the m*.sh scripts already wait_for /healthz themselves, so this aligns the runner with what the tests expect. Refs: tests/api/_runner.sh ready-loop; routes.rs dispatch Err-arm.
The /healthz probe is the right semantic gate but it surfaced a deeper issue: AppState::startup() never succeeds in CI, so /healthz stays 503 for the full 60s. The 503 body carries the actual startup error per RouteResponse::err in routes.rs, and the runner was discarding it. Capture one final /healthz response (status + body) and dump the full wash dev log (not just the last 50 lines) so the next CI failure shows the underlying error message.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The integration runner (
tests/api/_runner.sh) waited for the dev server by pollingGET /. The api-gateway has an explicit early-return for/incomponents/api-gateway/src/routes.rs::dispatchthat returns200 "ocelaudit booting"beforeAppState::startup()has finished — every other path returns503in that window. Result: the runner declared ready as soon as/came up, thentests/api/m*.shhit/healthzand/api/v1/*and got503across the board (the symptom you've been seeing on main since 2026-05-01, commit3da49e4— the M14 split into csl-service + api-gateway).This PR:
/healthz(only200onceAppStateisOk— storage initialized, signer loaded)./healthzstatus + body and the fullwash devlog. The503body carries the exactAppState::startup()error viaRouteResponse::err— discarding it was hiding the real bug.What the new diagnostics reveal
With this PR applied, CI still fails — but now with a clear pointer to the underlying issue. Latest run:
WASI errno 44 =
ENOENT. SoAppState::startup()is failing because some filesystem op against/datareturns "no such file or directory" inside the wasm sandbox. The likely failure points are incomponents/api-gateway/src/state.rs:JsonFsStorage::open("/data")→fs::create_dir_allstorage.users_seed_if_empty()→ write/data/users.jsonSessionSigner::from_env_or_keyfile→ write/data/session.keyThe runner pre-stages
.cache/ocelaudit-databefore bootingwash devand.wash/config.yamlmaps it to/datavia thevolumesblock, so the host directory exists at boot. My guess (not verified) is that the M14 introduction ofservice_filealongside the existingvolumesblock changes how wash dev wires preopens for the main component vs. the service. Worth looking at:/datapreopens being applied to both the api-gateway component and the csl-service in wash 2.0.5? (Code path:wash-runtime/src/engine/workload.rs~line 1126, thefor (host_path, mount) in components.values().flat_map(...)loop.)./.cache/ocelaudit-datahost_path being resolved against the wrong cwd somewhere downstream?A maintainer with wash-dev internals fluency could probably localize this faster than I can from CI logs alone.
Test plan
/healthzto flip200quickly and alltests/api/m*.shto pass.