ci: probe /healthz, not /, when waiting for wash dev by ericgregory · Pull Request #2 · cosmonic-labs/ocelaudit

ericgregory · 2026-05-11T19:31:55Z

Summary

The integration runner (tests/api/_runner.sh) waited for the dev server by polling GET /. The api-gateway has an explicit early-return for / in components/api-gateway/src/routes.rs::dispatch that returns 200 "ocelaudit booting" before AppState::startup() has finished — every other path returns 503 in that window. Result: the runner declared ready as soon as / came up, then tests/api/m*.sh hit /healthz and /api/v1/* and got 503 across the board (the symptom you've been seeing on main since 2026-05-01, commit 3da49e4 — the M14 split into csl-service + api-gateway).

This PR:

Switches the runner's readiness probe to /healthz (only 200 once AppState is Ok — storage initialized, signer loaded).
On readiness timeout, dumps the final /healthz status + body and the full wash dev log. The 503 body carries the exact AppState::startup() error via RouteResponse::err — discarding it was hiding the real bug.

What the new diagnostics reveal

With this PR applied, CI still fails — but now with a clear pointer to the underlying issue. Latest run:

!! wash dev did not become ready within 60s.
-- final /healthz status + body --
  status=503
  body:
    {"error":"io: No such file or directory (os error 44)"}

WASI errno 44 = ENOENT. So AppState::startup() is failing because some filesystem op against /data returns "no such file or directory" inside the wasm sandbox. The likely failure points are in components/api-gateway/src/state.rs:

JsonFsStorage::open("/data") → fs::create_dir_all
storage.users_seed_if_empty() → write /data/users.json
SessionSigner::from_env_or_keyfile → write /data/session.key

The runner pre-stages .cache/ocelaudit-data before booting wash dev and .wash/config.yaml maps it to /data via the volumes block, so the host directory exists at boot. My guess (not verified) is that the M14 introduction of service_file alongside the existing volumes block changes how wash dev wires preopens for the main component vs. the service. Worth looking at:

Are /data preopens being applied to both the api-gateway component and the csl-service in wash 2.0.5? (Code path: wash-runtime/src/engine/workload.rs ~line 1126, the for (host_path, mount) in components.values().flat_map(...) loop.)
Is the relative ./.cache/ocelaudit-data host_path being resolved against the wrong cwd somewhere downstream?
Would an absolute host_path (set by the runner via temp config) sidestep the issue?

A maintainer with wash-dev internals fluency could probably localize this faster than I can from CI logs alone.

Test plan

CI exercises the new probe + diagnostic on every run.
Once the underlying preopen / startup issue is fixed, expect /healthz to flip 200 quickly and all tests/api/m*.sh to pass.

The integration runner waits for the dev server by polling GET /. The api-gateway has an explicit early-return for / in components/api-gateway/src/routes.rs::dispatch that returns 200 "ocelaudit booting" *before* AppState::startup() has finished — every other path returns 503 in that window. Result: the runner declared ready as soon as / came up, then the tests/api/m*.sh scripts immediately hit /healthz and /api/v1/* and got 503 across the board. Switch the probe to /healthz, which is only 200 once AppState is Ok (storage initialized, signer loaded). All the m*.sh scripts already wait_for /healthz themselves, so this aligns the runner with what the tests expect. Refs: tests/api/_runner.sh ready-loop; routes.rs dispatch Err-arm.

The /healthz probe is the right semantic gate but it surfaced a deeper issue: AppState::startup() never succeeds in CI, so /healthz stays 503 for the full 60s. The 503 body carries the actual startup error per RouteResponse::err in routes.rs, and the runner was discarding it. Capture one final /healthz response (status + body) and dump the full wash dev log (not just the last 50 lines) so the next CI failure shows the underlying error message.

ericgregory added 2 commits May 11, 2026 15:31

ericgregory mentioned this pull request May 11, 2026

test-api: AppState::startup() fails with ENOENT on /data since M14 split #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: probe /healthz, not /, when waiting for wash dev#2

ci: probe /healthz, not /, when waiting for wash dev#2
ericgregory wants to merge 2 commits into
mainfrom
fix-ci-readiness-probe

ericgregory commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericgregory commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the new diagnostics reveal

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ericgregory commented May 11, 2026 •

edited

Loading