Skip to content

Gate web deploy behind infra in one ordered workflow#8

Merged
johncarmack1984 merged 1 commit into
mainfrom
deploy-ordering-gate
Jun 17, 2026
Merged

Gate web deploy behind infra in one ordered workflow#8
johncarmack1984 merged 1 commit into
mainfrom
deploy-ordering-gate

Conversation

@johncarmack1984

Copy link
Copy Markdown
Owner

Summary

Fixes the deploy-ordering race that surfaced when the temperature-unification PR merged: deploy-web and deploy-infra were separate push-triggered workflows with separate concurrency groups, so a merge touching both ran them in parallel. deploy-web (50s) finished before deploy-infra (1m16s), so the new app went live requesting lattice.json ~26s before the lambda that writes it had deployed — and a brand-new feed stays 404 until its schedule first fires anyway.

This folds both into one ordered deploy workflow:

  • changesdorny/paths-filter sets web/infra flags.
  • infra (if infra changed) — cdk deploy, then primes alerts/temp/windtex (parallel, synchronous invokes) so the data the web reads exists before publish. A FunctionError fails the job.
  • web (needs: infra) — publishes only after infra succeeds, or directly when only web/ changed (always() + a result guard, so a failed infra never ships a web that depends on it).

Web-only and infra-only merges keep their fast single-sided paths via the path filter. A single concurrency: deploy group serializes deploys.

⚠️ Required once before this takes effect

The infra job's prime step needs lambda:InvokeFunction, which this PR adds to the deploy role in github-oidc-stack.ts. That stack is deployed locally, not by CI, so before (or right after) merging, run:

just profile=<stormdeck-admin> cdk deploy oidc

If you skip it, the first deploy run's prime step fails with AccessDenied — which the gate turns into a safe failure (the web job is held back, so nothing half-broken publishes); deploy the OIDC stack and re-run. (A fresh bootstrap via just cdk deploy oidc already includes the grant.)

Notes

  • The prod gap from the temperature merge is already closed — lattice.json was primed into prod at 15:37 UTC and the site's grid works. This PR is the systemic fix so it can't recur.
  • local prime tooling note: my just weather prime can't reach the stormdeck account from this machine (ambient creds are newearth-admin in a different account), which is why the CI prime — running in-account via OIDC — is the right home for it.

Verification

  • actionlint clean on all changed workflows (validated jobs/needs/if expressions + shellcheck on the prime script).
  • cdk typecheck passes; ci will cdk synth the new IAM policy on this PR.
  • Logic traced: web waits for infra when infra runs; runs directly when infra is skipped (web-only); is gated off when infra fails/cancels.

Docs & attribution

  • README updated (the deploy section + the CD / auto-release paragraphs).
  • On-map attribution — N/A (no data source changed).
  • Every new external source credited — N/A.
  • N/A — no data sources, user-facing behavior, costs, or architecture changed. (CI/CD architecture changed; README updated accordingly.)

deploy-web and deploy-infra were separate push-triggered workflows with separate
concurrency groups, so a merge touching both ran them in parallel. deploy-web
(50s) finished before deploy-infra (1m16s), briefly publishing a web that
requested lattice.json ~26s before the lambda that writes it had deployed.

Fold both into one ordered `deploy` workflow: a path-filter job gates the work,
the infra job deploys the stack and then primes the weather feeds, and the web
job `needs` infra — publishing only after infra succeeds, or directly on a
web-only merge (and never if infra failed). Web-only and infra-only merges keep
their fast single-sided paths via the path filter.

- deploy.yml: changes (paths-filter) -> infra (cdk deploy + parallel prime of
  alerts/temp/windtex) -> web (needs infra)
- github-oidc-stack.ts: grant the deploy role lambda:InvokeFunction on the
  ingest so the prime step can run
- delete deploy-web.yml + deploy-infra.yml; update README + comment references
@johncarmack1984 johncarmack1984 added the enhancement New feature or request label Jun 17, 2026
@johncarmack1984 johncarmack1984 merged commit fab6113 into main Jun 17, 2026
3 checks passed
@johncarmack1984 johncarmack1984 deleted the deploy-ordering-gate branch June 18, 2026 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant