Skip to content

Add mmgis-deployment skill: scripted local deployments with seeded golden baseline#128

Merged
BhattaraiSijan merged 134 commits into
developmentfrom
feature/mmgis-deployment-skill
Jun 16, 2026
Merged

Add mmgis-deployment skill: scripted local deployments with seeded golden baseline#128
BhattaraiSijan merged 134 commits into
developmentfrom
feature/mmgis-deployment-skill

Conversation

@CarsonDavis

Copy link
Copy Markdown

Summary

Adds a committed Claude Code skill (.claude/skills/mmgis-deployment/) that codifies how MMGIS runs locally — one shared Postgres+PostGIS container, N coexisting deployments (worktrees, clones, or the main checkout) differing only in PORT and DB_NAME, each cloned from a frozen mmgis_golden baseline so it boots as a working app.

What's included

  • Scripts (deterministic, stateless — all state derived from git worktree list, each .env, and the DB container): create, start, stop, list, doctor, test, teardown, refresh-golden, seed-golden, plus a shared _lib.sh.
  • References: deployment-model.md (how local deploy works, ports, golden-template mechanics) and troubleshooting.md (symptom → cause → fix, keyed from doctor.sh).
  • Baseline distribution: the golden database is never shipped — seed/baseline-mission.json (sanitized config, no tokens) + seed-golden.sh rebuild an identical baseline on any machine via MMGIS's own APIs (first_signuplogin/api/configure/add). MAPBOX_TOKEN (new optional entry in sample.env) is injected at seed time and never committed.
  • Safety rails: teardown.sh refuses on uncommitted/unpushed work and protects mmgis/mmgis_golden; golden writers stage-and-swap so a failed run can't leave a broken baseline; kill operations only touch node processes on the deployment's ports.

Cold start for a new machine

npm install --force && npm run db:start
.claude/skills/mmgis-deployment/scripts/seed-golden.sh
.claude/skills/mmgis-deployment/scripts/create.sh <name>
.claude/skills/mmgis-deployment/scripts/start.sh ../MMGIS-<name>

Testing

  • All scripts pass bash -n; reviewed via multi-angle code review with confirmed findings fixed (repo anchoring, idempotent re-runs, fail-fast ordering).
  • seed-golden.sh verified end-to-end against a throwaway target DB: seeded admin (111) + baseline mission, token substitution, and CREATE DATABASE ... TEMPLATE cloning all confirmed.
  • list.sh/doctor.sh exercised against the live multi-worktree setup, including from foreign working directories.

CarsonDavis and others added 28 commits June 9, 2026 15:04
Distributes the deployment baseline as a recipe instead of a database:
seed/baseline-mission.json (sanitized mission config, token placeholder)
plus seed-golden.sh, which boots a temporary server against a scratch DB
and seeds an admin + the baseline mission via MMGIS's own APIs
(first_signup -> login -> configure/add), then renames it to mmgis_golden.
MAPBOX_TOKEN is injected from the environment at seed time; tokens are
never committed.
Introduce MMGIS_DEPLOYMENT_MODE (full|lean, default full) with a backend
helper, client-exposed build flags, the STATIC_MISSION_CONFIG webpack
alias, and a gitignore rule for the baked-config stub. No behavior
change; nothing reads the gate yet.
In lean deployments the adjacent-server proxy routes never register,
the sidecar spawner is a no-op, the Configure Pug shell receives
WITH_*=false, and init-db skips the mmgis-stac database. Full mode
(the default) is unchanged.
… Configure

In lean deployments the Datasets and Geodatasets API modules never
mount and their Configure nav tabs are hidden. The deployment mode is
plumbed through the Pug shell to window.mmgisglobal for the Configure
SPA, riding the same path as the existing WITH_* flags. Full mode
(the default) is unchanged; the geodatasets table still syncs in both
modes.
…an mode

In lean deployments the Missions/ static stack (including the _time_
compositor riding middleware.missions) never mounts, the link
shortener does not register, and the /api/utils endpoints that read
the on-disk Missions/ tree or local SPICE data (getprofile, getbands,
getminmax, queryTilesetTimes, ll2aerll, chronice) are gated.
proj42wkt and healthcheck stay. Full mode (the default) is unchanged.
In lean mode the Draw module's /api/draw and /api/files routes never
mount, and updateTools skips the Draw tool so it is absent from the
Essence bundle and the Configure tool list. Draw code stays on disk
(Webhooks requires it at load) and its tables still sync in both
modes. Full mode (the default) is unchanged.
The tile-populate-from-x button auto-fills tile fields by calling the
local /titiler proxy and reading tilemapresource.xml from Missions/ —
both gated out of lean deployments. Render the button only outside
lean; the External Service URL fields it would populate stay fully
editable. Full mode is unchanged.
Add a bounded retry around the init-db database connection (covers
ECONNREFUSED-style connection errors, not just SQLSTATE classes), an
optional first-superadmin seed from SEED_SUPERADMIN_* env vars, a
DISABLE_FIRST_SIGNUP gate for the open first-signup page (server
route + admin login page), and a server-side WebSocket ping/pong
heartbeat so idle-closing proxies don't silently drop connections.
All four default to today's behavior when the new env vars are unset.
Builds can now produce a backend-less dashboard bundle: SERVER is
build-substituted via env.raw, the dormant calls.js branch dispatches
through a STATIC_HANDLERS table covering all 40 named calls
(bake/compute/drop per the lean API disposition), ServiceUrls resolves
only external URLs in static mode, and backend-dependent surfaces
(login, WebSocket, mission deeplinks, credentialed viewer fetches,
identifier/measure elevation queries, coordinates elevation readout)
short-circuit. Server builds default to SERVER=node and are unchanged.
In static dashboard builds the single-band COG color range is fetched
from the layer's external TiTiler statistics endpoint, shapefile
export computes its .prj client-side instead of calling proj42wkt,
and the time-slider histogram is disabled (the slider itself still
scrubs and animates). Server builds are unchanged.
New Deployments backend module (model syncs in both modes; routes mount
only in lean, admin-gated) with publish/update/delete/list against
per-dashboard CloudFormation stacks (S3 + CloudFront + password
Function, stack name mmgis-dashboard-<id>). publish-static.js bakes a
mission into a static bundle and uploads it; webhooks fire on publish,
update, and delete. Configure gains a Deployments page, a lean-only
nav tab, and a save-bar Publish button. Full mode is unchanged.
Add a delete confirmation modal, make bundle uploads retryable
(explicit ContentLength), skip live stack lookups for deleted rows,
centralize deployment status strings, make publish task re-runs
idempotent when the stack already exists, surface SaveBar publish
errors and guard against double-click duplicates, and deduplicate
requireEnv.
infrastructure/ holds the lean deployment's ECS task definitions,
two-roles-per-task least-privilege IAM (everything pinned to the
mmgis-dashboard-* prefix or the shared asset bucket), the admin
CloudFront config (AllViewer + CachingDisabled, /assets/* behavior),
the password-gate CloudFront Function reference, and the shared S3
asset bucket. deploy-lean.yml builds and pushes the image and rolls
out a new task-definition revision via ECS Express Mode. trust proxy
becomes 2 for the CloudFront->ALB->ECS hop count. The full/upstream
deployment uses none of this.
Give ecr:GetAuthorizationToken its required Resource:"*" (the action
supports no resource scoping; the prior ARN pin was an implicit deny),
grant the admin task role the mmgis-dashboard-* teardown permissions
CloudFormation exercises with the caller's credentials during inline
DeleteStack, drop the dashboards password from the admin task (only
the publish task reads it), and remove the publish task role's unused
runtime secret-read grant.
In lean mode the Upload module writes validated images to the shared
admin asset bucket under assets/<mission>/... and returns the
root-relative /assets/... path, which resolves same-origin in the
admin (CloudFront /assets/* behavior) and in published dashboards
(PR 8 copies the keys into each dashboard bucket). Full mode keeps
writing to Missions/ on disk, unchanged. Validators, size cap, and
the mission path-traversal guard apply identically in both modes.
While any deployment is provisioning, updating, or deleting, the
Deployments page refreshes itself every 15s, and a Configure-level
watcher raises a snackbar when a publish completes (with the dashboard
link), fails, or a delete finishes — so the admin no longer has to
refresh manually to learn the outcome. Polling runs only while a
transition is in flight; one shared poller prevents duplicate request
streams. Frontend-only.
Keep link-carrying toasts (the published-dashboard URL) on screen until
dismissed, combine simultaneous completion toasts into one message
instead of dropping all but the last, and avoid a dangling dash when a
published row has no URL.
PR bodies in the lean series (#138, #139) reference follow-up.md for
deferred items; commit it and the review/merge/deploy next-steps doc so
those references resolve.
The lean ADR commits to two admin-only WebSocket flows (Configure lock
warnings and layer-update push), but ENABLE_MMGIS_WEBSOCKETS and
ENABLE_CONFIG_WEBSOCKETS default to off, so a by-the-README deploy
shipped them silently disabled. Set both true in the admin task def and
document why.
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

✅ Version Already Updated

This PR includes a manual version update to 4.2.11-20260611

No automatic version bump needed.

Add the overhaul-1 plugin-debt items and the non-coder UX gaps surfaced
by the 2026-06-10 integrated sanity check.
slesaad
slesaad previously approved these changes Jun 11, 2026
Lean 1/13: Deployment-mode foundation
Lean 2/13: Gate sidecar proxy, spawner, and mmgis-stac creation
…into lean/pr-03-datasets-geodatasets

# Conflicts:
#	API/Backend/Config/setup.js
Lean 3/13: Gate Datasets/Geodatasets; expose DEPLOYMENT_MODE to Configure
Lean 4/13: Gate Draw out of lean deployments
Lean 5/13: Gate Missions serving, link shortener, and Missions-bound utils
Lean 7/13: Static frontend mode — SERVER flag, dispatcher, ServiceUrls
…into lean/pr-12-hardening

# Conflicts:
#	scripts/init-db.js
Lean 12/13: Harden boot and runtime for managed hosting
Lean 6/13: Hide the populate-from-COG button in lean Configure
Lean 9/13: Static-mode COG range, projection WKT, time-histogram disable
Lean 8/13: Publish flow — Deployments module, publish task, Configure page
Lean 11/13: AWS infra recipes — ECS, IAM, CloudFront, deploy pipeline
Lean 10/13: Repoint lean asset uploads to the shared S3 bucket
Resolve two pre-existing CI failures inherited from development:

- Playwright npm ci failed with an ERESOLVE peer conflict: react-chartjs-2@5
  requires chart.js@4 but chartjs-plugin-zoom@1 pinned chart.js@3. Upgrade
  chart.js ^3.6.0 -> ^4.4.0 and chartjs-plugin-zoom ^1.2.1 -> ^2.2.0 (v2
  accepts chart.js >=3.2.0), so all chart deps agree on v4 and resolve
  cleanly without --force.

- Docker build failed at the micromamba env step: '. /root/.bashrc' ran
  under /bin/sh (dash) and could not parse the bash-only init block. Call
  the micromamba binary directly with MAMBA_ROOT_PREFIX instead of sourcing
  .bashrc.

Drop the now-unnecessary 'npm install --force' from the Dockerfile.

Verified: npm ci (no flags) succeeds, npm run build compiles against
chart.js v4, and unit tests behave identically before/after the upgrade.
Mission/deployment names containing quotes, &, <, or a literal </script>
previously pasted unescaped into the published dashboard's index.html,
breaking the inline <script> (MAIN_MISSION et al.) or the <title>
(LINK_PREVIEW_*). Escape JS-string-context placeholders via JSON.stringify
(plus < -> \u003c so </script> can't close the element) and HTML-context
placeholders as HTML entities.
copyPrefix/copyObjectIfExists wrapped the whole "bucket/key" string in
encodeURIComponent, turning every / into %2F. S3 then reads the result as a
single literal bucket name and the copy fails with NoSuchBucket/InvalidArgument,
so publishing any mission with assets errored out. Add buildCopySource() which
encodes each key segment but preserves the / separators and the bucket/key
boundary. Strengthen the copyPrefix test to assert the raw CopySource (the old
assertion decoded it first, masking the bug) and cover a key with a space.
The DELETE teardown read the bucket only from settings.bucket, which the
async publish task writes at the very end (after it has already uploaded the
bundle). A delete fired after CREATE_COMPLETE but before that write skipped
emptyBucket, so DeleteStack failed on a non-empty bucket and orphaned it.

Extract the teardown into an exported teardownDeployment() and fall back to
the stack's BucketName output (the source of truth, populated in exactly that
window) when settings.bucket is null. Add a unit test covering the fallback
path and the settings.bucket-present shortcut.
getTiTilerUrl/getTiTilerPgStacUrl return null in a static build with no
external service configured, but five builders interpolated that null into a
string like "null/cog/tiles/..." — a request to a bogus host. buildTipgUrl
and buildVeloserverUrl already guarded; extend the same 'if (baseUrl == null)
return null' to buildTiTilerCogTilesUrl, buildTiTilerPointUrl,
buildColormapImageUrl and the two pgSTAC builders (same latent bug). Callers
already tolerate null (e.g. LayersTool gates the colormap <img> on a truthy
URL). Add a unit test covering the null return and the node-mode happy path.
…up to #4)

buildStacItemsUrl shared the same latent bug as the other builders: it
interpolated getStacUrl()'s result, which is null in a static build with no
configured STAC service, producing "null/collections/.../items". Unlike the
colormap caller, the consumer (LayersTool gears handler) did fetch(stacUrl)
with no null check, so the builder guard alone wasn't enough — fetch(null)
coerces to fetch("null"). Add the builder guard and wrap the fetch in
'if (stacUrl != null)' (an early return would skip the layer-toggle logic that
follows). Add buildStacItemsUrl to the null-return test.
…e coverage

This branch had hardwired the in-browser projStringToWkt converter (which only
covers ~12 projections, returning null otherwise) into the shapefile export in
EVERY mode, so full deployments — which still have the GDAL backend — silently
failed to export shapefiles for any other CRS. Route the conversion through the
existing calls.api('proj42wkt') dispatcher instead: full/node builds hit the
GDAL backend (all projections), static dashboards fall back to projStringToWkt
via STATIC_HANDLERS. This restores the development-branch behavior. Drop the now
unused projStringToWkt import from both tools.
…ckend

Follow-up to #6. Routing solely through calls.api was correct for the full
admin and static dashboards but broke the lean admin: it runs as a live node
server (SERVER=node) with the proj42wkt route unmounted by lean's isFull()
gate, so calls.api 404s on a common projection that the in-browser converter
handles fine.

Try projStringToWkt first (covers common projections, no round-trip, works in
every deployment) and only fall back to calls.api('proj42wkt') -> GDAL when it
returns null. Full admin still reaches GDAL for exotic projections (the #6
win); lean/static have no GDAL, so an unsupported projection fails gracefully.
Because the client always tries first, the export no longer depends on the
node-vs-static gating axis being coherent with lean (the deferred #5 work).
…-bucket fixes

Documentation catch-up for the fixes that landed:
- adr.md: delete flow now empties the bucket (name from settings.bucket,
  falling back to the stack BucketName output mid-provision) before
  DeleteStack; clarify CFN teardown empties first.
- api.md / pr-09: proj42wkt is now a fallback — the shapefile export tries
  the in-browser projStringToWkt first and only calls the GDAL route when it
  can't handle the projection (full hits GDAL; lean stays gated 404).
- utils.js: route comment reworded to match the hybrid (fallback, not
  primary).
@BhattaraiSijan BhattaraiSijan merged commit 42b9e0a into development Jun 16, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants