auth: fail-fast on missing signing key + enlace doctor CLI (closes #11)#12
Merged
thorwhalen merged 4 commits intomainfrom Apr 21, 2026
Merged
auth: fail-fast on missing signing key + enlace doctor CLI (closes #11)#12thorwhalen merged 4 commits intomainfrom
thorwhalen merged 4 commits intomainfrom
Conversation
When [auth].enabled is true but the signing key env var is unset or shorter than 32 chars, build_backend() now raises EnlaceConfigError instead of silently skipping auth wiring. The old silent path caused a production incident where /auth/* requests fell through to the SPA catch-all and returned index.html (see #11). Loud opt-out: ENLACE_ALLOW_UNSIGNED=1 restores the prior behavior while logging an error banner, for operators diagnosing a broken box. Also: the OAuth ImportError swallow now logs an error listing the configured providers, so a missing authlib install stops being invisible.
Probes a running gateway over plain urllib (no new deps) to catch silent-degradation failures that static validation can't: - signing-key env check (only meaningful when run in the gateway's env; use --envfile or --skip-env-checks from outside) - oauth importability - frontend_dir sanity - HTTP: /auth/csrf must return JSON with a csrf key (this is the probe that would have caught #11 directly — the SPA catch-all returns text/html when auth is unwired) - HTTP: each app's frontend mount exists (200, 401, or 3xx; fails on 404 and 5xx so 'protected but responding' stays green) - HTTP: each app's API mount returns non-5xx Output: pretty text by default, or --json for CI / deploy pipelines. Exits nonzero on any failure.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #11. Two changes, two commits:
Fail-fast when
[auth].enabled=trueandENLACE_SIGNING_KEYis missing or malformed. Replaces the silentreturnatcompose.py:_wire_auth_and_storesthat caused the production incident described in Auth silently disabled when ENLACE_SIGNING_KEY is missing (production incident) #11 (the gateway booted clean,systemctl is-activesaidactive, but/auth/*was unmounted and the SPA catch-all returned<!doctype html>for every CSRF/login request). The new behavior raisesEnlaceConfigErrorwith a clear remediation message. Loud opt-out viaENLACE_ALLOW_UNSIGNED=1for operators diagnosing a broken box.New
enlace doctorsubcommand — a post-deploy smoke tool. Probes a running gateway over plainurllib(no new deps) and reports pass/fail per check. Catches exactly the regression that motivated this PR: when/auth/csrfreturnstext/htmlinstead of JSON, doctor fails loudly. Also: each app's frontend and API mount, oauth importability, frontend_dir sanity.Bonus fix: the OAuth
ImportErrorswallow in_wire_auth_and_stores(compose.py:470) now logs anERRORbanner listing the configured providers — a missing authlib install stops being invisible.New CLI
Tested against live production
After wiring in, ran against
apps.thorwhalen.com's gateway (the one that had the original incident): 25 pass, 0 fail. Reproducing the regression by revertingEnvironmentFile=in the systemd unit is covered by the tw_platform PR that consumes this as a post-deploy smoke.Breaking change?
Technically yes: any deployment that currently has
[auth].enabled=trueand noENLACE_SIGNING_KEYset will now refuse to start. But such a deployment is already broken —/auth/*wasn't working, and any protected mount was either unreachable or (worse) unchecked. Failing loudly is strictly safer than the silent-serve-SPA fallback.ENLACE_ALLOW_UNSIGNED=1is the escape hatch for operators who need to boot the gateway without auth for diagnostics.Test plan
pytest enlace/tests tests)test_auth_failfast.py)test_doctor.py)--skip-env-checks; 30 pass / 0 fail with--envfile /opt/tw_platform/.envEnlaceConfigErrorwith actionable messageENLACE_ALLOW_UNSIGNED=1restores silent path with loud log line