OIDC Support#1746
Open
pranav-super wants to merge 28 commits into
Open
Conversation
This was referenced Aug 12, 2025
3b702ba to
3cb5001
Compare
9ad91b6 to
efd9d32
Compare
|
6d605b8 to
95ca8ef
Compare
63e9fe2 to
981b01a
Compare
Contributor
|
@jmorton @pranav-super We should tag up about this PR in relation to my new changes in #1741 where I've added socket consolidation + user store defined in a client-side store passed around via svelte context. |
Implement complete OIDC authentication flow including: - Server-side OIDC client using Arctic library with PKCE support - Login, callback, logout, and token refresh endpoints - Updated hooks.server.ts to handle OIDC authentication mode - Modified subscribable stores and effects for token management - Request utilities updated for authenticated API calls
Migrate from passing user through PageData to using a centralized userStore for authentication state. This change: - Removes user parameter threading through page components - Updates all stores to access user from centralized auth store - Refactors route layouts to use reactive auth state - Removes unnecessary +page.ts files that only passed user data - Enables role changes to propagate without full page reload # Conflicts: # src/components/plan/PlanMergeReview.svelte # src/routes/+layout.server.ts # src/routes/+layout.svelte # src/routes/constraints/+layout.svelte # src/routes/constraints/+page.svelte # src/routes/constraints/edit/[id]/+page.svelte # src/routes/constraints/new/+page.svelte # src/routes/dictionaries/+page.svelte # src/routes/expansion/+layout.svelte # src/routes/expansion/rules/+page.svelte # src/routes/expansion/rules/edit/[id]/+page.svelte # src/routes/expansion/rules/new/+page.svelte # src/routes/expansion/runs/+page.svelte # src/routes/expansion/sets/+page.svelte # src/routes/expansion/sets/new/+page.svelte # src/routes/external-sources/+layout.svelte # src/routes/external-sources/sources/+page.svelte # src/routes/external-sources/types/+page.svelte # src/routes/models/+layout.svelte # src/routes/models/+page.svelte # src/routes/models/[id]/+page.svelte # src/routes/parcels/+layout.svelte # src/routes/parcels/+page.svelte # src/routes/parcels/edit/[id]/+page.svelte # src/routes/parcels/new/+page.svelte # src/routes/plans/+page.svelte # src/routes/plans/[id]/+page.svelte # src/routes/plans/[id]/merge/+page.svelte # src/routes/scheduling/+layout.svelte # src/routes/scheduling/+page.svelte # src/routes/scheduling/conditions/edit/[id]/+page.svelte # src/routes/scheduling/conditions/new/+page.svelte # src/routes/scheduling/goals/edit/[id]/+page.svelte # src/routes/scheduling/goals/new/+page.svelte # src/routes/sequence-templates/+layout.svelte # src/routes/sequence-templates/+page.svelte # src/routes/tags/+page.svelte # src/routes/workspaces/+layout.svelte # src/routes/workspaces/+page.svelte # src/routes/workspaces/[workspaceId]/+layout@.svelte # src/routes/workspaces/[workspaceId]/actions/+layout@.svelte # src/routes/workspaces/[workspaceId]/actions/runs/[runId]/+layout@.svelte # src/routes/workspaces/[workspaceId]/actions/runs/[runId]/+page.svelte # src/stores/sequencing.ts # src/stores/tags.ts
Create rule.ts module for enforcing authentication requirements at the route level, enabling consistent access control across the application.
Implement Playwright tests for OIDC authentication including: - OIDC fixture for handling auth flows in tests - Login/logout test scenarios - Token refresh verification - Updated helpers and AppNav fixture for OIDC support
Align legacy auth routes (login, logout, changeRole) with OIDC implementation by using SvelteKit's event-based cookie API instead of manual header manipulation. Reduces code duplication and improves consistency.
The Client singleton was initiating a fetch for the well-known configuration but not awaiting it, causing endpoint values to be undefined when the constructor completed before the fetch. Changes: - Convert Client.instance to async getter returning Promise<Client> - Move initialization to async init() method that awaits well-known fetch - Update all call sites to await Client.instance - Uncomment OIDC_CLIENT_SECRET env var to fix type error
# Conflicts: # src/routes/workspaces/[workspaceId]/actions/+layout.ts # src/routes/workspaces/[workspaceId]/actions/+page.svelte # src/routes/workspaces/[workspaceId]/actions/runs/[runId]/+layout.ts
…clean up existing fixture
…s reconnecting banner on intentional WS restarts
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


___REQUIRES_GATEWAY_PR___="131"Summary
Adds OpenID Connect (OIDC) authentication to plandev-ui, alongside the existing username/password (JWT) and SSO/CAM login modes. Operators authenticate through their organization's identity provider (e.g., Keycloak) using the standard Authorization Code + PKCE flow, and the UI handles refresh, role-switching, and logout end-to-end. OIDC is the standard protocol for organizational single sign-on and is supported by most modern identity providers (Keycloak, Okta, Azure AD, etc.). This PR enables PlanDev be deployed in environments where the mission's team manages access centrally through their existing OIDC infrastructure rather than maintaining a per-app user store. OIDC is off by default and opt-in with
PUBLIC_AUTH_OIDC_ENABLED=true.Glossary
verifier, sends only its SHA-256 hash to the IdP up front, and reveals the verifier only when exchanging the code for tokens. Prevents an attacker who intercepts the authorization code from being able to redeem it./protocol/openid-connect/certson Keycloak) listing the public keys it uses to sign tokens. Anyone validating a token fetches the matching public key from this URL. This is how plandev-ui, plandev-gateway, and Hasura all independently verify JWT signatures without sharing a secret with the IdP.OIDC alongside existing auth modes
PlanDev now supports three mutually exclusive auth modes, selected at runtime via env vars:
gateway login()→ direct DB writePUBLIC_AUTH_SSO_ENABLED=truegateway login()→ direct DB writePUBLIC_AUTH_OIDC_ENABLED=truegateway session()→ direct DB write (new, this PR)The three are mutually exclusive at runtime — Hasura validates JWTs with one configuration (HS256 or RS256+JWKS) and can't accept both simultaneously on the current Hasura v2.12.1.
The auth-mode router lives in
src/hooks.server.tsand dispatches tohandleJWTAuth,handleSSOAuth, orhandleOIDCAuth. Each ends withevent.locals.userpopulated; downstream code is mode-agnostic.Architecture: how the pieces talk in OIDC mode
Full first-time login flow:
sequenceDiagram autonumber participant B as Browser participant U as plandev-ui<br/>(SvelteKit server) participant I as IdP<br/>(Keycloak) participant G as plandev-gateway participant H as Hasura participant P as Postgres Note over B,U: User clicks "Login Using OIDC" B->>U: GET /oidc/login U->>U: Generate PKCE verifier, state, nonce<br/>(set as short-lived cookies) U-->>B: 302 → IdP /authorize B->>I: GET /authorize (code_challenge, state, nonce) I-->>B: 302 → /oidc/callback?code=...&state=... B->>U: GET /oidc/callback U->>I: POST /token (code + PKCE verifier) I-->>U: access_token, id_token, refresh_token U->>U: Verify signatures via JWKS<br/>(issuer, audience, nonce) U-->>B: Set-Cookie + 302 → /plans Note over B,P: Per request from now on B->>U: GET /plans U->>U: Verify JWT (JWKS) U->>G: GET /auth/session (Bearer token) G->>G: Verify JWT (JWKS) + extract claims G->>P: SELECT users_and_roles<br/>(INSERT if new — lazy upsert) G-->>U: { success: true } U-->>B: SSR'd page B->>H: GraphQL query/subscribe (Bearer token) H->>H: Verify JWT (JWKS) H->>P: data P-->>H: rows H-->>B: dataNotes:
/auth/sessionpath — called once per server request to validate the JWT and (new in this PR) provision first-time users.gateway → SELECT/INSERT users_and_roles). The gateway writes the row directly via SQL with elevated DB credentials — same pattern JWT/SSO modes use today.Refresh has its own flow, triggered by a browser-side timer rather than the IdP:
sequenceDiagram autonumber participant T as Browser timer participant B as Browser participant U as plandev-ui participant I as IdP participant H as Hasura (WS) Note over T: Set to fire ~10s before<br/>access token's exp claim T->>B: setTimeout fires B->>U: POST /oidc/refresh U->>I: POST /token (refresh_token grant) I-->>U: new access/id/refresh tokens U->>U: Verify new tokens via JWKS U-->>B: Set-Cookie (new tokens) B->>B: cookieStore change → restartSharedClient() B->>H: close WS (code 4999 "Client Restart") + reconnect<br/>with new accessToken in connectionParams B->>B: subscribe(accessTokenDecoded) fires<br/>schedule next refresh for new expWhy OIDC lives in the UI rather than the gateway
Two pieces of OIDC have to live in the UI no matter what:
verifier,oidc_state,oidc_nonce) must land on the UI's origin because the IdP redirects back to the UI's/oidc/callback. The gateway runs on a different origin (e.g.,localhost:9000vs the UI'slocalhost:3000, and different hosts/ports in production), so a cookie set by the gateway wouldn't be sent on the IdP→UI redirect. They have to live on the UI's origin to be available when the callback handler reads them.What we did move gateway-side is what fits gateway-side naturally: user provisioning (writing
permissions.userson first login). That matches what JWT/SSO modes already do today viagateway login() → getUserRoles(). It uses direct DB writes with elevated credentials, bypassing Hasura's per-role insert permissions.Responsibilities of Services:
/auth/session, lazy user-row upsert via SQLHow API users fit in (aerie-cli, scripts, automation)
OIDC support in this PR is for the browser flow only. API users —
aerie-cli, scheduled jobs, custom scripts hitting Hasura directly — wouldn't use any of this code path. They authenticate against the IdP separately using flows appropriate for non-browser clients (device code flow for interactive CLI logins, client credentials grant for service-to-service automation, etc.) and end up holding a JWT they can send to Hasura/gateway/action-server asAuthorization: Bearer <token>.That works because Hasura validates every JWT against the IdP's JWKS regardless of where the token came from. So putting browser-side OIDC logic in the UI doesn't restrict API access in any way —
aerie-cliand other consumers go directly to the IdP. The UI's OIDC code is a browser convenience layer, not a chokepoint.When
aerie-cliadds OIDC support, it can use this PR's realm config as a reference for the IdP-side setup (test users, role mappers, default-role attribute) but its client-side code will be different (no PKCE/cookies; device code flow + token storage on disk).Technical breakdown
OIDC flow (UI server-side)
src/lib/server/oidc.ts—Clientsingleton (PKCE), JWT verification via JWKS, lazyinit()fromOIDC_WELL_KNOWN_URL, signature + audience + issuer + nonce checks per OIDC spec.src/routes/oidc/login/+page.server.ts— Generates PKCE verifier, state, nonce; validates thebackquery param against open-redirect attacks.src/routes/oidc/callback/+page.server.ts— Validates state, exchanges code for tokens, verifies nonce, sets cookies, redirects back to the original page.src/routes/oidc/refresh/+server.ts— Refreshes access/id tokens; returns 401 on refresh-token rejection so the client can detect and log out.src/routes/oidc/logout/+server.ts— Clears local cookies, redirects to IdP logout withid_token_hint.OIDC flow (UI client-side)
src/lib/stores/oidc.ts— Refresh scheduling derived from the access token'sexpclaim; cookie-store listener proactively restarts the GraphQL WS on accessToken change so Hasura doesn't unilaterally close the connection on expiry.src/utilities/auth.ts— SharedcomputeRolesFromJWT(used by all three auth modes).src/lib/types/oidc.ts— SharedextractClaimshelper +ClaimsConfigtype (server and client agree on the JWT claim shape).Hooks + layout
src/hooks.server.ts— Three-mode dispatch (JWT/SSO/OIDC), CSP report-only headers, prefix-anchored "non-protected path" detection.src/routes/+layout.server.ts— Auth-gate viaenforce(locals.user, userIsDefined); prefix-anchored to avoid substring-match false negatives.src/routes/+layout.svelte— Reactive user store; OIDC-only cookie-store + refresh wiring.Gateway-side (cross-repo)
src/packages/auth/functions.ts—session()now callsgetUserRoles(lazy upsert) so OIDC users get apermissions.usersrow on first sight via direct DB write. Same pattern as JWT/SSO login uses today; bypasses Hasura GraphQL permissions entirely.User-visible impacts in OIDC mode
What an operator notices that's different from JWT/SSO mode:
/loginsays "Login Using OIDC" and routes through the IdP.default_roleattribute (per-user, mission-configurable)./login?reason=Session expired - please log in again. The reason survives the IdP roundtrip via a short-lived server-sidelogoutReasoncookie set by/oidc/logoutand consumed by+layout.server.tswhen it redirects to/login. WebSocket subscriptions don't blink during a refresh — the proactive restart cycles cleanly with the new token, and the "Reconnecting..." banner is explicitly suppressed for intentional restarts (gated on close code4999+ reason"Client Restart", so a foreign close at the same code can't accidentally swallow the banner).id_token_hintto the IdP's logout endpoint so the IdP session is also destroyed (not just the local plandev session).How to run
plandev-ui env vars
Required in plandev-ui's
.env(or sourced from.env.test.oidcfor local OIDC test runs):PUBLIC_AUTH_OIDC_ENABLED— set totrueto enable OIDC. The login page renders the "Login Using OIDC" button when this is set; otherwise the UI falls back to JWT/SSO.OIDC_CLIENT_ID— the ID the IdP assigned to your client (often something likeaerie). The PKCE flow sends this on every IdP call.OIDC_REDIRECT_URI— where the IdP sends the browser back after auth. Must match what's registered with the IdP. Locally:http://localhost:3000/oidc/callback.OIDC_ISSUER— the IdP's issuer URL (e.g.,http://localhost:8000/realms/aerie-devfor the test realm). Used to validate theissclaim on incoming JWTs.OIDC_JWKS_URL— the IdP's JWKS endpoint (/protocol/openid-connect/certson Keycloak). The UI fetches public keys from here to verify token signatures.OIDC_AUDIENCE— the expectedaudclaim on the ID token. Typically equalsOIDC_CLIENT_ID. Validated only on the ID token per OIDC spec; access tokens are treated as opaque.OIDC_SCOPES— space-separated scopes requested at login.openid profile emailis the typical baseline; defaults to that if unset.Either supply the well-known discovery URL OR each endpoint individually:
OIDC_WELL_KNOWN_URL— the IdP's well-known config URL (usually contains/.well-known/openid-configuration). When set,OIDC_AUTHORIZATION_URL,OIDC_TOKEN_URL, andOIDC_LOGOUT_URLare auto-discovered at startup.OIDC_AUTHORIZATION_URL/OIDC_TOKEN_URL/OIDC_LOGOUT_URL— explicit endpoints. Required only if well-known discovery is unavailable.Optional advanced configuration:
OIDC_ALGORITHMS— space-separated list of allowed JWT signing algorithms. Defaults toRS256, which is what every modern IdP uses. Override only if your IdP signs with something unusual (RS384,RS512,ES256, etc.).OIDC_CLAIMS_NAMESPACE/OIDC_CLAIMS_USER_ID/OIDC_CLAIMS_ALLOWED_ROLES/OIDC_CLAIMS_DEFAULT_ROLE— claim paths within the JWT. Default to the Hasura convention (https://hasura.io/jwt/claims+x-hasura-user-idetc.). Override if your IdP can't be configured to emit Hasura-style claims and you map them in Hasura'sclaims_mapinstead.PUBLIC_OIDC_CLAIMS_*— same four claim paths, but read by browser-side code. Must match the non-PUBLIC versions; the duplication is needed because SvelteKit's$env/dynamic/publiconly exposesPUBLIC_-prefixed vars to the browser.OIDC_CLIENT_SECRET— not used by the current PKCE flow (PKCE is what replaces the need for a client secret in public/browser clients). Present in.envfor forward compatibility if a future flow (e.g., client credentials) is added.Hasura + gateway env vars
If using OIDC, PlanDev's
HASURA_GRAPHQL_JWT_SECRETneeds to change from HS256 to RS256 withjwk_urlmatchingOIDC_JWKS_URL. Example:{ "type": "RS256", "jwk_url": "http://aerie_keycloak:8000/realms/aerie-dev/protocol/openid-connect/certs", "claims_namespace": "https://hasura.io/jwt/claims" }The same
HASURA_GRAPHQL_JWT_SECRETis consumed by plandev-gateway, action-server, and workspace-server — all of them need to validate IdP-issued JWTs against the same JWKS.For local development against the bundled Keycloak test realm:
Test users (configured in
e2e-tests/oauth/realm-export.json):AerieAdmin/password— has all three rolesAerieUser/password—user+viewerAerieViewer/password—vieweronlyTear down:
Testing
The OIDC suite covers: login as admin/user/viewer, role switching, refresh, logout, multi-tab refresh coordination, tab-backgrounding refresh.
Manual test scenarios
Prerequisites: OIDC stack up (
npm run test:e2e:oidc:setup), UI started with OIDC env vars sourced. Browser must support the Cookie Store API (Chrome/Edge) for automatic token refresh.Normal operation (happy path)
/oidc/refreshrequest appearsconnection_init)Offline resilience
Role switching
x-hasura-rolein headersLogout flow
/loginLong-running session
HMR during development (dev only)
Big decisions and where the complexity is
OIDC lives in the UI; user provisioning lives in the gateway. Why each piece is where it is — see "Why OIDC lives in the UI rather than the gateway" above. Short version: PKCE cookies and refresh trigger have to be in the UI because of how the browser hits the IdP and Hasura; user-row provisioning fits the gateway because that's where every other auth mode writes the row today.
WebSocket lifecycle + token refresh — non-obvious behavior worth a closer read, collapsed because it's longest:
WS Lifecycle & Token Refresh details
Hasura validates JWT not only at
connection_initbut also continuously monitors token expiration, closing WebSocket connections when JWTs expire (observed in Hasura logs:"Could not verify JWT: JWTExpired", close codes 1006/4400).Implementation:
INTENTIONAL_RESTART_CODE = 4999,INTENTIONAL_RESTART_REASON = "Client Restart") so theclosedhandler can recognize our restart and suppress the transient'reconnecting'state — otherwise the banner would flash on every refresh. Gating on both code AND reason prevents a foreign close at the same code from accidentally swallowing the banner. Trade-off: if our reconnect ever hangs on a network/server issue, the banner won't surface until graphql-ws's ownconnectionAckWaitTimeoutfires (~15s) and a non-4999 close follows.accessTokenDecoded(object, never deduped) rather than the numericdelaystore (which Svelte primitive-deduped, causing refreshes to silently stop after identical-delay cycles)connectionStatelistener (fast, ~100ms when graphql-ws auto-reconnects) and a 5s fallback timer (kicks graphql-ws out of lazy mode when all subscriptions are terminated)/oidc/refreshreturning 401 →logout('Session expired - please log in again')→/oidc/logout?reason=...→ server stashes the reason in anhttpOnlylogoutReasoncookie (60s TTL) → IdPend_session_endpoint→ IdP redirects back to origin (query params stripped) →+layout.server.tsconsumes and deletes the cookie, appending&reason=...to the/loginredirect → existing/loginreason handling surfaces it to the user.on.errormatches bothJWTExpiredandJWSError(signature-invalid tokens, e.g., from IdP key rotation or tampering)Refresh trigger lives in the client. The "gateway" can't intercept Hasura HTTP/WS calls (the UI hits Hasura directly, the gateway isn't a network gateway), so the refresh trigger has to live in the browser. setTimeout-based, with refresh-on-401 fallback in the WS error handler.
Auth-mode env coupling. Hasura's
HASURA_GRAPHQL_JWT_SECRETis process-global; one Hasura instance can't validate both HS256 (gateway-signed) and RS256 (Keycloak-signed) tokens simultaneously on v2.12.1. CI runs two phases: the regular suite with HS256, then OIDC with RS256.Each user's "default role" comes from a setting on their IdP profile, not from a guess. A user can have multiple roles (e.g., an admin who also has
userandviewer). When they log in, the UI has to pick one to start in. The previous Keycloak config asked Keycloak to pick a role from the user's list, and Keycloak picked arbitrarily — so an admin could land in viewer-mode unpredictably.Now each IdP user has an explicit
default_roleattribute on their profile (e.g.,"default_role": ["aerie_admin"]), and that attribute is what gets stamped into the JWT. Two practical benefits:viewerfor safer day-to-day use even though those users have higher privileges they can opt into via the dropdown. We don't hardcode an "admin > user > viewer" priority anywhere in the UI.Configured in
e2e-tests/oauth/realm-export.json(test realm); each production deployment configures the same on their own IdP.Open questions
usercookie ({ id, token }); OIDC mode stores the access token directly as a plainaccessTokencookie. The sharedgetToken()insrc/stores/gqlClient.ts:81-102has to branch on which shape it sees, and similar mode-specific branching exists in the hooks dispatcher and gateway-side cookie handling. Long-term, unifying on the OIDC-style (accessTokencookie + deriveidfrom JWT claim) would collapse a lot of duplication. Out of scope here because it's cross-repo (gateway constructs theusercookie today) and the refactor logically pulls in SSO/CAM's cookie handling too — both modes whose auth we don't want to risk regressing in this PR. Worth its own coordinated PR after this lands.src/lib/server/oidc.tssetssecure: !devon auth cookies — works in dev (HTTP) and production over HTTPS, but breaks if a production deploy ever runs behind HTTP (e.g., internal network without TLS). Worth a follow-up to thread the request URL in and computesecure = url.protocol === 'https:'.setTimeoutisn't cancelled whenlogout()is called, so it can fire once during the navigation to/oidc/logout, producing an extra/oidc/refresh+/oidc/logoutpair in logs. No functional impact. Optional fix:cancelScheduledRefresh()called fromlogout().src/stores/plan.ts:111destructuresrevisionfromdatawithout null-guarding. Out of scope for this PR, separate follow-up.HASURA_GRAPHQL_JWT_SECRETS(plural) so one Hasura could validate both HS256 and RS256 simultaneously, eliminating the CI two-phase split.TODOs
secureflag depend on request protocol, not justdevmode (production-HTTP deploy correctness)cancelScheduledRefresh()inlogout()to silence the cosmetic duplicate refresh + logout pairplan.ts:111destructure-of-null here or as a separate PR (it's pre-existing, not OIDC-introduced)docs/TESTING.md(or rely on this PR description; either way, devs need to know abouttest:e2e:oidc:setup)