release: To Prod by suisuss · Pull Request #1388 · KeeperHub/keeperhub

suisuss · 2026-05-27T04:05:39Z

No description provided.

Reset-password form: the authenticator code input is no longer conditional on a server retry. The fields are now ordered reset code -> authenticator code -> new password -> confirm password, with helper copy that explains the authenticator field is optional for accounts without 2FA. Users who have TOTP enrolled no longer need a round-trip with mfa_code_required to learn the field exists. Step indicator: the current and done bubbles use bg-keeperhub-green instead of the primary navy. The pending state drops the muted treatment so the digit and label sit at the default foreground weight, making the upcoming step legible instead of looking disabled. The digit uses dark:text-background so the dark navy shows on the bright green bubble in dark mode. Same three changes apply to the sign-in dual-factor stepper, the shared DualFactorSteps component, and the TOTP setup dialog so the whole MFA surface stays consistent.

interceptOauthCallback had six silent-return branches plus two success paths and no way to tell which one a failing user took. Each branch now emits a structured warn with provider + status + the relevant identifying fields so Sentry / Loki surface the path. Each log line is keyed on a stable error_type so we can group: - non_redirect_response (Better Auth returned 4xx/5xx) - no_session_token_in_set_cookie (interceptor saw no session cookie) - session_row_not_found (cookie was set but row missing) - user_email_missing (row found but no email to deliver an OTP to) - pending_oauth_mfa_path (TOTP-enrolled user, defer to /verify-mfa) - pending_signup_mfa_path (no TOTP, defer to /enroll-mfa)

Defense-in-depth backstop for the deactivated-user auth bypass. Adds a BEFORE INSERT trigger on sessions that rejects any insert whose owning user has deactivated_at set, covering paths that bypass the app-layer gate in lib/auth-deactivation-guard.ts (future OAuth providers, direct db.insert(sessions) call sites, hook regressions). The rejection raises a dedicated SQLSTATE KH001 plus a fixed message so a future detection layer can classify it by error code alone. Shipped as a hand-authored versioned migration (0090) since drizzle-kit generate is broken on this repo; no snapshot, matching 0086-0088. Tests: static drift assertions on the migration SQL (tests/unit) and a runtime KH001 contract test against a real Postgres (tests/e2e/vitest).

The scheduler selected due workflows filtering only enabled=true, never excluding soft-deleted workflows or those whose owner is deactivated, so stale-but-enabled workflows were dispatched and only stopped by the block_executions INSERT trigger. The five execution entry points (scheduler select, executor dispatch, HTTP execute, webhook, agent-call lookup) each gated lifecycle state differently and had drifted. Introduce lib/workflow/executable.ts as the single source of truth: - workflowExecutableConditions(): SQL WHERE fragment for the select sites (scheduler, agent-call lookup), joining users for the owner-active clause. - getWorkflowExecutability(): in-memory discriminated predicate for the fetch-then-gate sites (executor, execute, webhook), returning a reason so callers preserve their HTTP semantics (webhook 410 disabled vs 404 gone). Both encode the same enabled + not-deleted + owner-active columns. Org/ subscription status stays in the billing layer; the block_executions DB trigger remains a hard backstop. Behavior changes: agent-call lookup now also requires enabled=true; a webhook workflow that is both deleted and disabled now returns 404 (gone) rather than 410.

The runtime test reads the SQLSTATE from err.cause.code (drizzle wraps the postgres-js error in DrizzleQueryError); the docstring claimed err.code. Also corrected the skip note: tests/setup.ts always sets a default DATABASE_URL, so the no-URL skip branch rarely fires - SKIP_INFRA_TESTS is the real opt-out.

The trigger reads committed deactivated_at, so a session insert racing a not-yet-committed deactivation can slip through. Document that this is a backstop behind the app-layer gate, not an airtight guarantee, so no future reader assumes it closes the concurrent-deactivation window on its own.

Qualify the table as public.users so the trigger always reads the real table regardless of the caller's search_path (the function runs SECURITY INVOKER). Removes the search_path-shadowing ambiguity of the unqualified reference. Update the drift test's matcher to expect the qualified name.

Add a self-contained e2e case: a user deactivated then reactivated (deactivated_at back to NULL) can have a session created again. Proves the trigger evaluates the live deactivated_at rather than a one-time state.

The execute route serves both automated dispatch (internal service auth) and the interactive editor "Run" button (session/OAuth). Workflows default to enabled=false, so gating enabled unconditionally would 404 every test run of a not-yet-enabled workflow during authoring. Apply the enabled gate only on the internal/automated branch; the interactive branch still blocks soft-deleted and deactivated-owner workflows but allows a disabled one through so it can be test-run. Also drop the now-redundant access.isDeleted checks - deletion is handled uniformly by the executability gate.

…ated-trigger feat: add sessions INSERT trigger rejecting deactivated owners

The executor fetched the workflow, then issued a second query for the owner's deactivatedAt. Fold it into a single leftJoin so each dispatched message costs one round-trip instead of two. The route sites keep their separate lookup - the app db does not register the workflows relation and a join there would force a large test-harness rework for a single indexed lookup.

The forgot-password route generated its own 6-digit code and wrote it to the verifications table verbatim, sitting next to the better-auth emailOTP rows that storeOTP "encrypted" has been hardening for months. A DB read alone was therefore enough to recover a live reset code, the exact threat the encrypted store was put in to close. Same primitives as the strict-signin path: symmetricEncrypt under BETTER_AUTH_SECRET, stored as <ciphertext>:0 to match the emailOTP suffix shape; constant-time decrypt-and-compare on verify. Also drop the 500 on send failure. The row is already minted and the rate limiter has already debited, so escalating a SendGrid hiccup also leaks the "this email exists" bit that the user-not-found and OAuth-only branches go out of their way to suppress. sendVerificationOTP keeps the failure log path. UI copy shifts from "If an account exists ..." to a confident "Check your inbox for the reset code." for the same reason.

Reset-password form: the "(if 2FA is enabled)" hint and helper paragraph are gone; the field is just labelled "Authenticator code" since the server already knows whether it needs that input. The now-dead forgotNeedsMfa state and the mfa_code_required reveal branch in the client are deleted; the field is always present so there is nothing to reveal. Success toast switches to "Check your inbox for the reset code." to match the new server copy. MFA stepper across the sign-in, reset, /verify-mfa and /enroll-mfa surfaces: - "Email code" and "Authenticator code" input labels picked up the keeperhub-green-dark accent so they read as "the thing to fill in," not just another label. - Pending step number and label drop the muted treatment so upcoming steps stay legible instead of looking disabled. - The TOTP setup dialog's "Skip for now" button on the download-codes step is renamed to "Skip" and made the primary action, themed bg-keeperhub-green with the same dark-on-green contrast pattern the bubbles use.

Org API keys are admin/owner-only at /api/keys via requireAdminOrOwnerWithMfa, but the overlay let any member click "New API Key" and the per-row trash button. The server rejection landed as a generic 403 toast which read as a bug. Members now see the list as before but the create action and the per-row delete are disabled, with a banner above the list and a tooltip on the trash button explaining the requirement. Personal webhook (User) keys stay fully writable for every user since those are user-scoped, not org-scoped. Role state is read from useActiveMember, which is built on authClient.useActiveOrganization, so the gate flips automatically when the user switches orgs without a refresh.

Drives the actual GET /api/internal/schedules handler against a real database and asserts it returns only the healthy workflow's schedule while excluding soft-deleted, disabled, and deactivated-owner ones, then re-includes a schedule after the owner is reactivated. Unmocks the globally-stubbed @/lib/db so the route runs against seeded rows; skips when no database is available.

The module gates five execution entry points, not four.

The execute and webhook routes carried near-verbatim copies of the background kick-off (start() + runId persist + error classification). Extract a single lib/workflow/execute-in-background.ts that both call, parameterized by a log prefix and endpoint, so the two cannot drift (the webhook copy had already lost the credential-handling SECURITY comment).

The K8s runner and in-process executor re-checked enabled and deleted_at as a defensive backstop but not owner deactivation, so they could drift from the dispatcher's gate if the owner was deactivated between dispatch and execution. Route both through getWorkflowExecutability so all three lifecycle conditions are covered, cancelling the execution rather than running it.

The executor created execution rows as "pending" (then runner/in-process flip to "running"), but the execute and webhook routes created them as "running", so API-triggered executions skipped the pending phase. Create rows as "pending" in the routes too and flip to "running" in executeWorkflowInBackground, guarded on the current status so a fast completion is never clobbered back to running. Every entry point now shares pending -> running -> terminal. ("pending" is already the workflow_executions schema default.)

…n status executeWorkflow writes the final execution status itself - via its _workflowComplete step into logWorkflowCompleteDb, which carries the error->success reconciliation, the no-overwrite guard, and richer fields (transactionHashes, error classification, duration). The K8s runner and in-process executor then called updateExecutionStatus again after it returned, overwriting that authoritative row with a simpler one and defeating the reconciliation (verified: executeWorkflow alone leaves the row at "success" with completedAt set, no updateExecutionStatus needed). Drop the post-run updateExecutionStatus(success/error) calls from both paths (schedule bookkeeping and the catch-block fallback stay), and guard updateExecutionStatus so its remaining callers (running, cancelled, fatal fallback) can never overwrite a row that already reached success/cancelled - matching the engine's own KEEP-431 guard. Execution-row output on these paths now matches the API/DevKit path.

The running-execution concurrency cap was only checked on the API route, so schedule/block/event runs dispatched to k8s-job or in-process targets bypassed it entirely. Extract the check into a server-only-free lib/workflow/concurrency that both the route wrapper and the standalone executor share, and enforce it in the executor before creating the execution row. On breach the executor throws so the SQS message is redelivered after the visibility timeout (back-pressure) rather than dropping the scheduled run.

…s guards The executor's billing-guard reimplemented the monthly execution-limit check that lib/billing/plans-server.ts already does - the same count query and the same allow/overage/block branch order, maintained by hand on both sides of the server-only boundary. Extract a server-only-free lib/billing/execution-limit-core (countMonthlyExecutions, decideExecutionLimit, effectiveExecutionLimit) that both call, so the branch logic and count can no longer drift. Each side keeps its own result shape and its own debt source: the route's is Stripe-aware and server-only, the executor's is a plain active-debt sum, so those cannot be merged. Feature gating is already shared via lib/features/workflow-validator (validateWorkflowFeatures), so only the inherently db-bound plan lookup differs there - left as is. Minor: the executor's active-debt block result now reports the actual monthly used count instead of 0 (it now counts before deciding, like the route).

…flow call The in-process executor and K8s runner call executeWorkflow directly rather than start() from workflow/api, which the DevKit editor plugin flags. That is deliberate: these are standalone processes with no DevKit run-processor, so the pod is the execution boundary and runs the workflow synchronously. Document the rationale and the durability tradeoff (no checkpoint/resume; with backoffLimit:0 a crashed pod leaves the row "running" until a sweeper closes it) at both call sites. Routing through start() would persist runs nothing executes, so it is not the fix; the stuck-running gap is tracked separately.

The two success-path diagnostics for the OAuth interceptor were logSystemWarn, which routes through captureException and surfaces in Sentry as a warning. Every legitimate OAuth login by a member without TOTP would therefore page Sentry with pending_signup_mfa_path, drowning the actual signals we added the logs for. Switch the two success branches to console.info with the same structured payload. The four genuine-failure branches (non_redirect_response, no_session_token_in_set_cookie, session_row_not_found, user_email_missing) stay on logSystemWarn because those should page.

…fa-stepper-green fix(auth): reset-password UX, MFA stepper polish, API-keys role gate

The runner and in-process executors no longer relied on their own write to close the execution row, deferring entirely to executeWorkflow's internal _workflowComplete write. That write is best-effort (caught and logged, never rethrown), so if it was lost the row stayed stuck running with no recovery. Tighten updateExecutionStatus to only transition a non-terminal row (exclude success, error, and cancelled), then re-issue the success/error write from the runner and in-process success branches. The guard makes the re-issued write a no-op once the engine's own terminal write landed - so it cannot clobber the reconciled status or richer fields - and closes the row only when that write was lost.

countMonthlyExecutions sums two COUNT(*) subqueries; postgres-js serializes bigint as a string, so used came back as a string for the executor billing guard (the query builder it replaced cast to ::int and returned a number). Cast each COUNT to ::int so both the Next billing path and the executor get a numeric used.

fallbackCompleteExecution was defined but never called anywhere, and its two required env vars (EVENTS_SERVICE_API_KEY, NEXT_PUBLIC_APP_URL) are not wired into the executor, so it would no-op even if invoked. It also wrote via an endpoint guarded only against cancelled, so it could clobber a terminal status. The guarded same-handle backstop now covers the lost-write case; drop the dead indirection and its test.

The agent-call lookup gated on workflowExecutableConditions (which folds in enabled), so a listed-but-disabled workflow fell out of the result set and collapsed to a generic 404 - misreporting a paused workflow as gone, and discarding the disabled-vs-gone distinction the rest of the lifecycle work preserves. Split the predicate: workflowReachableConditions gates the hard-gone states (soft-deleted, deactivated owner) in SQL as 404, and the route checks enabled in-memory on the loaded row, returning 503 temporarily-unavailable for a disabled workflow. No info leak since the listing is already public. workflowExecutableConditions now composes on top of the reachable predicate so the scheduler select is unchanged.

Real-DB test proving the guard that makes the runner/in-process re-issued write a safe backstop: it closes a still-running row, and never overwrites a row already in a terminal state (success, error, cancelled). The error case exercises the ne(status,'error') clause specifically, so a re-issued write cannot clobber the engine's reconciled terminal status.

…cle-filter fix: gate workflow execution on a shared lifecycle predicate

joelorzet and others added 30 commits May 26, 2026 21:48

test: TECH-12 cover session insert after owner reactivation

406140b

Add a self-contained e2e case: a user deactivated then reactivated (deactivated_at back to NULL) can have a session created again. Proves the trigger evaluates the live deactivated_at rather than a one-time state.

Merge pull request #1385 from KeeperHub/feat/TECH-12-sessions-deactiv…

4dd72ce

…ated-trigger feat: add sessions INSERT trigger rejecting deactivated owners

docs: KEEP-611 correct entry-point count in executable predicate

4227733

The module gates five execution entry points, not four.

Merge pull request #1387 from KeeperHub/fix/reset-password-form-and-m…

1975028

…fa-stepper-green fix(auth): reset-password UX, MFA stepper polish, API-keys role gate

Merge pull request #1386 from KeeperHub/fix/KEEP-611-scheduler-lifecy…

71017b8

…cle-filter fix: gate workflow execution on a shared lifecycle predicate

suisuss temporarily deployed to staging May 27, 2026 04:05 — with GitHub Actions Inactive

suisuss temporarily deployed to staging May 27, 2026 04:17 — with GitHub Actions Inactive

suisuss merged commit ca30461 into prod May 27, 2026
25 checks passed

OleksandrUA had a problem deploying to staging May 27, 2026 16:42 — with GitHub Actions Failure

OleksandrUA had a problem deploying to prod May 27, 2026 17:06 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: To Prod#1388

release: To Prod#1388
suisuss merged 31 commits into
prodfrom
staging

suisuss commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

suisuss commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants