release: To Prod by suisuss · Pull Request #1359 · KeeperHub/keeperhub

suisuss · 2026-05-25T02:06:53Z

No description provided.

The scheduler dispatcher polls every 60s, so the effective firing resolution is 60s. A row with `interval_seconds = 30` doesn't fire every 30s -- it fires every poll cycle (every 60s), because shouldTriggerInterval's `elapsedMs % intervalMs < 60_000` predicate is always true when intervalMs < 60_000. Silent under-firing, not a crash. Same class of bug as the original KEEP-575 "UI says X, schedule does Y" mismatch. The UI minimum is 1 minute so UI traffic never hits this path. Only API / DB-direct writes can land sub-60s values; reject them so the config falls back to cron (or no-op) at sync time instead of being stored and silently miscadenced. Three new boundary tests via extractScheduleConfig: - interval=30s -> cron fallback - interval=59s -> cron fallback (one below the floor) - interval=60s -> interval mode accepted (the exact floor)

…er script scripts/scheduler/schedule-dispatcher.ts is a standalone variant that parses cron_expression directly. Since KEEP-575 PR #1282 commit 2510ed5, interval-mode rows store a fixed `0 0 1 1 *` sentinel in that column ("00:00 on Jan 1"); if this script parsed it, it would "fire" the schedule once a year. Add intervalSeconds to the SELECT, filter out non-null values, and log the skipped count so the omission is visible rather than appearing as an unexplained drop in dispatched schedules. The production dispatcher in keeperhub-scheduler/schedule-dispatcher/dispatch.ts already handles both modes correctly; this fix only narrows the standalone variant to its actual capability.

updateScheduleStatus in workflow-runner-profiled.ts computes next_run_at by parsing cron_expression. For interval-mode rows the column holds a `0 0 1 1 *` sentinel; parsing it would write "Jan 1 next year" to next_run_at -- visibly wrong for an "every N minutes" schedule. The production updateScheduleStatus in lib/schedule-service.ts (and its peer in keeperhub-executor/lib/db-helpers.ts) handles interval mode correctly via computeNextIntervalRunTime. This profiling runner is invoked manually and isn't on the critical timing path, so log and skip instead of pulling the interval helpers in.

…ent demotion The original parser-level guardrail (commit 373299c) returned null for sub-60s values, causing extractScheduleConfig to silently fall through to the cron branch. The save endpoint then returned 200 while either demoting the schedule to cron mode (if a cron expression was set) or deleting the schedule entirely (if not). The dispatcher stopped miscadencing, but the API contract still lied to the caller. Promote the parser-level guardrail to a typed throw, propagate it through the service layer as a distinguishable result code, and have the workflow save routes return 400 SCHEDULE_INTERVAL_TOO_SMALL. - parseIntervalSeconds throws IntervalTooSmallError (carries raw + minimum) for sub-60s values. Absent / empty / non-numeric inputs still return null -- only out-of-range numeric values throw. - syncWorkflowSchedule catches the throw and returns { synced: false, error, code: "interval_too_small" }. Other sync failures (timezone, cron) keep their existing warn-and-continue behaviour; only this one is promoted to a hard 400. - PATCH /api/workflows/[workflowId] runs extractScheduleConfig BEFORE the DB update so a rejected sub-60s value never lands as persisted nodes paired with a missing or miscadenced schedule row. - Admin enable-workflow returns 400 on the same code for consistency, in case a pre-validation-era row reaches it. Tests: - 3 boundary tests in schedule-service.test.ts now assert IntervalTooSmallError is thrown instead of asserting cron fallback, plus a new test verifying error.raw and error.minimum are populated so callers can render structured error bodies. - New integration test tests/integration/workflow-schedule-validation-route.test.ts covers the PATCH route: 30s / 59s reject, no partial DB write on reject, 60s accept, manual trigger unaffected. - tests/integration/workflow-listing-route.test.ts mock switched to vi.importActual + override because the route now imports extractScheduleConfig alongside syncWorkflowSchedule. DB enforcement (a CHECK constraint on workflow_schedules.interval_seconds) is deliberately not in this commit -- it's a separate concern and a separate change.

The 60s floor is used by parseIntervalSeconds today and will be the source of truth for any future code that needs to express "the minimum interval" (UI validators, zod schemas at API boundaries, a future DB CHECK constraint). Keeping it as a private module const guarantees drift -- re-declared 60s here, hard-coded 60s in a validator there. Pure export change; no behaviour difference.

…andalone dispatcher The profiling runner skip in scripts/runtime/workflow_runtime_analysis/workflow-runner-profiled.ts uses console.warn for the same "this script only handles cron mode, skipping the interval-mode row" event. The standalone dispatcher was using console.log for the equivalent skip, which is inconsistent and buries the omission in info-level noise. Align both to warn.

The existing boundary tests (in tests/unit/schedule-service.test.ts) exercise the parser transitively via extractScheduleConfig. Their assertions describe the downstream function's behaviour ("falls back to cron when ...") rather than the parser's own contract, so they couple to caller semantics that may change. Add 19 direct tests pinning the parser's three observable outcomes: absent input (undefined / null / empty / NaN / non-numeric) -> null unusable input (zero / negative / Infinity / objects) -> null numeric < MIN_INTERVAL_SECONDS -> throws numeric >= MIN_INTERVAL_SECONDS -> floored int Includes the boundary cases (59, 60), the fractional edges (59.9 floors to 59 and throws; 60.7 floors to 60 and passes), and a populated-error assertion verifying raw and minimum are accessible on the thrown IntervalTooSmallError so callers can render structured 400 bodies.

…one dispatcher The filter `(s) => s.intervalSeconds === null` is intentionally strict rather than `!= null && >= 60`. A row with `interval_seconds = 0` falls into the "skip as interval-mode" bucket rather than the cron bucket -- which is the safe choice, because the cron path would parse the `0 0 1 1 *` sentinel and fire the schedule once a year. The case isn't reachable through the API today (parseIntervalSeconds returns null for zero) and would be impossible if a DB CHECK constraint enforcing `interval_seconds IS NULL OR >= 60` lands. But the script's filter is the only defense against a 0 value reaching this code path from a script, psql, or future migration that bypasses the service layer, so call it out explicitly rather than leaving it as an implicit consequence of the `=== null` choice. No behaviour change; comment-only.

CHECK (interval_seconds IS NULL OR interval_seconds >= 60) on workflow_schedules. The application-layer rejection added in 47a6c5f covers normal API traffic, but anything that bypasses the service layer (scripts, psql, future migrations, refactors that drop the throw-handling) can still write sub-60s values. The CHECK is the defense-in-depth layer that makes the floor impossible to violate at the storage level. Constraint name workflow_schedules_interval_seconds_min_60 matches the project's existing convention (table_column_kind pattern visible in workflow_schedules_workflow_id_workflows_id_fk etc.). Pre-flight (2026-05-21): - staging: SELECT COUNT(*) FROM workflow_schedules WHERE interval_seconds IS NOT NULL AND interval_seconds < 60 -> 0 rows - prod: same query -> 0 rows Zero existing violators means the migration lands clean in a single ALTER TABLE; no NOT VALID / VALIDATE two-step and no backfill needed. The table is small (single-digit thousands of rows), so the brief ACCESS EXCLUSIVE lock during the validation scan is functionally instant. The literal 60 in SQL mirrors MIN_INTERVAL_SECONDS in lib/cron-utils.ts -- SQL can't import from TS, so the two must be kept in sync if the floor ever changes. A comment on both sides flags this.

…t migration Pre-flight on 2026-05-22 against staging (39 rows) and prod (396 rows). Both well under the ~100k threshold where ADD CONSTRAINT ... CHECK's ACCESS EXCLUSIVE validation scan would meaningfully block concurrent traffic. Single-statement ALTER TABLE is correct at this size; the validation scan is microsecond-fast. Documented the verified row counts and the threshold at which a future maintainer should switch to the NOT VALID + VALIDATE pattern so the decision is reproducible. Comment-only; migration SQL is unchanged.

…riminator The previous shape had code as an optional field on the failure branch: | { synced: false; error: string; code?: "interval_too_small" }; That left call sites free to ignore code without TS complaint. If a future failure category lands (e.g. an "invalid_webhook_url" hard fail), existing call sites won't be forced to handle it -- they'll silently fall into the soft path. Replace with a required kind discriminator and split hard/soft into two distinct variants: | { synced: false; kind: "soft"; error: string } | { synced: false; kind: "hard"; code: "interval_too_small"; error: string } Hard failures MUST be surfaced as a 400; soft failures (timezone, cron) are warn-logged and the save still succeeds. Adding a new hard-kind code now requires updating every call site's narrowing, caught by TS exhaustiveness checks. Call sites updated: - app/api/admin/test/enable-workflow/route.staging.ts: switches the branch from `syncResult.code === "interval_too_small"` to `syncResult.kind === "hard"`. - app/api/workflows/[workflowId]/route.ts: handlePostUpdateSideEffects reads `syncResult.error` only when `!syncResult.synced`, unchanged. No behaviour change for working paths; new shape just enforces exhaustive handling at the type level.

The try/catch around extractScheduleConfig in syncWorkflowSchedule only handles IntervalTooSmallError -- any other thrown error rethrows. This contract was documented in code (the `throw error` line) but not pinned by a test. Three tests: 1. parseIntervalSeconds throws generic Error -> syncWorkflowSchedule rejects with the same Error. 2. parseIntervalSeconds throws a custom subclass that is NOT IntervalTooSmallError -> rejects, preserves subclass identity. 3. Sanity: parseIntervalSeconds throws IntervalTooSmallError -> syncWorkflowSchedule returns { synced: false, kind: "hard", code: "interval_too_small" } and does NOT rethrow. The third test guards against a refactor that deletes the catch block entirely (which would make tests 1 and 2 still pass for the wrong reason -- any error would rethrow because there's no catch). Implementation note: parseIntervalSeconds is the only thing inside extractScheduleConfig that throws today, so it's the natural injection point. Mocked via vi.importActual + override to keep IntervalTooSmallError real (instanceof checks in syncWorkflowSchedule still work).

Two new tests for the route's sync-result handling: 1. kind:hard (interval_too_small) -> 400 SCHEDULE_INTERVAL_TOO_SMALL with the error message echoed in `message`. Direct coverage of the branch added in 98959c9, which had no test until now. 2. kind:soft (invalid timezone) -> 200, schedule sync result is included in the response body. Pins the "warn-and-continue" behaviour so it can't silently flip to 400 in a future refactor of ScheduleSyncResult. The happy-path test (sync succeeds -> 200) was already present and unchanged. The mocked sync return values use the new discriminated shape from 33a3955 (kind: "hard" | "soft") to assert call-site behaviour rather than the looser code? field that existed before.

App-layer signup defenses ahead of retiring the emergency signup whitelist trigger from migration 0084. - BetterAuth captcha plugin (Cloudflare Turnstile) gated to /sign-up/email only. Production fails fast at module load if the secret key is missing; CI/test and admin-test-endpoint modes skip the plugin so existing Playwright signup tests keep working. - Per-IP signup rate limit (5/hour) added to rateLimit.customRules, declared before the existing /* bypass rule so first-match wins. Chains through rateLimitBypassRule so the X-Test-API-Key bypass continues to work for E2E. - AuthDialog mounts the Turnstile widget in the signup view when NEXT_PUBLIC_TURNSTILE_SITE_KEY is set, passes the token via signUp.email({ fetchOptions: { headers: { "x-captcha-response" } } }), resets the widget on view-leave and every error path so single-use tokens never get reused. - .env.example documents both env vars plus Cloudflare's published always-pass / always-fail test keys for local dev. - New tests/unit/signup-defenses.test.ts asserts plugin config shape, the four skip predicates, the production-throw, and rate-limit rule precedence + bypass behavior. Verification: - pnpm vitest tests/unit/signup-defenses.test.ts -> 8/8 pass - pnpm type-check -> 0 errors - pnpm check -> remaining errors are pre-existing in unrelated files OAuth (GitHub/Google) signup is intentionally not captcha-gated; the provider already authenticates the user.

next build evaluates route modules during "Collecting page data" with NODE_ENV=production but no runtime secrets injected. The hard assertion at module load was therefore crashing the Docker build even though it correctly fires at server boot. Skip the assertion when NEXT_PHASE === "phase-production-build". The same check still fires under phase-production-server (next start) and under any custom server that doesn't set NEXT_PHASE, so production runtime stays fail-fast. New unit test covers the build-phase skip.

Retires the block_user_signup_security_2026_05_21 trigger + function installed via migration 0084 during the 2026-05-21 incident. App-layer signup defenses (Cloudflare Turnstile gated on /sign-up/email plus per-IP signup rate limit) replace it. Identifiers in the DROP statements are explicit. The sibling block_executions_security_2026_05_21 trigger (migration 0082) and the cascade-deactivation triggers (migration 0085) are untouched. Verified locally against a fresh DB: After 0086: only block_executions_security_2026_05_21 remains. block_user_signup_security_2026_05_21 and its function are gone. Merge gates: - KEEP-621 (app-layer defenses) live in prod. - Detection layer live. - Staging soak shows acceptable Turnstile success rate.

DROP TRIGGER IF EXISTS matches by name. If the trigger was renamed ad-hoc on prod (the same drift pattern TECH-11 is reconciling), the drop silently no-ops and the signup gate stays live with nobody noticing. The pg NOTICE forces the mismatch into deploy logs so on-call sees it during the rollout. Doesn't block the migration - this is observability, not a gate.

Catches future edits that would widen the DROP target, remove the drift NOTICE, or accidentally touch the sibling triggers that must stay in place (block_executions_security_2026_05_21, block_database_integrations_security_2026_05_21 - though 0083 already retired the latter, the assertion is defense against re-introduction). Strips SQL line-comments before checking identifier names so the migration's prose can legitimately reference sibling triggers when explaining why they're left alone.

Migration 0085 adds a single cascade_user_deactivation trigger, not a family of "cascade-deactivation triggers". Comment now names the trigger explicitly so a future reader grepping for it lands in the right place. Test also gains a structural assertion that the cascade trigger isn't named in any DDL statement - same defense as for the executions kill-switch.

Defends the OAuth + email-OTP re-entry vector. New sessions whose login-time country (CF-IPCountry attested) differs from the user's recent session history are flagged in sessions.risk_flags_json; if the user has TOTP enrolled the session is minted with requires_mfa=true and getDualAuthContext returns a mfa_required code on every subsequent request until the user passes the step-up at /verify-mfa. Server: - twoFactor() Better Auth plugin wired with TOTP + backup codes. Email-OTP-as-second-factor is intentionally left without a sender since email is already our primary login factor. - assessLoginRisk reads CF-Connecting-IP / CF-IPCountry and compares against the user's last 25 session attestations. New-country logins flag the session; null geo (local dev / direct origin) is treated as inconclusive. - session.create.before computes the signal and merges requires_mfa + risk_flags_json into the session insert. - Custom enrollment endpoints bypass the plugin's password gate so OAuth / email-OTP users can enroll: /api/user/totp/setup, /enroll (atomic verify + backup-code mint, forwards rotated session cookie), /backup-codes (TOTP-gated regenerate), /verify-stepup, /disable, /status. Anonymous users (name "Anonymous" / temp- email) are refused. - two_factor.user_id UNIQUE + INSERT ... ON CONFLICT DO UPDATE keeps enrollment idempotent under concurrent /setup calls. Client: - TwoFactorSection in Settings -> Security shows compact status row. - TotpSetupDialog: linear three-step wizard (scan -> verify -> save codes) with horizontal numbered StepIndicator matching the deploy- safe wizard. The save-codes step locks every close path (X, ESC, outside-click, beforeunload) until copy + download + retype-confirm pass. - TotpManageDialog: name, enrolled date, regenerate codes (TOTP- gated), disable (TOTP-gated). - /verify-mfa page redirects out unless the current session has requires_mfa=true; on success clears the flag and stamps mfa_verified_at. Schema (migration 0086): - two_factor (id, user_id UNIQUE, secret, backup_codes nullable, name, enrolled_at) with secrets and codes encrypted by BETTER_AUTH_SECRET. - users.two_factor_enabled flag. - sessions.requires_mfa / mfa_verified_at / risk_flags_json. Tests: tests/unit/login-risk.test.ts covers no-CF-headers, first-attestation, known-country, new-country, malformed JSON, and current-country exclusion paths.

Three layered checks on the two highest-risk wallet actions: - role = "owner" in the active org (admins are no longer accepted) - users.two_factor_enabled = true (action refused if not enrolled) - sessions.requires_mfa = false (step-up cleared on this session) All three combine in lib/middleware/owner-mfa-guard.ts and return distinct error codes (not_owner / mfa_not_enrolled / mfa_pending) so the client can branch UX without parsing strings. Routes updated: - /api/user/wallet/withdraw - /api/user/wallet/export-key/request - /api/user/wallet/export-key/verify (mirrors the gate on /request so an attacker who somehow obtains the wallet-inbox OTP can't skip straight to /verify) The wallet-creator and wallet-inbox-email-OTP checks on the export flow remain in place; this PR adds the owner+MFA layer on top.

Adds two helpers alongside the existing requireOwnerWithMfa: - requireAdminOrOwnerWithMfa: same shape, accepts either role. - requireMfaEnrolled: self-scoped (no org role check), gates only on users.two_factor_enabled and sessions.requires_mfa. Switches all three helpers to a discriminated GuardOk | GuardError return so callers can narrow with `if (!guard.ok)` instead of an `"error" in guard` shape that didn't survive TypeScript narrowing on the success branch. Updates the three existing callsites (withdraw, export-key request, export-key verify) to the new narrowing. The actions they protect are unchanged from the previous commit — owner-only + MFA enrolled + step-up cleared — only the narrowing syntax changes.

Protects: - POST /api/keys (mint a new organization API key) - DELETE /api/keys/[keyId] (revoke / soft-delete an organization API key) Both routes now require role = admin or owner in the active org, users.two_factor_enabled = true, and sessions.requires_mfa = false before the action is honored. Returns 403 with code one of not_admin_or_owner / mfa_not_enrolled / mfa_pending. Why both legs symmetrically: - Minting an org API key produces a long-lived credential that bypasses any future session MFA. - Revoking is the inverse; without the same gate an attacker on a session could rotate keys (revoke + mint) to lock owners out and substitute their own.

Protects: - POST /api/api-keys (mint a new user-scoped API key) - DELETE /api/api-keys/[keyId] (delete a user-scoped API key) User-scoped keys (wfb_ prefix) live on the apiKeys table keyed by userId, not organizationId, so the owner-role gate does not apply. Both legs now require users.two_factor_enabled = true and sessions.requires_mfa = false via requireMfaEnrolled. Returns 403 with code mfa_not_enrolled or mfa_pending. User-scoped keys are the highest-leverage MFA-immune credential a stolen session can produce — once minted, the key authenticates out-of-band of the session entirely. Gating creation is critical; gating deletion keeps the create/delete pair symmetric so a compromised session cannot rotate the user's keys.

Protects: - POST /api/agentic-wallet/[id]/approve (authorize a pending fund-moving operation) - POST /api/agentic-wallet/[id]/reject (cancel a pending operation) Existing checks remain in place: the caller must be the wallet's linkedUserId (proves wallet ownership). On top of that, both legs now require users.two_factor_enabled = true and sessions.requires_mfa = false via requireMfaEnrolled. Approve is the obvious target — it commits a fund movement. Reject gets the same gate because a stolen session could otherwise DoS legitimate requests by pre-emptively cancelling them, and keeping the pair on identical authorization rules prevents skew if either endpoint changes later. Out of scope here: /api/agentic-wallet/link. That route already verifies HMAC proof-of-possession of the wallet's provisioning secret in addition to the session; HMAC is a stronger factor than TOTP for that flow.

Adds lib/client/handle-guard-error.ts. Reads a 403 response from any of the routes gated by lib/middleware/owner-mfa-guard.ts, parses the discriminated GuardError body, and dispatches by `code`: - mfa_not_enrolled -> calls onEnrollMfa (callsite opens Settings) - mfa_pending -> calls onPendingMfa(currentPath) (callsite navigates to /verify-mfa?next=<currentPath>) - not_owner -> toast with role explanation - not_admin_or_owner -> same Returns true when handled so the caller stops without surfacing a generic error toast on top. Returns false for non-403 responses and for 403s without a recognized code, leaving the caller's existing error path intact. Callsites are wired in subsequent commits.

Protected route this UI calls: - POST /api/user/wallet/withdraw Before: a 403 from the owner+MFA gate surfaced as a raw "Withdrawal failed" toast carrying the server's error string. The user had no path forward. Now: routes the response through handleGuardError. If the server returned a recognized GuardError code, the helper handles it and the modal returns to the input state instead of dropping into the red "error" state. Specifically: - mfa_not_enrolled -> opens Settings overlay so the user can enable TOTP without leaving the wallet flow - mfa_pending -> redirects to /verify-mfa?next=<current path> - not_owner -> toast explaining only owners can withdraw Non-guard 403s and other failures still drop through to the existing error branch.

Protected routes this UI calls: - POST /api/user/wallet/export-key/request (send wallet-inbox OTP) - POST /api/user/wallet/export-key/verify (consume OTP + return key) Both legs are gated by requireOwnerWithMfa on the server, so either can return a 403 with code not_owner / mfa_not_enrolled / mfa_pending. Routes both fetch calls through handleGuardError with a shared options object: - mfa_not_enrolled -> closes the export dialog, opens Settings so the user can enable TOTP - mfa_pending -> closes the dialog, redirects to /verify-mfa with the current URL as `next` - not_owner -> toast explaining only owners can export On the /request leg the dialog falls back to "idle". On /verify it falls back to "otp" so the user can retry with a refreshed code if the gate failure was about step-up (which they can clear in another tab) rather than enrollment.

Protected routes this UI calls: - POST /api/keys (create org API key) - DELETE /api/keys/[keyId] (revoke org API key) - POST /api/api-keys (create user API key) - DELETE /api/api-keys/[keyId] (delete user API key) The same overlay handles both tabs (Webhook = user-scoped, Organisation = org-scoped). Each underlying endpoint is gated by requireAdminOrOwnerWithMfa or requireMfaEnrolled on the server. Routes both create and delete through handleGuardError. On a recognized 403: - mfa_not_enrolled -> closes the overlay stack and opens Settings so the user can enroll TOTP - mfa_pending -> closes the stack and redirects to /verify-mfa with the current URL as `next` - not_admin_or_owner -> toast explaining the role requirement Non-guard errors keep their existing toast paths intact.

Replaces the wallet-inbox email-OTP factor on the export-key flow with a TOTP fresh challenge using the same authenticator the user enrolled for KEEP-619. The owner+MFA gate (requireOwnerWithMfa) stays in place; this is purely swapping the second factor. Server: - DELETED /api/user/wallet/export-key/request No more email send, no more keyExportCodes write. - REWROTE /api/user/wallet/export-key/verify Accepts { code: <TOTP> }. Verifies via auth.api.verifyTOTP (the same primitive used by /verify-mfa and the manage dialog). Wallet-creator check + Turnkey export unchanged. UI: components/overlays/wallet/export-private-key-button.tsx - Removed the "requesting" step that fired the email. - Dialog now opens directly to the TOTP code input. - Step states reduced from 5 to 4 (idle/totp/verifying/done). Why drop the wallet-inbox factor: - Single source of truth for second factors. Users were trained on TOTP via the KEEP-619 enrollment UI; the wallet-inbox flow was a separate mental model with its own UX. - The wallet-inbox email was often the user's own account address anyway, so the factor was effectively single-channel (compromise the user's mailbox -> compromise both). - TOTP secret is encrypted by BETTER_AUTH_SECRET and bound to a physical device; email OTPs traverse SMTP and live in an inbox the attacker can already read after most compromises. The owner+MFA passive gate (role=owner, MFA enrolled, step-up cleared) plus this fresh TOTP challenge gives the export flow four checks in total: 1) authenticated session, 2) owner role, 3) MFA enrolled on the account, 4) fresh code from authenticator. The integration test suite that exercised the email-OTP path (tests/integration/wallet-export-key-route.test.ts) is removed; the new endpoint is a thin wrapper around auth.api.verifyTOTP which is itself covered upstream.

Adds /api/auth/strict-signin/{start,complete} so credential sign-in for TOTP-enrolled users verifies password, email OTP, and TOTP atomically before any session row is minted. Blocks Better Auth's /sign-in/email-otp for TOTP users so the session-on-OTP path can't be used as a TOTP bypass. Dialog now drives the three-step flow (password -> email OTP -> TOTP) and hard-reloads on success so the session cookie is picked up. Alert dialog rewritten as a thin Dialog wrapper to avoid the Turbopack issue with the Radix umbrella export. user-menu no longer unmounts AuthDialog mid-flow during session refetch.

… screen /strict-signin/start now checks users.two_factor_enabled. If false it signs the user in directly via auth.api.signInEmail and forwards the session cookies. The dialog reads signedIn=true and hard-reloads, skipping the email-OTP + TOTP steps. Users with TOTP enrolled continue through the existing dual-factor flow. Forced-enrollment surface no longer redirects on onEnrolled (that fires right after verify, before the save-codes phase). The dialog stays open through the save-codes phase; on close, the page swaps to a success card with a 10s auto-redirect and a Continue button so the redirect is acknowledged rather than abrupt.

Extracts the dual-factor (authenticator + email OTP) prompt into a single component and the response-code handling into a hook, replacing six inlined copies across withdraw, export private key, api keys (create + revoke), settings (email change), change password, and deactivate account. The hook owns totpCode / emailOtp / awaitingEmailOtp state plus a handleResponse that translates the server's factors_required / mfa_code_invalid / email_code_invalid codes into the correct local state transitions. Consumers pass an onError sink (toast or local setError) so call sites stay terse. Also strips the diagnostic console.logs that were added during the strict-signin wiring; logSystemError calls for real failures are kept.

…itive-actions feat(auth): owner+MFA gate on withdraw and key export

# Conflicts: # components/auth/dialog.tsx # drizzle/meta/_journal.json

…factor Replaces the string-equality compare of the decrypted email OTP with crypto.timingSafeEqual in both the strict-signin endpoint and the requireDualFactor helper. The 6-digit alphabet made the timing channel narrow, but the canonical primitive is constant-time. Adds a sliding-window counter (10 attempts per 15 minutes, per (user, action)) in front of requireDualFactor. It covers both empty-body floods that mint email OTPs at the victim's inbox and brute-force guesses against a live verifications row. The counter is wiped on a successful both-factor verify so a legitimately confused user does not get locked out after a few typos. In-memory per pod, matching lib/mcp/rate-limit.ts; effective limit is LIMIT * num_replicas. Migrate to Redis when replica count starts to matter.

CodeQL js/request-forgery (critical) on the strict-signin/start endpoint: the OTP-send call used new URL(request.url).origin to build the target, which reflects the request's Host header. An attacker setting Host: evil.com would have caused the server to POST the sign-in OTP request to the attacker's domain. Replaces the self-fetch with a direct auth.api.sendVerificationOTP call. Better Auth's plugin pipeline runs in-process so the encrypted sign-in-otp-<email> verifications row is still seeded the same way the HTTP endpoint would, with no host-controlled URL construction in the loop.

…DualFactorInput Disabling TOTP from the manage dialog now runs through requireDualFactor with action totp_disable, matching the gate on withdraw / export-key / api-keys / password-change. A stolen session plus one factor can no longer turn off MFA. The manage dialog Disable view swaps the single authenticator-code input for the shared DualFactorInput so the user sees both factors required. DualFactorInput stops hiding the email field. Both inputs render from the start; the email input is disabled with a "we will email a code after you submit" hint until the server emits factors_required, at which point it becomes editable and the hint switches to "enter the code we just emailed". Surfacing both factors up front matches the server-side contract and stops the "wait, where is the email field?" confusion that the lazy-reveal pattern caused.

Adds a shared two-step DualFactorSteps wizard (email then authenticator) mirroring TotpSetupDialog's numbered indicator pattern, and threads it through every dual-factor surface: withdraw, export-key, api-keys (create + revoke), settings deactivate, change-password, totp-manage disable, and the new /verify-mfa step-up dialog. Each consumer owns a prereq phase (label, DEACTIVATE confirmation, etc.) and hands off to the wizard for the code steps. The wizard pre-fetches the email-OTP silently on mount so the inbox lands the code before the user types anything; a Resend control bypasses the one-shot guard. Fixes a class of bugs around the requires_mfa flag. Without declaring session.additionalFields in betterAuth({...}), Better Auth was silently dropping the requiresMfa / mfaVerifiedAt / riskFlagsJson fields from session inserts so the proxy step-up gate never fired, meaning TOTP-enrolled OAuth users sailed past /verify-mfa entirely. Declaring the fields restores the contract. verify-stepup now runs requireDualFactor with action session_stepup so the step-up demands both an authenticator code and an emailed code, symmetric with the credential strict-signin atomic chain. OAuth-pre-MFA sessions get a 10-min TTL so a stolen cookie expires before the legitimate user finishes step-up. Follow-up will refactor OAuth callback to defer session creation until both codes verify. Other fixes: - handleSignIn: missing setLoading(false) on the 401 branch was leaving the form permanently busy on a wrong-password attempt - strict-signin/route.ts: email-OTP row is now consumed AFTER signInEmail+verifyTOTP succeed instead of before, so a transient Better Auth flake does not lock the user out of retries - totp-disable runs through requireDualFactor + a step wizard so disabling MFA can no longer be done with a stolen session and a single TOTP code

When a TOTP-enrolled user finishes the provider OAuth dance, the callback wrapper deletes the session row Better Auth just minted, clears the session cookie, and issues a signed pending_oauth_mfa cookie instead. The cookie carries only userId/email/redirect under an HMAC; on its own it has no auth power. /api/auth/oauth-mfa-finalize verifies it alongside both codes via requireDualFactor (action oauth_signin) and only then writes a fresh session row with the hashed token plus sets the real session cookie. /verify-mfa detects the cookie and routes the wizard to the finalize endpoint instead of the existing-session step-up path. The shared layout now drops the navigation sidebar + workflow toolbar on /verify-mfa and /enroll-mfa so no auth-aware UI renders on those routes, regardless of whether a session row exists. Matches the security posture of the strict-signin atomic chain: no usable session exists for the user-agent until both factors verify.

Better Auth's verifyTOTP flips users.two_factor_enabled = true AND rotates the session. The new (rotated) session row then runs through session.create.before, which now reads two_factor_enabled = true and sets requires_mfa = true. Result: the user enrolled, but the very next request bounces to /verify-mfa, even though they just proved TOTP in the same enroll call. Fix: after the verifyTOTP success and backup-codes persist, update every session belonging to this user to requires_mfa = false and mfa_verified_at = now. The freshest possible TOTP proof is the one we just verified inside this request, so the step-up gate has been satisfied for those sessions.

…n-step-up-mfa feat(auth): risk-based step-up MFA with TOTP

Credential and OAuth signups previously minted a session immediately on email verification and relied on the proxy gate to bounce the user to /enroll-mfa for forced enrollment. Between the verify and the enroll step a usable session existed for an account that had no second factor. This branch closes that window by deferring the session until TOTP is enrolled. The flow is now symmetric with the strict-signin atomic chain: no usable session for the user-agent until both verification steps complete. - Credential signup: after emailOtp.verifyEmail succeeds, the dialog POSTs /api/auth/finish-credential-signup instead of signIn.email. That endpoint re-verifies the password, confirms users.email_verified, and issues a signed pending_signup_mfa cookie. No session row is created. - OAuth signup: interceptOauthCallback in /api/auth/[...all] branches on users.two_factor_enabled. Users who never enrolled take the new pending-signup path: the session Better Auth just minted is deleted, the session cookie is cleared, the pending_signup_mfa cookie is set, and the browser is redirected to /enroll-mfa. Already-enrolled users keep the existing pending_oauth_mfa flow that defers to /verify-mfa. - /enroll-mfa detects which auth shape is on the request (session vs pending_signup_mfa) and renders the wizard in the matching mode. The setup/enroll endpoints accept both auth shapes via a new resolveEnrollMfaCaller helper. - /api/user/totp/enroll on the pending-signup path verifies the TOTP code directly against the encrypted secret in two_factor (no session to chain through verifyTOTP with), flips users.two_factor_enabled, persists backup codes, and mints the first session for the account in one transaction. The session cookie returned is the FIRST usable session for the user. pending_signup_mfa is HMAC-signed under BETTER_AUTH_SECRET with a 30-minute TTL. It carries only userId / email / provider / redirect. The cookie alone has no auth power: it only unlocks /enroll-mfa and the setup/enroll endpoints, both of which still require generating + verifying a TOTP code via the user's own authenticator app.

… enroll Better Auth signs the better-auth.session_token cookie via hono's setSignedCookie: HMAC-SHA256(rawToken, BETTER_AUTH_SECRET) base64, joined to the raw token with a literal "." separator. getSession verifies that signature and rejects any value that does not match. The two routes that mint sessions manually (oauth-mfa-finalize and the pending-signup branch of totp/enroll) were writing unsigned raw tokens, so the cookie looked valid client-side but was treated as absent on every subsequent request and the user landed anonymous despite the sessions row existing. New helper signSessionCookieValue in lib/auth-session-token-hash.ts produces the same format hono emits; both routes use it now. Also the enrollment success card's Continue button used to race the browser's cookie commit on a fast click: window.location.assign fired before the cookie was sent on the next request. The button now hits /api/auth/get-session first to confirm the session is live; if not yet visible, a 500ms grace plus the SSR-on-target fallback catches the race.

Two follow-ups on the no-session-until-MFA path inside /api/user/totp/enroll: 1. Rate limit the TOTP verify. The pending-signup branch is reached while the caller is unauthenticated by session, so Better Auth's per-route plugin rate limiter does not protect it. Without a counter, an attacker holding a stolen pending_signup_mfa cookie could brute the 6-digit code over the full 30-minute cookie TTL. Use the same sliding-window primitive that gates requireDualFactor, keyed on (userId, "totp_enroll"). The counter is wiped on a successful verify so a typo burst does not lock out a legitimate user. 2. Make the four writes that finalize enrollment a single transaction (backup-codes update + users.two_factor_enabled flip + sessions insert). A mid-flight failure previously could leave the account with two_factor_enabled = true but no session row, while the pending cookie was about to be cleared, bricking the user.

…-email-verify fix(auth): no session until MFA is enrolled on new signups

suisuss and others added 30 commits May 22, 2026 13:26

joelorzet added 12 commits May 25, 2026 14:40

Merge pull request #1356 from KeeperHub/feat/KEEP-619-owner-only-sens…

fae95e6

…itive-actions feat(auth): owner+MFA gate on withdraw and key export

Merge branch 'staging' into feat/KEEP-619-risk-based-login-step-up-mfa

ab21331

# Conflicts: # components/auth/dialog.tsx # drizzle/meta/_journal.json

Merge pull request #1355 from KeeperHub/feat/KEEP-619-risk-based-logi…

f0222e5

…n-step-up-mfa feat(auth): risk-based step-up MFA with TOTP

joelorzet temporarily deployed to staging May 25, 2026 21:34 — with GitHub Actions Inactive

joelorzet temporarily deployed to staging May 25, 2026 21:47 — with GitHub Actions Inactive

joelorzet added 4 commits May 25, 2026 18:59

Merge pull request #1366 from KeeperHub/fix/signup-requires-mfa-after…

dbbf26e

…-email-verify fix(auth): no session until MFA is enrolled on new signups

joelorzet temporarily deployed to staging May 25, 2026 22:36 — with GitHub Actions Inactive

joelorzet temporarily deployed to staging May 25, 2026 22:49 — with GitHub Actions Inactive

eskp merged commit c658d73 into prod May 26, 2026
33 checks passed

eskp mentioned this pull request May 26, 2026

fix(auth): suppress signup notifications for non-fresh users #1372

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: To Prod#1359

release: To Prod#1359
eskp merged 87 commits into
prodfrom
staging

suisuss commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

suisuss commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants