Skip to content

fix(auth): suppress signup notifications for non-fresh users#1372

Merged
eskp merged 2 commits into
stagingfrom
feature/fix-spurious-discord-webhook
May 26, 2026
Merged

fix(auth): suppress signup notifications for non-fresh users#1372
eskp merged 2 commits into
stagingfrom
feature/fix-spurious-discord-webhook

Conversation

@eskp

@eskp eskp commented May 26, 2026

Copy link
Copy Markdown

Summary

  • Adds a freshness guard (lib/auth-notification-guard.ts) that gates notifyDiscordSignup and subscribeToMailerLite on user.createdAt being within the last hour.
  • Wraps the two call sites in lib/auth.ts (databaseHooks.user.create.after and emailVerification.afterEmailVerification) with the guard.
  • Adds 9 unit tests anchored on the exact regression timestamp.

What was firing the false-positive ping

Last night the signup Discord channel pinged for jerry george / jerrygithub360@gmail.com / OAuth even though that user has existed since 2026-04-27. Loki + the prod DB nailed the flow end-to-end (timestamps in UTC):

2026-05-25 14:12:13.432  Better Auth: "Sign-up attempt for existing email: <user-email>"
2026-05-25 14:12:13.578  [Auth] Sending OTP to <user-email> for email-verification
2026-05-25 14:12:13.807  [Email] OTP sent
2026-05-25 14:12:44.450  [Auth] afterEmailVerification fired  <-- Discord ping posts here
2026-05-25 14:12:45.628  Better Auth ERROR: Credential account not found
2026-05-25 14:12:51.836  Better Auth ERROR: Credential account not found
2026-05-25 14:12:56.027  Better Auth ERROR: Credential account not found

The user row's created_at = 2026-04-27 21:02:58.926Z, updated_at bumped to 2026-05-25 14:12:44.445Z (the verification touched updated_at but not created_at). Only one accounts row exists, provider_id = github, with created_at 4 milliseconds after the user row's — confirming the original signup was OAuth-only, not email/password.

So the actor: posted /sign-up/email with an existing email, received an OTP in that mailbox, entered it. Better Auth fired afterEmailVerification against the existing user row (better-auth's updateUser doesn't reset createdAt), and our notification helper happily pinged the signup channel because it had no concept of "is this user actually new". Three follow-up password-sign-in attempts failed because there is no credential accounts row.

Why the existing guards didn't catch it

  • databaseHooks.user.create.after is gated on emailVerified, which is the right gate for an OAuth signup but doesn't help here because that hook didn't fire at all (no new user row was created).
  • afterEmailVerification had no guard at all. The original commit body that introduced this notification path (2026-03-23) assumed afterEmailVerification only fires on first-signup verification. That's true in normal use, but Better Auth's hook is event-shaped ("an email got verified") not state-shaped ("a user just signed up"). When /sign-up/email is hit with an existing email, the synthetic-anti-enumeration response combined with emailOTP.sendVerificationOnSignUp: true causes an OTP to flow to the real inbox; verifying it then fires afterEmailVerification against the existing row.
  • KEEP-608 deactivation guard correctly refused to mint a session (verified: jerry has 0 sessions, no post-deactivation activity). That layer worked. The signup notification fires earlier in the flow, before session creation, so it didn't benefit from the guard.
  • KEEP-621 Turnstile on /sign-up/email only shipped to prod 10 hours after this incident (PR release: To Prod #1359 merged 2026-05-26 00:08 UTC; the false ping fired 2026-05-25 14:12 UTC). Even with Turnstile, an actor who clears the CAPTCHA can still trigger the same chain.

Loki sweep: campaign or one-off?

48-hour search:

  • Sign-up attempt for existing email: 1 event total — this incident only.
  • Credential account not found: 11 events across 5 emails — 4 of 5 are staff accounts with 1-5 fumbled attempts each (normal "wrong password" volume). Not a campaign signature.

Fix

// lib/auth-notification-guard.ts
export function isFreshSignup(user: { createdAt?: Date | string | null }): boolean {
  if (!user.createdAt) return false;
  const createdAtMs = (user.createdAt instanceof Date
    ? user.createdAt
    : new Date(user.createdAt)).getTime();
  if (Number.isNaN(createdAtMs)) return false;
  return Date.now() - createdAtMs < 60 * 60 * 1000;
}

The 1-hour window is generous enough for slow OTP / verification-link flows (OTP expiry is 5 min but resends and email-link clicks can stretch the window) while suppressing any "re-event" path that surfaces a user row older than an hour. The same guard wraps both notification call sites in lib/auth.ts.

This fix narrows the predicate to the state-shape the original commit intended (a user just signed up) instead of trusting the event-shape that Better Auth actually delivers (an email was verified). It does not depend on Better Auth's hook semantics being correct.

Test plan

  • pnpm type-check clean (exit 0)
  • pnpm check 0 diagnostics on changed files; repo-wide error count drops by 1 (the stale console.log("[Auth] afterEmailVerification fired", ...) is removed)
  • npx vitest run tests/unit/auth-notification-guard.test.ts -> 9/9 pass
  • Full npx vitest run: 6200 pass, 14 fail — all 14 are pre-existing E2E-Vitest failures requiring local Postgres :5433 / LocalStack SQS; none touch lib/auth*
  • One regression test asserts the exact 2026-04-27 -> 2026-05-25 14:12 case from prod

Out of scope, filed as separate Linear ticket (Joel / High / Security Incident project)

The deeper "/sign-up/email leaks OTP to existing-email inbox -> existing user can be re-verified by mailbox-holder" footgun. Three potential mitigations: set sendVerificationOnSignUp: false and request OTPs explicitly only for confirmed-new users; wire emailAndPassword.onExistingUserSignUp to send an account-recovery email instead; or open a Better Auth upstream issue to make /email-otp/verify-email mirror /verify-email's "skip-if-already-verified" guard.

The signup Discord webhook and MailerLite subscribe fire from two hooks
in lib/auth.ts: databaseHooks.user.create.after (gated on emailVerified)
and emailVerification.afterEmailVerification (no gate). Both can deliver
a user object whose row is not actually freshly created.

The trigger observed in prod on 2026-05-25 14:12 UTC: a POST /sign-up/email
landed for an email (jerrygithub360@gmail.com) that already had an
OAuth-only account from 2026-04-27. Better Auth's email/password sign-up
flow sends an OTP for any email it is asked about; once verified, the
afterEmailVerification hook fires for the already-existing user and
notifyDiscordSignup posts a false-positive "New signup" alert to the
incident channel. Three subsequent password-sign-in attempts failed with
"Credential account not found" because the existing user is OAuth-only.
Loki confirmed the firing sequence end-to-end.

Add a small freshness helper (lib/auth-notification-guard.ts) and wrap
both call sites with it: notifications fire only when user.createdAt is
within the last hour. Real signups complete OTP / verification-link
flows well within that window; any "re-event" path that surfaces an
older user object is now silently filtered out of the signup channel.

The window is intentionally generous (1 hour, not 5 min) to cover slow
inbox-check verification flows without being so long that re-events
slip through.

This fix is independent of and additive to KEEP-621 (Cloudflare
Turnstile on /sign-up/email), which had not yet shipped to prod at the
time of the observed incident.

Tests:
- lib/auth-notification-guard.ts: new helper, single exported function
- tests/unit/auth-notification-guard.test.ts: 9 cases covering fresh
  user, slow-OTP user, expired user, the 2026-05-25 regression case,
  exact-window-boundary, ISO-string timestamp, missing field, null
  field, invalid date string
- pnpm type-check: clean
- pnpm check: 0 diagnostics on changed files; repo-wide error count
  decreases by 1 (the previously-present "[Auth] afterEmailVerification
  fired" console.log is removed)
Address code review feedback on PR #1372:

- Widen NEW_SIGNUP_NOTIFICATION_WINDOW_MS from 1h to 24h. The 1h ceiling
  collides with Better Auth's default 3600s verification-link TTL (a
  click at minute 59-60 with any network skew falls past the cutoff)
  and with the OTP-resend path where a user who started signup hours
  earlier and finally enters the OTP would be silently dropped. 24h
  comfortably absorbs both flows plus app-pod clock skew while still
  suppressing the re-verification-of-existing-user case the guard
  exists for. Export the constant so tests can reference it directly.

- Change the boundary check from `<` to `<=` so the exact window tick
  is treated as fresh; closes a one-millisecond hole at the upper edge.

- Tighten the parameter type from `Date | string | null` to
  `Date | null`. Better Auth's adapter always passes a Date object
  (`internalAdapter` -> Drizzle -> pg returns `Date` for `timestamp`
  columns), so the string branch was unreachable in current callers
  but exposed a latent timezone-naive-string parsing hazard if anyone
  ever called the helper with a naive ISO string (`new Date(str)` is
  locale-dependent without an offset). Dropping the string branch
  eliminates that surface and matches Better Auth's User type
  (`createdAt: Date`).

- Restore the multi-line ternary in lib/auth.ts:570-577 to the
  single-line form that ships on staging. This was a formatter
  rule that I had reverted in the previous commit thinking it was
  unrelated; it's the canonical Biome-preferred form.

- Strengthen the OAuth-path call-site comment: `user.create.after`
  only fires on actual inserts in current Better Auth (per
  `db/with-hooks.mjs:25-42`), so the freshness check there is pure
  belt-and-suspenders against a future adapter change. Real OAuth
  signups have `createdAt = now` and pass it trivially.

Tests:
- Expand from 9 to 11 cases. New: exact-window-boundary-passes (now
  inclusive), 1-hour-link-TTL with margin (Better Auth default flow),
  future-dated createdAt (NTP skew), epoch-zero. Pin the constant by
  importing NEW_SIGNUP_NOTIFICATION_WINDOW_MS so future changes to the
  window are reflected uniformly. Keep the 28-day production
  regression anchored on the actual 2026-04-27 incident timestamp.

- pnpm type-check: clean
- pnpm check (scoped to changed files): 0 diagnostics, no fixes applied
- npx vitest run tests/unit/auth-notification-guard.test.ts: 11/11 pass
@eskp

eskp commented May 26, 2026

Copy link
Copy Markdown
Author

Code review feedback addressed in 669d102e. Resolution summary:

BLOCKER B-1 (window too tight) — addressed.

  • NEW_SIGNUP_NOTIFICATION_WINDOW_MS widened 1h -> 24h. Covers Better Auth's default 3600s verification-link TTL plus margin, the OTP-resend-later path, and app-pod clock skew. Still suppresses the actual regression (~28-day-old user re-verifying).
  • Boundary changed from < to <= so the exact window tick is fresh.

WARN W-1 (change-email path also suppressed) — intentional, now noted in comments.
Better Auth's afterEmailVerification fires from change-email flows too. Suppressing those notifications on the "New signup" channel is the desired behavior; the embed title would have been misleading for an email change. Worth a separate channel/embed for email-change events if anyone wants visibility there.

WARN W-2, W-4 (type-shape hazards) — addressed.
Parameter type tightened from Date | string | null to Date | null. Matches Better Auth's User type (createdAt: Date) and eliminates the unreachable-but-latent timezone-naive-string parsing path. String-input test removed.

WARN W-3 (boundary semantics didn't match comment) — addressed by <= change + comment refresh.

WARN W-5 (OAuth-path guard is defense-in-depth) — kept the guard, updated the comment to call it out explicitly. Cost is one boolean predicate; benefit is robustness if Better Auth's hook semantics ever drift.

WARN W-6 (test should pin the constant) — addressed. NEW_SIGNUP_NOTIFICATION_WINDOW_MS is exported and tests now express bounds as WINDOW - 1 / WINDOW / WINDOW + 1.

WARN W-7 (test coverage gaps) — most addressed. New cases: future-dated createdAt (NTP skew), epoch zero, exact-boundary inclusive, 1-hour-link-TTL with margin. ISO-string and Unix-millis cases dropped along with the type tightening (no longer reachable).

WARN W-8 (clock skew tolerance was zero) — absorbed by the 24h window.

INFO I-1, I-2, I-3, I-4 — comment + test docstring tweaks; literal-date kept in the regression test because the value is the actual prod-incident timestamp (2026-04-27 21:02:58.926Z), which is the anchor we care about. Arithmetic comment added alongside the literal so readers see the 28-day delta.

Adjacent finding filed as KEEP-634 (assigned Joel, High, SEV-1 project): the underlying Better Auth /sign-up/email + emailOTP flow that allows a mailbox-holder to re-verify an existing account. This PR only fixes the notification noise; KEEP-634 covers the flow defect.

Verification:

  • pnpm type-check: clean
  • npx @biomejs/biome check on changed files: 0 diagnostics
  • npx vitest run tests/unit/auth-notification-guard.test.ts: 11/11 pass

Ready for re-review.

@eskp eskp merged commit 7b5c6c6 into staging May 26, 2026
41 checks passed
@eskp eskp deleted the feature/fix-spurious-discord-webhook branch May 26, 2026 04:00
@github-actions

Copy link
Copy Markdown

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • Namespace: pr-1372
  • All Helm releases (Keeperhub, Scheduler, Event services)
  • PostgreSQL Database (including data)
  • LocalStack, Redis
  • All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@github-actions

Copy link
Copy Markdown

ℹ️ No PR Environment to Clean Up

No PR environment was found for this PR. This is expected if:

  • The PR never had the deploy-pr-environment label
  • The environment was already cleaned up
  • The deployment never completed successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant