Skip to content

test: fix e2e suite setup blockers#1584

Merged
suisuss merged 22 commits into
stagingfrom
feat/KEEP-815-point-mythos-e2e
Jun 20, 2026
Merged

test: fix e2e suite setup blockers#1584
suisuss merged 22 commits into
stagingfrom
feat/KEEP-815-point-mythos-e2e

Conversation

@suisuss

@suisuss suisuss commented Jun 18, 2026

Copy link
Copy Markdown

What

Gets the e2e Playwright suite running green end to end. It was red across ~two dozen tests from a mix of setup blockers, test-isolation bugs, fixture/schema drift, feature-gating gaps, and flaky timing -- plus one real app-behavior gap in the auth flow. Also documents the e2e testing approach (testability signals + the agent-driven test loop).

Setup blockers

  • Org-scoped integrations cleanup (cleanup.ts, seed.ts): integrations is org-scoped (created_by / organization_id); cleanup still deleted by the removed user_id column, which aborted Playwright global-setup. Switched to created_by.
  • Append-only audit trigger in teardown (cleanup.ts, seed.ts, forgot-password-reset-revocation.test.ts): deleting a test user drives the security_audit_log FK to null actor_user_id, which the append-only trigger rejects. Teardown runs as the postgres superuser, so it now sets session_replication_role = 'replica' on the cleanup connection.
  • Runner log assertion (full-pipeline.test.ts): updated the expected substring to the current runner output.
  • MFA gate bypass for e2e (playwright.config.ts): sends the x-load-test-mfa-bypass header (gated on LOAD_TEST_BYPASS_TOKEN) so credentialed e2e traffic clears the mandatory-MFA enrollment gate in proxy.ts.

Test isolation and determinism

  • Shared-user org pollution (organization-wallet.test.ts, tests/utils/db.ts): the Organization-Creation tests created orgs as the shared persistent test user, which flipped that user's active org (via the afterCreateOrganization hook) and 404'd every later org-scoped test (workflow save, enable/disable toggle, marketplace listing). Reset the user's active org after that block, and make createTestWorkflow resolve the owning org deterministically (owner, oldest) so the helper and the session agree.

Schema / fixture drift

  • credential-bundle (credential-bundle.test.ts): the manual integration INSERT used user_id (column is created_by) and omitted visibility; execution authorizes integrations as the org principal, which only accepts visibility = 'organization'. Fixed both. The assertions also need a workflow execution to complete, which only advances under the production runtime, so the suite is gated on NEXT_BUILD_MODE=production (matching back-forward-hydration).

Feature-gated fixtures

  • Pro-plan test org (utils/seed.ts, auth.setup.ts, utils/db.ts): Send Webhook (scheduled-workflow) and HTTP Request import (workflow-io-modal) are Pro-gated actions. Added a seeded pro-plan user/org and point those suites at it via test.use({ storageState }), leaving the default e2e-test-org free for the billing tests.

Flaky timing

  • Shell-flicker assertions (marketplace-tab.test.ts, tabbed-hub-shell.test.ts): the "no skeleton on the surrounding shell during tab swap" check used a fixed 150ms wait that raced a transient content skeleton mounting before its [role="tabpanel"] wrapper. Replaced with a poll-to-settle.
  • web3-balance (utils/workflow.ts): the editor could enter build mode before auth hydrated, leaving the Save button gated ("Sign in to save"). createWorkflow now waits for auth-ready first.
  • back-forward-hydration /hub: settleClientDataFetches keyed off a Loading... sentinel /hub doesn't render, so the button count was sampled mid-hydration. Replaced with a poll-to-convergence.

App fix

  • Unverified sign-in redirect (app/api/auth/strict-signin/start/route.ts, components/auth/dialog.tsx): an unverified sign-in returned a generic 500 (Better Auth's signInEmail throws "email not verified", caught as a generic failure), so the auth dialog had no signal to redirect and stayed on the sign-in form. The route now pre-checks email_verified after the password match and returns a typed email_not_verified (403); the dialog mirrors its signup path -- re-send the verification OTP and route to the verify view. Verified sign-in is unaffected: the new branch only fires on the new code.

Local environment

Running the suite against pnpm dev needs two vars in the dev server's .env (already documented in .env.example): TEST_API_KEY and INCLUDE_TEST_ENDPOINTS=true. Without them the per-IP signup rate limit (5/hour) is not bypassed and signup-driven tests 429 after the 5th account. CI is unaffected -- it disables rate-limiting outright.

Docs

  • tests/README.md: data-* alias / data-snapshot "reading data back" conventions and the Agent-Driven Test Development loop.
  • CLAUDE.md: Playwright section points at the above plus the /test-write / /test-debug commands.

Out of scope

Some full vitest/protocol-coverage prerequisites are operational rather than code here (executor deployment env vars, DevKit db:setup-workflow ordering vs db:migrate, WORKFLOW_TARGET_WORLD).

- cleanup/seed: integrations is org-scoped (created_by), not user_id
- cleanup/seed/forgot-password: bypass the append-only security_audit_log
  trigger when deleting test users (session_replication_role = replica)
- full-pipeline: match current runner 'not executable (disabled)' log line
- playwright: send x-load-test-mfa-bypass header to clear the mandatory-MFA gate
suisuss added 2 commits June 18, 2026 20:15
…ky suites

The serial e2e suite shared one persistent user with ambiguous org
resolution, and several fixtures had drifted from the schema / feature model.

- organization-wallet: reset the persistent user's active org after the
  ORG-CREATE block, which was flipping it (afterCreateOrganization) and 404ing
  every later org-scoped test (workflow-save, webhook-toggle, slug-gate).
- createTestWorkflow: select the owning org deterministically (owner, oldest)
  so the helper and the session resolve the same org.
- credential-bundle: fix integration INSERT schema drift (user_id -> created_by)
  and seed 'organization' visibility so the execute-time org-principal gate
  authorizes the reference.
- add a seeded pro-plan test user/org; run the Send Webhook (scheduled-workflow)
  and HTTP Request import (workflow-io-modal) suites as it, since those actions
  are Pro-gated. e2e-test-org stays free for the billing tests.
- hub shell-flicker: replace the racy fixed 150ms wait with a deterministic
  poll so a transient content skeleton mounting before its [role=tabpanel]
  wrapper is not miscounted as a shell flicker (marketplace-tab, tabbed-hub-shell).
- credential-bundle: skip unless NEXT_BUILD_MODE=production -- the assertions
  need a workflow execution to complete, which the DevKit worker only advances
  under the production runtime (under next dev, runs hang at pending).
- createWorkflow: wait for the org switcher (auth-ready) before building, so the
  editor does not create an anonymous workflow whose Save button stays gated
  ('Sign in to save workflows') -- fixes the web3-balance flake.
- signup verify-view waits: 15s -> 30s to tolerate the dev server's first-request
  route compilation (auth.test.ts x7 + signUp helper); fixes the intermittent
  'stuck on Create account' in auth / invitations.
- back-forward-hydration: poll the button count to convergence instead of a
  single mid-hydration snapshot (the /hub page renders no 'Loading...' sentinel).
/api/auth/strict-signin/start let Better Auth's signInEmail throw 'email not
verified' and caught it as a generic 500, so the auth dialog had no signal to
redirect and stayed on the sign-in form (e2e: 'unverified user signing in
redirects to verification'). Pre-check email_verified after the password match
and return a typed email_not_verified (403); the dialog mirrors its signup path
-- re-send the verification OTP and route to the verify view. Verified sign-in
is unaffected: the new branch only fires on the new code.

Also revert the signup verify-view timeout (auth.test.ts + signUp helper) from
30s back to 15s -- that was a misdiagnosis. The signup failures were a local
rate-limit (the dev env was missing INCLUDE_TEST_ENDPOINTS, documented in
.env.example), not cold-compile slowness.
Adds the data-* alias / data-snapshot 'reading data back' conventions and the
Agent-Driven Test Development loop to tests/README.md, and points CLAUDE.md's
Playwright section at them plus the /test-write and /test-debug commands.
… resolution

Unit, integration, and e2e-ephemeral jobs now declare environment: staging so
secrets resolve to the staging tier instead of the repo-level fallback. The
e2e-ephemeral workflow moves from the prod/staging ternary to always staging,
which also fixes prod-branch runs where staging-only secrets (TURNKEY_*,
TESTNET_FUNDER_PK) previously resolved empty.
suisuss added 2 commits June 19, 2026 16:30
The e2e production build asserts SANDBOX_BACKEND=remote and SANDBOX_URL (the
/.well-known/workflow/v1/step guard); set them in the start-app build/start env
so the ephemeral build clears the gate. Add a start-sandbox composite action
and run it in e2e-vitest-ephemeral so SANDBOX_URL points at a live container.
…test

CI's scoped scripts skipped tests outside their paths. Run them in place:
lib/**/__tests__ and the metrics-collector server test via test-unit,
postgres-world via test:e2e:vitest, and the sandbox package's own unit tests
via a new step. Add sandbox-run-code.test.ts as a live consumer driving
runRemote() against the e2e sandbox.
…ss sign-in

playwright.config.ts sends x-load-test-mfa-bypass only when LOAD_TEST_BYPASS_TOKEN
is set, and proxy.ts verifies that header against the same env var to clear the
mandatory-MFA enrollment gate. Neither the app nor the playwright test step had
it, so auth.setup never reached the org switcher and gated all 122 tests. Wire
it into both (scoped to the playwright job).
suisuss added 10 commits June 19, 2026 20:02
INV-SEND-3 asserted an 'already invited' error toast on re-inviting a pending
email, but the org plugin sets cancelPendingInvitationsOnReInvite: true
(lib/auth.ts) -- re-inviting cancels the pending invite and re-sends (success).
The 'already invited' string exists nowhere in the app, so that toast can never
appear; the test failed all 3 retries (consistent, not flaky). Rewrite it to
assert the actual re-invite-re-sends behavior. The invitation-system code is
correct and intentional -- only the test was stale.
The invitations playwright suite signs in as a few shared persistent
users many times across its serial tests. The strict-signin/start
credential-attempt limiter (5/email/15min) had no test bypass, so later
sign-ins returned 429 and the auth dialog never closed.

App: exempt authenticated e2e traffic from the credential-attempt limiter,
gated identically to better-auth's rateLimitBypassRule
(testEndpointsEnabled + matching X-Test-API-Key). Wire build-time
INCLUDE_TEST_ENDPOINTS and runtime ALLOW_TEST_ENDPOINTS into the e2e
start-app action, scoped to the playwright job only.

Test: reuse the inviter storage state so inviter-first tests skip the UI
sign-in entirely; signInAsInviter becomes idempotent and the per-describe
clearCookies move inline to the four tests that must start logged out.
The shadcn InputOTP component intermittently drops digits when populated
via a single Playwright fill(), leaving the verify button disabled and
hanging the subsequent click (observed: 2-3 of 6 digits landing, varying
per run). This surfaced in INV-RECV-1 once the credential-rate-limit fix
let the test reach the email-verification step.

Add fillOtpInput: clear, type the code with pressSequentially, assert the
input holds the full value, and retry until it sticks. Use it in both
signUpAndVerify and the INV-RECV-1 accept-invite verify step.
storeOTP is "encrypted" (KEEP-625), so verifications.value is stored as
`<symmetric-encrypted>:<keyVersion>`. The e2e OTP reader (getOtpViaDb)
and the admin test endpoint stripped the version suffix but never
decrypted, returning the raw ciphertext as the "OTP". The segmented OTP
input then kept only the first 6 chars of that ciphertext, so the verify
button stayed disabled and INV-RECV-1 hung on the Verify & Join click.

Decrypt the ciphertext with BETTER_AUTH_SECRET (the same primitive the
strict-signin verifier uses) to recover the real 6-digit code, in both
getOtpViaDb and app/api/admin/test/otp. Validated the split+decrypt
round-trip locally against better-auth's symmetricEncrypt.
Revert the decrypt change to /api/admin/test/otp. That endpoint has a
dedicated integration test (admin-otp-route.test.ts) asserting the legacy
split-only contract, and it is not on the ephemeral e2e path, which reads
the OTP via getOtpViaDb. Decrypting there broke the integration test
without being needed for this suite. getOtpViaDb keeps the decrypt.

Note: the admin endpoint and its test still encode the pre-KEEP-625
split-only behavior, so a deployed-env OTP read (BASE_URL set) would get
ciphertext -- a separate, pre-existing follow-up.
After email verification the app forces new signups through TOTP
enrollment (dialog.tsx: "New signups must enroll TOTP"); the
LOAD_TEST_BYPASS_TOKEN only clears the login step-up, not the enrollment
wizard. signUpAndVerify expected the dialog to close after email verify,
so INV-RECV-2/3 and ORG-2 hung on the enrollment screen.

Read the rendered base32 setup key, generate a matching RFC 6238 TOTP
(HMAC-SHA1, 30s, dynamic truncation - validated byte-for-byte against the
server's generateTotp), submit it, and dismiss the backup-codes step.
Gated on the enrollment input appearing so other signup flows are
unaffected.
The accept-invite page defaults to the create-account view; INV-RECV-2
accepts as an existing user, so it clicks the "Sign in" toggle before the
"Sign In & Join" button exists. Surfaced once the TOTP-enrollment fix let
signUpAndVerify create the invitee and the test reached this step.
Consume intent.redirectTo as the post-sign-in landing when it is a
same-origin relative path (guarded against open redirects), falling back
to / otherwise. Both sign-in success paths now navigate to the resolved
target instead of always /. Also activates the existing use-template
redirect intent.
…FA dialog

Existing invitees now have mandatory TOTP, so the inline signIn.email path
returned a twoFactorRedirect with no session and the page 401'd on accept.
Open the shared AuthDialog (full password + email-OTP + TOTP flow) with a
redirectTo back to the invite; on return the authenticated accept view
handles it. Removes the dead inline sign-in handler and the transitional
state that only masked the old SPA race.
signUpAndVerify/completeTotpEnrollment now return the TOTP setup key.
Add getSignInOtpFromDb (reads the sign-in-otp verifications row, which the
admin OTP API does not expose), completeMfaSignInDialog (password ->
email-OTP -> TOTP), and fillContentEditableOtp (the email-OTP field is a
contentEditable div, not an input). INV-RECV-2 now drives the dialog and
accepts on return.
@github-actions

Copy link
Copy Markdown

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • Namespace: pr-1584
  • All Helm releases (Keeperhub, Scheduler, Event services)
  • PostgreSQL Database (including data)
  • LocalStack, Redis
  • All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@github-actions

Copy link
Copy Markdown

ℹ️ No PR Environment to Clean Up

No PR environment was found for this PR. This is expected if:

  • The PR never had the deploy-pr-environment label
  • The environment was already cleaned up
  • The deployment never completed successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant