Skip to content

ci: restructure e2e test workflows and add admin test API#472

Open
suisuss wants to merge 42 commits intostagingfrom
feat/KEEP-1351-e2e-stability
Open

ci: restructure e2e test workflows and add admin test API#472
suisuss wants to merge 42 commits intostagingfrom
feat/KEEP-1351-e2e-stability

Conversation

@suisuss
Copy link

@suisuss suisuss commented Mar 3, 2026

Summary

  • Eliminates the workflow_run cascade that caused phantom CI runs on every PR push
  • Converts workflow chains to workflow_call reusable workflows coordinated by orchestrator files (ci-pipeline.yml, release-pipeline.yml)
  • Renames e2e-tests.yml to e2e-tests-ephemeral.yml with ephemeral/remote naming convention
  • Adds admin test API endpoints (/api/admin/test/otp, /api/admin/test/invitation) for remote Playwright DB lookups
  • Extracts duplicated Node.js/pnpm/Playwright CI setup into .github/actions/setup-node-pnpm composite action

Changes

CI/CD

  • Convert workflow_run chains to workflow_call orchestrators
  • Prevent label additions from re-triggering completed jobs
  • Standardize actions/checkout to v6 across all workflow files
  • Extract ~100 lines of duplicated setup steps into a single composite action with install-playwright and discover-plugins inputs

Test Reliability

  • signOut() now throws on missing user menu instead of silently passing
  • OTP and invitation API polling use exponential backoff (500ms base, 4s cap, 8 retries)
  • Admin test API 500 responses return generic "Internal server error" instead of leaking error.message
  • Integration test assertions updated to match sanitized error responses

Admin Test API

  • GET /api/admin/test/otp?email= -- latest OTP lookup
  • GET /api/admin/test/invitation?email= -- pending invitation lookup
  • Gated by TEST_API_KEY with timing-safe comparison
  • Email restricted to @techops.services domain
  • Unit and integration tests included

Documentation

  • Comprehensive test data seeding, wallet, Sepolia, secrets, and cleanup docs
  • Fixed stale workflow_run references, mermaid diagrams, test counts
  • Added composite action documentation to workflow docs
  • Renamed "Local" context to "Ephemeral" throughout

Test plan

  • pnpm test:e2e:vitest passes locally
  • pnpm test:e2e (Playwright) passes locally
  • CI ephemeral E2E pipeline passes on push to staging
  • Remote Playwright passes against deployed staging (skipping for now, tests are flaky, need better rate limiting mitigation for post deploy verification testing)
  • Admin test API returns 401/403/404/200 for expected inputs

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

PR Environment Deployment Failed

The PR environment deployment encountered an error.

Please check the workflow logs for details.

Common issues:

  • Database initialization timeout
  • Image build failures
  • Resource constraints

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

PR Environment Deployment Failed

The PR environment deployment encountered an error.

Please check the workflow logs for details.

Common issues:

  • Database initialization timeout
  • Image build failures
  • Resource constraints

suisuss added 13 commits March 5, 2026 11:57
Same fix as OTP polling: exponential backoff (500ms -> 4s cap)
instead of linear 500ms retries.
Stop forwarding error.message to the client in 500 responses.
The detail is still logged server-side via console.error.
Admin routes now return "Internal server error" instead of
forwarding error.message. Update test expectations to match.
deploy-keeperhub.yaml is now a reusable workflow called via
workflow_call from ci-pipeline.yml, not triggered by workflow_run.
…ction

Deduplicate ~40 lines of setup steps repeated across 6 jobs.
New .github/actions/setup-node-pnpm/action.yml with two boolean
inputs: install-playwright and discover-plugins.
- Rename "Local" context to "Ephemeral" to match naming convention
- Fix mermaid diagram: replace non-existent check-e2e-label with check-labels
- Update e2e-tests-ephemeral.yml trigger to reflect workflow_call pattern
- Correct remote vitest test count from 114+ to ~130
- Add composite action (setup-node-pnpm) documentation to workflow docs
…ignals

Add data attributes to app components for test automation:
- workflow-canvas: data-ready for canvas load state
- org-switcher: data-state for switching/loading/ready
- accept-invite: data-page-state for hydration state

Replace waitForTimeout/networkidle with element assertions:
- auth.setup: remove 2s sleeps, use domcontentloaded + org-switcher wait
- auth utils: replace networkidle with org-switcher visibility
- workflow utils: use data-ready, waitForURL, element assertions
- invitations: replace retry loop with data-page-state wait
- workflow.test/schedule-trigger.test: remove all waitForTimeout calls
- organization-wallet.test: replace toast race with element assertion

Stabilize playwright config:
- fullyParallel: false, workers: 1 (serial to avoid shared-state conflicts)
- retries: 2 (handles environmental flakiness)
- reporter: github + html in CI, list locally
- Unskip analytics-gas, scheduled-workflow, web3-balance, para-wallet tests
- Remove stop-execution.test.ts and ORG-4 placeholder (unimplemented UI)
- Fix Para Wallet "Create Wallet" ambiguous selector (data-slot scoping)
- Fix analytics-gas scrollIntoView race (wait for attachment first)
- Remove CI sharding from e2e-tests-ephemeral.yml (serial execution)
- Move regex literal to top-level scope (biome lint fix)
- Use consistent button[role="combobox"] selector in signIn()
- Update docs with CI execution model and stability decisions
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

Read the test API key from AWS Parameter Store instead of GitHub
Secrets so the CI runner uses the same value as the deployed app.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

suisuss added 2 commits March 5, 2026 19:03
Read TEST_API_KEY from the deployed pod environment so the CI runner
uses the exact same value as the app. Skip ephemeral and vitest-remote
jobs on this branch to iterate faster on playwright-remote.
When a synchronize event only changes files under .github/, tests/,
or config files like playwright.config/vitest.config, skip the
deploy step and just re-run the tests against the existing deployment.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

suisuss added 2 commits March 5, 2026 19:26
The deployed PR environment runs NODE_ENV=development, not test/CI,
so rate limiting was still active. Add DISABLE_AUTH_RATE_LIMIT env var
to PR environment values and check it in the auth config.
Use better-auth customRules to bypass rate limiting when requests
include a valid X-Test-API-Key header. Playwright config sends this
header automatically when TEST_API_KEY env var is set. This keeps
rate limiting active for real users while allowing E2E tests to run
without hitting limits in PR, staging, and prod environments.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

suisuss added 2 commits March 5, 2026 19:35
Add testFetch() and getTestHeaders() to vitest E2E utils for future
tests that hit auth endpoints. Also add X-Test-API-Key to Playwright
admin-fetch headers.
Fix customRules return type (return currentRule instead of undefined),
remove unused biome-ignore suppression, drop unnecessary async.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

suisuss added 2 commits March 5, 2026 20:02
Remote and ephemeral E2E test jobs are disabled (if: false) across
deploy-keeperhub, deploy-pr-environment, and e2e-tests-ephemeral
workflows while auth/rate-limit infrastructure is being stabilised.
Remote tests gated by ENABLE_E2E_REMOTE_TESTS, ephemeral tests by
ENABLE_E2E_EPHEMERAL_TESTS. Both are GitHub repository variables.
Currently ephemeral=true, remote=false.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🚀 PR Environment Deployed

Your PR environment has been successfully deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Scheduler Dispatcher (staging image)
  • Scheduler Executor (staging image)
  • SC Event Tracker (staging image)
  • SC Event Worker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant