Skip to content

feat: 2-tier E2E test system — granular touchfiles + gate/periodic split (v0.11.16.0)#450

Open
garrytan wants to merge 4 commits intomainfrom
garrytan/e2e-test-triage
Open

feat: 2-tier E2E test system — granular touchfiles + gate/periodic split (v0.11.16.0)#450
garrytan wants to merge 4 commits intomainfrom
garrytan/e2e-test-triage

Conversation

@garrytan
Copy link
Owner

Summary

  • Granular global touchfiles: Shrunk GLOBAL_TOUCHFILES from 9 to 3 entries. gen-skill-docs.ts (changed 51 times in 30 days) no longer triggers all 56 tests — only the ~27 that actually depend on it. Same for llm-judge.ts, test-server.ts, worktree.ts, and Codex/Gemini session runners.
  • 2-tier test system: Every E2E test is classified as gate (blocks PRs, ~$8.50) or periodic (weekly cron, ~$11.50). CI runs gate tests by default via EVALS_TIER=gate. Periodic tests run Monday 6 AM UTC via new evals-periodic.yml workflow.
  • Replaced EVALS_FAST with EVALS_TIER env var (gate/periodic). Removed allow_failure flags from CI matrix.
  • Safety net: Free validation test ensures E2E_TIERS keys always match E2E_TOUCHFILES keys.

Test Coverage

All new code paths have test coverage. 558 free tests pass, 0 fail.

New tests added:

  • E2E_TIERS covers exactly the same tests as E2E_TOUCHFILES — prevents tier map drift
  • E2E_TIERS only contains valid tier values — catches typos
  • Updated gen-skill-docs.ts is a scoped touchfile — verifies it's no longer global

Pre-Landing Review

No issues found. Test infrastructure only — no SQL, no LLM trust boundaries, no frontend changes.

Test plan

  • All 558 free tests pass (touchfiles, skill-validation, gen-skill-docs)
  • Tier validation test catches missing/extra entries
  • gen-skill-docs.ts change triggers ~27 tests, not 56
  • Backward compatibility: EVALS_ALL=1 still runs everything

🤖 Generated with Claude Code

garrytan and others added 3 commits March 24, 2026 08:15
- Shrink GLOBAL_TOUCHFILES from 9 to 3 (only truly global deps)
- Move scoped deps (gen-skill-docs, llm-judge, test-server, worktree,
  codex/gemini session runners) into individual test entries
- Add E2E_TIERS map classifying each test as gate or periodic
- Replace EVALS_FAST with EVALS_TIER env var (gate/periodic)
- Add tier validation test (E2E_TIERS keys must match E2E_TOUCHFILES)
- CI runs only gate tests; periodic tests run weekly via cron
- Add evals-periodic.yml workflow (Monday 6 AM UTC + manual)
- Remove allow_failure flags (gate tests should be reliable)
- Add test:gate and test:periodic scripts, remove test:e2e:fast
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 24, 2026

E2E Evals: ❌ FAIL

64/81 tests passed | $9.25 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 7/7 $0.33
e2e-deploy 4/4 $0.6
e2e-design 3/3 $0.53
e2e-plan 7/7 $1.02
e2e-qa-workflow 3/3 $0.9
e2e-review 5/5 $0.93
e2e-routing 7/20 $3.77
e2e-workflow 4/8 $0.69
llm-judge 24/24 $0.48

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Failures

  • ❌ journey-ideation: success
  • ❌ journey-visual-qa: success
  • ❌ journey-qa: success
  • ❌ journey-ideation: success
  • ❌ journey-debug: success
  • ❌ journey-visual-qa: success
  • ❌ journey-ideation: success
  • ❌ journey-qa: success
  • ❌ journey-qa: success
  • ❌ journey-visual-qa: success
  • ❌ journey-debug: success
  • ❌ journey-design-system: success
  • ❌ journey-debug: success
  • ❌ /ship local workflow: success
  • ❌ /ship local workflow: success
  • ❌ /ship local workflow: success
  • ❌ /setup-browser-cookies detect: error_max_turns

browse/dist/ is already in .gitignore — the binary was committed
by mistake in dc5e053. Untrack it so it stops showing as modified.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant