feat: 2-tier E2E test system — granular touchfiles + gate/periodic split (v0.11.16.0)#450
Open
feat: 2-tier E2E test system — granular touchfiles + gate/periodic split (v0.11.16.0)#450
Conversation
- Shrink GLOBAL_TOUCHFILES from 9 to 3 (only truly global deps) - Move scoped deps (gen-skill-docs, llm-judge, test-server, worktree, codex/gemini session runners) into individual test entries - Add E2E_TIERS map classifying each test as gate or periodic - Replace EVALS_FAST with EVALS_TIER env var (gate/periodic) - Add tier validation test (E2E_TIERS keys must match E2E_TOUCHFILES) - CI runs only gate tests; periodic tests run weekly via cron - Add evals-periodic.yml workflow (Monday 6 AM UTC + manual) - Remove allow_failure flags (gate tests should be reliable) - Add test:gate and test:periodic scripts, remove test:e2e:fast
# Conflicts: # CLAUDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
E2E Evals: ❌ FAIL64/81 tests passed | $9.25 total cost | 12 parallel runners
12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite Failures
|
browse/dist/ is already in .gitignore — the binary was committed by mistake in dc5e053. Untrack it so it stops showing as modified. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GLOBAL_TOUCHFILESfrom 9 to 3 entries.gen-skill-docs.ts(changed 51 times in 30 days) no longer triggers all 56 tests — only the ~27 that actually depend on it. Same forllm-judge.ts,test-server.ts,worktree.ts, and Codex/Gemini session runners.gate(blocks PRs, ~$8.50) orperiodic(weekly cron, ~$11.50). CI runs gate tests by default viaEVALS_TIER=gate. Periodic tests run Monday 6 AM UTC via newevals-periodic.ymlworkflow.EVALS_FASTwithEVALS_TIERenv var (gate/periodic). Removedallow_failureflags from CI matrix.E2E_TIERSkeys always matchE2E_TOUCHFILESkeys.Test Coverage
All new code paths have test coverage. 558 free tests pass, 0 fail.
New tests added:
E2E_TIERS covers exactly the same tests as E2E_TOUCHFILES— prevents tier map driftE2E_TIERS only contains valid tier values— catches typosgen-skill-docs.ts is a scoped touchfile— verifies it's no longer globalPre-Landing Review
No issues found. Test infrastructure only — no SQL, no LLM trust boundaries, no frontend changes.
Test plan
gen-skill-docs.tschange triggers ~27 tests, not 56EVALS_ALL=1still runs everything🤖 Generated with Claude Code