From 6f6b2e9f3671ba06ecf43f19101ff75864f6bdb0 Mon Sep 17 00:00:00 2001 From: Rhuan Barreto Date: Fri, 29 May 2026 00:54:23 +0200 Subject: [PATCH 1/3] test(engine): fix flaky Windows timeout in git-files scope test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `resolveScopedFiles > warns when file scope exceeds threshold` test created 1001 real files plus `git add .` to trip the file-count warning. On Windows CI runners that filesystem work exceeded the per-test 30s timeout, killing `git add .` mid-run (exit 143 = SIGTERM) and flaking the Windows smoke test. Make the warn threshold injectable via an optional `fileWarnThreshold` parameter on `resolveScopedFiles` (defaults to SCOPE_FILE_WARN_THRESHOLD). The test now injects a threshold of 5 and creates only 6 files — fast and deterministic on every platform — and drops the 30s per-test override (which was tighter than the 60s global, making timeouts more likely). Capture the pattern in ARCH-005 (Testing Standards). Signed-off-by: Rhuan Barreto --- .archgate/adrs/ARCH-005-testing-standards.md | 4 ++++ .claude/agent-memory/archgate-developer/MEMORY.md | 2 ++ src/engine/git-files.ts | 11 +++++++++-- tests/engine/git-files.test.ts | 11 ++++++++--- 4 files changed, 23 insertions(+), 5 deletions(-) diff --git a/.archgate/adrs/ARCH-005-testing-standards.md b/.archgate/adrs/ARCH-005-testing-standards.md index dc86f6ac..442c4637 100644 --- a/.archgate/adrs/ARCH-005-testing-standards.md +++ b/.archgate/adrs/ARCH-005-testing-standards.md @@ -46,6 +46,8 @@ Use Bun's built-in test runner (`bun test`) for all tests. Test files go in `tes - Use descriptive test names that explain the expected behavior - **When mocking `fetch` in tests, assign directly to `globalThis.fetch`** — use `globalThis.fetch = mockFn as unknown as typeof fetch`. Restore in `afterEach` via `mock.restore()` (from `bun:test`) or by reassigning the original reference before the test. - **Wrap `spyOn` / `mockImplementation` calls in `try/finally` to guarantee `mockRestore()` runs** — when `expect()` assertions fail, they throw immediately, skipping any `mockRestore()` that follows. The un-restored spy leaks into subsequent tests, causing false positives or false negatives. Pattern: `const spy = spyOn(...).mockImplementation(() => {}); try { /* assertions */ } finally { spy.mockRestore(); }`. Alternatively, create and restore spies in `beforeEach`/`afterEach` hooks instead of inline. +- **Make large production thresholds injectable so tests can use a small value** — When production code only acts past a large numeric threshold (e.g., `SCOPE_FILE_WARN_THRESHOLD = 1000` in `src/engine/git-files.ts`), add an optional parameter that defaults to the module constant (e.g., `resolveScopedFiles(root, globs, { fileWarnThreshold })`). Tests inject a tiny value such as `5` and create only a handful of files to exercise the same code path. This keeps the test fast and deterministic on every platform instead of materializing thousands of fixture files. +- **Only ever raise a per-test timeout override above the global, never below it** — the project global is `bun test --timeout 60000` (60s). A per-test `}, 30_000` override makes that test *more* likely to time out than the default. Use a per-test override solely to grant a genuinely slow test more time than 60s; never set one shorter than the global. ### Don't @@ -59,6 +61,7 @@ Use Bun's built-in test runner (`bun test`) for all tests. Test files go in `tes - Don't import test utilities from `node:test` — use Bun's built-in `bun:test` module - **Don't use `mock.module("node:fetch", ...)` to intercept HTTP fetch calls** — in Bun, the runtime fetch is `globalThis.fetch` and `mock.module` targeting `node:fetch` does not intercept it. The mock silently has no effect: the real network is hit, making tests non-deterministic and dependent on external services. Assign `globalThis.fetch` directly instead (see Do's above). - **Don't place `mockRestore()` after assertions without `try/finally` protection** — if the assertion throws, the spy is never restored and contaminates subsequent tests. This is especially dangerous when spying on globals like `console.warn` or `globalThis.fetch`, where a leaked spy silently suppresses or redirects output for every test that follows. +- **Don't materialize large filesystem fixtures (1000+ files, plus `git add .`) just to cross a production threshold** — on Windows CI runners that filesystem work is slow enough to exceed the per-test timeout, killing the staging subprocess mid-run (`git add . failed (exit 143)`, where 143 = 128 + SIGTERM) and producing a flaky failure that does not reproduce locally. Inject a small threshold instead (see Do's above). This was the root cause of a recurring Windows smoke-test flake on the `resolveScopedFiles` warning test. ## Implementation Pattern @@ -213,6 +216,7 @@ Code reviewers MUST verify: 5. Tests that call `git commit` on a temp repo configure `user.email` and `user.name` locally before committing 6. Tests that mock HTTP fetch assign `globalThis.fetch` directly — no `mock.module("node:fetch", ...)` usage 7. Tests that use `spyOn` or `mockImplementation` inline (not in `beforeEach`/`afterEach`) wrap the spy lifecycle in `try/finally` to guarantee `mockRestore()` runs even when assertions fail +8. Tests that exercise a numeric production threshold inject a small threshold value rather than generating fixtures large enough to trip the production default, and no per-test timeout override is shorter than the global `--timeout 60000` ## References diff --git a/.claude/agent-memory/archgate-developer/MEMORY.md b/.claude/agent-memory/archgate-developer/MEMORY.md index d663156b..fc19be38 100644 --- a/.claude/agent-memory/archgate-developer/MEMORY.md +++ b/.claude/agent-memory/archgate-developer/MEMORY.md @@ -30,6 +30,8 @@ Skipping steps 2 or 3 is a workflow violation. The user should NEVER have to inv ## Patterns & Fixes +- **Make large production thresholds injectable rather than building giant test fixtures** — Tests that must cross a big numeric threshold in production code (e.g., the 1000-file `SCOPE_FILE_WARN_THRESHOLD` in `src/engine/git-files.ts`) should NOT create 1000+ real files + `git add .` to trip it. On Windows CI runners that filesystem work is slow enough to exceed the per-test timeout — `git add .` gets SIGTERM'd mid-run (`git add . failed (exit 143)`, 143 = 128+SIGTERM) and the whole suite fails flakily. Fix: add an optional override (e.g., `fileWarnThreshold?: number` defaulting to the module constant) to the function under test, and have the test inject a tiny value (5) so it only needs ~6 files. Caused the `git-files > resolveScopedFiles > warns when file scope exceeds threshold` flake on PR #366's Windows smoke test. Also captured in ARCH-005. +- **Never set a per-test timeout override SHORTER than the global `--timeout`** — CI runs `bun test --timeout 60000` (60s). A per-test `}, 30_000` override makes that test MORE likely to time out than the default, the opposite of the usual intent. Only override the per-test timeout to raise it above the global, never to lower it. The 30s override on the git-files threshold test was the proximate cause of its Windows flake. - **YAML double-quoted strings require escaped backslashes for Windows paths in tests** — YAML interprets `\` as an escape character inside double-quoted strings. Writing `cwd: "E:\project"` silently corrupts the parsed value because `\p` is not a valid escape sequence. Fix: use `JSON.stringify(path)` to produce properly escaped YAML values (e.g., `cwd: ${JSON.stringify(cwd)}`). JSON and YAML double-quoted strings share the same escape syntax. Encountered in Copilot CLI session-context tests (`workspace.yaml` with Windows paths). - **`git commit` in temp repos requires local identity** — CI runners have no global `user.email`/`user.name` configured. Any test that runs `git commit` on a temp repo MUST call `git config user.email` and `git config user.name` locally after `git init`. Fails with a cryptic `ShellPromise` error in CI; passes locally. Also captured in ARCH-005 Do's. - **Never use `bunx prettier` directly** — Always use `bun run format` (to fix) or `bun run format:check` (to verify). Using `bunx prettier` can fail or use a different version than the project's devDependency. The same applies to all dev tools: prefer `bun run