From 6f6b2e9f3671ba06ecf43f19101ff75864f6bdb0 Mon Sep 17 00:00:00 2001
From: Rhuan Barreto <rhuan@barreto.work>
Date: Fri, 29 May 2026 00:54:23 +0200
Subject: [PATCH 1/3] test(engine): fix flaky Windows timeout in git-files
 scope test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The `resolveScopedFiles > warns when file scope exceeds threshold` test
created 1001 real files plus `git add .` to trip the file-count warning.
On Windows CI runners that filesystem work exceeded the per-test 30s
timeout, killing `git add .` mid-run (exit 143 = SIGTERM) and flaking the
Windows smoke test.

Make the warn threshold injectable via an optional `fileWarnThreshold`
parameter on `resolveScopedFiles` (defaults to SCOPE_FILE_WARN_THRESHOLD).
The test now injects a threshold of 5 and creates only 6 files — fast and
deterministic on every platform — and drops the 30s per-test override
(which was tighter than the 60s global, making timeouts more likely).

Capture the pattern in ARCH-005 (Testing Standards).

Signed-off-by: Rhuan Barreto <rhuan@barreto.work>
---
 .archgate/adrs/ARCH-005-testing-standards.md      |  4 ++++
 .claude/agent-memory/archgate-developer/MEMORY.md |  2 ++
 src/engine/git-files.ts                           | 11 +++++++++--
 tests/engine/git-files.test.ts                    | 11 ++++++++---
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/.archgate/adrs/ARCH-005-testing-standards.md b/.archgate/adrs/ARCH-005-testing-standards.md
index dc86f6ac..442c4637 100644
--- a/.archgate/adrs/ARCH-005-testing-standards.md
+++ b/.archgate/adrs/ARCH-005-testing-standards.md
@@ -46,6 +46,8 @@ Use Bun's built-in test runner (`bun test`) for all tests. Test files go in `tes
 - Use descriptive test names that explain the expected behavior
 - **When mocking `fetch` in tests, assign directly to `globalThis.fetch`** — use `globalThis.fetch = mockFn as unknown as typeof fetch`. Restore in `afterEach` via `mock.restore()` (from `bun:test`) or by reassigning the original reference before the test.
 - **Wrap `spyOn` / `mockImplementation` calls in `try/finally` to guarantee `mockRestore()` runs** — when `expect()` assertions fail, they throw immediately, skipping any `mockRestore()` that follows. The un-restored spy leaks into subsequent tests, causing false positives or false negatives. Pattern: `const spy = spyOn(...).mockImplementation(() => {}); try { /* assertions */ } finally { spy.mockRestore(); }`. Alternatively, create and restore spies in `beforeEach`/`afterEach` hooks instead of inline.
+- **Make large production thresholds injectable so tests can use a small value** — When production code only acts past a large numeric threshold (e.g., `SCOPE_FILE_WARN_THRESHOLD = 1000` in `src/engine/git-files.ts`), add an optional parameter that defaults to the module constant (e.g., `resolveScopedFiles(root, globs, { fileWarnThreshold })`). Tests inject a tiny value such as `5` and create only a handful of files to exercise the same code path. This keeps the test fast and deterministic on every platform instead of materializing thousands of fixture files.
+- **Only ever raise a per-test timeout override above the global, never below it** — the project global is `bun test --timeout 60000` (60s). A per-test `}, 30_000` override makes that test *more* likely to time out than the default. Use a per-test override solely to grant a genuinely slow test more time than 60s; never set one shorter than the global.
 
 ### Don't
 
@@ -59,6 +61,7 @@ Use Bun's built-in test runner (`bun test`) for all tests. Test files go in `tes
 - Don't import test utilities from `node:test` — use Bun's built-in `bun:test` module
 - **Don't use `mock.module("node:fetch", ...)` to intercept HTTP fetch calls** — in Bun, the runtime fetch is `globalThis.fetch` and `mock.module` targeting `node:fetch` does not intercept it. The mock silently has no effect: the real network is hit, making tests non-deterministic and dependent on external services. Assign `globalThis.fetch` directly instead (see Do's above).
 - **Don't place `mockRestore()` after assertions without `try/finally` protection** — if the assertion throws, the spy is never restored and contaminates subsequent tests. This is especially dangerous when spying on globals like `console.warn` or `globalThis.fetch`, where a leaked spy silently suppresses or redirects output for every test that follows.
+- **Don't materialize large filesystem fixtures (1000+ files, plus `git add .`) just to cross a production threshold** — on Windows CI runners that filesystem work is slow enough to exceed the per-test timeout, killing the staging subprocess mid-run (`git add . failed (exit 143)`, where 143 = 128 + SIGTERM) and producing a flaky failure that does not reproduce locally. Inject a small threshold instead (see Do's above). This was the root cause of a recurring Windows smoke-test flake on the `resolveScopedFiles` warning test.
 
 ## Implementation Pattern
 
@@ -213,6 +216,7 @@ Code reviewers MUST verify:
 5. Tests that call `git commit` on a temp repo configure `user.email` and `user.name` locally before committing
 6. Tests that mock HTTP fetch assign `globalThis.fetch` directly — no `mock.module("node:fetch", ...)` usage
 7. Tests that use `spyOn` or `mockImplementation` inline (not in `beforeEach`/`afterEach`) wrap the spy lifecycle in `try/finally` to guarantee `mockRestore()` runs even when assertions fail
+8. Tests that exercise a numeric production threshold inject a small threshold value rather than generating fixtures large enough to trip the production default, and no per-test timeout override is shorter than the global `--timeout 60000`
 
 ## References
 
diff --git a/.claude/agent-memory/archgate-developer/MEMORY.md b/.claude/agent-memory/archgate-developer/MEMORY.md
index d663156b..fc19be38 100644
--- a/.claude/agent-memory/archgate-developer/MEMORY.md
+++ b/.claude/agent-memory/archgate-developer/MEMORY.md
@@ -30,6 +30,8 @@ Skipping steps 2 or 3 is a workflow violation. The user should NEVER have to inv
 
 ## Patterns & Fixes
 
+- **Make large production thresholds injectable rather than building giant test fixtures** — Tests that must cross a big numeric threshold in production code (e.g., the 1000-file `SCOPE_FILE_WARN_THRESHOLD` in `src/engine/git-files.ts`) should NOT create 1000+ real files + `git add .` to trip it. On Windows CI runners that filesystem work is slow enough to exceed the per-test timeout — `git add .` gets SIGTERM'd mid-run (`git add . failed (exit 143)`, 143 = 128+SIGTERM) and the whole suite fails flakily. Fix: add an optional override (e.g., `fileWarnThreshold?: number` defaulting to the module constant) to the function under test, and have the test inject a tiny value (5) so it only needs ~6 files. Caused the `git-files > resolveScopedFiles > warns when file scope exceeds threshold` flake on PR #366's Windows smoke test. Also captured in ARCH-005.
+- **Never set a per-test timeout override SHORTER than the global `--timeout`** — CI runs `bun test --timeout 60000` (60s). A per-test `}, 30_000` override makes that test MORE likely to time out than the default, the opposite of the usual intent. Only override the per-test timeout to raise it above the global, never to lower it. The 30s override on the git-files threshold test was the proximate cause of its Windows flake.
 - **YAML double-quoted strings require escaped backslashes for Windows paths in tests** — YAML interprets `\` as an escape character inside double-quoted strings. Writing `cwd: "E:\project"` silently corrupts the parsed value because `\p` is not a valid escape sequence. Fix: use `JSON.stringify(path)` to produce properly escaped YAML values (e.g., `cwd: ${JSON.stringify(cwd)}`). JSON and YAML double-quoted strings share the same escape syntax. Encountered in Copilot CLI session-context tests (`workspace.yaml` with Windows paths).
 - **`git commit` in temp repos requires local identity** — CI runners have no global `user.email`/`user.name` configured. Any test that runs `git commit` on a temp repo MUST call `git config user.email` and `git config user.name` locally after `git init`. Fails with a cryptic `ShellPromise` error in CI; passes locally. Also captured in ARCH-005 Do's.
 - **Never use `bunx prettier` directly** — Always use `bun run format` (to fix) or `bun run format:check` (to verify). Using `bunx prettier` can fail or use a different version than the project's devDependency. The same applies to all dev tools: prefer `bun run <script>` over `bunx <tool>` when a package.json script exists.
diff --git a/src/engine/git-files.ts b/src/engine/git-files.ts
index 568ebf44..04c3a8b1 100644
--- a/src/engine/git-files.ts
+++ b/src/engine/git-files.ts
@@ -68,12 +68,19 @@ export function getGitTrackedFiles(
 export async function resolveScopedFiles(
   projectRoot: string,
   adrFileGlobs?: string[],
-  options?: { respectGitignore?: boolean; adrId?: string }
+  options?: {
+    respectGitignore?: boolean;
+    adrId?: string;
+    /** Override the file-count warning threshold (defaults to SCOPE_FILE_WARN_THRESHOLD). */
+    fileWarnThreshold?: number;
+  }
 ): Promise<string[]> {
   const hasExplicitFiles = (adrFileGlobs?.length ?? 0) > 0;
   const patterns = hasExplicitFiles ? adrFileGlobs! : ["**/*"];
   const respectGitignore = options?.respectGitignore !== false;
   const label = options?.adrId ? `ADR ${options.adrId}` : "resolveScopedFiles";
+  const fileWarnThreshold =
+    options?.fileWarnThreshold ?? SCOPE_FILE_WARN_THRESHOLD;
 
   if (!respectGitignore && !hasExplicitFiles) {
     logWarn(
@@ -105,7 +112,7 @@ export async function resolveScopedFiles(
 
   const all = [...new Set(results.flat())].sort();
 
-  if (all.length > SCOPE_FILE_WARN_THRESHOLD || scanMs > SCOPE_SCAN_WARN_MS) {
+  if (all.length > fileWarnThreshold || scanMs > SCOPE_SCAN_WARN_MS) {
     logWarn(
       `${label}: Resolved ${all.length} files from patterns: ${patterns.join(", ")} (scan took ${Math.round(scanMs)}ms). Consider narrowing the \`files\` patterns in the ADR frontmatter to improve performance.`
     );
diff --git a/tests/engine/git-files.test.ts b/tests/engine/git-files.test.ts
index 054bdec5..6ad9e4c3 100644
--- a/tests/engine/git-files.test.ts
+++ b/tests/engine/git-files.test.ts
@@ -383,7 +383,11 @@ describe("git-files", () => {
     test("warns when file scope exceeds threshold", async () => {
       await git(["init"], tempDir);
       mkdirSync(join(tempDir, "src"), { recursive: true });
-      const fileCount = SCOPE_FILE_WARN_THRESHOLD + 1;
+      // Use a tiny injected threshold so we only need a handful of files —
+      // creating 1000+ real files + `git add .` is slow enough on Windows
+      // runners to trip the per-test timeout (SIGTERM mid-`git add`).
+      const fileWarnThreshold = 5;
+      const fileCount = fileWarnThreshold + 1;
       for (let i = 0; i < fileCount; i++) {
         writeFileSync(
           join(tempDir, "src", `file-${String(i).padStart(4, "0")}.ts`),
@@ -395,8 +399,9 @@ describe("git-files", () => {
       try {
         const files = await resolveScopedFiles(tempDir, ["src/**/*.ts"], {
           adrId: "SCOPE-001",
+          fileWarnThreshold,
         });
-        expect(files.length).toBeGreaterThan(SCOPE_FILE_WARN_THRESHOLD);
+        expect(files.length).toBeGreaterThan(fileWarnThreshold);
         const warnCalls = warnSpy.mock.calls.map((args) => args.join(" "));
         expect(
           warnCalls.some(
@@ -410,7 +415,7 @@ describe("git-files", () => {
       } finally {
         warnSpy.mockRestore();
       }
-    }, 30_000);
+    });
 
     test("does not warn when file scope is within threshold", async () => {
       await git(["init"], tempDir);

From e007c22606d1c116a5bd88e0c867315baf02a8f3 Mon Sep 17 00:00:00 2001
From: Rhuan Barreto <rhuan@barreto.work>
Date: Fri, 29 May 2026 00:56:53 +0200
Subject: [PATCH 2/3] style(adr): apply oxfmt formatting to ARCH-005
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

oxfmt formats markdown too — normalize emphasis markers in the
testing-standards ADR so `bun run format:check` passes in CI.

Signed-off-by: Rhuan Barreto <rhuan@barreto.work>
---
 .archgate/adrs/ARCH-005-testing-standards.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.archgate/adrs/ARCH-005-testing-standards.md b/.archgate/adrs/ARCH-005-testing-standards.md
index 442c4637..8a7dfc5f 100644
--- a/.archgate/adrs/ARCH-005-testing-standards.md
+++ b/.archgate/adrs/ARCH-005-testing-standards.md
@@ -47,7 +47,7 @@ Use Bun's built-in test runner (`bun test`) for all tests. Test files go in `tes
 - **When mocking `fetch` in tests, assign directly to `globalThis.fetch`** — use `globalThis.fetch = mockFn as unknown as typeof fetch`. Restore in `afterEach` via `mock.restore()` (from `bun:test`) or by reassigning the original reference before the test.
 - **Wrap `spyOn` / `mockImplementation` calls in `try/finally` to guarantee `mockRestore()` runs** — when `expect()` assertions fail, they throw immediately, skipping any `mockRestore()` that follows. The un-restored spy leaks into subsequent tests, causing false positives or false negatives. Pattern: `const spy = spyOn(...).mockImplementation(() => {}); try { /* assertions */ } finally { spy.mockRestore(); }`. Alternatively, create and restore spies in `beforeEach`/`afterEach` hooks instead of inline.
 - **Make large production thresholds injectable so tests can use a small value** — When production code only acts past a large numeric threshold (e.g., `SCOPE_FILE_WARN_THRESHOLD = 1000` in `src/engine/git-files.ts`), add an optional parameter that defaults to the module constant (e.g., `resolveScopedFiles(root, globs, { fileWarnThreshold })`). Tests inject a tiny value such as `5` and create only a handful of files to exercise the same code path. This keeps the test fast and deterministic on every platform instead of materializing thousands of fixture files.
-- **Only ever raise a per-test timeout override above the global, never below it** — the project global is `bun test --timeout 60000` (60s). A per-test `}, 30_000` override makes that test *more* likely to time out than the default. Use a per-test override solely to grant a genuinely slow test more time than 60s; never set one shorter than the global.
+- **Only ever raise a per-test timeout override above the global, never below it** — the project global is `bun test --timeout 60000` (60s). A per-test `}, 30_000` override makes that test _more_ likely to time out than the default. Use a per-test override solely to grant a genuinely slow test more time than 60s; never set one shorter than the global.
 
 ### Don't
 

From ada3df3aaa40e78188cbb5e97372174cf5af6efe Mon Sep 17 00:00:00 2001
From: Rhuan Barreto <rhuan@barreto.work>
Date: Fri, 29 May 2026 00:57:15 +0200
Subject: [PATCH 3/3] docs(memory): note oxfmt formats markdown/ADRs

Signed-off-by: Rhuan Barreto <rhuan@barreto.work>
---
 .claude/agent-memory/archgate-developer/MEMORY.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.claude/agent-memory/archgate-developer/MEMORY.md b/.claude/agent-memory/archgate-developer/MEMORY.md
index fc19be38..ef256811 100644
--- a/.claude/agent-memory/archgate-developer/MEMORY.md
+++ b/.claude/agent-memory/archgate-developer/MEMORY.md
@@ -30,6 +30,7 @@ Skipping steps 2 or 3 is a workflow violation. The user should NEVER have to inv
 
 ## Patterns & Fixes
 
+- **`oxfmt` formats markdown too — run `bun run format` after editing any `.md`, including ADRs** — `format:check` (part of `bun run validate` and the CI "Lint, Test & Check" job) runs `oxfmt --check .` over ALL files, not just `.ts`. It normalizes markdown (e.g., emphasis `*word*` → `_word_`). The `adr-author` skill writes ADR markdown but does NOT auto-format it, so ADR edits frequently fail CI `format:check` even when the TS changes are clean. Always run `bun run format` before committing ADR/markdown edits. Tripped CI on PR #372 (ARCH-005 edit).
 - **Make large production thresholds injectable rather than building giant test fixtures** — Tests that must cross a big numeric threshold in production code (e.g., the 1000-file `SCOPE_FILE_WARN_THRESHOLD` in `src/engine/git-files.ts`) should NOT create 1000+ real files + `git add .` to trip it. On Windows CI runners that filesystem work is slow enough to exceed the per-test timeout — `git add .` gets SIGTERM'd mid-run (`git add . failed (exit 143)`, 143 = 128+SIGTERM) and the whole suite fails flakily. Fix: add an optional override (e.g., `fileWarnThreshold?: number` defaulting to the module constant) to the function under test, and have the test inject a tiny value (5) so it only needs ~6 files. Caused the `git-files > resolveScopedFiles > warns when file scope exceeds threshold` flake on PR #366's Windows smoke test. Also captured in ARCH-005.
 - **Never set a per-test timeout override SHORTER than the global `--timeout`** — CI runs `bun test --timeout 60000` (60s). A per-test `}, 30_000` override makes that test MORE likely to time out than the default, the opposite of the usual intent. Only override the per-test timeout to raise it above the global, never to lower it. The 30s override on the git-files threshold test was the proximate cause of its Windows flake.
 - **YAML double-quoted strings require escaped backslashes for Windows paths in tests** — YAML interprets `\` as an escape character inside double-quoted strings. Writing `cwd: "E:\project"` silently corrupts the parsed value because `\p` is not a valid escape sequence. Fix: use `JSON.stringify(path)` to produce properly escaped YAML values (e.g., `cwd: ${JSON.stringify(cwd)}`). JSON and YAML double-quoted strings share the same escape syntax. Encountered in Copilot CLI session-context tests (`workspace.yaml` with Windows paths).