Skip to content

fix(persistence): close TOCTOU windows in YAML output repo walkers (swamp-club#249)#1319

Merged
stack72 merged 1 commit intomainfrom
worktree-cryptic-hugging-zebra
May 5, 2026
Merged

fix(persistence): close TOCTOU windows in YAML output repo walkers (swamp-club#249)#1319
stack72 merged 1 commit intomainfrom
worktree-cryptic-hugging-zebra

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 5, 2026

Summary

Followup to #1307. Closes the two pre-existing TOCTOU windows in the YAML output repo walkers that were flagged as out-of-scope when #1307 merged.

What

Issue 1 — Directory-level TOCTOU in findAll

yaml_output_repository.ts:118 — pre-fix, a NotFound from Deno.readDir(methodDir) (concurrent bulk delete of a method directory) escaped the inner for-await to the outer catch and returned [], discarding outputs already collected from earlier method directories of the same type. The fix wraps the inner readDir(methodDir) body in its own try/catch that continues on NotFound. The outer catch keeps its role of "type directory itself doesn't exist."

The same shape exists in findAllGlobalSince (line 200) — same race, smaller blast radius (current model type only). Fixed for consistency.

Issue 2 — Per-file TOCTOU in findById and delete

yaml_output_repository.ts:65 (findById) and :195 (delete) — pre-fix, a NotFound from Deno.readTextFile on a non-target file (concurrent delete during iteration) propagated to the outer catch and returned null / aborted the search even when the target existed. The fix wraps the readTextFile + parseYaml + match block in a per-file try/catch that continues on NotFound.

For delete, the loop is restructured as find-then-act: per-file try/catch wraps only the search; notifyDirty + Deno.remove + cleanupEmptyParentDirs run after the loop in the existing outer try/catch, preserving the existing semantic that a NotFound from Deno.remove (target deleted concurrently) is absorbed silently (idempotent delete).

Why no fix in YamlWorkflowRunRepository.findById

The issue body lists this method too, but the existing entry.name.includes(runId) filter at line 78 restricts readTextFile to the target file only — runIds are 36-char UUIDs that never substring-match each other, and atomicWriteTextFile's .{freshUuid}.tmp intermediate files don't contain the runId. The only NotFound readTextFile can raise is for the target's own concurrent deletion, for which returning null is correct. Added an in-place comment documenting this so a future maintainer who removes that filter doesn't silently re-introduce the bug.

Tests

Four new strict regression tests in yaml_output_repository_test.ts, each fails against pre-fix code and passes against the fix:

  • findAll: method directory deleted mid-iteration is skipped, not fatal
  • findAllGlobalSince: method directory deleted mid-iteration is skipped, not fatal
  • findById: non-target file deleted mid-iteration still returns target
  • delete: non-target file deleted mid-iteration still deletes target

The "remove file/dir before call" pattern from #1307's tests doesn't strictly trigger the race on most filesystems (readdir returns only currently-existing entries). The new tests use a withPhantomReadDirEntry helper that monkey-patches Deno.readDir for the call's duration to yield a phantom entry pointing at nothing on disk; production code follows up with readTextFile/readDir on the phantom path, getting a real NotFound, deterministically exercising the TOCTOU window. The helper follows the same shape as the codebase's established Deno.env save/restore pattern.

Test plan

  • `deno fmt` — clean
  • `deno lint` — clean
  • `deno check` — clean
  • `deno run test` — full suite passes (one unrelated flaky parallel-test in `shutdown_handlers_test.ts` SIGINT-on-POSIX, passes in isolation both with and without these changes)
  • All four new tests verified failing pre-fix and passing post-fix via stash/unstash cycle
  • `deno run compile` — binary refreshed

🤖 Generated with Claude Code

…wamp-club#249)

Followup to #1307. Closes the two pre-existing TOCTOU windows that
were flagged as out-of-scope when #1307 merged.

## What

### Issue 1 — Directory-level TOCTOU in `findAll`

`yaml_output_repository.ts:118` — pre-fix, a `NotFound` from
`Deno.readDir(methodDir)` (concurrent bulk delete of a method
directory) escaped the inner `for-await` to the outer catch and
returned `[]`, discarding outputs already collected from earlier
method directories of the same type. The fix wraps the inner
`readDir(methodDir)` body in its own `try/catch` that `continue`s on
`NotFound`. The outer catch keeps its role of "type directory itself
doesn't exist."

The same shape exists in `findAllGlobalSince` (line 200) — same race,
smaller blast radius (current model type only). Fixed for consistency.

### Issue 2 — Per-file TOCTOU in `findById` and `delete`

`yaml_output_repository.ts:65` (`findById`) and `:195` (`delete`) —
pre-fix, a `NotFound` from `Deno.readTextFile` on a non-target file
(concurrent delete during iteration) propagated to the outer catch
and returned `null` / aborted the search even when the target existed.
The fix wraps the `readTextFile + parseYaml + match` block in a
per-file `try/catch` that `continue`s on `NotFound`.

For `delete`, the loop is restructured as find-then-act: per-file
`try/catch` wraps only the search; `notifyDirty + Deno.remove +
cleanupEmptyParentDirs` run after the loop in the existing outer
`try/catch`, preserving the existing semantic that a `NotFound` from
`Deno.remove` (target deleted concurrently) is absorbed silently
(idempotent delete).

### Why no fix in `YamlWorkflowRunRepository.findById`

The issue body lists this method too, but the existing
`entry.name.includes(runId)` filter at line 78 restricts `readTextFile`
to the target file only — runIds are 36-char UUIDs that never
substring-match each other, and `atomicWriteTextFile`'s
`.{freshUuid}.tmp` intermediate files don't contain the runId. The
only `NotFound` `readTextFile` can raise is for the target's own
concurrent deletion, for which returning `null` is correct. Added an
in-place comment documenting this so a future maintainer who removes
that filter doesn't silently re-introduce the bug.

## Tests

Four new strict regression tests in `yaml_output_repository_test.ts`,
each fails against pre-fix code and passes against the fix:

- `findAll: method directory deleted mid-iteration is skipped, not fatal`
- `findAllGlobalSince: method directory deleted mid-iteration is skipped, not fatal`
- `findById: non-target file deleted mid-iteration still returns target`
- `delete: non-target file deleted mid-iteration still deletes target`

The "remove file/dir before call" pattern from #1307's tests doesn't
strictly trigger the race on most filesystems (`readdir` returns only
currently-existing entries). The new tests use a
`withPhantomReadDirEntry` helper that monkey-patches `Deno.readDir`
for the call's duration to yield a phantom entry pointing at nothing
on disk; production code follows up with `readTextFile`/`readDir` on
the phantom path, getting a real `NotFound`, deterministically
exercising the TOCTOU window. The helper follows the same shape as
the codebase's established `Deno.env` save/restore pattern.

## Test plan

- [x] \`deno fmt\` — clean
- [x] \`deno lint\` — clean
- [x] \`deno check\` — clean
- [x] \`deno run test\` — full suite passes (one unrelated flaky
  parallel-test in \`shutdown_handlers_test.ts\` SIGINT-on-POSIX,
  passes in isolation both with and without these changes)
- [x] All four new tests verified failing pre-fix and passing
  post-fix via stash/unstash cycle
- [x] \`deno run compile\` — binary refreshed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None.

Medium

None.

Low

  1. yaml_output_repository_test.ts:514withPhantomReadDirEntry URL.pathname fallback: The typeof path === "string" ? path : path.pathname branch for URL objects would produce a /-prefixed, percent-encoded pathname on Windows that wouldn't match targetDir (a native path from join()). In practice this never fires because all production Deno.readDir calls in this repo pass strings, and this is test-only code. No action needed.

Verdict

PASS — The TOCTOU fixes are well-scoped and structurally consistent. Each new try/catch correctly narrows NotFound handling to the specific TOCTOU gap (per-file or per-method-dir) while re-throwing all other errors. The delete find-then-act restructuring preserves idempotent-delete semantics via the existing outer catch. The for await iterator is properly closed on break. The withPhantomReadDirEntry test helper deterministically exercises the race windows and restores global state in finally. The defensive comment in yaml_workflow_run_repository.ts documenting why the fix is unnecessary there is a good guard against future regressions.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. Test concurrency safety (yaml_output_repository_test.ts:507-528): The withPhantomReadDirEntry helper monkey-patches the global Deno.readDir. Deno runs tests concurrently by default within a file. If two phantom-patching tests overlap, the save/restore of the original can race — test B saves test A's wrapper as "original" and restores it in finally, leaving a stale closure on Deno.readDir. In practice this is unlikely to cause failures (each phantom targets a unique temp dir path, so the stale wrapper is a no-op passthrough), but worth noting. If flakes ever appear, serializing these four tests (e.g., wrapping in a Deno.test group with sanitizeOps: false) would eliminate the window.

  2. Comment repetition: The per-file try/catch comment block (5 lines explaining the TOCTOU window) appears three times in yaml_output_repository.ts — in findById, findAll, and findAllGlobalSince. The per-method-directory comment appears twice. The WHY is genuinely non-obvious and warrants documentation, but you could consider a single doc-comment on a private helper or a file-level note, then short one-liners at each site referencing it. Not blocking — the current approach has the advantage of being self-contained at each call site.

Overall this is a well-scoped, well-tested fix. The find-then-act restructure of delete is a nice improvement, and the yaml_workflow_run_repository.ts comment documenting why the fix is unnecessary there is a good defensive practice.

@stack72 stack72 merged commit d67b0d3 into main May 5, 2026
11 checks passed
@stack72 stack72 deleted the worktree-cryptic-hugging-zebra branch May 5, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant