fix(persistence): close TOCTOU windows in YAML output repo walkers (swamp-club#249)#1319
fix(persistence): close TOCTOU windows in YAML output repo walkers (swamp-club#249)#1319
Conversation
…wamp-club#249) Followup to #1307. Closes the two pre-existing TOCTOU windows that were flagged as out-of-scope when #1307 merged. ## What ### Issue 1 — Directory-level TOCTOU in `findAll` `yaml_output_repository.ts:118` — pre-fix, a `NotFound` from `Deno.readDir(methodDir)` (concurrent bulk delete of a method directory) escaped the inner `for-await` to the outer catch and returned `[]`, discarding outputs already collected from earlier method directories of the same type. The fix wraps the inner `readDir(methodDir)` body in its own `try/catch` that `continue`s on `NotFound`. The outer catch keeps its role of "type directory itself doesn't exist." The same shape exists in `findAllGlobalSince` (line 200) — same race, smaller blast radius (current model type only). Fixed for consistency. ### Issue 2 — Per-file TOCTOU in `findById` and `delete` `yaml_output_repository.ts:65` (`findById`) and `:195` (`delete`) — pre-fix, a `NotFound` from `Deno.readTextFile` on a non-target file (concurrent delete during iteration) propagated to the outer catch and returned `null` / aborted the search even when the target existed. The fix wraps the `readTextFile + parseYaml + match` block in a per-file `try/catch` that `continue`s on `NotFound`. For `delete`, the loop is restructured as find-then-act: per-file `try/catch` wraps only the search; `notifyDirty + Deno.remove + cleanupEmptyParentDirs` run after the loop in the existing outer `try/catch`, preserving the existing semantic that a `NotFound` from `Deno.remove` (target deleted concurrently) is absorbed silently (idempotent delete). ### Why no fix in `YamlWorkflowRunRepository.findById` The issue body lists this method too, but the existing `entry.name.includes(runId)` filter at line 78 restricts `readTextFile` to the target file only — runIds are 36-char UUIDs that never substring-match each other, and `atomicWriteTextFile`'s `.{freshUuid}.tmp` intermediate files don't contain the runId. The only `NotFound` `readTextFile` can raise is for the target's own concurrent deletion, for which returning `null` is correct. Added an in-place comment documenting this so a future maintainer who removes that filter doesn't silently re-introduce the bug. ## Tests Four new strict regression tests in `yaml_output_repository_test.ts`, each fails against pre-fix code and passes against the fix: - `findAll: method directory deleted mid-iteration is skipped, not fatal` - `findAllGlobalSince: method directory deleted mid-iteration is skipped, not fatal` - `findById: non-target file deleted mid-iteration still returns target` - `delete: non-target file deleted mid-iteration still deletes target` The "remove file/dir before call" pattern from #1307's tests doesn't strictly trigger the race on most filesystems (`readdir` returns only currently-existing entries). The new tests use a `withPhantomReadDirEntry` helper that monkey-patches `Deno.readDir` for the call's duration to yield a phantom entry pointing at nothing on disk; production code follows up with `readTextFile`/`readDir` on the phantom path, getting a real `NotFound`, deterministically exercising the TOCTOU window. The helper follows the same shape as the codebase's established `Deno.env` save/restore pattern. ## Test plan - [x] \`deno fmt\` — clean - [x] \`deno lint\` — clean - [x] \`deno check\` — clean - [x] \`deno run test\` — full suite passes (one unrelated flaky parallel-test in \`shutdown_handlers_test.ts\` SIGINT-on-POSIX, passes in isolation both with and without these changes) - [x] All four new tests verified failing pre-fix and passing post-fix via stash/unstash cycle - [x] \`deno run compile\` — binary refreshed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Adversarial Review
Critical / High
None.
Medium
None.
Low
yaml_output_repository_test.ts:514—withPhantomReadDirEntryURL.pathname fallback: Thetypeof path === "string" ? path : path.pathnamebranch for URL objects would produce a/-prefixed, percent-encoded pathname on Windows that wouldn't matchtargetDir(a native path fromjoin()). In practice this never fires because all productionDeno.readDircalls in this repo pass strings, and this is test-only code. No action needed.
Verdict
PASS — The TOCTOU fixes are well-scoped and structurally consistent. Each new try/catch correctly narrows NotFound handling to the specific TOCTOU gap (per-file or per-method-dir) while re-throwing all other errors. The delete find-then-act restructuring preserves idempotent-delete semantics via the existing outer catch. The for await iterator is properly closed on break. The withPhantomReadDirEntry test helper deterministically exercises the race windows and restores global state in finally. The defensive comment in yaml_workflow_run_repository.ts documenting why the fix is unnecessary there is a good guard against future regressions.
There was a problem hiding this comment.
Code Review
Blocking Issues
None.
Suggestions
-
Test concurrency safety (
yaml_output_repository_test.ts:507-528): ThewithPhantomReadDirEntryhelper monkey-patches the globalDeno.readDir. Deno runs tests concurrently by default within a file. If two phantom-patching tests overlap, the save/restore of the original can race — test B saves test A's wrapper as "original" and restores it infinally, leaving a stale closure onDeno.readDir. In practice this is unlikely to cause failures (each phantom targets a unique temp dir path, so the stale wrapper is a no-op passthrough), but worth noting. If flakes ever appear, serializing these four tests (e.g., wrapping in aDeno.testgroup withsanitizeOps: false) would eliminate the window. -
Comment repetition: The per-file try/catch comment block (5 lines explaining the TOCTOU window) appears three times in
yaml_output_repository.ts— infindById,findAll, andfindAllGlobalSince. The per-method-directory comment appears twice. The WHY is genuinely non-obvious and warrants documentation, but you could consider a single doc-comment on a private helper or a file-level note, then short one-liners at each site referencing it. Not blocking — the current approach has the advantage of being self-contained at each call site.
Overall this is a well-scoped, well-tested fix. The find-then-act restructure of delete is a nice improvement, and the yaml_workflow_run_repository.ts comment documenting why the fix is unnecessary there is a good defensive practice.
Summary
Followup to #1307. Closes the two pre-existing TOCTOU windows in the YAML output repo walkers that were flagged as out-of-scope when #1307 merged.
What
Issue 1 — Directory-level TOCTOU in
findAllyaml_output_repository.ts:118— pre-fix, aNotFoundfromDeno.readDir(methodDir)(concurrent bulk delete of a method directory) escaped the innerfor-awaitto the outer catch and returned[], discarding outputs already collected from earlier method directories of the same type. The fix wraps the innerreadDir(methodDir)body in its owntry/catchthatcontinues onNotFound. The outer catch keeps its role of "type directory itself doesn't exist."The same shape exists in
findAllGlobalSince(line 200) — same race, smaller blast radius (current model type only). Fixed for consistency.Issue 2 — Per-file TOCTOU in
findByIdanddeleteyaml_output_repository.ts:65(findById) and:195(delete) — pre-fix, aNotFoundfromDeno.readTextFileon a non-target file (concurrent delete during iteration) propagated to the outer catch and returnednull/ aborted the search even when the target existed. The fix wraps thereadTextFile + parseYaml + matchblock in a per-filetry/catchthatcontinues onNotFound.For
delete, the loop is restructured as find-then-act: per-filetry/catchwraps only the search;notifyDirty + Deno.remove + cleanupEmptyParentDirsrun after the loop in the existing outertry/catch, preserving the existing semantic that aNotFoundfromDeno.remove(target deleted concurrently) is absorbed silently (idempotent delete).Why no fix in
YamlWorkflowRunRepository.findByIdThe issue body lists this method too, but the existing
entry.name.includes(runId)filter at line 78 restrictsreadTextFileto the target file only — runIds are 36-char UUIDs that never substring-match each other, andatomicWriteTextFile's.{freshUuid}.tmpintermediate files don't contain the runId. The onlyNotFoundreadTextFilecan raise is for the target's own concurrent deletion, for which returningnullis correct. Added an in-place comment documenting this so a future maintainer who removes that filter doesn't silently re-introduce the bug.Tests
Four new strict regression tests in
yaml_output_repository_test.ts, each fails against pre-fix code and passes against the fix:findAll: method directory deleted mid-iteration is skipped, not fatalfindAllGlobalSince: method directory deleted mid-iteration is skipped, not fatalfindById: non-target file deleted mid-iteration still returns targetdelete: non-target file deleted mid-iteration still deletes targetThe "remove file/dir before call" pattern from #1307's tests doesn't strictly trigger the race on most filesystems (
readdirreturns only currently-existing entries). The new tests use awithPhantomReadDirEntryhelper that monkey-patchesDeno.readDirfor the call's duration to yield a phantom entry pointing at nothing on disk; production code follows up withreadTextFile/readDiron the phantom path, getting a realNotFound, deterministically exercising the TOCTOU window. The helper follows the same shape as the codebase's establishedDeno.envsave/restore pattern.Test plan
🤖 Generated with Claude Code