fix(persistence): close TOCTOU window in find* walkers (review followup)#1307
fix(persistence): close TOCTOU window in find* walkers (review followup)#1307
Conversation
Followup to swamp-club#240 / #1306. Review surfaced two related issues that landed without being addressed: 1. **TOCTOU race in find* walkers**: `Deno.stat` and `Deno.readTextFile` in `YamlWorkflowRunRepository.findRunsSinceByWorkflowId` and `YamlOutputRepository.findAllGlobalSince` were unguarded. A concurrent delete (GC, `deleteAllByWorkflowId`, output cleanup) between `readDir` and the per-file ops would throw `NotFound`, propagate to the outer catch, and silently discard runs already collected for the workflow (run repo) or abandon the rest of the model type's outputs (output repo). The pre-existing `findAllByWorkflowId` and `findAll(type)` shared the bug. Fix: wrap each per-file body in its own `try/catch` that `continue`s on `NotFound`, matching the pattern `unified_data_repository.ts:273` already uses. The outer catch keeps its role of handling "directory itself doesn't exist." All four walkers across the workflow-run and output repos now use the same shape. 2. **`YamlOutputRepository.findAllGlobalSince` and `FileSystemUnifiedDataRepository.findAllGlobalSince` lacked direct unit tests**. They were covered indirectly via service-layer mocks, so a divergence between the three implementations could ship undetected. Added tests mirroring the workflow-run repo's patterns: in-window-only filtering and the "file deleted mid-iteration" race that this PR's #1 fix prevents. Also includes two cosmetic polishes called out in review: - `--limit <count:number>` → `<n:number>` to match `data_search` and `workflow_run_search` flag placeholders in `--help` output. - Added `swamp summarise --limit 10` to the command's example block so the flag shows up alongside `--since` examples. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
None.
Suggestions
None.
Verdict
PASS — The only UX-facing change is in summarise.ts: the --limit placeholder was renamed from <count:number> to <n:number>, making it consistent with data_search, workflow_run_search, and data_query which all use <n:number>. A new example (swamp summarise --limit 10) was also added to improve discoverability. All remaining changes are internal persistence fixes and tests with no user-visible effect.
There was a problem hiding this comment.
Code Review
Clean, well-scoped followup. The TOCTOU fix is correct and consistently applied across all four walkers, matching the existing pattern in unified_data_repository.ts:273. The new tests directly exercise both the time-window filtering and the race-condition resilience for each repository, which pins the fix against future regressions.
Blocking Issues
None.
Suggestions
-
Pre-existing TOCTOU in
findById/deletewalkers —YamlOutputRepository.findById(line 65–93) andYamlWorkflowRunRepository.findById(line 71–103) have the same unguardedreadTextFileinside areadDirloop. A concurrent delete of a non-target file stops the scan early and returnsnull, potentially missing the actual target listed later. Same forYamlOutputRepository.delete. Not in scope of this PR (and the race window is smaller since these return on first match), but worth a follow-up ticket. -
Comment verbosity — The per-file
try/catchblocks carry a 4–5 line comment in each of the four walkers explaining the same TOCTOU rationale. The WHY is genuinely non-obvious and warrants a comment, but after the first instance a shorter one-liner (e.g.// Skip files deleted between readDir and stat/read (TOCTOU)) would suffice and reduce duplication. Minor style point — each method is independently readable as-is.
There was a problem hiding this comment.
Adversarial Review
I traced every code path in the four modified walkers (findAll, findAllGlobalSince in the output repo; findAllByWorkflowId, findRunsSinceByWorkflowId in the workflow-run repo), the three new test files, and the CLI polish change.
Critical / High
None.
Medium
None.
Low
-
Pre-existing TOCTOU at directory level in
findAll(yaml_output_repository.ts:150): If a method directory is deleted betweenreadDir(typeDir)andreadDir(methodDir), theNotFoundescapes to the outer catch which returns[], discarding all outputs already collected for that type. The per-file fix in this PR is the right granularity — directory deletion is far rarer than file deletion (it only happens via bulk ops likedeleteAllByWorkflowId) — but noting for completeness that the window still exists one level up. Not introduced by this PR. -
Pre-existing TOCTOU in
findById(both repos):findByIdstill hasreadTextFileafterreadDirwithout per-file try/catch. A concurrent deletion of a non-target file during the scan would cause the search to abort and returnnulleven if the target file exists. The failure mode is benign (false "not found") and the single-file lookup has a much narrower race window than the multi-file walkers this PR fixes. Not introduced by this PR.
Verdict
PASS. The TOCTOU fix is correct and consistent across all four walkers. Each inner continue targets the right loop level (verified for the triple-nested findAllGlobalSince in the output repo). The inner catch re-throws non-NotFound errors so genuine I/O failures still propagate. The outer catch retains its original role (directory-doesn't-exist). The findAllGlobalSince tests properly simulate the race by deleting files between save and query. The CLI polish (<count:number> → <n:number>, added example) is harmless. Clean PR.
There was a problem hiding this comment.
Adversarial Review
Critical / High
None.
Medium
None.
Low
-
Pre-existing TOCTOU at directory level in findAll (yaml_output_repository.ts:150): If a method directory is deleted between readDir(typeDir) and readDir(methodDir), the NotFound escapes to the outer catch which returns [], discarding all outputs already collected for that type. The per-file fix in this PR is the right granularity -- directory deletion is far rarer than file deletion (it only happens via bulk ops like deleteAllByWorkflowId) -- but noting for completeness that the window still exists one level up. Not introduced by this PR.
-
Pre-existing TOCTOU in findById (both repos): findById still has readTextFile after readDir without per-file try/catch. A concurrent deletion of a non-target file during the scan would cause the search to abort and return null even if the target file exists. The failure mode is benign (false "not found") and the single-file lookup has a much narrower race window than the multi-file walkers this PR fixes. Not introduced by this PR.
Verdict
PASS. The TOCTOU fix is correct and consistent across all four walkers. Each inner continue targets the right loop level (verified for the triple-nested findAllGlobalSince in the output repo). The inner catch re-throws non-NotFound errors so genuine I/O failures still propagate. The outer catch retains its original role (directory-doesn't-exist). The findAllGlobalSince tests properly simulate the race by deleting files between save and query. The CLI polish (count to n, added example) is harmless. Clean PR.
Followup to #1306 (swamp-club#240). Addresses review comments that surfaced after merge.
What
TOCTOU fix in find* walkers (the one that matters)
Deno.statandDeno.readTextFilein the workflow-run and output repos' `find*` methods were unguarded. A concurrent delete — GC, `deleteAllByWorkflowId`, output cleanup, another swamp invocation — between `readDir` and the per-file ops throws `NotFound`, propagates to the outer catch, and:The fix mirrors the pattern `unified_data_repository.ts:273` already uses: wrap each per-file body in its own `try/catch` that `continue`s on `NotFound`. The outer catch keeps its role of handling "directory itself doesn't exist." All four walkers across the workflow-run and output repos now use the same shape.
This isn't a regression introduced by #1306 — the pre-existing `findAllByWorkflowId` and `findAll(type)` had it too — but the new `findAllGlobalSince` methods are specifically designed for large-repo scenarios where the race window is wider (3 ops per file instead of 1), so it's worth fixing now while the design is fresh.
Direct unit tests for
findAllGlobalSinceThe original PR added dedicated tests for the workflow-run repo's two-stage filter but left the output and data repo implementations covered only indirectly via service-layer mocks. Future divergence between the three implementations could ship undetected. Added tests mirroring the workflow-run patterns:
Polish
Test plan
🤖 Generated with Claude Code