fix(persistence): close TOCTOU window in find* walkers (review followup) by stack72 · Pull Request #1307 · systeminit/swamp

stack72 · 2026-05-05T13:39:30Z

Followup to #1306 (swamp-club#240). Addresses review comments that surfaced after merge.

What

TOCTOU fix in find* walkers (the one that matters)

Deno.stat and Deno.readTextFile in the workflow-run and output repos' `find*` methods were unguarded. A concurrent delete — GC, `deleteAllByWorkflowId`, output cleanup, another swamp invocation — between `readDir` and the per-file ops throws `NotFound`, propagates to the outer catch, and:

Workflow run repo (`findRunsSinceByWorkflowId` / `findAllByWorkflowId`): returns `[]` for that workflow → silently discards every run already parsed and pushed to the array.
Output repo (`findAllGlobalSince` / `findAll(type)`): `continue`s past the entire current model type → silently drops the rest of that type's outputs.

The fix mirrors the pattern `unified_data_repository.ts:273` already uses: wrap each per-file body in its own `try/catch` that `continue`s on `NotFound`. The outer catch keeps its role of handling "directory itself doesn't exist." All four walkers across the workflow-run and output repos now use the same shape.

This isn't a regression introduced by #1306 — the pre-existing `findAllByWorkflowId` and `findAll(type)` had it too — but the new `findAllGlobalSince` methods are specifically designed for large-repo scenarios where the race window is wider (3 ops per file instead of 1), so it's worth fixing now while the design is fresh.

Direct unit tests for `findAllGlobalSince`

The original PR added dedicated tests for the workflow-run repo's two-stage filter but left the output and data repo implementations covered only indirectly via service-layer mocks. Future divergence between the three implementations could ship undetected. Added tests mirroring the workflow-run patterns:

In-window-only filtering on output and data repos.
The "file deleted mid-iteration" race that this PR's TOCTOU fix prevents (and pins the data repo's existing per-file try/catch so it can't regress).

Polish

`--limit count:number` → `<n:number>` to match `data_search` and `workflow_run_search` placeholders in `--help`.
Added `swamp summarise --limit 10` to the command's example block.

Test plan

`deno check` — clean
`deno lint` — clean
`deno fmt` — clean
`deno run test` — 5430 passing, 0 failing (+17 new tests vs. main: 4 TOCTOU + 4 findAllGlobalSince across the three repos, plus the polish stays in the existing CLI tests)
`deno run compile` — binary refreshed
All four walkers (`findAllByWorkflowId`, `findRunsSinceByWorkflowId`, `findAll(type)`, output `findAllGlobalSince`) have a "file deleted mid-iteration is skipped, not fatal" test that fails against the pre-fix code and passes against the new pattern

🤖 Generated with Claude Code

Followup to swamp-club#240 / #1306. Review surfaced two related issues that landed without being addressed: 1. **TOCTOU race in find* walkers**: `Deno.stat` and `Deno.readTextFile` in `YamlWorkflowRunRepository.findRunsSinceByWorkflowId` and `YamlOutputRepository.findAllGlobalSince` were unguarded. A concurrent delete (GC, `deleteAllByWorkflowId`, output cleanup) between `readDir` and the per-file ops would throw `NotFound`, propagate to the outer catch, and silently discard runs already collected for the workflow (run repo) or abandon the rest of the model type's outputs (output repo). The pre-existing `findAllByWorkflowId` and `findAll(type)` shared the bug. Fix: wrap each per-file body in its own `try/catch` that `continue`s on `NotFound`, matching the pattern `unified_data_repository.ts:273` already uses. The outer catch keeps its role of handling "directory itself doesn't exist." All four walkers across the workflow-run and output repos now use the same shape. 2. **`YamlOutputRepository.findAllGlobalSince` and `FileSystemUnifiedDataRepository.findAllGlobalSince` lacked direct unit tests**. They were covered indirectly via service-layer mocks, so a divergence between the three implementations could ship undetected. Added tests mirroring the workflow-run repo's patterns: in-window-only filtering and the "file deleted mid-iteration" race that this PR's #1 fix prevents. Also includes two cosmetic polishes called out in review: - `--limit <count:number>` → `<n:number>` to match `data_search` and `workflow_run_search` flag placeholders in `--help` output. - Added `swamp summarise --limit 10` to the command's example block so the flag shows up alongside `--since` examples. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

CLI UX Review

Blocking

None.

Suggestions

None.

Verdict

PASS — The only UX-facing change is in summarise.ts: the --limit placeholder was renamed from <count:number> to <n:number>, making it consistent with data_search, workflow_run_search, and data_query which all use <n:number>. A new example (swamp summarise --limit 10) was also added to improve discoverability. All remaining changes are internal persistence fixes and tests with no user-visible effect.

github-actions

Code Review

Clean, well-scoped followup. The TOCTOU fix is correct and consistently applied across all four walkers, matching the existing pattern in unified_data_repository.ts:273. The new tests directly exercise both the time-window filtering and the race-condition resilience for each repository, which pins the fix against future regressions.

Blocking Issues

None.

Suggestions

Pre-existing TOCTOU in findById / delete walkers — YamlOutputRepository.findById (line 65–93) and YamlWorkflowRunRepository.findById (line 71–103) have the same unguarded readTextFile inside a readDir loop. A concurrent delete of a non-target file stops the scan early and returns null, potentially missing the actual target listed later. Same for YamlOutputRepository.delete. Not in scope of this PR (and the race window is smaller since these return on first match), but worth a follow-up ticket.
Comment verbosity — The per-file try/catch blocks carry a 4–5 line comment in each of the four walkers explaining the same TOCTOU rationale. The WHY is genuinely non-obvious and warrants a comment, but after the first instance a shorter one-liner (e.g. // Skip files deleted between readDir and stat/read (TOCTOU)) would suffice and reduce duplication. Minor style point — each method is independently readable as-is.

github-actions

Adversarial Review

I traced every code path in the four modified walkers (findAll, findAllGlobalSince in the output repo; findAllByWorkflowId, findRunsSinceByWorkflowId in the workflow-run repo), the three new test files, and the CLI polish change.

Critical / High

None.

Medium

None.

Low

Pre-existing TOCTOU at directory level in findAll (yaml_output_repository.ts:150): If a method directory is deleted between readDir(typeDir) and readDir(methodDir), the NotFound escapes to the outer catch which returns [], discarding all outputs already collected for that type. The per-file fix in this PR is the right granularity — directory deletion is far rarer than file deletion (it only happens via bulk ops like deleteAllByWorkflowId) — but noting for completeness that the window still exists one level up. Not introduced by this PR.
Pre-existing TOCTOU in findById (both repos): findById still has readTextFile after readDir without per-file try/catch. A concurrent deletion of a non-target file during the scan would cause the search to abort and return null even if the target file exists. The failure mode is benign (false "not found") and the single-file lookup has a much narrower race window than the multi-file walkers this PR fixes. Not introduced by this PR.

Verdict

PASS. The TOCTOU fix is correct and consistent across all four walkers. Each inner continue targets the right loop level (verified for the triple-nested findAllGlobalSince in the output repo). The inner catch re-throws non-NotFound errors so genuine I/O failures still propagate. The outer catch retains its original role (directory-doesn't-exist). The findAllGlobalSince tests properly simulate the race by deleting files between save and query. The CLI polish (<count:number> → <n:number>, added example) is harmless. Clean PR.

github-actions

Adversarial Review

Critical / High

None.

Medium

None.

Low

Pre-existing TOCTOU at directory level in findAll (yaml_output_repository.ts:150): If a method directory is deleted between readDir(typeDir) and readDir(methodDir), the NotFound escapes to the outer catch which returns [], discarding all outputs already collected for that type. The per-file fix in this PR is the right granularity -- directory deletion is far rarer than file deletion (it only happens via bulk ops like deleteAllByWorkflowId) -- but noting for completeness that the window still exists one level up. Not introduced by this PR.
Pre-existing TOCTOU in findById (both repos): findById still has readTextFile after readDir without per-file try/catch. A concurrent deletion of a non-target file during the scan would cause the search to abort and return null even if the target file exists. The failure mode is benign (false "not found") and the single-file lookup has a much narrower race window than the multi-file walkers this PR fixes. Not introduced by this PR.

Verdict

PASS. The TOCTOU fix is correct and consistent across all four walkers. Each inner continue targets the right loop level (verified for the triple-nested findAllGlobalSince in the output repo). The inner catch re-throws non-NotFound errors so genuine I/O failures still propagate. The outer catch retains its original role (directory-doesn't-exist). The findAllGlobalSince tests properly simulate the race by deleting files between save and query. The CLI polish (count to n, added example) is harmless. Clean PR.

github-actions Bot approved these changes May 5, 2026

View reviewed changes

github-actions Bot reviewed May 5, 2026

View reviewed changes

stack72 merged commit f630b98 into main May 5, 2026
11 checks passed

stack72 deleted the fix/persistence-toctou-walkers branch May 5, 2026 13:51

stack72 mentioned this pull request May 5, 2026

fix(persistence): close TOCTOU windows in YAML output repo walkers (swamp-club#249) #1319

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(persistence): close TOCTOU window in find* walkers (review followup)#1307

fix(persistence): close TOCTOU window in find* walkers (review followup)#1307
stack72 merged 1 commit intomainfrom
fix/persistence-toctou-walkers

stack72 commented May 5, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stack72 commented May 5, 2026

What

TOCTOU fix in find* walkers (the one that matters)

Direct unit tests for findAllGlobalSince

Polish

Test plan

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

CLI UX Review

Blocking

Suggestions

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Suggestions

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Direct unit tests for `findAllGlobalSince`