perf(summarise): skip pre-cutoff scans and cap output on large repos#1306
perf(summarise): skip pre-cutoff scans and cap output on large repos#1306
Conversation
`swamp summarise --since 1d` hung on repos with thousands of workflow runs because SummaryService called `findAllGlobal()` on three repos to load every output, run, and data record into memory before filtering by the cutoff. Pushes the time bound into the repository contract via a new `findAllGlobalSince(cutoff)` method on `WorkflowRunRepository`, `OutputRepository`, and `DataRepositoryReader`. The YAML implementations use a two-stage filter: an mtime pre-filter (`Deno.stat`, skip parse if mtime < cutoff — runs are rewritten on every status transition, so mtime < cutoff implies the run completed before cutoff) and a parse-verify fallback that re-checks `startedAt`/`createdAt` after parse to handle long-running workflows and backup-restore scenarios where mtime is unreliable. Adds an opt-in `--limit N` flag that caps each per-group `runs[]` array; counts (`succeeded`, `failed`, `total`) always reflect every matching run in the window. When truncation occurs, the group sets an optional `truncated: true` field — omitted otherwise so existing JSON consumers see no shape change. Default is unlimited, preserving prior behavior. Hand-measured against synthetic 3000- and 10000-run fixtures (warm cache, macOS APFS): N=3000, in-window=50: findAllGlobal 343ms → findAllGlobalSince 88ms (3.9x) N=3000, in-window=500: findAllGlobal 345ms → findAllGlobalSince 144ms (2.4x) N=10000, in-window=100: findAllGlobal 1141ms → findAllGlobalSince 281ms (4.1x) Cold-cache wins are larger because `Deno.stat` is much cheaper than `readFile + parseYaml`. Speedup grows with the ratio of out-of-window to in-window runs — exactly the case the issue describes (`--since 1d` over months of accumulated runs). `integration/summarise_perf_test.ts` pins the optimization deterministically by counting parses against a 500-run fixture; if the mtime pre-filter ever regresses, parse count jumps from ~10 back to 500 and the test fails with a concrete error. Wall-clock is intentionally not asserted (flaky on shared CI). New CLI test in `src/cli/commands/summarise_test.ts` covers the flag surface. Service tests cover limit truncation, the optional `truncated` field, and that counts remain authoritative when `runs[]` is capped. Streaming JSON output, an interactive progress indicator, and a `--sample` mode (also proposed in the issue) are deferred to follow-ups; they require invasive renderer reshaping that is out of scope here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
-
src/cli/commands/summarise.ts:66— validation error usesnew Errorinstead ofUserErrorthrow new Error("--limit must be a positive integer");
Every other command-level validation in the codebase throws
UserError(seeissue_ripple.ts:78,datastore_sync.ts:67-77,vault_put.ts:191,vault_create.ts:51).UserErroris the established pattern for user-facing validation messages — the CLI's error handler suppresses the stack trace forUserErrorand prints only the message. Throwing a plainErrorhere will print a stack trace to the user for what is a simple input error.Fix: Import
UserErrorfrom../../domain/errors.tsand change the throw to:throw new UserError("--limit must be a positive integer");
Suggestions
-
--limithelp text is terse —"Cap per-group run details (counts still reflect all matching runs)"is accurate but dense. A hint that this takes a positive integer (e.g. 10) would help discoverability, matching the style of--sincewhich has(e.g. 1h, 1d, 7d, 1w). Not blocking. -
Truncation indicator in log mode is placed after failure counts —
(showing 3 of 20)appears as the last segment after✓ 15 ✗ 5. A user skimming quickly might not see it. Consider making it the first element or adding it on its own indented line. Not blocking — the information is present and accurate.
Verdict
NEEDS CHANGES — one blocking issue: validation error must use UserError to avoid exposing a stack trace to users on bad --limit input. The rest of the UX (help text, log-mode formatting, JSON shape with truncated flag, consistent counts-vs-details split) is clean.
There was a problem hiding this comment.
Code Review
Well-structured PR. The core optimization — pushing the time bound into the repository contract via findAllGlobalSince(cutoff) — is the right DDD call: services shouldn't filter what the repository should never have returned. The two-stage mtime pre-filter / parse-verify approach is sound, and the Stage B fallback correctly handles the backup-restore and long-running-workflow edge cases. The --limit + truncated flag design keeps counts authoritative while bounding detail output — clean JSON contract.
DDD
findAllGlobalSince(cutoff)aligns with the Specification pattern without introducing a separate type — appropriate scope.SummaryServiceremains a proper domain service orchestrating across aggregates.- The
DataRepositoryReaderread-only projection insummary_types.tsis appropriate for query-side concerns. - Repository interface contracts are well-documented with JSDoc explaining when to prefer
findAllGlobalSinceoverfindAllGlobal.
CLAUDE.md Compliance
- ✅ License headers on all new files
- ✅
@std/path(join) used for all path construction - ✅ Named exports throughout
- ✅ Test naming follows
"functionName: describes behavior"convention - ✅ CLI test has
initializeLogging+ models barrel import - ✅ Windows
.catch(() => {})in integration test'swithTempDir - ✅ Renderer handles both log and json modes (truncation annotation in log output)
- ✅ CLI and renderer imports from
src/libswamp/mod.ts(no internal path imports introduced)
Test Coverage
Good coverage across layers: 3 unit tests for YamlWorkflowRunRepository.findAllGlobalSince (in-window, long-running rejection, backup-restore), 3 service-level tests (limit truncation, no-truncation omission, default unlimited), 4 CLI flag-surface tests, and 1 integration test pinning parse-count regression. Mock repos in summary_service_test.ts and execution_service_test.ts correctly implement the new interface method.
Security
No concerns. Filesystem walks stay within repo boundaries (this.baseDir), error handling properly catches NotFound without leaking paths, and the --limit validation rejects non-positive values at the CLI boundary.
Blocking Issues
None.
Suggestions
-
Unit tests for output/data repo implementations:
YamlOutputRepository.findAllGlobalSinceandFileSystemUnifiedDataRepository.findAllGlobalSincedon't have their own unit tests. The code is structurally identical to the well-tested workflow run repo, so the risk is low — but unit tests next to each source file would match the project convention. Consider adding them in a follow-up. -
--limitinteger validation: The error message says "positive integer" but Cliffy's:numbertype accepts floats like1.5.Array.slicetruncates to integer so behavior is correct, but consider adding aNumber.isInteger()check to match the stated contract, or using Cliffy's:integertype if available. -
Pre-existing (not from this PR):
src/presentation/renderers/summarise.tsimportsActivitySummarydirectly from../../domain/summary/summary_types.tsinstead of throughsrc/libswamp/mod.ts. Not introduced by this PR, but worth a note for future cleanup.
There was a problem hiding this comment.
Adversarial Review
Critical / High
None found.
Medium
-
src/infrastructure/persistence/yaml_output_repository.ts:201andyaml_workflow_run_repository.ts:251— Per-fileDeno.stat/readTextFilenot wrapped in individual try/catch for TOCTOU races.If a YAML file is deleted between the
readDiriteration and theDeno.stat()call (Stage A), theNotFounderror propagates to the outer catch block:- In
findAllGlobalSince(output repo, line 221-222):continueskips to the next model type, silently dropping any remaining outputs for the current type that hadn't been iterated yet. - In
findRunsSinceByWorkflowId(workflow run repo, line 268-269):return []discards all runs already collected in therunsarray for that workflow — a worse outcome since it loses already-gathered results.
Breaking example: Another process (or a concurrent
swamp gc) deletes a single output file mid-scan. The output repo silently skips all remaining files for that model type; the workflow run repo returns zero runs for that workflow even if dozens matched.However: The existing
findAllByWorkflowId(line 109-130) andfindAll(output repo, line 122-147) have the identical pattern — outer catch only, no per-file guard. This is a pre-existing design choice in the codebase, not a regression introduced by this PR. Theunified_data_repository.ts:findAllForModelSince(line 273-280) is the odd one out that does wrapstatindividually. For consistency, wrapping the per-file stat in its own try/catch (like the data repo does) would be a nice follow-up, but I'm not flagging this as blocking since it's consistent with the existing code. - In
Low
-
src/cli/commands/summarise.ts:58—--limitaccepts non-integer floats.Cliffy's
:numbertype parses--limit 2.5as2.5. The validation at line 65 (options.limit <= 0) passes.Array.prototype.slice(0, 2.5)silently truncates toslice(0, 2). The error message says "positive integer" but the input isn't validated as such. In practice, nobody passes--limit 2.5, and the behavior is merely imprecise rather than wrong. Could add!Number.isInteger(options.limit)to the guard if desired. -
integration/summarise_perf_test.ts:113— Monkey-patchingDeno.readTextFileis fragile under parallel test execution.The
instrumentReadCounterreplaces the globalDeno.readTextFile. If Deno's test runner parallelizes test files (which it does by default), another test file reading YAML workflow-run files concurrently could increment the counter, making thebudgetassertion flaky. Thetry/finallyrestore is correct, but the window between patch and restore is the entire test body. In practice this is unlikely to cause issues since other test files probably don't hit workflow-run YAML files, and the integration test is likely run in its own file. But it's worth noting.
Verdict
PASS. The PR is well-designed. The two-stage mtime pre-filter is a sound optimization with correct fallback semantics (Stage B is always the source of truth). The --limit truncation correctly preserves aggregate counts while capping detail arrays. Interface changes are properly propagated to all implementors and test mocks. The TOCTOU error-handling gap is pre-existing and consistent with the codebase's current patterns — not a regression.
Review feedback on #1306: 1. The --limit validation threw `new Error(...)`, which prints a stack trace to the user. Every other CLI-level validation in the codebase throws `UserError` (issue_ripple.ts, datastore_sync.ts, vault_put.ts, vault_create.ts); the CLI's error handler suppresses the stack and prints only the message for UserError. Switch to UserError for consistency. 2. Cliffy's `:number` type accepts floats, so `--limit 2.5` slipped past the `<= 0` guard and silently truncated via `Array.prototype.slice(0, 2.5) → slice(0, 2)`. The error message says "positive integer" — make the validation match by adding `!Number.isInteger(options.limit)` to the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
None.
Suggestions
-
Flag placeholder inconsistency (
src/cli/commands/summarise.ts:59): The new flag is declared as--limit <count:number>, while the existing--limitflags indata_search.tsandworkflow_run_search.tsuse--limit <n:number>. Cliffy renders the placeholder in help text, so this creates a minor cosmetic inconsistency. Consider<n:number>for uniformity. -
Flag description wording (
src/cli/commands/summarise.ts:60):"Cap per-group run details (counts still reflect all matching runs)"is longer than comparable descriptions elsewhere ("Max results","Maximum results (unlimited when omitted)"). The parenthetical is valuable because the semantic is non-obvious — counts staying authoritative under truncation is worth calling out. A slight tightening like"Max run details per group (counts always reflect all runs)"would be a bit more readable, but the current version is acceptable. -
Missing usage example: The command has two
--sinceexamples but no example for--limit. Adding something like.example("Cap detail output on a large repo", "swamp summarise --limit 10")would help users discover the flag. Low priority since the help text documents it, but the pattern of leading with examples is already established on this command.
Verdict
PASS — --limit is correctly wired through to JSON output, the truncated field is omitted when not needed (no silent shape change), error validation is clear and uses UserError, and the (showing N of M) hint in log mode is readable and consistently applied to both method and workflow sections.
There was a problem hiding this comment.
Code Review
Well-structured performance improvement. The two-stage mtime pre-filter is a clean optimization — correctness is guaranteed by Stage B, and the invariant (mtime < cutoff ⇒ startedAt < cutoff for completed runs) is well-documented. The --limit flag design is thoughtful: counts stay authoritative, truncated is omitted when unused (no silent JSON shape change), and the second commit correctly tightens validation (UserError + integer check).
DDD alignment is good: pushing the time bound into the repository contract via findAllGlobalSince follows the Specification pattern without introducing a new type. The domain types (MethodGroup.truncated, WorkflowRunGroup.truncated) cleanly model the capping semantics. SummaryService remains a proper domain service coordinating multiple repositories.
Blocking Issues
None.
Suggestions
-
Missing unit tests for
YamlOutputRepository.findAllGlobalSinceandFileSystemUnifiedDataRepository.findAllGlobalSince— The workflow run repo gets dedicated unit tests for the two-stage filter (in-window only, long-running rejection, backup-restore), but the output and data repo implementations of the same pattern have no direct unit tests. They're structurally identical and covered indirectly via service-layer mocks, so this isn't blocking, but adding parallel tests would guard against future divergence between the three implementations. -
TOCTOU window on
Deno.statin output and workflow-run repos — InfindAllGlobalSincefor bothYamlOutputRepositoryandYamlWorkflowRunRepository,Deno.stat()is called without a per-file NotFound guard. If a file is deleted betweenreadDirandstat, the outertry/catchswallows the NotFound but also abandons remaining files in that directory (output repo) or workflow (run repo). The data repo wrapsstatin its owntry/catchwithcontinueon NotFound, which is more resilient. Consider aligning the pattern. Low risk in practice since file deletion during summarise is unlikely. -
Integration test monkey-patches
Deno.readTextFile— TheinstrumentReadCounterapproach is pragmatic for this regression test, but worth noting it makes the test sensitive to runtime internals. If Deno ever changesreadTextFileto be non-writable on the namespace object, the test would fail for an unrelated reason. Not blocking — it's the simplest way to count parses without plumbing instrumentation through the repo class.
There was a problem hiding this comment.
Adversarial Review
Critical / High
None found.
Medium
-
src/infrastructure/persistence/yaml_workflow_run_repository.ts:251— TOCTOU race infindRunsSinceByWorkflowIddiscards partially-collected runs.Deno.stat(path)at line 251 is unguarded. If a file is deleted betweenDeno.readDirandDeno.stat(e.g., a concurrent GC or deletion), theDeno.errors.NotFoundpropagates to the outer catch at line 267, which returns[]— discarding all runs already collected in therunsarray for that workflow ID.Breaking example: Concurrent
deleteAllByWorkflowIdremoves a file whilefindAllGlobalSinceis mid-iteration over that workflow's directory. The stat throws NotFound, the catch returns[], and the caller loses legitimate in-window runs that were already parsed and pushed toruns.Suggested fix: Wrap the stat in its own try/catch, matching the pattern used in
unified_data_repository.ts:273-278:try { const stat = await Deno.stat(path); const mtimeMs = stat.mtime?.getTime(); if (mtimeMs !== undefined && mtimeMs < cutoffMs) continue; } catch (error) { if (error instanceof Deno.errors.NotFound) continue; throw error; }
Mitigating factor: The pre-existing
findAllByWorkflowId(used byfindAllGlobal) has the same unguardedreadTextFilepattern, so this isn't a regression — but it's worth fixing since the new method is specifically targeting large repos where the race window is wider. -
src/infrastructure/persistence/yaml_output_repository.ts:201— Same TOCTOU pattern infindAllGlobalSince.Deno.stat(path)at line 201 is unguarded. A deleted file causes the exception to propagate to the outer catch at line 222, whichcontinues past the entire current model type. Results from other model types are preserved (they're already inresults), but any results already parsed for the current model type's inner method loop are lost.Suggested fix: Same as above — wrap
Deno.statin a targeted try/catch that continues on NotFound.
Low
integration/summarise_perf_test.ts:113—instrumentReadCountermonkey-patches a global.Deno.readTextFileis replaced at the process level. If Deno's test runner executes multiple test files in the same isolate concurrently, any parallel test that readsworkflow-run-*.yamlfiles would inflate the counter. The budget ofN_IN_WINDOW + 5(15) provides slack, but the test is inherently fragile under concurrent execution. Thefinallyrestore prevents leaking the patch, which is good.
Verdict
PASS. The two-stage mtime pre-filter is sound — Stage A can only produce false positives (not false negatives), and Stage B is always the source of truth for inclusion. The --limit validation correctly rejects zero, negative, float, and NaN inputs using UserError. The truncation logic correctly preserves aggregate counts while capping detail arrays, and the truncated flag is only set when actual truncation occurs. All repository interfaces, concrete implementations, and test mocks are consistently updated. The medium-severity TOCTOU findings follow pre-existing patterns and only matter under concurrent mutation — they're worth addressing but don't block this PR.
Summary
Resolves swamp-club#240.
swamp summarise --since 1dhung on repos with thousands of workflow runs becauseSummaryServicecalledfindAllGlobal()on three repos to load every output, run, and data record into memory before filtering by the cutoff.findAllGlobalSince(cutoff)method onWorkflowRunRepository,OutputRepository, andDataRepositoryReader.Deno.stat, skip parse if mtime < cutoff) plus a parse-verify fallback that re-checksstartedAt/createdAtto handle long-running workflows and backup-restore scenarios.--limit Nflag that caps each per-groupruns[]array. Counts (succeeded,failed,total) always reflect every matching run; an optionaltruncated: truefield is set only when truncation occurred (omitted otherwise — no silent JSON shape change).Measured perf delta
Synthetic fixtures, warm cache, macOS APFS:
findAllGlobalfindAllGlobalSinceCold-cache wins are larger because
Deno.statis much cheaper thanreadFile + parseYaml. Speedup grows with the ratio of out-of-window to in-window runs — exactly the case the issue describes (--since 1dover months of accumulated runs).DDD note
findAllGlobal()is a domain-layer leak — services shouldn't filter what the repository should never have returned. The newfindAllGlobalSince(cutoff)aligns with the Specification pattern (cutoff is the spec) without introducing a separate type. Both methods coexist; new callers with a time bound should preferfindAllGlobalSince. JSDoc on each new method explains the preference. A future cleanup could parameterize via anActivityFiltervalue object — out of scope here.What's deferred
Streaming JSON output, an interactive progress indicator, and a
--samplemode (also proposed in the issue) require invasive renderer reshaping. Filed as follow-ups.Test plan
deno check— cleandeno lint— cleandeno fmt— cleandeno run test— 5413 passing, 0 failingdeno run compile— binary refreshed--limittruncation with counts staying authoritative,truncatedflag emitted only when actual truncation occurred, default-unlimited preserves all runsintegration/summarise_perf_test.tsdeterministic regression test: counts parses against a 500-run fixture; if the mtime pre-filter regresses, parse count jumps from ~10 back to 500 and the test fails🤖 Generated with Claude Code