Skip to content

perf(summarise): skip pre-cutoff scans and cap output on large repos#1306

Merged
stack72 merged 2 commits intomainfrom
perf/summarise-large-repos
May 5, 2026
Merged

perf(summarise): skip pre-cutoff scans and cap output on large repos#1306
stack72 merged 2 commits intomainfrom
perf/summarise-large-repos

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 5, 2026

Summary

Resolves swamp-club#240. swamp summarise --since 1d hung on repos with thousands of workflow runs because SummaryService called findAllGlobal() on three repos to load every output, run, and data record into memory before filtering by the cutoff.

  • Pushes the time bound into the repository contract via a new findAllGlobalSince(cutoff) method on WorkflowRunRepository, OutputRepository, and DataRepositoryReader.
  • YAML implementations use a two-stage filter: an mtime pre-filter (Deno.stat, skip parse if mtime < cutoff) plus a parse-verify fallback that re-checks startedAt/createdAt to handle long-running workflows and backup-restore scenarios.
  • Adds an opt-in --limit N flag that caps each per-group runs[] array. Counts (succeeded, failed, total) always reflect every matching run; an optional truncated: true field is set only when truncation occurred (omitted otherwise — no silent JSON shape change).

Measured perf delta

Synthetic fixtures, warm cache, macOS APFS:

N total in-window findAllGlobal findAllGlobalSince speedup
3,000 50 343ms 88ms 3.9x
3,000 500 345ms 144ms 2.4x
10,000 100 1141ms 281ms 4.1x

Cold-cache wins are larger because Deno.stat is much cheaper than readFile + parseYaml. Speedup grows with the ratio of out-of-window to in-window runs — exactly the case the issue describes (--since 1d over months of accumulated runs).

DDD note

findAllGlobal() is a domain-layer leak — services shouldn't filter what the repository should never have returned. The new findAllGlobalSince(cutoff) aligns with the Specification pattern (cutoff is the spec) without introducing a separate type. Both methods coexist; new callers with a time bound should prefer findAllGlobalSince. JSDoc on each new method explains the preference. A future cleanup could parameterize via an ActivityFilter value object — out of scope here.

What's deferred

Streaming JSON output, an interactive progress indicator, and a --sample mode (also proposed in the issue) require invasive renderer reshaping. Filed as follow-ups.

Test plan

  • deno check — clean
  • deno lint — clean
  • deno fmt — clean
  • deno run test — 5413 passing, 0 failing
  • deno run compile — binary refreshed
  • Repo unit tests cover the two-stage filter: in-window only, long-running run rejection (mtime > cutoff but startedAt < cutoff), backup-restore scenario (mtime scrambled, Stage B catches it)
  • Service tests cover: cutoff filtering, --limit truncation with counts staying authoritative, truncated flag emitted only when actual truncation occurred, default-unlimited preserves all runs
  • CLI test pins the flag surface
  • integration/summarise_perf_test.ts deterministic regression test: counts parses against a 500-run fixture; if the mtime pre-filter regresses, parse count jumps from ~10 back to 500 and the test fails

🤖 Generated with Claude Code

`swamp summarise --since 1d` hung on repos with thousands of workflow runs
because SummaryService called `findAllGlobal()` on three repos to load every
output, run, and data record into memory before filtering by the cutoff.

Pushes the time bound into the repository contract via a new
`findAllGlobalSince(cutoff)` method on `WorkflowRunRepository`,
`OutputRepository`, and `DataRepositoryReader`. The YAML implementations use
a two-stage filter: an mtime pre-filter (`Deno.stat`, skip parse if mtime <
cutoff — runs are rewritten on every status transition, so mtime < cutoff
implies the run completed before cutoff) and a parse-verify fallback that
re-checks `startedAt`/`createdAt` after parse to handle long-running workflows
and backup-restore scenarios where mtime is unreliable.

Adds an opt-in `--limit N` flag that caps each per-group `runs[]` array;
counts (`succeeded`, `failed`, `total`) always reflect every matching run in
the window. When truncation occurs, the group sets an optional
`truncated: true` field — omitted otherwise so existing JSON consumers see
no shape change. Default is unlimited, preserving prior behavior.

Hand-measured against synthetic 3000- and 10000-run fixtures (warm cache,
macOS APFS):

  N=3000,  in-window=50:    findAllGlobal 343ms → findAllGlobalSince 88ms  (3.9x)
  N=3000,  in-window=500:   findAllGlobal 345ms → findAllGlobalSince 144ms (2.4x)
  N=10000, in-window=100:   findAllGlobal 1141ms → findAllGlobalSince 281ms (4.1x)

Cold-cache wins are larger because `Deno.stat` is much cheaper than
`readFile + parseYaml`. Speedup grows with the ratio of out-of-window to
in-window runs — exactly the case the issue describes (`--since 1d` over
months of accumulated runs).

`integration/summarise_perf_test.ts` pins the optimization deterministically
by counting parses against a 500-run fixture; if the mtime pre-filter ever
regresses, parse count jumps from ~10 back to 500 and the test fails with a
concrete error. Wall-clock is intentionally not asserted (flaky on shared
CI). New CLI test in `src/cli/commands/summarise_test.ts` covers the flag
surface. Service tests cover limit truncation, the optional `truncated`
field, and that counts remain authoritative when `runs[]` is capped.

Streaming JSON output, an interactive progress indicator, and a `--sample`
mode (also proposed in the issue) are deferred to follow-ups; they require
invasive renderer reshaping that is out of scope here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

  1. src/cli/commands/summarise.ts:66 — validation error uses new Error instead of UserError

    throw new Error("--limit must be a positive integer");

    Every other command-level validation in the codebase throws UserError (see issue_ripple.ts:78, datastore_sync.ts:67-77, vault_put.ts:191, vault_create.ts:51). UserError is the established pattern for user-facing validation messages — the CLI's error handler suppresses the stack trace for UserError and prints only the message. Throwing a plain Error here will print a stack trace to the user for what is a simple input error.

    Fix: Import UserError from ../../domain/errors.ts and change the throw to:

    throw new UserError("--limit must be a positive integer");

Suggestions

  1. --limit help text is terse"Cap per-group run details (counts still reflect all matching runs)" is accurate but dense. A hint that this takes a positive integer (e.g. 10) would help discoverability, matching the style of --since which has (e.g. 1h, 1d, 7d, 1w). Not blocking.

  2. Truncation indicator in log mode is placed after failure counts(showing 3 of 20) appears as the last segment after ✓ 15 ✗ 5. A user skimming quickly might not see it. Consider making it the first element or adding it on its own indented line. Not blocking — the information is present and accurate.

Verdict

NEEDS CHANGES — one blocking issue: validation error must use UserError to avoid exposing a stack trace to users on bad --limit input. The rest of the UX (help text, log-mode formatting, JSON shape with truncated flag, consistent counts-vs-details split) is clean.

github-actions[bot]
github-actions Bot previously approved these changes May 5, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Well-structured PR. The core optimization — pushing the time bound into the repository contract via findAllGlobalSince(cutoff) — is the right DDD call: services shouldn't filter what the repository should never have returned. The two-stage mtime pre-filter / parse-verify approach is sound, and the Stage B fallback correctly handles the backup-restore and long-running-workflow edge cases. The --limit + truncated flag design keeps counts authoritative while bounding detail output — clean JSON contract.

DDD

  • findAllGlobalSince(cutoff) aligns with the Specification pattern without introducing a separate type — appropriate scope.
  • SummaryService remains a proper domain service orchestrating across aggregates.
  • The DataRepositoryReader read-only projection in summary_types.ts is appropriate for query-side concerns.
  • Repository interface contracts are well-documented with JSDoc explaining when to prefer findAllGlobalSince over findAllGlobal.

CLAUDE.md Compliance

  • ✅ License headers on all new files
  • @std/path (join) used for all path construction
  • ✅ Named exports throughout
  • ✅ Test naming follows "functionName: describes behavior" convention
  • ✅ CLI test has initializeLogging + models barrel import
  • ✅ Windows .catch(() => {}) in integration test's withTempDir
  • ✅ Renderer handles both log and json modes (truncation annotation in log output)
  • ✅ CLI and renderer imports from src/libswamp/mod.ts (no internal path imports introduced)

Test Coverage

Good coverage across layers: 3 unit tests for YamlWorkflowRunRepository.findAllGlobalSince (in-window, long-running rejection, backup-restore), 3 service-level tests (limit truncation, no-truncation omission, default unlimited), 4 CLI flag-surface tests, and 1 integration test pinning parse-count regression. Mock repos in summary_service_test.ts and execution_service_test.ts correctly implement the new interface method.

Security

No concerns. Filesystem walks stay within repo boundaries (this.baseDir), error handling properly catches NotFound without leaking paths, and the --limit validation rejects non-positive values at the CLI boundary.

Blocking Issues

None.

Suggestions

  1. Unit tests for output/data repo implementations: YamlOutputRepository.findAllGlobalSince and FileSystemUnifiedDataRepository.findAllGlobalSince don't have their own unit tests. The code is structurally identical to the well-tested workflow run repo, so the risk is low — but unit tests next to each source file would match the project convention. Consider adding them in a follow-up.

  2. --limit integer validation: The error message says "positive integer" but Cliffy's :number type accepts floats like 1.5. Array.slice truncates to integer so behavior is correct, but consider adding a Number.isInteger() check to match the stated contract, or using Cliffy's :integer type if available.

  3. Pre-existing (not from this PR): src/presentation/renderers/summarise.ts imports ActivitySummary directly from ../../domain/summary/summary_types.ts instead of through src/libswamp/mod.ts. Not introduced by this PR, but worth a note for future cleanup.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. src/infrastructure/persistence/yaml_output_repository.ts:201 and yaml_workflow_run_repository.ts:251 — Per-file Deno.stat / readTextFile not wrapped in individual try/catch for TOCTOU races.

    If a YAML file is deleted between the readDir iteration and the Deno.stat() call (Stage A), the NotFound error propagates to the outer catch block:

    • In findAllGlobalSince (output repo, line 221-222): continue skips to the next model type, silently dropping any remaining outputs for the current type that hadn't been iterated yet.
    • In findRunsSinceByWorkflowId (workflow run repo, line 268-269): return [] discards all runs already collected in the runs array for that workflow — a worse outcome since it loses already-gathered results.

    Breaking example: Another process (or a concurrent swamp gc) deletes a single output file mid-scan. The output repo silently skips all remaining files for that model type; the workflow run repo returns zero runs for that workflow even if dozens matched.

    However: The existing findAllByWorkflowId (line 109-130) and findAll (output repo, line 122-147) have the identical pattern — outer catch only, no per-file guard. This is a pre-existing design choice in the codebase, not a regression introduced by this PR. The unified_data_repository.ts:findAllForModelSince (line 273-280) is the odd one out that does wrap stat individually. For consistency, wrapping the per-file stat in its own try/catch (like the data repo does) would be a nice follow-up, but I'm not flagging this as blocking since it's consistent with the existing code.

Low

  1. src/cli/commands/summarise.ts:58--limit accepts non-integer floats.

    Cliffy's :number type parses --limit 2.5 as 2.5. The validation at line 65 (options.limit <= 0) passes. Array.prototype.slice(0, 2.5) silently truncates to slice(0, 2). The error message says "positive integer" but the input isn't validated as such. In practice, nobody passes --limit 2.5, and the behavior is merely imprecise rather than wrong. Could add !Number.isInteger(options.limit) to the guard if desired.

  2. integration/summarise_perf_test.ts:113 — Monkey-patching Deno.readTextFile is fragile under parallel test execution.

    The instrumentReadCounter replaces the global Deno.readTextFile. If Deno's test runner parallelizes test files (which it does by default), another test file reading YAML workflow-run files concurrently could increment the counter, making the budget assertion flaky. The try/finally restore is correct, but the window between patch and restore is the entire test body. In practice this is unlikely to cause issues since other test files probably don't hit workflow-run YAML files, and the integration test is likely run in its own file. But it's worth noting.

Verdict

PASS. The PR is well-designed. The two-stage mtime pre-filter is a sound optimization with correct fallback semantics (Stage B is always the source of truth). The --limit truncation correctly preserves aggregate counts while capping detail arrays. Interface changes are properly propagated to all implementors and test mocks. The TOCTOU error-handling gap is pre-existing and consistent with the codebase's current patterns — not a regression.

Review feedback on #1306:

1. The --limit validation threw `new Error(...)`, which prints a stack trace
   to the user. Every other CLI-level validation in the codebase throws
   `UserError` (issue_ripple.ts, datastore_sync.ts, vault_put.ts, vault_create.ts);
   the CLI's error handler suppresses the stack and prints only the message
   for UserError. Switch to UserError for consistency.

2. Cliffy's `:number` type accepts floats, so `--limit 2.5` slipped past the
   `<= 0` guard and silently truncated via `Array.prototype.slice(0, 2.5) →
   slice(0, 2)`. The error message says "positive integer" — make the
   validation match by adding `!Number.isInteger(options.limit)` to the
   guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

  1. Flag placeholder inconsistency (src/cli/commands/summarise.ts:59): The new flag is declared as --limit <count:number>, while the existing --limit flags in data_search.ts and workflow_run_search.ts use --limit <n:number>. Cliffy renders the placeholder in help text, so this creates a minor cosmetic inconsistency. Consider <n:number> for uniformity.

  2. Flag description wording (src/cli/commands/summarise.ts:60): "Cap per-group run details (counts still reflect all matching runs)" is longer than comparable descriptions elsewhere ("Max results", "Maximum results (unlimited when omitted)"). The parenthetical is valuable because the semantic is non-obvious — counts staying authoritative under truncation is worth calling out. A slight tightening like "Max run details per group (counts always reflect all runs)" would be a bit more readable, but the current version is acceptable.

  3. Missing usage example: The command has two --since examples but no example for --limit. Adding something like .example("Cap detail output on a large repo", "swamp summarise --limit 10") would help users discover the flag. Low priority since the help text documents it, but the pattern of leading with examples is already established on this command.

Verdict

PASS — --limit is correctly wired through to JSON output, the truncated field is omitted when not needed (no silent shape change), error validation is clear and uses UserError, and the (showing N of M) hint in log mode is readable and consistently applied to both method and workflow sections.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Well-structured performance improvement. The two-stage mtime pre-filter is a clean optimization — correctness is guaranteed by Stage B, and the invariant (mtime < cutoff ⇒ startedAt < cutoff for completed runs) is well-documented. The --limit flag design is thoughtful: counts stay authoritative, truncated is omitted when unused (no silent JSON shape change), and the second commit correctly tightens validation (UserError + integer check).

DDD alignment is good: pushing the time bound into the repository contract via findAllGlobalSince follows the Specification pattern without introducing a new type. The domain types (MethodGroup.truncated, WorkflowRunGroup.truncated) cleanly model the capping semantics. SummaryService remains a proper domain service coordinating multiple repositories.

Blocking Issues

None.

Suggestions

  1. Missing unit tests for YamlOutputRepository.findAllGlobalSince and FileSystemUnifiedDataRepository.findAllGlobalSince — The workflow run repo gets dedicated unit tests for the two-stage filter (in-window only, long-running rejection, backup-restore), but the output and data repo implementations of the same pattern have no direct unit tests. They're structurally identical and covered indirectly via service-layer mocks, so this isn't blocking, but adding parallel tests would guard against future divergence between the three implementations.

  2. TOCTOU window on Deno.stat in output and workflow-run repos — In findAllGlobalSince for both YamlOutputRepository and YamlWorkflowRunRepository, Deno.stat() is called without a per-file NotFound guard. If a file is deleted between readDir and stat, the outer try/catch swallows the NotFound but also abandons remaining files in that directory (output repo) or workflow (run repo). The data repo wraps stat in its own try/catch with continue on NotFound, which is more resilient. Consider aligning the pattern. Low risk in practice since file deletion during summarise is unlikely.

  3. Integration test monkey-patches Deno.readTextFile — The instrumentReadCounter approach is pragmatic for this regression test, but worth noting it makes the test sensitive to runtime internals. If Deno ever changes readTextFile to be non-writable on the namespace object, the test would fail for an unrelated reason. Not blocking — it's the simplest way to count parses without plumbing instrumentation through the repo class.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. src/infrastructure/persistence/yaml_workflow_run_repository.ts:251 — TOCTOU race in findRunsSinceByWorkflowId discards partially-collected runs. Deno.stat(path) at line 251 is unguarded. If a file is deleted between Deno.readDir and Deno.stat (e.g., a concurrent GC or deletion), the Deno.errors.NotFound propagates to the outer catch at line 267, which returns [] — discarding all runs already collected in the runs array for that workflow ID.

    Breaking example: Concurrent deleteAllByWorkflowId removes a file while findAllGlobalSince is mid-iteration over that workflow's directory. The stat throws NotFound, the catch returns [], and the caller loses legitimate in-window runs that were already parsed and pushed to runs.

    Suggested fix: Wrap the stat in its own try/catch, matching the pattern used in unified_data_repository.ts:273-278:

    try {
      const stat = await Deno.stat(path);
      const mtimeMs = stat.mtime?.getTime();
      if (mtimeMs !== undefined && mtimeMs < cutoffMs) continue;
    } catch (error) {
      if (error instanceof Deno.errors.NotFound) continue;
      throw error;
    }

    Mitigating factor: The pre-existing findAllByWorkflowId (used by findAllGlobal) has the same unguarded readTextFile pattern, so this isn't a regression — but it's worth fixing since the new method is specifically targeting large repos where the race window is wider.

  2. src/infrastructure/persistence/yaml_output_repository.ts:201 — Same TOCTOU pattern in findAllGlobalSince. Deno.stat(path) at line 201 is unguarded. A deleted file causes the exception to propagate to the outer catch at line 222, which continues past the entire current model type. Results from other model types are preserved (they're already in results), but any results already parsed for the current model type's inner method loop are lost.

    Suggested fix: Same as above — wrap Deno.stat in a targeted try/catch that continues on NotFound.

Low

  1. integration/summarise_perf_test.ts:113instrumentReadCounter monkey-patches a global. Deno.readTextFile is replaced at the process level. If Deno's test runner executes multiple test files in the same isolate concurrently, any parallel test that reads workflow-run-*.yaml files would inflate the counter. The budget of N_IN_WINDOW + 5 (15) provides slack, but the test is inherently fragile under concurrent execution. The finally restore prevents leaking the patch, which is good.

Verdict

PASS. The two-stage mtime pre-filter is sound — Stage A can only produce false positives (not false negatives), and Stage B is always the source of truth for inclusion. The --limit validation correctly rejects zero, negative, float, and NaN inputs using UserError. The truncation logic correctly preserves aggregate counts while capping detail arrays, and the truncated flag is only set when actual truncation occurs. All repository interfaces, concrete implementations, and test mocks are consistently updated. The medium-severity TOCTOU findings follow pre-existing patterns and only matter under concurrent mutation — they're worth addressing but don't block this PR.

@stack72 stack72 merged commit ff890cd into main May 5, 2026
11 checks passed
@stack72 stack72 deleted the perf/summarise-large-repos branch May 5, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant