Skip to content

perf(catalog): batch version deletes, WAL checkpoint after GC, add datastore compact (swamp-club#241)#1311

Merged
stack72 merged 2 commits intomainfrom
worktree-squishy-mapping-boole
May 5, 2026
Merged

perf(catalog): batch version deletes, WAL checkpoint after GC, add datastore compact (swamp-club#241)#1311
stack72 merged 2 commits intomainfrom
worktree-squishy-mapping-boole

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 5, 2026

Summary

Three coordinated SQLite catalog performance improvements for swamp-club#241:

  • Batch version deletes: CatalogStore.bulkRemoveVersions() wraps N individual DELETE statements in a single BEGIN IMMEDIATE transaction, eliminating per-row WAL fsync overhead during version GC.
  • WAL checkpoint after GC: dataGc calls PRAGMA wal_checkpoint(TRUNCATE) at the end of every live (non-dry-run) GC run, resetting the WAL file to 0 bytes. Stats are reported in the completed event and rendered in both log and json output modes. Note: TRUNCATE returns (0,0,0) on full success (post-truncation counts, not pre), so renderers special-case that result.
  • swamp datastore compact command: New command exposing on-demand WAL checkpoint + VACUUM so operators can reclaim space without running GC.

Test Plan

  • CatalogStore: 3 new tests — bulkRemoveVersions atomicity, no-op for empty array, checkpoint WAL truncation
  • dataGc: 3 new tests — compactCatalog called on real run, skipped on dry-run, WAL stats default to 0 when not provided
  • datastoreCompact: 4 new tests — event sequence, WAL page counts in completed event, dbBytesReclaimed calculation, zero when db size doesn't decrease
  • All 39 tests pass; deno check and deno lint clean

…tastore compact command (swamp-club#241)

Three coordinated improvements to SQLite catalog performance under heavy GC load:

1. `bulkRemoveVersions` on `CatalogStore` wraps N individual DELETE statements in a single
   BEGIN IMMEDIATE transaction, eliminating per-row WAL fsync overhead during version GC.

2. `compactCatalog` dep added to `DataGcDeps`; after a live GC run `dataGc` calls
   `PRAGMA wal_checkpoint(TRUNCATE)` so the WAL file is reset to 0 bytes. WAL page stats
   are reported in the completed event and rendered in both log and JSON output modes.
   TRUNCATE returns (0,0,0) on full success, so the log renderer special-cases that result.

3. New `swamp datastore compact` command exposes on-demand WAL checkpoint + VACUUM so
   operators can reclaim space without running GC.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

  1. src/presentation/renderers/data_gc.ts — misleading WAL message on --dry-run

    When a user runs swamp data gc --dry-run, gc.ts skips the WAL checkpoint (const compact = !input.dryRun ? deps.compactCatalog?.() : undefined) and sets walPagesTotal: 0 and walPagesCheckpointed: 0 in the completed event. The log renderer's WAL block checks:

    if (e.data.walPagesTotal > 0 && e.data.walPagesCheckpointed < e.data.walPagesTotal) {
      // partial checkpoint
    } else {
      logger.info`WAL checkpointed and truncated`; // ← fires on dry-run
    }

    Because 0 > 0 is false, the else branch fires unconditionally, telling the user "WAL checkpointed and truncated" even though no checkpoint happened. This is actively incorrect output for a safety operation that users rely on to preview without side effects.

    Fix: Gate the WAL lines on !e.data.dryRun:

    if (!e.data.dryRun) {
      if (e.data.walPagesTotal > 0 && e.data.walPagesCheckpointed < e.data.walPagesTotal) {
        logger.info`WAL partial checkpoint: ...`;
      } else {
        logger.info`WAL checkpointed and truncated`;
      }
    }

Suggestions

  1. src/presentation/renderers/datastore_compact.ts — raw byte count is hard to read

    logger.info\Catalog compacted: reclaimed ${e.data.dbBytesReclaimed} bytes`outputs raw integers (e.g.,400000000 bytes). A human-readable format (e.g., 391 KB, 3.8 MB`) would be more scannable. Not blocking since no existing renderers use a humanized formatter either, but worth a follow-up.

  2. src/presentation/renderers/data_gc.ts (JSON mode) — ambiguous WAL zeros on dry-run

    In JSON output, a dry-run emits "walPagesTotal": 0, "walPagesCheckpointed": 0. A script consuming this cannot distinguish "dry-run skipped the checkpoint" from "real run, WAL was already empty". Consider adding a "walCheckpointed": false boolean (or similar) so consumers know whether the checkpoint actually ran.

Verdict

NEEDS CHANGES — the data gc --dry-run log path reports "WAL checkpointed and truncated" when no checkpoint occurred.

github-actions[bot]
github-actions Bot previously approved these changes May 5, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-structured PR that adds three coordinated SQLite performance improvements. The architecture follows the project's established DDD patterns well — deps-based dependency injection for datastoreCompact, proper separation of application logic from infrastructure, and both log and json output modes supported. The libswamp import boundary is respected in all CLI and renderer files. Test coverage is solid: 10 new tests across the three features with good edge-case coverage (empty arrays, dry-run skip, partial checkpoints, no-shrink vacuum).

Blocking Issues

None.

Suggestions

  1. catalogStore.close() not in try/finally (src/cli/commands/datastore_compact.ts:93): If consumeStream throws (e.g., the renderer's error handler throws UserError), catalogStore.close() is skipped. Wrapping the consumeStream + close() in try/finally would be more defensive — though in practice the process exits immediately on CLI errors, so the impact is minimal.

  2. Duplicated _catalog.db path construction (src/cli/commands/datastore_compact.ts:75): The join(dataBaseDir, "_catalog.db") pattern is also in src/infrastructure/persistence/repository_factory.ts:94. Extracting a shared helper (or exposing the path from the factory) would prevent these two locations from diverging. Low risk since the name is unlikely to change.

  3. initializeLogging in compact_test.ts: The compact_test.ts calls await initializeLogging({}) in every test, but the sibling gc_test.ts (which also uses withGeneratorSpan via the generator under test) runs fine without it. The calls are harmless but may be unnecessary.

  4. CatalogCheckpointStats not re-exported from mod.ts: The CatalogCheckpointStats type is defined in catalog_store.ts but isn't needed externally since deps interfaces define their own return shapes. Just noting this is intentional and correct — no action needed.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None.

Medium

  1. Misleading WAL log on dry-runsrc/presentation/renderers/data_gc.ts:38-47

    On a dry-run GC, compactCatalog is skipped so walPagesTotal and walPagesCheckpointed both default to 0 (via ?? 0 in gc.ts:177-178). The renderer condition walPagesTotal > 0 && walPagesCheckpointed < walPagesTotal evaluates to false, so it falls through to the else branch and logs "WAL checkpointed and truncated" — even though no checkpoint was performed.

    A successful PRAGMA wal_checkpoint(TRUNCATE) also returns (0, 0, 0) (post-truncation state), so the renderer can't distinguish "never attempted" from "fully successful" with the current data model.

    Breaking example: swamp data gc --dry-run outputs:

    GC complete: deleted 0 items
    WAL checkpointed and truncated   ← wrong, nothing was checkpointed
    

    Suggested fix: Guard the WAL log block on !e.data.dryRun, e.g.:

    if (!e.data.dryRun) {
      if (e.data.walPagesTotal > 0 && e.data.walPagesCheckpointed < e.data.walPagesTotal) {
        logger.info`WAL partial checkpoint: ...`;
      } else {
        logger.info`WAL checkpointed and truncated`;
      }
    }
  2. Missing try/finally for catalogStore.close()src/cli/commands/datastore_compact.ts:92-93

    If consumeStream propagates an exception (e.g., checkpoint() hits SQLITE_BUSY after the 5s busy_timeout, or vacuum() fails due to an I/O error), catalogStore.close() on line 93 is never reached. The SQLite connection leaks. For a CLI that exits immediately this is benign, but it's an easy fix:

    try {
      await consumeStream(datastoreCompact(ctx, deps), renderer.handlers());
    } finally {
      catalogStore.close();
    }

Low

  1. Dead error event in datastoreCompact generatorsrc/libswamp/datastores/compact.ts:34,57-59

    DatastoreCompactEvent includes { kind: "error"; error: SwampError } and both renderers implement an error handler, but the generator never catches exceptions and never yields an error event. Exceptions from checkpoint() / vacuum() propagate uncaught from the async generator. This is consistent with the dataGc generator pattern so it's not wrong per se, but the error variant and its handlers are dead code.

  2. dbBytesReclaimed can under-reportsrc/libswamp/datastores/compact.ts:55-67

    beforeSize is measured pre-checkpoint (main DB only, not the WAL file). After checkpoint flushes WAL pages into the main DB, the main DB can grow. After VACUUM shrinks it, afterSize may still be larger than the original beforeSize, causing Math.max(0, beforeSize - afterSize) to report 0 even when VACUUM reclaimed significant space relative to the post-checkpoint size. The Math.max(0, ...) guard prevents negatives, but the metric will read "Catalog already compact" in cases where real compaction happened. A more accurate measurement would capture beforeSize after checkpoint but before VACUUM.

Verdict

PASS — The core logic (batch deletes, WAL checkpoint, VACUUM) is correct and well-tested. The two medium findings are UX-level issues (misleading dry-run log, missing finally block) that don't risk data loss or corruption.

- Gate WAL log lines on !dryRun in data_gc renderer; dry-run skips
  compactCatalog so walPagesTotal/walPagesCheckpointed both default to 0,
  causing the else branch to falsely log "WAL checkpointed and truncated"
- Wrap consumeStream in try/finally so catalogStore.close() is guaranteed
  even when the renderer throws
- Measure beforeSize after checkpoint (not before) so the WAL pages are
  already flushed to the main DB, giving an accurate pre-VACUUM baseline
  for dbBytesReclaimed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

  1. Raw byte count in log modesrc/presentation/renderers/datastore_compact.ts:52

    The completed handler emits:

    Catalog compacted: reclaimed ${e.data.dbBytesReclaimed} bytes
    

    Every other renderer in the codebase that displays byte sizes in log mode (datastore_setup.ts, data_list.ts, data_search.tsx, data_query.ts) uses a local formatBytes/formatSize helper to produce 400.0KB / 1.2MB style output. For a command whose explicit purpose is reclaiming disk space, a raw number like reclaimed 1073741824 bytes is awkward where reclaimed 1.0GB would be immediately scannable. Consider copying the same formatBytes helper used in datastore_setup.ts.

  2. "WAL checkpointed and truncated" when WAL was already emptysrc/presentation/renderers/data_gc.ts:47 and datastore_compact.ts:49

    When walPagesTotal == 0 (WAL was already clean), both renderers fall through to the else branch and print "WAL checkpointed and truncated". Technically accurate for TRUNCATE mode, but "truncated" implies there was something to truncate. A message like "WAL already clean" for the zero case would be clearer. Minor, and the PR comment documents the SQLite (0,0,0) behaviour, so this is low priority.

Verdict

PASS — both log and JSON modes are fully implemented for the new swamp datastore compact command, the command description and examples are clear, flag naming is consistent with other datastore subcommands, and error handling follows the established UserError pattern. The raw-bytes display is a style nit rather than a blocker.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-structured PR that adds three coordinated SQLite catalog performance improvements with proper layered architecture.

Architecture & DDD ✅

  • Proper layer separation: CatalogStore (infrastructure) → datastoreCompact/dataGc (application services in libswamp) → CLI command + renderer (presentation)
  • Dependency injection via DatastoreCompactDeps and DataGcDeps interfaces follows established patterns
  • Application services use async generator event streams, consistent with existing codebase conventions

Import Boundary ✅

  • CLI command (datastore_compact.ts) imports from libswamp/mod.ts — correct
  • Renderer (datastore_compact.ts) imports from libswamp/mod.ts — correct
  • New types properly exported from libswamp/mod.ts

Security ✅

  • bulkRemoveVersions uses parameterized queries (? placeholders) — no SQL injection risk
  • Proper BEGIN IMMEDIATE / COMMIT / ROLLBACK transaction handling
  • try/finally ensures catalogStore.close() in the compact command
  • Math.max(0, beforeSize - afterSize) prevents negative byte reclamation values

Tests ✅

  • 3 CatalogStore tests: bulkRemoveVersions atomicity, empty-array no-op, checkpoint WAL truncation
  • 3 dataGc tests: compact called on live run, skipped on dry-run, WAL stats default when dep omitted
  • 4 datastoreCompact tests: event sequence, WAL page counts, dbBytesReclaimed calculation, zero-reclaim case
  • Tests live next to source files per conventions

Code Style ✅

  • License headers on all new files
  • Both log and json output modes supported
  • WAL log lines properly gated behind !dryRun (second commit fix)
  • AnyOptions pattern matches established CLI convention
  • LogTape tagged templates use bare interpolation per CLAUDE.md

Suggestions (non-blocking)

  1. CatalogCheckpointStats is exported from catalog_store.ts but the deps interfaces in compact.ts and gc.ts define their own inline type shapes. Could reference the named type for discoverability, though structural typing makes this work fine as-is.
  2. The beforeSize measurement after checkpoint (second commit fix) is a nice catch — ensures WAL pages are flushed before measuring the pre-VACUUM baseline.

LGTM — solid performance improvement with clean architecture and thorough tests.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

I traced every code path across the 12 changed files, focusing on transaction safety, SQLite WAL checkpoint semantics, resource lifecycle, and error propagation. The code is solid.

Critical / High

None.

Medium

  1. src/infrastructure/persistence/unified_data_repository.ts:1386bulkRemoveVersions removes catalog rows for versions whose disk deletion may have failed. The Promise.allSettled batch (line 1352) can contain rejected promises for versions whose directories couldn't be removed. However, versionsToRemove (the full list, not the success-filtered list) is passed to bulkRemoveVersions. This means a version could exist on disk but have no catalog entry. Mitigating factor: this is pre-existing behavior — the old per-row removeVersion loop iterated over the same versionsToRemove list regardless of disk-deletion outcome. This PR correctly preserves the existing semantics. Not blocking.

Low

  1. src/infrastructure/persistence/catalog_store.ts:550ROLLBACK could theoretically mask the original error. If COMMIT throws and then this.db.exec("ROLLBACK") also throws, the original commit error is lost. In practice this is near-impossible with SQLite (a failed COMMIT auto-rollbacks, so the explicit ROLLBACK is a safety belt on a no-op). No action needed.

  2. src/presentation/renderers/data_gc.ts:39-48 — WAL log message is slightly misleading if WAL mode is not active. If the database were somehow not in WAL mode, PRAGMA wal_checkpoint(TRUNCATE) returns (-1, -1) for log/checkpointed. The condition walPagesTotal > 0 would be false, so it logs "WAL checkpointed and truncated" even though nothing happened. In practice the constructor always sets PRAGMA journal_mode=WAL, so this path is unreachable.

Observations (not findings)

  • Transaction pattern in bulkRemoveVersions is correct: BEGIN IMMEDIATE acquires a reserved lock immediately (no reader starvation), the prepared statement is reused across iterations (good for perf), and ROLLBACK in the catch block is a proper cleanup.
  • TRUNCATE checkpoint semantics are handled correctly everywhere. The renderers and PR description both correctly note that wal_checkpoint(TRUNCATE) returns (0,0,0) on full success (post-truncation counts).
  • Resource cleanup in datastore_compact.ts is proper — catalogStore.close() is in a finally block. The data_gc.ts path doesn't close the catalog store created by createDataGcDeps, but this is pre-existing and not introduced by this PR.
  • catalogDbSize measurement order in datastoreCompact is correct: measured after checkpoint (so WAL pages are flushed to main DB) but before VACUUM, giving an accurate pre-VACUUM baseline. The second commit explicitly fixed this ordering.
  • Test coverage is good: atomicity of bulk delete, no-op for empty arrays, WAL truncation verification, dry-run skipping, event sequence, and dbBytesReclaimed calculation including the zero-delta edge case.

Verdict

PASS — Well-structured performance improvement with correct transaction handling, proper WAL checkpoint semantics, and thorough test coverage. No issues introduced by this PR.

@stack72 stack72 merged commit 7df1020 into main May 5, 2026
11 checks passed
@stack72 stack72 deleted the worktree-squishy-mapping-boole branch May 5, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant