Conversation
…tastore compact command (swamp-club#241) Three coordinated improvements to SQLite catalog performance under heavy GC load: 1. `bulkRemoveVersions` on `CatalogStore` wraps N individual DELETE statements in a single BEGIN IMMEDIATE transaction, eliminating per-row WAL fsync overhead during version GC. 2. `compactCatalog` dep added to `DataGcDeps`; after a live GC run `dataGc` calls `PRAGMA wal_checkpoint(TRUNCATE)` so the WAL file is reset to 0 bytes. WAL page stats are reported in the completed event and rendered in both log and JSON output modes. TRUNCATE returns (0,0,0) on full success, so the log renderer special-cases that result. 3. New `swamp datastore compact` command exposes on-demand WAL checkpoint + VACUUM so operators can reclaim space without running GC. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
-
src/presentation/renderers/data_gc.ts— misleading WAL message on--dry-runWhen a user runs
swamp data gc --dry-run,gc.tsskips the WAL checkpoint (const compact = !input.dryRun ? deps.compactCatalog?.() : undefined) and setswalPagesTotal: 0andwalPagesCheckpointed: 0in the completed event. The log renderer's WAL block checks:if (e.data.walPagesTotal > 0 && e.data.walPagesCheckpointed < e.data.walPagesTotal) { // partial checkpoint } else { logger.info`WAL checkpointed and truncated`; // ← fires on dry-run }
Because
0 > 0is false, the else branch fires unconditionally, telling the user "WAL checkpointed and truncated" even though no checkpoint happened. This is actively incorrect output for a safety operation that users rely on to preview without side effects.Fix: Gate the WAL lines on
!e.data.dryRun:if (!e.data.dryRun) { if (e.data.walPagesTotal > 0 && e.data.walPagesCheckpointed < e.data.walPagesTotal) { logger.info`WAL partial checkpoint: ...`; } else { logger.info`WAL checkpointed and truncated`; } }
Suggestions
-
src/presentation/renderers/datastore_compact.ts— raw byte count is hard to readlogger.info\Catalog compacted: reclaimed ${e.data.dbBytesReclaimed} bytes`outputs raw integers (e.g.,400000000 bytes). A human-readable format (e.g.,391 KB,3.8 MB`) would be more scannable. Not blocking since no existing renderers use a humanized formatter either, but worth a follow-up. -
src/presentation/renderers/data_gc.ts(JSON mode) — ambiguous WAL zeros on dry-runIn JSON output, a dry-run emits
"walPagesTotal": 0, "walPagesCheckpointed": 0. A script consuming this cannot distinguish "dry-run skipped the checkpoint" from "real run, WAL was already empty". Consider adding a"walCheckpointed": falseboolean (or similar) so consumers know whether the checkpoint actually ran.
Verdict
NEEDS CHANGES — the data gc --dry-run log path reports "WAL checkpointed and truncated" when no checkpoint occurred.
There was a problem hiding this comment.
Code Review
Clean, well-structured PR that adds three coordinated SQLite performance improvements. The architecture follows the project's established DDD patterns well — deps-based dependency injection for datastoreCompact, proper separation of application logic from infrastructure, and both log and json output modes supported. The libswamp import boundary is respected in all CLI and renderer files. Test coverage is solid: 10 new tests across the three features with good edge-case coverage (empty arrays, dry-run skip, partial checkpoints, no-shrink vacuum).
Blocking Issues
None.
Suggestions
-
catalogStore.close()not in try/finally (src/cli/commands/datastore_compact.ts:93): IfconsumeStreamthrows (e.g., the renderer'serrorhandler throwsUserError),catalogStore.close()is skipped. Wrapping theconsumeStream+close()in try/finally would be more defensive — though in practice the process exits immediately on CLI errors, so the impact is minimal. -
Duplicated
_catalog.dbpath construction (src/cli/commands/datastore_compact.ts:75): Thejoin(dataBaseDir, "_catalog.db")pattern is also insrc/infrastructure/persistence/repository_factory.ts:94. Extracting a shared helper (or exposing the path from the factory) would prevent these two locations from diverging. Low risk since the name is unlikely to change. -
initializeLoggingincompact_test.ts: Thecompact_test.tscallsawait initializeLogging({})in every test, but the siblinggc_test.ts(which also useswithGeneratorSpanvia the generator under test) runs fine without it. The calls are harmless but may be unnecessary. -
CatalogCheckpointStatsnot re-exported frommod.ts: TheCatalogCheckpointStatstype is defined incatalog_store.tsbut isn't needed externally since deps interfaces define their own return shapes. Just noting this is intentional and correct — no action needed.
There was a problem hiding this comment.
Adversarial Review
Critical / High
None.
Medium
-
Misleading WAL log on dry-run —
src/presentation/renderers/data_gc.ts:38-47On a dry-run GC,
compactCatalogis skipped sowalPagesTotalandwalPagesCheckpointedboth default to 0 (via?? 0ingc.ts:177-178). The renderer conditionwalPagesTotal > 0 && walPagesCheckpointed < walPagesTotalevaluates tofalse, so it falls through to theelsebranch and logs "WAL checkpointed and truncated" — even though no checkpoint was performed.A successful
PRAGMA wal_checkpoint(TRUNCATE)also returns(0, 0, 0)(post-truncation state), so the renderer can't distinguish "never attempted" from "fully successful" with the current data model.Breaking example:
swamp data gc --dry-runoutputs:GC complete: deleted 0 items WAL checkpointed and truncated ← wrong, nothing was checkpointedSuggested fix: Guard the WAL log block on
!e.data.dryRun, e.g.:if (!e.data.dryRun) { if (e.data.walPagesTotal > 0 && e.data.walPagesCheckpointed < e.data.walPagesTotal) { logger.info`WAL partial checkpoint: ...`; } else { logger.info`WAL checkpointed and truncated`; } }
-
Missing try/finally for
catalogStore.close()—src/cli/commands/datastore_compact.ts:92-93If
consumeStreampropagates an exception (e.g.,checkpoint()hits SQLITE_BUSY after the 5s busy_timeout, orvacuum()fails due to an I/O error),catalogStore.close()on line 93 is never reached. The SQLite connection leaks. For a CLI that exits immediately this is benign, but it's an easy fix:try { await consumeStream(datastoreCompact(ctx, deps), renderer.handlers()); } finally { catalogStore.close(); }
Low
-
Dead
errorevent indatastoreCompactgenerator —src/libswamp/datastores/compact.ts:34,57-59DatastoreCompactEventincludes{ kind: "error"; error: SwampError }and both renderers implement anerrorhandler, but the generator never catches exceptions and never yields an error event. Exceptions fromcheckpoint()/vacuum()propagate uncaught from the async generator. This is consistent with thedataGcgenerator pattern so it's not wrong per se, but theerrorvariant and its handlers are dead code. -
dbBytesReclaimedcan under-report —src/libswamp/datastores/compact.ts:55-67beforeSizeis measured pre-checkpoint (main DB only, not the WAL file). After checkpoint flushes WAL pages into the main DB, the main DB can grow. After VACUUM shrinks it,afterSizemay still be larger than the originalbeforeSize, causingMath.max(0, beforeSize - afterSize)to report 0 even when VACUUM reclaimed significant space relative to the post-checkpoint size. TheMath.max(0, ...)guard prevents negatives, but the metric will read "Catalog already compact" in cases where real compaction happened. A more accurate measurement would capturebeforeSizeafter checkpoint but before VACUUM.
Verdict
PASS — The core logic (batch deletes, WAL checkpoint, VACUUM) is correct and well-tested. The two medium findings are UX-level issues (misleading dry-run log, missing finally block) that don't risk data loss or corruption.
- Gate WAL log lines on !dryRun in data_gc renderer; dry-run skips compactCatalog so walPagesTotal/walPagesCheckpointed both default to 0, causing the else branch to falsely log "WAL checkpointed and truncated" - Wrap consumeStream in try/finally so catalogStore.close() is guaranteed even when the renderer throws - Measure beforeSize after checkpoint (not before) so the WAL pages are already flushed to the main DB, giving an accurate pre-VACUUM baseline for dbBytesReclaimed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
None.
Suggestions
-
Raw byte count in log mode —
src/presentation/renderers/datastore_compact.ts:52The completed handler emits:
Catalog compacted: reclaimed ${e.data.dbBytesReclaimed} bytesEvery other renderer in the codebase that displays byte sizes in log mode (
datastore_setup.ts,data_list.ts,data_search.tsx,data_query.ts) uses a localformatBytes/formatSizehelper to produce400.0KB/1.2MBstyle output. For a command whose explicit purpose is reclaiming disk space, a raw number likereclaimed 1073741824 bytesis awkward wherereclaimed 1.0GBwould be immediately scannable. Consider copying the sameformatByteshelper used indatastore_setup.ts. -
"WAL checkpointed and truncated" when WAL was already empty —
src/presentation/renderers/data_gc.ts:47anddatastore_compact.ts:49When
walPagesTotal == 0(WAL was already clean), both renderers fall through to the else branch and print "WAL checkpointed and truncated". Technically accurate for TRUNCATE mode, but "truncated" implies there was something to truncate. A message like "WAL already clean" for the zero case would be clearer. Minor, and the PR comment documents the SQLite (0,0,0) behaviour, so this is low priority.
Verdict
PASS — both log and JSON modes are fully implemented for the new swamp datastore compact command, the command description and examples are clear, flag naming is consistent with other datastore subcommands, and error handling follows the established UserError pattern. The raw-bytes display is a style nit rather than a blocker.
There was a problem hiding this comment.
Code Review
Clean, well-structured PR that adds three coordinated SQLite catalog performance improvements with proper layered architecture.
Architecture & DDD ✅
- Proper layer separation:
CatalogStore(infrastructure) →datastoreCompact/dataGc(application services in libswamp) → CLI command + renderer (presentation) - Dependency injection via
DatastoreCompactDepsandDataGcDepsinterfaces follows established patterns - Application services use async generator event streams, consistent with existing codebase conventions
Import Boundary ✅
- CLI command (
datastore_compact.ts) imports fromlibswamp/mod.ts— correct - Renderer (
datastore_compact.ts) imports fromlibswamp/mod.ts— correct - New types properly exported from
libswamp/mod.ts
Security ✅
bulkRemoveVersionsuses parameterized queries (?placeholders) — no SQL injection risk- Proper
BEGIN IMMEDIATE/COMMIT/ROLLBACKtransaction handling try/finallyensurescatalogStore.close()in the compact commandMath.max(0, beforeSize - afterSize)prevents negative byte reclamation values
Tests ✅
- 3
CatalogStoretests:bulkRemoveVersionsatomicity, empty-array no-op,checkpointWAL truncation - 3
dataGctests: compact called on live run, skipped on dry-run, WAL stats default when dep omitted - 4
datastoreCompacttests: event sequence, WAL page counts,dbBytesReclaimedcalculation, zero-reclaim case - Tests live next to source files per conventions
Code Style ✅
- License headers on all new files
- Both
logandjsonoutput modes supported - WAL log lines properly gated behind
!dryRun(second commit fix) AnyOptionspattern matches established CLI convention- LogTape tagged templates use bare interpolation per CLAUDE.md
Suggestions (non-blocking)
CatalogCheckpointStatsis exported fromcatalog_store.tsbut the deps interfaces incompact.tsandgc.tsdefine their own inline type shapes. Could reference the named type for discoverability, though structural typing makes this work fine as-is.- The
beforeSizemeasurement after checkpoint (second commit fix) is a nice catch — ensures WAL pages are flushed before measuring the pre-VACUUM baseline.
LGTM — solid performance improvement with clean architecture and thorough tests.
There was a problem hiding this comment.
Adversarial Review
I traced every code path across the 12 changed files, focusing on transaction safety, SQLite WAL checkpoint semantics, resource lifecycle, and error propagation. The code is solid.
Critical / High
None.
Medium
src/infrastructure/persistence/unified_data_repository.ts:1386—bulkRemoveVersionsremoves catalog rows for versions whose disk deletion may have failed. ThePromise.allSettledbatch (line 1352) can contain rejected promises for versions whose directories couldn't be removed. However,versionsToRemove(the full list, not the success-filtered list) is passed tobulkRemoveVersions. This means a version could exist on disk but have no catalog entry. Mitigating factor: this is pre-existing behavior — the old per-rowremoveVersionloop iterated over the sameversionsToRemovelist regardless of disk-deletion outcome. This PR correctly preserves the existing semantics. Not blocking.
Low
-
src/infrastructure/persistence/catalog_store.ts:550—ROLLBACKcould theoretically mask the original error. IfCOMMITthrows and thenthis.db.exec("ROLLBACK")also throws, the original commit error is lost. In practice this is near-impossible with SQLite (a failed COMMIT auto-rollbacks, so the explicit ROLLBACK is a safety belt on a no-op). No action needed. -
src/presentation/renderers/data_gc.ts:39-48— WAL log message is slightly misleading if WAL mode is not active. If the database were somehow not in WAL mode,PRAGMA wal_checkpoint(TRUNCATE)returns(-1, -1)for log/checkpointed. The conditionwalPagesTotal > 0would be false, so it logs "WAL checkpointed and truncated" even though nothing happened. In practice the constructor always setsPRAGMA journal_mode=WAL, so this path is unreachable.
Observations (not findings)
- Transaction pattern in
bulkRemoveVersionsis correct:BEGIN IMMEDIATEacquires a reserved lock immediately (no reader starvation), the prepared statement is reused across iterations (good for perf), andROLLBACKin the catch block is a proper cleanup. - TRUNCATE checkpoint semantics are handled correctly everywhere. The renderers and PR description both correctly note that
wal_checkpoint(TRUNCATE)returns(0,0,0)on full success (post-truncation counts). - Resource cleanup in
datastore_compact.tsis proper —catalogStore.close()is in afinallyblock. Thedata_gc.tspath doesn't close the catalog store created bycreateDataGcDeps, but this is pre-existing and not introduced by this PR. catalogDbSizemeasurement order indatastoreCompactis correct: measured after checkpoint (so WAL pages are flushed to main DB) but before VACUUM, giving an accurate pre-VACUUM baseline. The second commit explicitly fixed this ordering.- Test coverage is good: atomicity of bulk delete, no-op for empty arrays, WAL truncation verification, dry-run skipping, event sequence, and
dbBytesReclaimedcalculation including the zero-delta edge case.
Verdict
PASS — Well-structured performance improvement with correct transaction handling, proper WAL checkpoint semantics, and thorough test coverage. No issues introduced by this PR.
Summary
Three coordinated SQLite catalog performance improvements for swamp-club#241:
CatalogStore.bulkRemoveVersions()wraps N individualDELETEstatements in a singleBEGIN IMMEDIATEtransaction, eliminating per-row WAL fsync overhead during version GC.dataGccallsPRAGMA wal_checkpoint(TRUNCATE)at the end of every live (non-dry-run) GC run, resetting the WAL file to 0 bytes. Stats are reported in thecompletedevent and rendered in bothlogandjsonoutput modes. Note:TRUNCATEreturns(0,0,0)on full success (post-truncation counts, not pre), so renderers special-case that result.swamp datastore compactcommand: New command exposing on-demand WAL checkpoint +VACUUMso operators can reclaim space without running GC.Test Plan
CatalogStore: 3 new tests —bulkRemoveVersionsatomicity, no-op for empty array,checkpointWAL truncationdataGc: 3 new tests —compactCatalogcalled on real run, skipped on dry-run, WAL stats default to 0 when not provideddatastoreCompact: 4 new tests — event sequence, WAL page counts in completed event,dbBytesReclaimedcalculation, zero when db size doesn't decreasedeno checkanddeno lintclean