feat(hub): automated WAL-checkpoint + incremental vacuum (ADR-045 D4, #79)#288
Merged
Conversation
…s (ADR-045 D4, #79) Two storage-hygiene gaps from the D2 store split (#79): WAL files grow unbounded and freed pages never return to the OS. They are distinct mechanisms, addressed separately. - WAL growth is reader-pinning: the hub's long-lived SSE readers keep the auto-checkpoint from ever reaching the WAL head. Fixed by a periodic wal_checkpoint(TRUNCATE), not VACUUM. - Reclamation uses auto_vacuum=INCREMENTAL (bounded, interleaves with readers), not full VACUUM (~2x disk + global write lock for an O(DB-size) duration — hostile to a small always-on VPS). New event/digest shards are born auto_vacuum=INCREMENTAL (the pragma rides the schema-creating writer connection in openStorePool; hub.db keeps freelist reuse). A background loop (runStoreMaintenance, same ctx lifetime as the other sweeps; HUB_STORE_MAINTENANCE_DISABLE / _INTERVAL) checkpoints + runs a bounded incremental_vacuum with hysteresis (>=25% free and above a floor, reclaim down to a watermark, capped per pass) so an active firehose can't thrash. incremental_vacuum is a no-op where auto_vacuum!=INCREMENTAL, so the pass is safe on hub.db and legacy shards. `hub-server db vacuum` now sets auto_vacuum=INCREMENTAL before the rebuild, doubling as the one-time legacy-shard converter. Tests (locally -race clean): fresh shard is INCREMENTAL; a pass reclaims free pages (bounded by the cap); no reclaim below threshold; safe no-op on a NONE store; WAL truncated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5b1fe92 to
906f847
Compare
physercoe
pushed a commit
that referenced
this pull request
Jun 14, 2026
Hub robustness sweep (#74–#79) + Projects-tab segmented sub-tabs. - ADR-045 D4 storage maintenance (#288) - raw-SQL-error no-leak sweep (#280/#283), rows.Err audit (#286), FTS/routing status codes (#287), owner-or-steward gate (#281), read-pool cap + rows.Close defer (#292), additive pagination (#293) - segmented Projects | Workspaces tabs (#289) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the storage-hygiene half of #79 via a new ADR-045 D4 (amends D2). Reasoned through with the director (incremental vs periodic full VACUUM): incremental wins decisively for an always-on small VPS with long-lived SSE readers.
Two problems, two mechanisms (not conflated)
-wal. Fixed by periodicwal_checkpoint(TRUNCATE), not VACUUM.auto_vacuum=INCREMENTAL+ boundedincremental_vacuum— not fullVACUUM(~2× disk + global write lock for O(DB-size), hostile to a 2 GB VPS, and it fights the SSE readers).Change
auto_vacuum=INCREMENTAL— pragma rides the schema-creating writer conn (openStorePool);hub.dbkeeps freelist reuse (low delete volume).runStoreMaintenanceloop (same ctx lifetime as the other sweeps;HUB_STORE_MAINTENANCE_DISABLE,HUB_STORE_MAINTENANCE_INTERVALdefault 5 m). Per open shard writer (hub.db+ each open team's events/digest):wal_checkpoint(TRUNCATE)then a hysteresis-gatedincremental_vacuum(≥25 % free and above a floor → reclaim down to a watermark, capped per pass). No-op whereauto_vacuum≠INCREMENTAL, so safe onhub.db/legacy. Evicted teams checkpoint on pool close.hub-server db vacuumnow setsauto_vacuum=INCREMENTALbefore the rebuild → doubles as the one-time legacy-shard converter. Full VACUUM stays operator-only/offline.Tests (locally
-raceclean —internal/serverrace suite green, 456 s)TestNewShardIsIncrementalAutoVacuum,TestMaintainStoreReclaimsFreePages(+per-pass cap),TestMaintainStoreNoReclaimBelowThreshold,TestMaintainStoreSafeOnNonIncremental,TestMaintainStoreTruncatesWAL.go build/vetclean.🤖 Generated with Claude Code