Skip to content

feat(hub): automated WAL-checkpoint + incremental vacuum (ADR-045 D4, #79)#288

Merged
physercoe merged 1 commit into
mainfrom
feat/store-maintenance-d4
Jun 14, 2026
Merged

feat(hub): automated WAL-checkpoint + incremental vacuum (ADR-045 D4, #79)#288
physercoe merged 1 commit into
mainfrom
feat/store-maintenance-d4

Conversation

@physercoe

Copy link
Copy Markdown
Owner

Closes the storage-hygiene half of #79 via a new ADR-045 D4 (amends D2). Reasoned through with the director (incremental vs periodic full VACUUM): incremental wins decisively for an always-on small VPS with long-lived SSE readers.

Two problems, two mechanisms (not conflated)

  • WAL growth = reader pinning. SQLite's auto-checkpoint only resets the WAL up to the oldest live reader's snapshot; the hub's long-lived SSE streams keep it from reaching the head → unbounded -wal. Fixed by periodic wal_checkpoint(TRUNCATE), not VACUUM.
  • Free pages never returned to OS. Fixed by auto_vacuum=INCREMENTAL + bounded incremental_vacuum — not full VACUUM (~2× disk + global write lock for O(DB-size), hostile to a 2 GB VPS, and it fights the SSE readers).

Change

  • New event/digest shards born auto_vacuum=INCREMENTAL — pragma rides the schema-creating writer conn (openStorePool); hub.db keeps freelist reuse (low delete volume).
  • runStoreMaintenance loop (same ctx lifetime as the other sweeps; HUB_STORE_MAINTENANCE_DISABLE, HUB_STORE_MAINTENANCE_INTERVAL default 5 m). Per open shard writer (hub.db + each open team's events/digest): wal_checkpoint(TRUNCATE) then a hysteresis-gated incremental_vacuum (≥25 % free and above a floor → reclaim down to a watermark, capped per pass). No-op where auto_vacuum≠INCREMENTAL, so safe on hub.db/legacy. Evicted teams checkpoint on pool close.
  • hub-server db vacuum now sets auto_vacuum=INCREMENTAL before the rebuild → doubles as the one-time legacy-shard converter. Full VACUUM stays operator-only/offline.

Tests (locally -race clean — internal/server race suite green, 456 s)

TestNewShardIsIncrementalAutoVacuum, TestMaintainStoreReclaimsFreePages (+per-pass cap), TestMaintainStoreNoReclaimBelowThreshold, TestMaintainStoreSafeOnNonIncremental, TestMaintainStoreTruncatesWAL. go build/vet clean.

🤖 Generated with Claude Code

…s (ADR-045 D4, #79)

Two storage-hygiene gaps from the D2 store split (#79): WAL files grow
unbounded and freed pages never return to the OS. They are distinct
mechanisms, addressed separately.

- WAL growth is reader-pinning: the hub's long-lived SSE readers keep the
  auto-checkpoint from ever reaching the WAL head. Fixed by a periodic
  wal_checkpoint(TRUNCATE), not VACUUM.
- Reclamation uses auto_vacuum=INCREMENTAL (bounded, interleaves with
  readers), not full VACUUM (~2x disk + global write lock for an
  O(DB-size) duration — hostile to a small always-on VPS).

New event/digest shards are born auto_vacuum=INCREMENTAL (the pragma
rides the schema-creating writer connection in openStorePool; hub.db
keeps freelist reuse). A background loop (runStoreMaintenance, same ctx
lifetime as the other sweeps; HUB_STORE_MAINTENANCE_DISABLE / _INTERVAL)
checkpoints + runs a bounded incremental_vacuum with hysteresis (>=25%
free and above a floor, reclaim down to a watermark, capped per pass) so
an active firehose can't thrash. incremental_vacuum is a no-op where
auto_vacuum!=INCREMENTAL, so the pass is safe on hub.db and legacy
shards. `hub-server db vacuum` now sets auto_vacuum=INCREMENTAL before
the rebuild, doubling as the one-time legacy-shard converter.

Tests (locally -race clean): fresh shard is INCREMENTAL; a pass reclaims
free pages (bounded by the cap); no reclaim below threshold; safe no-op
on a NONE store; WAL truncated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@physercoe physercoe force-pushed the feat/store-maintenance-d4 branch from 5b1fe92 to 906f847 Compare June 14, 2026 04:35
@physercoe physercoe merged commit 20233cb into main Jun 14, 2026
4 checks passed
@physercoe physercoe deleted the feat/store-maintenance-d4 branch June 14, 2026 04:38
physercoe pushed a commit that referenced this pull request Jun 14, 2026
Hub robustness sweep (#74#79) + Projects-tab segmented sub-tabs.
- ADR-045 D4 storage maintenance (#288)
- raw-SQL-error no-leak sweep (#280/#283), rows.Err audit (#286),
  FTS/routing status codes (#287), owner-or-steward gate (#281),
  read-pool cap + rows.Close defer (#292), additive pagination (#293)
- segmented Projects | Workspaces tabs (#289)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant