TL;DR
The change_log table is append-only and never pruned. Because DB triggers append a row on every session/PR/CI update (not just inserts), and the SCM observer + lifecycle rewrite those rows continuously, the SQLite file under ~/.ao/data grows without bound for as long as the daemon runs. There's no retention window, row cap, prune query, or VACUUM anywhere in the codebase.
Not a duplicate: the existing CDC/storage issues (#110, #104, #266) and #269 don't touch retention.
Background
change_log is the durable CDC log that powers every live update in the app. The flow is:
- SQLite triggers append a row to
change_log whenever a watched row changes.
- The poller (
internal/cdc) tails the table every 100ms (DefaultPollInterval) and fans events out to subscribers.
- Clients replay missed events on reconnect via
/events?after=<seq> / Last-Event-ID.
This design is great — the problem is purely that nothing ever removes old rows.
Why it grows without bound
The high-volume writers are the update triggers, not the inserts:
migrations/0001_init.sql defines sessions_cdc_update, pr_cdc_update, and pr_checks_cdc_update. Each fires INSERT INTO change_log on every row update.
- The SCM observer (
internal/observe/scm/observer.go) polls PRs/CI on DefaultTickInterval and persists PR + check facts each cycle → a new row per changed fact.
- Lifecycle/activity-state transitions update
sessions continuously while an agent is active.
So a single long-lived session with an open PR under active CI emits a steady drip of rows, forever.
What's missing
queries/changelog.sql only has ReadChangeLogAfter + MaxChangeLogSeq — no delete/prune query.
- The one targeted delete (
DeleteSession) is explicitly a no-op for live sessions and only clears one session's rows on hard delete.
- No
VACUUM, retention job, or row cap exists anywhere. (grep -ri 'retention\|prune\|compact' internal/ returns nothing relevant.)
Impact
- Unbounded SQLite + WAL growth on the user's disk over weeks/months of normal use.
- The base table and
idx_change_log_project keep growing; the DB file never shrinks even after sessions are deleted.
Why retention is safe here
The poller keeps only an in-memory cursor and SeekToHeads on restart. Durable replay is bounded by what clients actually request (after seq). Events older than the lowest live cursor — and certainly older than a generous age/seq window — are never needed again. A bounded policy does not break the replay contract.
Suggested fix (small + additive)
- Add a slow-ticker retention sweep on the daemon (well off the 100ms poll path):
DELETE FROM change_log WHERE seq <= ? for a watermark behind head (by created_at age, or a fixed seq lag).
- Add the prune query to
queries/changelog.sql and regen sqlc — or run it via tx.ExecContext like the existing DeleteSession delete, given the documented sqlc placeholder quirk in that file.
- Note the retention guarantee in
docs/architecture.md next to the existing CDC description.
Staying additive (new query + sweeper) avoids touching already-merged migrations, per the repo's hard rules.
Acceptance criteria
Filed after reviewing all open/closed issues and PRs — this retention gap isn't tracked anywhere else.
TL;DR
The
change_logtable is append-only and never pruned. Because DB triggers append a row on every session/PR/CI update (not just inserts), and the SCM observer + lifecycle rewrite those rows continuously, the SQLite file under~/.ao/datagrows without bound for as long as the daemon runs. There's no retention window, row cap, prune query, orVACUUManywhere in the codebase.Background
change_logis the durable CDC log that powers every live update in the app. The flow is:change_logwhenever a watched row changes.internal/cdc) tails the table every 100ms (DefaultPollInterval) and fans events out to subscribers./events?after=<seq>/Last-Event-ID.This design is great — the problem is purely that nothing ever removes old rows.
Why it grows without bound
The high-volume writers are the update triggers, not the inserts:
migrations/0001_init.sqldefinessessions_cdc_update,pr_cdc_update, andpr_checks_cdc_update. Each firesINSERT INTO change_logon every row update.internal/observe/scm/observer.go) polls PRs/CI onDefaultTickIntervaland persists PR + check facts each cycle → a new row per changed fact.sessionscontinuously while an agent is active.So a single long-lived session with an open PR under active CI emits a steady drip of rows, forever.
What's missing
queries/changelog.sqlonly hasReadChangeLogAfter+MaxChangeLogSeq— no delete/prune query.DeleteSession) is explicitly a no-op for live sessions and only clears one session's rows on hard delete.VACUUM, retention job, or row cap exists anywhere. (grep -ri 'retention\|prune\|compact' internal/returns nothing relevant.)Impact
idx_change_log_projectkeep growing; the DB file never shrinks even after sessions are deleted.Why retention is safe here
The poller keeps only an in-memory cursor and
SeekToHeads on restart. Durable replay is bounded by what clients actually request (afterseq). Events older than the lowest live cursor — and certainly older than a generous age/seq window — are never needed again. A bounded policy does not break the replay contract.Suggested fix (small + additive)
DELETE FROM change_log WHERE seq <= ?for a watermark behind head (bycreated_atage, or a fixed seq lag).queries/changelog.sqland regen sqlc — or run it viatx.ExecContextlike the existingDeleteSessiondelete, given the documented sqlc placeholder quirk in that file.docs/architecture.mdnext to the existing CDC description.Staying additive (new query + sweeper) avoids touching already-merged migrations, per the repo's hard rules.
Acceptance criteria
change_logstops growing without bound under sustained session/PR/CI activity.Filed after reviewing all open/closed issues and PRs — this retention gap isn't tracked anywhere else.