Skip to content

bug(storage): change_log grows forever — no retention or pruning #302

@areycruzer

Description

@areycruzer

TL;DR

The change_log table is append-only and never pruned. Because DB triggers append a row on every session/PR/CI update (not just inserts), and the SCM observer + lifecycle rewrite those rows continuously, the SQLite file under ~/.ao/data grows without bound for as long as the daemon runs. There's no retention window, row cap, prune query, or VACUUM anywhere in the codebase.

Not a duplicate: the existing CDC/storage issues (#110, #104, #266) and #269 don't touch retention.


Background

change_log is the durable CDC log that powers every live update in the app. The flow is:

  1. SQLite triggers append a row to change_log whenever a watched row changes.
  2. The poller (internal/cdc) tails the table every 100ms (DefaultPollInterval) and fans events out to subscribers.
  3. Clients replay missed events on reconnect via /events?after=<seq> / Last-Event-ID.

This design is great — the problem is purely that nothing ever removes old rows.

Why it grows without bound

The high-volume writers are the update triggers, not the inserts:

  • migrations/0001_init.sql defines sessions_cdc_update, pr_cdc_update, and pr_checks_cdc_update. Each fires INSERT INTO change_log on every row update.
  • The SCM observer (internal/observe/scm/observer.go) polls PRs/CI on DefaultTickInterval and persists PR + check facts each cycle → a new row per changed fact.
  • Lifecycle/activity-state transitions update sessions continuously while an agent is active.

So a single long-lived session with an open PR under active CI emits a steady drip of rows, forever.

What's missing

  • queries/changelog.sql only has ReadChangeLogAfter + MaxChangeLogSeqno delete/prune query.
  • The one targeted delete (DeleteSession) is explicitly a no-op for live sessions and only clears one session's rows on hard delete.
  • No VACUUM, retention job, or row cap exists anywhere. (grep -ri 'retention\|prune\|compact' internal/ returns nothing relevant.)

Impact

  • Unbounded SQLite + WAL growth on the user's disk over weeks/months of normal use.
  • The base table and idx_change_log_project keep growing; the DB file never shrinks even after sessions are deleted.

Why retention is safe here

The poller keeps only an in-memory cursor and SeekToHeads on restart. Durable replay is bounded by what clients actually request (after seq). Events older than the lowest live cursor — and certainly older than a generous age/seq window — are never needed again. A bounded policy does not break the replay contract.

Suggested fix (small + additive)

  1. Add a slow-ticker retention sweep on the daemon (well off the 100ms poll path): DELETE FROM change_log WHERE seq <= ? for a watermark behind head (by created_at age, or a fixed seq lag).
  2. Add the prune query to queries/changelog.sql and regen sqlc — or run it via tx.ExecContext like the existing DeleteSession delete, given the documented sqlc placeholder quirk in that file.
  3. Note the retention guarantee in docs/architecture.md next to the existing CDC description.

Staying additive (new query + sweeper) avoids touching already-merged migrations, per the repo's hard rules.

Acceptance criteria

  • change_log stops growing without bound under sustained session/PR/CI activity.
  • Durable CDC replay still works for any client cursor inside the retention window.
  • Test covers: rows past the watermark are deleted; rows within it (and unread by the poller cursor) are retained.

Filed after reviewing all open/closed issues and PRs — this retention gap isn't tracked anywhere else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority: mediumFix when convenientstoragePersistence lane

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions