bug(storage): change_log grows forever — no retention or pruning

## TL;DR

The `change_log` table is append-only and **never pruned**. Because DB triggers append a row on every session/PR/CI **update** (not just inserts), and the SCM observer + lifecycle rewrite those rows continuously, the SQLite file under `~/.ao/data` grows without bound for as long as the daemon runs. There's no retention window, row cap, prune query, or `VACUUM` anywhere in the codebase.

> Not a duplicate: the existing CDC/storage issues (#110, #104, #266) and #269 don't touch retention.

---

## Background

`change_log` is the durable CDC log that powers every live update in the app. The flow is:

1. SQLite **triggers** append a row to `change_log` whenever a watched row changes.
2. The **poller** (`internal/cdc`) tails the table every 100ms (`DefaultPollInterval`) and fans events out to subscribers.
3. Clients **replay** missed events on reconnect via `/events?after=<seq>` / `Last-Event-ID`.

This design is great — the problem is purely that nothing ever removes old rows.

## Why it grows without bound

The high-volume writers are the **update** triggers, not the inserts:

- `migrations/0001_init.sql` defines `sessions_cdc_update`, `pr_cdc_update`, and `pr_checks_cdc_update`. Each fires `INSERT INTO change_log` on **every row update**.
- The SCM observer (`internal/observe/scm/observer.go`) polls PRs/CI on `DefaultTickInterval` and persists PR + check facts each cycle → a new row per changed fact.
- Lifecycle/activity-state transitions update `sessions` continuously while an agent is active.

So a single long-lived session with an open PR under active CI emits a steady drip of rows, forever.

## What's missing

- `queries/changelog.sql` only has `ReadChangeLogAfter` + `MaxChangeLogSeq` — **no delete/prune query**.
- The one targeted delete (`DeleteSession`) is explicitly a no-op for live sessions and only clears one session's rows on hard delete.
- No `VACUUM`, retention job, or row cap exists anywhere. (`grep -ri 'retention\|prune\|compact' internal/` returns nothing relevant.)

## Impact

- Unbounded SQLite + WAL growth on the user's disk over weeks/months of normal use.
- The base table and `idx_change_log_project` keep growing; the DB file never shrinks even after sessions are deleted.

## Why retention is safe here

The poller keeps only an in-memory cursor and `SeekToHead`s on restart. Durable replay is bounded by what clients actually request (`after` seq). Events older than the lowest live cursor — and certainly older than a generous age/seq window — are never needed again. A bounded policy does **not** break the replay contract.

## Suggested fix (small + additive)

1. Add a slow-ticker retention sweep on the daemon (well off the 100ms poll path): `DELETE FROM change_log WHERE seq <= ?` for a watermark behind head (by `created_at` age, or a fixed seq lag).
2. Add the prune query to `queries/changelog.sql` and regen sqlc — or run it via `tx.ExecContext` like the existing `DeleteSession` delete, given the documented sqlc placeholder quirk in that file.
3. Note the retention guarantee in `docs/architecture.md` next to the existing CDC description.

Staying additive (new query + sweeper) avoids touching already-merged migrations, per the repo's hard rules.

## Acceptance criteria

- [ ] `change_log` stops growing without bound under sustained session/PR/CI activity.
- [ ] Durable CDC replay still works for any client cursor inside the retention window.
- [ ] Test covers: rows past the watermark are deleted; rows within it (and unread by the poller cursor) are retained.

---
<sub>Filed after reviewing all open/closed issues and PRs — this retention gap isn't tracked anywhere else.</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(storage): change_log grows forever — no retention or pruning #302

TL;DR

Background

Why it grows without bound

What's missing

Impact

Why retention is safe here

Suggested fix (small + additive)

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug(storage): change_log grows forever — no retention or pruning #302

Description

TL;DR

Background

Why it grows without bound

What's missing

Impact

Why retention is safe here

Suggested fix (small + additive)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions