Skip to content

fix: summarize conversations only once idle, and refresh summaries as they grow#109

Open
minyek wants to merge 1 commit into
obra:mainfrom
minyek:fix/summarizer-stale-partial-resummary
Open

fix: summarize conversations only once idle, and refresh summaries as they grow#109
minyek wants to merge 1 commit into
obra:mainfrom
minyek:fix/summarizer-stale-partial-resummary

Conversation

@minyek
Copy link
Copy Markdown
Contributor

@minyek minyek commented Jun 7, 2026

Description

Summaries are write-once. A conversation that gets summarized while still in
progress keeps a summary of only its early portion forever — every later turn
is invisible to the summary. For long-running or resumed conversations, the
one-line summary shown next to search results is therefore misleading.

The fix has two complementary halves: defer summarizing a conversation until
it has gone idle (so a still-active conversation is never frozen into a partial
snapshot), and refresh a summary when its transcript later grows. Note the
behavioral change from the defer half: a conversation is no longer summarized
until it has been idle for the quiescence window (EPISODIC_MEMORY_SUMMARY_QUIESCENCE_HOURS,
default 1h), so very recent conversations intentionally have no summary yet.

(Search results themselves were always complete — search runs over indexed
exchanges, kept fresh via INSERT OR REPLACE. This fixes only the summary
displayed alongside them.)

Symptom

There is no error in the logs — the symptom is silently wrong output:

  • <archive>/<project>/<session>-summary.txt for an active or long conversation
    describes only the first handful of exchanges.
  • Continuing the conversation over hours/days never updates it.
  • stats/verify count the file as "summarized", so it is never revisited.

Environment

  • episodic-memory v1.4.2
  • Claude Agent SDK summarizer
  • All installs affected; most visible to anyone who keeps long or resumed
    sessions.

Steps to Reproduce

  1. Start a conversation and let a sync summarize it early (a few exchanges).
  2. Continue the conversation for many more turns over time.
  3. Run sync again (or let the SessionStart background sync run).
  4. Inspect <archive>/<project>/<session>-summary.txt.
  5. Observe it still summarizes only the early portion — never re-summarized,
    even though the transcript has grown substantially.

Root Cause

shouldQueueForSummary only re-queued a conversation when its sentinel was
missing or a stale error marker. Once a real summary existed, the
conversation was treated as done permanently:

  1. No coverage signal. Nothing recorded how much of the transcript a given
    summary reflected, so transcript growth was undetectable.
  2. Append-only transcripts. A summary written at exchange 5 is never
    refreshed at exchange 500 — the file only ever grows, but nothing acted on
    that growth.
  3. No idle awareness. Even if growth had been detected, summarizing an
    in-progress conversation would freeze a mid-session snapshot that is itself
    immediately stale.

Fix Summary

  1. Coverage header. Real summaries now carry a machine-readable first line —
    __COVERAGE__ {"bytes":<n>,"lastExchange":"<iso>","schema":1} — recording the
    archived JSONL byte size the summary reflects (formatSummaryFile /
    parseSummaryFile, buildCoverage, writeSummary).
  2. Growth-driven re-queue. shouldQueueForSummary(path, currentArchiveBytes)
    re-queues a real summary when the archive has grown past its covered bytes
    (append-only ⇒ a size increase means new content).
  3. Quiescence gate. isQuiescent + EPISODIC_MEMORY_SUMMARY_QUIESCENCE_HOURS
    (default 1) only (re)summarize after the conversation has been idle for the
    window, so a mid-session snapshot is never frozen. Idle is measured from the
    last exchange's timestamp — the archive copy's mtime is reset on every
    sync and is unreliable.
  4. Gentle upgrade. Legacy header-less summaries are stamped with a
    current-size baseline (ensureCoverageBaseline) — a header rewrite, no LLM
    call
    — so existing installs gain a growth baseline without a mass
    re-summarization on upgrade.
  5. Preserve on failure. A re-summary failure now keeps the prior real
    summary; an error sentinel is written only for a first-ever failure, so a
    transient SDK error never discards a good summary (writeErrorSentinelIfNew).
  6. Display. search strips the coverage header from the summary shown
    alongside results.

Internally, the gate + write + error-sentinel logic is now shared
(needsSummary, writeErrorSentinelIfNew, summarizeIfQuiescent) across the
three indexer paths and the sync loop, and the _HOURS config defaults convert
to milliseconds at a single boundary (MS_PER_HOUR).

Tests added

  • test/summary-coverage.test.ts — coverage header round-trip and legacy baseline stamping.
  • test/summary-quiescence.test.ts — quiescence verdicts and _HOURS env parsing.
  • test/summary-sentinel.test.ts — growth-aware gate decision table; needsSummary / writeErrorSentinelIfNew.
  • test/sync-resummary.test.ts — growth-driven re-summary, no-op when unchanged, preserve-on-failure.
  • test/search-summary-strip.test.ts — the header never leaks into the displayed summary.
  • Updates to sync-error-sentinel, show, and sync tests for the new header/gate.

Fix Verified

  • Full suite: 242 tests across 40 files pass.
  • Real end-to-end on the built dist (isolated temp dirs, real summarizer):
    • Seed an 8 237-byte conversation → first sync writes a summary stamped "bytes":8237.
    • Grow the transcript to 41 069 bytes → next sync re-summarizes; coverage re-stamped "bytes":41069.
    • No further growth → next sync performs no re-summary.

Caveat / Notes

  • Re-summary is triggered by byte growth, not an exchange-count delta. This
    is deliberate: transcripts are append-only and the per-sync coverage scan is
    ~23 ms even against a multi-GB archive, so no recency cutoff or minimum-delta
    threshold is needed. The quiescence gate bounds work to one re-summary per
    idle-after-growth episode.

… they grow

Summaries were write-once and could freeze a still-active conversation into a partial mid-session snapshot that was then never refreshed, so the one-line summary shown beside search results for long or resumed conversations was misleading. This fixes both halves: defer summarizing until a conversation has gone idle, and refresh a summary when its transcript later grows. (Search itself was always complete — it indexes exchanges; this fixes only the displayed summary.)

- Coverage header `__COVERAGE__ {bytes,lastExchange,schema}` records the archived JSONL byte size a summary reflects.
- shouldQueueForSummary re-queues a real summary once the archive grows past its covered bytes (transcripts are append-only).
- Quiescence gate (isQuiescent + EPISODIC_MEMORY_SUMMARY_QUIESCENCE_HOURS, default 1) only (re)summarizes after the conversation has been idle, measured from the last exchange's timestamp, so a mid-session snapshot is never frozen.
- Legacy header-less summaries get a no-LLM baseline stamp on upgrade — no mass re-summarization.
- A re-summary failure preserves the prior real summary; an error sentinel is written only for a first-time failure.
- search strips the coverage header from the displayed summary.

Gating and writing are shared (needsSummary, writeErrorSentinelIfNew, summarizeIfQuiescent) across the indexer paths and the sync loop; the _HOURS config defaults convert to ms at a single boundary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant