Skip to content

feat(memory): add --summary mode + volume fan-out signal to recall#77

Merged
gupsammy merged 3 commits into
mainfrom
feat/recall-summary-fanout
Jun 2, 2026
Merged

feat(memory): add --summary mode + volume fan-out signal to recall#77
gupsammy merged 3 commits into
mainfrom
feat/recall-summary-fanout

Conversation

@gupsammy

@gupsammy gupsammy commented Jun 2, 2026

Copy link
Copy Markdown
Owner

What

Adds a --summary retrieval mode and a mechanical volume-based fan-out signal to the recall-conversations skill, so broad/retrospective queries stop flooding the main context.

Why

A taxonomy of 61 real historical recall invocations showed two populations:

  • ~72% are continuation-restore / specific-lookup — small targeted answers; fan-out would only add latency.
  • ~30% are broad-retro-synthesis / gap-antipattern / decision-extraction — high blast radius, heavy consume.

Investigating the heavy cluster surfaced the load-bearing fact: the recall scripts emit full untruncated transcripts (formatting.py renders every m.content). The precomputed context_summary column exists (94% populated, written at import time by summarizer.py) but recall never read it. So recent_chats.py --limit 20 --verbose (the run-retro recipe) was pouring ~20 full transcripts into context.

How — two tiers, gated by measured volume (not session count)

  1. --summary (tier 1): emit the precomputed context_summary instead of full content. Single-pass, free at query time, skips the per-message query. Measured ~3x reduction (309,798 → 85,885 chars on a 20-session retro).
  2. Fan out (tier 2): only when even summaries overflow. The scripts emit a two-tier signal via a shared cli_common.volume_flags() helper — summary_suggested (large full pull → use --summary) and fanout_suggested (large even after --summary → fan out), plus an INFO: stderr nudge in markdown mode. SKILL.md step 4 documents per-project general-purpose fan-out (shard by project, never by arbitrary count — count splits sever a decision/antipattern thread).

Triggering keys on lens type + measured output volume, never raw session count — continuation-restore and specific-lookup (the 72%) never escalate.

Changes

  • scripts/memory_lib/cli_common.py--summary flag, FANOUT_SUGGEST_CHARS, volume_signal(), volume_flags() (single source of truth for both scripts' JSON meta), emit_volume_signal().
  • scripts/recent_chats.py / search_conversations.pysummary routing + JSON meta via volume_flags().
  • scripts/memory_lib/formatting.py — summary branch in format_markdown_session (early return; placeholder for empty summary).
  • SKILL.md — gated step-4 escalation + Agent in allowed-tools.
  • references/tool-reference.md--summary rows + signal docs.

Testing

  • New tests/claude-memory/test_summary_mode.py — 30 tests: volume_signal boundaries (49999/50000/50001), formatter summary branch, summary-mode routing in both scripts (via the in-memory memory_db fixture), and the JSON-meta truth table called through the shared volume_flags() helper (guards production, not a copy).
  • Fixed 2 stale footer assertions in test_context_injection.py (the footer now emits the canonical claude-memory:recall-conversations name).
  • Full tests/claude-memory suite: 448 passed, 0 failed.

Notes

  • Version auto-bumped 0.8.104 → 0.8.105 by the auto-version pre-commit hook.
  • Adding Agent to allowed-tools widens the skill's tool surface; it is gated behind the rare tier-2 path so the common 72% of calls are unaffected.

recall scripts emit full transcripts, flooding context on broad/retro queries. Add a --summary flag (recent_chats + search_conversations) that emits the precomputed context_summary column instead of full message content (~3x smaller, single-pass, skips the per-message query).

Add a mechanical two-tier volume signal via a shared cli_common.volume_flags() helper: summary_suggested (large full pull -> use --summary) and fanout_suggested (large even after --summary -> fan out by project), plus an INFO stderr nudge in markdown mode. The orchestrator escalates by measured output volume, not raw session count.

SKILL.md gains a gated step-4 escalation (per-project general-purpose fan-out) and Agent in allowed-tools; tool-reference.md documents the flag and signals.

Tests: add tests/claude-memory/test_summary_mode.py (30 tests covering volume_signal boundaries, the formatter summary branch, summary-mode routing in both scripts, and the JSON meta truth table via the shared helper). Fix 2 stale footer assertions in test_context_injection.py (footer now uses the canonical skill name).
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @gupsammy's task in 5m 12s —— View job


PR Review: feat(memory): add --summary mode + volume fan-out signal to recall

Tasks

  • Gather context and understand the request
  • Fetch PR diff and read changed files
  • Review Python scripts against stdlib/quality criteria
  • Review SKILL.md and tool-reference.md changes
  • Review test coverage
  • Post comprehensive review findings

What looks good

Python rules compliance — All changed .py files open with from __future__ import annotations. No third-party imports anywhere. No RETURNING clause in SQL. open_db_or_exit retains WAL mode + busy_timeout = 5000. Dynamic IN-clause placeholders use ",".join("?" * len(...)). FTS sanitization is untouched. ✓

Single source of truthvolume_flags() and volume_signal() live in cli_common.py and both scripts import them rather than duplicating the logic. The test truth-table calls volume_flags() directly, so it guards production derivation rather than a copy. Smart design choice. ✓

Summary branch in format_markdown_session — The session.get("summary") is not None check correctly distinguishes summary mode (key present, possibly "") from full mode (key absent). The early return correctly skips the full message rendering block. The empty-summary placeholder is clear. ✓

Volume threshold is mechanical — Gated on measured character count, not session count. The "continuation-restore" and "specific-lookup" paths (the 72%) will never trigger escalation regardless of how many sessions they touch. This is the right call. ✓

Mutual exclusivitysummary_suggested and fanout_suggested are mutually exclusive by construction (is_large and not summary_mode vs is_large and summary_mode). Pinned explicitly by test_flags_are_mutually_exclusive_when_large. ✓

Version syncplugin.json and marketplace.json both bumped to 0.8.105. README badges updated. ✓

Test coverage — 30 tests covering boundary conditions (49999/50000/50001), both modes, NULL DB values, JSON round-trip serialisability, and the full flag truth table. test_context_injection.py stale-footer fixes are correct (the hook now emits claude-memory:recall-conversations). ✓


One issue to fix before merge

b.context_summary lacks the same PRAGMA guard as b.tool_counts in recent_chats.py

The existing has_tool_counts PRAGMA check guards b.tool_counts from being selected on old DBs that predate the column. b.context_summary was added in the same _migrate_columns batch (lines 385–386 of db.py), but this PR hardcodes it into the static SQL string:

# recent_chats.py — after this PR
tool_counts_col = ", b.tool_counts" if has_tool_counts else ""
sql = f"""
    SELECT ..., b.id as branch_db_id, b.context_summary{tool_counts_col}
    ...
"""

If has_tool_counts is False (unmigrated DB from before the _migrate_columns batch), the query will fail with no such column: b.context_summary, rather than the graceful fallback the has_tool_counts pattern was designed to provide. This is only a concern for a DB that was created before both columns were introduced and has never been opened by the import hook — low probability but real.

The least-invasive fix is to gate context_summary on the same flag, since both columns are added together:

# recent_chats.py — suggested fix
context_summary_col = ", b.context_summary" if has_tool_counts else ""
tool_counts_col = ", b.tool_counts" if has_tool_counts else ""
sql = f"""
    SELECT ..., b.id as branch_db_id{context_summary_col}{tool_counts_col}
    ...
"""

Then in the unpacking:

if has_tool_counts:
    (..., branch_db_id, context_summary, tool_counts_json) = session
else:
    (..., branch_db_id) = session
    context_summary = None
    tool_counts_json = None

The same fix applies to both SQL paths in search_conversations.py (lines 76–79 and 101–104). Fix this →


Suggestions (non-blocking)

total_messages: 0 in JSON summary modeformat_json_sessions computes total_messages = sum(len(s.get("messages", [])) for s in sessions). In summary mode the messages key is absent, so this always emits 0. JSON consumers could mistake it for "session had no messages" rather than "messages weren't fetched." A null or omitting the field in summary mode would be less misleading, though it's a breaking change to the JSON schema if anything downstream consumes it. At minimum, the field could be documented in tool-reference.md.

SKILL.md step 4 verbosity — The new step runs ~200 words across two sub-bullets. The CLAUDE.md guidance is "terse and operational." The fanout guidance is useful but could drop the worked-out reasoning ("count-based splits sever a decision/antipattern thread") and keep just the actionable imperative. The prose rationale belongs in the PR description (where it already lives, well).

--summary --verbose interaction — Works correctly (verbose metadata renders before the summary block, full messages skipped). Not documented in tool-reference.md or the EXAMPLES block, so the combination may surprise users. A one-line note under the --summary row would help.


Verdict

The core design is sound and the implementation is clean. The PRAGMA guard asymmetry is the only change I'd call required before merge — everything else is polishing. 30-test suite with 448 passing makes this easy to ship once that guard is addressed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8fdf9fe307

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

b.files_modified, b.commits, s.git_branch,
p.name as project, p.path as project_path,
b.id as branch_db_id{tool_counts_col}
b.id as branch_db_id, b.context_summary{tool_counts_col}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard summary column reads on older databases

For an existing conversations.db that predates the context_summary migration and has not yet been opened by the setup/sync path, even a normal recent_chats.py run without --summary now fails with no such column: b.context_summary because this SELECT always reads the new column. open_db_or_exit() only opens the DB and does not run _migrate_columns, while the previous tool_counts read stayed compatible by probing PRAGMA table_info; please either migrate before querying or gate/fallback this column read, and apply the same pattern in search_conversations.py.

Useful? React with 👍 / 👎.

Both PR reviewers (Codex P2 + Claude) flagged that the new SELECT added b.context_summary unconditionally, so a normal recall on a DB predating the context_summary migration (not yet opened by import/sync, which runs _migrate_columns) would crash with 'no such column'. The sibling tool_counts column was already PRAGMA-guarded.

Probe context_summary independently via PRAGMA table_info in both scripts (more robust than coupling to has_tool_counts) and use index-based row unpacking to handle any column-presence combination. Full mode is unaffected on old DBs; summary mode falls back to ''. Adds 3 regression tests (ALTER TABLE DROP COLUMN simulates a pre-migration DB).
@gupsammy

gupsammy commented Jun 2, 2026

Copy link
Copy Markdown
Owner Author

Addressed the one review finding (raised by both Codex P2 and Claude) in 2570a47.

Issue: recent_chats.py / search_conversations.py added b.context_summary to the SELECT unconditionally, so a normal recall on a DB predating the context_summary migration (not yet opened by the import/sync path, which runs _migrate_columns) would fail with no such column — even without --summary.

Fix: probe context_summary independently via PRAGMA table_info(branches) in both scripts (rather than coupling to the has_tool_counts flag, so the two columns can't drift) and use index-based row unpacking to handle any column-presence combination. Full mode is unaffected on old DBs; summary mode falls back to "". Added 3 regression tests that ALTER TABLE ... DROP COLUMN to simulate a pre-migration DB. Full suite: 451 passed.

@codex review

@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @gupsammy's task in 1m 40s —— View job


Claude finished reviewing PR #77


Tasks

  • Gather context and understand the PR
  • Fetch PR diff and read changed files (especially fix in 2570a47)
  • Verify PRAGMA guard fix in recent_chats.py and search_conversations.py
  • Review new regression tests for pre-migration DB simulation
  • Post comprehensive review findings

Fix Verification: guard context_summary read for unmigrated DBs (2570a47)

The fix correctly resolves the blocker flagged in both prior reviews. Here's the detailed breakdown:

recent_chats.py — correct and robust

The fix probes context_summary independently (rather than coupling to has_tool_counts), so the two optional columns can evolve separately without risk of divergence:

has_context_summary = "context_summary" in branch_columns  # line 70
context_summary_col = ", b.context_summary" if has_context_summary else ""  # line 75

Index-based unpacking correctly tracks column positions across all four combinations:

has_context_summary has_tool_counts col at tool_counts read Result
True True 12 session[12] → tool_counts
False True 11 session[11] → tool_counts
True False None
False False None

search_conversations.py — correct and simpler

search_conversations.py never reads tool_counts, so there's only one optional column to handle. The pattern is clean:

(_session_id, uuid, ..., branch_db_id) = session[:9]  # line 130–131
context_summary = session[9] if has_context_summary else None  # line 132

The PRAGMA probe is placed before the FTS/LIKE branch selection, so the guard covers both code paths without duplication. ✓

Regression tests — complete

TestMissingContextSummaryColumn (3 tests) uses ALTER TABLE branches DROP COLUMN context_summary to simulate an unmigrated DB, which is the correct technique — it drops the column after seeding so the structural absence is genuine, not just a NULL value. Covers:

  • test_recent_full_mode_survives_missing_column — normal recall returns full messages, no crash ✓
  • test_recent_summary_mode_falls_back_to_empty--summary on pre-migration DB yields ""
  • test_search_summary_mode_survives_missing_column — LIKE search with --summary degrades gracefully ✓

FTS path isn't tested separately, but the PRAGMA probe happens before the FTS/LIKE branch, so coverage is sufficient.


Status of prior non-blocking suggestions

These three remain unaddressed from the initial review — all still non-blocking:

  1. total_messages: 0 in JSON summary mode (formatting.py:142) — sum(len(s.get("messages", [])) ...) always yields 0 in summary mode since the messages key is absent. A null or omitting the field would be less ambiguous for JSON consumers.
  2. SKILL.md step 4 verbosity — the two fan-out sub-bullets still run ~200 words; CLAUDE.md asks for terse operational prose. The rationale ("count-based splits sever a decision thread") belongs in the PR description rather than the skill file.
  3. --summary --verbose interaction undocumented — the combination works correctly but isn't mentioned in tool-reference.md or the EXAMPLES block.

Verdict

The required fix is correctly implemented and thoroughly tested. The independent has_context_summary probe (rather than reusing has_tool_counts) is actually a stronger approach than the one originally suggested — it makes each column's availability explicit and prevents future coupling bugs. Suite at 451 passed, 0 failed.

This PR is ready to merge. The three non-blocking suggestions above can be addressed in a follow-up if desired.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…summary --verbose

Addresses two non-blocking review suggestions. format_json_sessions reported total_messages: 0 in summary mode (sessions carry 'summary', not 'messages') — misleading for JSON consumers; now reports total_summaries in summary mode and total_messages in full mode. tool-reference.md documents that --summary composes with --verbose. Adds 2 counter tests. Suite: 453 passed. (Left SKILL.md step-4 prose intact — rule+reasoning is deliberate per CLAUDE.md authoring principle.)
@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @gupsammy's task in 1m 56s —— View job


Tasks

  • Gather context and understand the PR
  • Fetch PR diff and read changed files (including latest commit c643004)
  • Review the new commit addressing non-blocking suggestions
  • Post comprehensive review findings

Review: Latest commit c643004

This commit addresses two of the three non-blocking suggestions from the prior review. Here's the detailed breakdown:

format_json_sessions — correct and clean

The total_messages: 0 mislead is fixed by detecting retrieval mode from the session shape rather than a separate flag, which is the right call — the format_json_sessions function has no summary_mode argument and adding one would widen its signature unnecessarily.

if any("summary" in s for s in sessions):
    output["total_summaries"] = sum(1 for s in sessions if "summary" in s)
else:
    output["total_messages"] = sum(len(s.get("messages", [])) for s in sessions)

The key-existence heuristic is reliable because "summary" is only populated by the summary-mode code path; full mode never sets this key. sessions = [] correctly falls to the else branch and emits total_messages: 0. ✓

The inline comment explains why the counter is mode-dependent — a non-obvious invariant that belongs exactly here. ✓

tool-reference.md — accurate and concise

--summary composes with --verbose: you still get the files-modified / commits / tool-counts metadata header, only the conversation body is replaced by the precomputed summary. In JSON, summary mode reports total_summaries instead of total_messages.

Verified against format_markdown_session (line 98–118 verbose block runs unconditionally, summary early-return at line 120 follows). The doc matches the code. ✓

Two new tests — sufficient

TestJsonSessionCounters covers the full/summary split cleanly. A mixed-session list (some with "summary", some without) isn't tested, but never occurs in practice — both scripts apply --summary uniformly. No gap. ✓

Version bump — 0.8.106 → 0.8.107, synced in both plugin.json and marketplace.json. ✓


Status of all prior findings

Finding Status
PRAGMA guard asymmetry (b.context_summary without has_context_summary probe) ✅ Fixed in 2570a47 — independent probe, index-based unpacking, 3 regression tests
total_messages: 0 in JSON summary mode ✅ Fixed in c643004 — now emits total_summaries in summary mode
--summary --verbose interaction undocumented ✅ Fixed in c643004 — one-liner added to tool-reference.md
SKILL.md step-4 verbosity ⚠️ Deliberately left intact (commit message: "rule+reasoning is deliberate per CLAUDE.md authoring principle") — non-blocking

The one remaining item is a judgment call the author made consciously. The fan-out sub-bullets contain operational reasoning the agent needs to choose correctly between shard-by-project and shard-by-count — it can be argued to earn its token budget. Non-blocking. ✓


Verdict

All required changes from prior reviews are correctly implemented. Suite at 453 passed, 0 failed. This PR is ready to merge.

@gupsammy gupsammy merged commit e99c982 into main Jun 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant