feat(memory): add --summary mode + volume fan-out signal to recall#77
Conversation
recall scripts emit full transcripts, flooding context on broad/retro queries. Add a --summary flag (recent_chats + search_conversations) that emits the precomputed context_summary column instead of full message content (~3x smaller, single-pass, skips the per-message query). Add a mechanical two-tier volume signal via a shared cli_common.volume_flags() helper: summary_suggested (large full pull -> use --summary) and fanout_suggested (large even after --summary -> fan out by project), plus an INFO stderr nudge in markdown mode. The orchestrator escalates by measured output volume, not raw session count. SKILL.md gains a gated step-4 escalation (per-project general-purpose fan-out) and Agent in allowed-tools; tool-reference.md documents the flag and signals. Tests: add tests/claude-memory/test_summary_mode.py (30 tests covering volume_signal boundaries, the formatter summary branch, summary-mode routing in both scripts, and the JSON meta truth table via the shared helper). Fix 2 stale footer assertions in test_context_injection.py (footer now uses the canonical skill name).
|
Claude finished @gupsammy's task in 5m 12s —— View job PR Review: feat(memory): add --summary mode + volume fan-out signal to recallTasks
What looks goodPython rules compliance — All changed Single source of truth — Summary branch in Volume threshold is mechanical — Gated on measured character count, not session count. The "continuation-restore" and "specific-lookup" paths (the 72%) will never trigger escalation regardless of how many sessions they touch. This is the right call. ✓ Mutual exclusivity — Version sync — Test coverage — 30 tests covering boundary conditions (49999/50000/50001), both modes, NULL DB values, JSON round-trip serialisability, and the full flag truth table. One issue to fix before merge
The existing # recent_chats.py — after this PR
tool_counts_col = ", b.tool_counts" if has_tool_counts else ""
sql = f"""
SELECT ..., b.id as branch_db_id, b.context_summary{tool_counts_col}
...
"""If The least-invasive fix is to gate # recent_chats.py — suggested fix
context_summary_col = ", b.context_summary" if has_tool_counts else ""
tool_counts_col = ", b.tool_counts" if has_tool_counts else ""
sql = f"""
SELECT ..., b.id as branch_db_id{context_summary_col}{tool_counts_col}
...
"""Then in the unpacking: if has_tool_counts:
(..., branch_db_id, context_summary, tool_counts_json) = session
else:
(..., branch_db_id) = session
context_summary = None
tool_counts_json = NoneThe same fix applies to both SQL paths in Suggestions (non-blocking)
SKILL.md step 4 verbosity — The new step runs ~200 words across two sub-bullets. The CLAUDE.md guidance is "terse and operational." The fanout guidance is useful but could drop the worked-out reasoning ("count-based splits sever a decision/antipattern thread") and keep just the actionable imperative. The prose rationale belongs in the PR description (where it already lives, well).
VerdictThe core design is sound and the implementation is clean. The PRAGMA guard asymmetry is the only change I'd call required before merge — everything else is polishing. 30-test suite with 448 passing makes this easy to ship once that guard is addressed. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8fdf9fe307
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| b.files_modified, b.commits, s.git_branch, | ||
| p.name as project, p.path as project_path, | ||
| b.id as branch_db_id{tool_counts_col} | ||
| b.id as branch_db_id, b.context_summary{tool_counts_col} |
There was a problem hiding this comment.
Guard summary column reads on older databases
For an existing conversations.db that predates the context_summary migration and has not yet been opened by the setup/sync path, even a normal recent_chats.py run without --summary now fails with no such column: b.context_summary because this SELECT always reads the new column. open_db_or_exit() only opens the DB and does not run _migrate_columns, while the previous tool_counts read stayed compatible by probing PRAGMA table_info; please either migrate before querying or gate/fallback this column read, and apply the same pattern in search_conversations.py.
Useful? React with 👍 / 👎.
Both PR reviewers (Codex P2 + Claude) flagged that the new SELECT added b.context_summary unconditionally, so a normal recall on a DB predating the context_summary migration (not yet opened by import/sync, which runs _migrate_columns) would crash with 'no such column'. The sibling tool_counts column was already PRAGMA-guarded. Probe context_summary independently via PRAGMA table_info in both scripts (more robust than coupling to has_tool_counts) and use index-based row unpacking to handle any column-presence combination. Full mode is unaffected on old DBs; summary mode falls back to ''. Adds 3 regression tests (ALTER TABLE DROP COLUMN simulates a pre-migration DB).
|
Addressed the one review finding (raised by both Codex P2 and Claude) in Issue: Fix: probe @codex review |
|
Claude finished @gupsammy's task in 1m 40s —— View job Claude finished reviewing PR #77 Tasks
Fix Verification:
|
has_context_summary |
has_tool_counts |
col at tool_counts read | Result |
|---|---|---|---|
| True | True | 12 | session[12] → tool_counts ✓ |
| False | True | 11 | session[11] → tool_counts ✓ |
| True | False | — | None ✓ |
| False | False | — | None ✓ |
search_conversations.py — correct and simpler
search_conversations.py never reads tool_counts, so there's only one optional column to handle. The pattern is clean:
(_session_id, uuid, ..., branch_db_id) = session[:9] # line 130–131
context_summary = session[9] if has_context_summary else None # line 132The PRAGMA probe is placed before the FTS/LIKE branch selection, so the guard covers both code paths without duplication. ✓
Regression tests — complete
TestMissingContextSummaryColumn (3 tests) uses ALTER TABLE branches DROP COLUMN context_summary to simulate an unmigrated DB, which is the correct technique — it drops the column after seeding so the structural absence is genuine, not just a NULL value. Covers:
test_recent_full_mode_survives_missing_column— normal recall returns full messages, no crash ✓test_recent_summary_mode_falls_back_to_empty—--summaryon pre-migration DB yields""✓test_search_summary_mode_survives_missing_column— LIKE search with--summarydegrades gracefully ✓
FTS path isn't tested separately, but the PRAGMA probe happens before the FTS/LIKE branch, so coverage is sufficient.
Status of prior non-blocking suggestions
These three remain unaddressed from the initial review — all still non-blocking:
total_messages: 0in JSON summary mode (formatting.py:142) —sum(len(s.get("messages", [])) ...)always yields0in summary mode since themessageskey is absent. Anullor omitting the field would be less ambiguous for JSON consumers.- SKILL.md step 4 verbosity — the two fan-out sub-bullets still run ~200 words; CLAUDE.md asks for terse operational prose. The rationale ("count-based splits sever a decision thread") belongs in the PR description rather than the skill file.
--summary --verboseinteraction undocumented — the combination works correctly but isn't mentioned intool-reference.mdor theEXAMPLESblock.
Verdict
The required fix is correctly implemented and thoroughly tested. The independent has_context_summary probe (rather than reusing has_tool_counts) is actually a stronger approach than the one originally suggested — it makes each column's availability explicit and prevents future coupling bugs. Suite at 451 passed, 0 failed.
This PR is ready to merge. The three non-blocking suggestions above can be addressed in a follow-up if desired.
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…summary --verbose Addresses two non-blocking review suggestions. format_json_sessions reported total_messages: 0 in summary mode (sessions carry 'summary', not 'messages') — misleading for JSON consumers; now reports total_summaries in summary mode and total_messages in full mode. tool-reference.md documents that --summary composes with --verbose. Adds 2 counter tests. Suite: 453 passed. (Left SKILL.md step-4 prose intact — rule+reasoning is deliberate per CLAUDE.md authoring principle.)
|
Claude finished @gupsammy's task in 1m 56s —— View job Tasks
Review: Latest commit
|
| Finding | Status |
|---|---|
PRAGMA guard asymmetry (b.context_summary without has_context_summary probe) |
✅ Fixed in 2570a47 — independent probe, index-based unpacking, 3 regression tests |
total_messages: 0 in JSON summary mode |
✅ Fixed in c643004 — now emits total_summaries in summary mode |
--summary --verbose interaction undocumented |
✅ Fixed in c643004 — one-liner added to tool-reference.md |
| SKILL.md step-4 verbosity |
The one remaining item is a judgment call the author made consciously. The fan-out sub-bullets contain operational reasoning the agent needs to choose correctly between shard-by-project and shard-by-count — it can be argued to earn its token budget. Non-blocking. ✓
Verdict
All required changes from prior reviews are correctly implemented. Suite at 453 passed, 0 failed. This PR is ready to merge.
What
Adds a
--summaryretrieval mode and a mechanical volume-based fan-out signal to therecall-conversationsskill, so broad/retrospective queries stop flooding the main context.Why
A taxonomy of 61 real historical recall invocations showed two populations:
continuation-restore/specific-lookup— small targeted answers; fan-out would only add latency.broad-retro-synthesis/gap-antipattern/decision-extraction— high blast radius, heavy consume.Investigating the heavy cluster surfaced the load-bearing fact: the recall scripts emit full untruncated transcripts (
formatting.pyrenders everym.content). The precomputedcontext_summarycolumn exists (94% populated, written at import time bysummarizer.py) but recall never read it. Sorecent_chats.py --limit 20 --verbose(the run-retro recipe) was pouring ~20 full transcripts into context.How — two tiers, gated by measured volume (not session count)
--summary(tier 1): emit the precomputedcontext_summaryinstead of full content. Single-pass, free at query time, skips the per-message query. Measured ~3x reduction (309,798 → 85,885 chars on a 20-session retro).cli_common.volume_flags()helper —summary_suggested(large full pull → use--summary) andfanout_suggested(large even after--summary→ fan out), plus anINFO:stderr nudge in markdown mode.SKILL.mdstep 4 documents per-projectgeneral-purposefan-out (shard by project, never by arbitrary count — count splits sever a decision/antipattern thread).Triggering keys on lens type + measured output volume, never raw session count —
continuation-restoreandspecific-lookup(the 72%) never escalate.Changes
scripts/memory_lib/cli_common.py—--summaryflag,FANOUT_SUGGEST_CHARS,volume_signal(),volume_flags()(single source of truth for both scripts' JSON meta),emit_volume_signal().scripts/recent_chats.py/search_conversations.py—summaryrouting + JSON meta viavolume_flags().scripts/memory_lib/formatting.py— summary branch informat_markdown_session(early return; placeholder for empty summary).SKILL.md— gated step-4 escalation +Agentinallowed-tools.references/tool-reference.md—--summaryrows + signal docs.Testing
tests/claude-memory/test_summary_mode.py— 30 tests:volume_signalboundaries (49999/50000/50001), formatter summary branch, summary-mode routing in both scripts (via the in-memorymemory_dbfixture), and the JSON-meta truth table called through the sharedvolume_flags()helper (guards production, not a copy).test_context_injection.py(the footer now emits the canonicalclaude-memory:recall-conversationsname).tests/claude-memorysuite: 448 passed, 0 failed.Notes
auto-versionpre-commit hook.Agenttoallowed-toolswidens the skill's tool surface; it is gated behind the rare tier-2 path so the common 72% of calls are unaffected.