Releases: codenamev/claude_memory
v0.12.1 — Upgrade-Experience Patches (setup-vectors, doctor EmbeddingsCheck, plugin manifest fix)
Theme: Upgrade-experience patches surfaced by the 0.12.0 soak. Four small but high-impact fixes — all uncovered by one user upgrading a single project — closing visibility gaps in the doctor and the plugin manifest. No schema changes, no breaking changes.
Added
claude-memory setup-vectorscommand — the documented opt-in path for end users who want vector recall via the BAAI/bge-small-en-v1.5 model. fastembed remains a dev/test gem dependency by design (the default install stays light); this command verifies the chosen provider is loadable (gracefully prompts togem install fastembedif not), writesCLAUDE_MEMORY_EMBEDDING_PROVIDER(and optionalCLAUDE_MEMORY_EMBEDDING_MODEL) to the project's.claude/settings.jsonenv block — the same mechanism Claude Code uses for OTel — and re-indexes existing facts via the existingIndexCommand(skip with--no-reindex). Supports--statusfor current config and--dry-runfor inspection. Preserves unrelated settings.json keys.Checks::EmbeddingsCheckinclaude-memory doctor— surfaces the active embedding provider name and dimensions, hints to setCLAUDE_MEMORY_EMBEDDING_PROVIDER=fastembedwhen on tfidf default and fastembed is loadable, and reports dimension mismatches between stored vectors and the current provider. Closes the visibility gap where a user could seesqlite-vec available ✓while silently running on tfidf without knowing.
Fixed
plugin.jsondeclaredskills: "./skills/"andoutputStyles: "./output-styles/"pointing at non-existent directories. Per Claude Code's plugin reference,distill-transcripts.mdis correctly a flat command (not a skill); both forms register as/<name>slash commands. Dead keys removed. Plugin spec rewritten as deletion-safe ("every directory key in plugin.json points at an existing directory") so this can't regress.
Documentation
- README "Upgrading" section now documents the marketplace-refresh +
/reload-pluginsflow explicitly. After/plugin marketplace update <name>users must run/reload-pluginsor restart Claude Code for new slash commands to appear — this bit one user upgrading to 0.12.0 looking for/distill-transcripts. Includes/audit-memoryand/distill-transcriptsas named examples.
Upgrade Notes
- No DB migrations. Schema stays at v18.
- After
gem update claude_memory, run/plugin marketplace update claude-memory && /reload-plugins(or restart Claude Code) to see the new/distill-transcriptsand/audit-memoryslash commands. - Existing fact bases continue to use whatever embedding provider they were indexed under. To opt into fastembed, run
claude-memory setup-vectors— it handles provider switching + re-index in one step. claude-memory doctorwill now emit a warning on tfidf default with fastembed loadable. This is informational, not an error; the system continues to function on tfidf.
🧪 Real Eval Validation
Results: 0/6 passed
Duration: 68.54s
Estimated Cost: ~$0.12
v0.12.0 — Release Discipline, Observability, Self-Audit
Theme: Release Discipline + Observability + Self-Audit — the infrastructure that makes a 1.0 semver promise defensible. This release locks down the public API surface, adds the observability primitives (OTel ingestion, dashboard Telemetry) and the self-audit toolkit (claude-memory audit) that serve the visibility pillar, and ships the negative-fact harm benchmark + staleness guard that make the long-horizon-quality claim measurable rather than aspirational.
Added
- Staleness guard for single-value facts — single-value predicates (
uses_database/deployment_platform/auth_method) are exclusive claims Claude follows authoritatively, so a stale one is the most dangerous kind of memory. The 0.12 harm benchmark caught Claude emittinggit push heroku HEAD:mainfrom a staledeployment_platformfact with zero hedge — and supersession only protects against this if the replacement was recorded. NewRecall::StalenessAnnotator(pure function) flags single-value facts that are old (valid_from/created_atolder thaninjection_stale_days, default 180) AND not recently confirmed (last_recalled_atnull or stale);Hook::ContextInjectorappends a⚠ stale: recorded YYYY-MM-DD … verify before relyingmarker at SessionStart so Claude can hedge or verify instead of blindly following. Multi-value predicates are never annotated (they accumulate; one stale entry isn't authoritative). NewConfiguration#injection_stale_days(CLAUDE_MEMORY_INJECTION_STALE_DAYS), deliberately much longer than the 14-day dashboard review window. Serves the 1.0 long-horizon-quality pillar — it's the first defense against memory degrading session quality over months. - Negative-fact harm benchmark — full 13-scenario corpus + release gate — expands the 0.11 3-scenario prototype to 13 cases across four harm classes (stale_tech, mismatched_scope, superseded_undetected, and the new reference_material_as_fact). Each scenario ships a
project_filesscaffold whose current state contradicts the wrong memory fact, so the test measures "does Claude follow stale/wrong memory over the project's actual state?" rather than reacting to an empty directory. Scored best-of-N (default 3 runs, majority vote per scenario viaHARM_BENCH_RUNS) to absorb single-shot LLM nondeterminism.HARM_RATE_THRESHOLD(default 1%) fails the run if the majority-harmed scenario rate is exceeded — making "memory doesn't make Claude wrong" a measurable release gate rather than a marketing claim. The first full-corpus real-mode run surfaced a real harm (stale deployment fact) and a harness confound (empty-tmpdir noise), which drove both the staleness guard above and the scaffold + best-of-N harness hardening. claude-memory audit— memory health diagnostic — productionizes the 2026-05-21 contamination audit into a stable diagnostic surface anyone using claude_memory can run on their own setup. Ten contract checks (C001-C010) cover open conflicts, single-cardinality multiplicity, distillation backlog, shortcut-leak detection, duplicate global conventions, bare-conclusion rate, project starvation, auto-memory import gaps, and single-cardinality churn.--jsonis the stable contract for CI;--severityfilters;--no-exitalways exits 0. The/audit-memoryslash command wraps the same runner for an interactive walkthrough.docs/audit_runbook.mddocuments each check's rationale and remediation.CHECK_METHODSis append-only by design so JSON consumers don't break when new checks land. Newclaude-memory import-auto-memoryretroactively pulls~/.claude/projects/<slug>/memory/*.mdentries thatAutoMemoryMirrorpreviously missed (slug bug:tr("/", "-")left underscores intact, soclaude_memorypaths never matched). Contributes to the visibility pillar of 1.0.- Contamination guardrails —
ReferenceMaterialDetectorexample-quote guard +Resolver:discardpath — the distiller used to treat example sentences in docs/CLAUDE.md ("e.g., postgres", "for example, mysql") as literal claims about the project, accumulating 103 rejected single-cardinality facts over six weeks before being caught by the 2026-05-21 audit. Two defenses now: (1)ReferenceMaterialDetectorflags single-cardinality predicate extractions whose source text containse.g.,/for example/i.e.quote patterns so they're tagged reference material at write time; (2)Resolvergains a:discardresolution path for the same shape so the fact never lands even if the detector misses. Memory shortcuts (memory.decisions/.conventions/.architecture) refactored from FTS text search (which returned facts whose object matched the predicate keyword) to predicate-based filtering viaPredicatePolicy, with project-DB precedence over global. Closes a class of "is memory still trustworthy?" bugs that erode the 1.0 stability claim. - OpenTelemetry ingestion + dashboard Telemetry tab — Claude Code can now export metrics, log-style events, and (opt-in) traces straight into the dashboard via OTLP/HTTP/JSON. New
claude-memory otelCLI manages the env block in.claude/settings.json(--enable,--disable,--enable-traces,--capture-prompts,--status,--verify); the dashboard exposes/v1/metrics,/v1/logs,/v1/traceson127.0.0.1:3377and a new "Telemetry" drawer showing cost per hour, tokens by model, top tools by latency, and a per-prompt journey waterfall that UNIONsotel_eventswith the existingactivity_events. Schema v18 addsotel_metrics/otel_events/otel_tracesplus an additiveprompt_idcolumn onactivity_eventsfor journey correlation. Privacy posture: nothing past metric counts is captured by default;OTEL_LOG_USER_PROMPTSonly flips on with explicit--capture-promptsconfirmation; traces remain 501-gated until the user opts in. Sweep retention defaults: 30 days metrics, 14 days events, 7 days traces. - Pre-release hook smoke gate (
bin/pre-release-smoke) — verifies the installed claude-memory gem actually fires hooks correctly and populates expecteddetail_jsonfields perspec/smoke/expected_fields.yml. Codifies the verification convention fromfeedback_hooks_run_installed_gem.mdinto a machine-enforced release gate. The trap has been sprung twice (2026-04-16 ActivityLog, 2026-04-30 #47 token-budget); the gate exists so it can't be sprung a third time. Wired into the/releaseskill as Phase 1 Step 6 (after specs, before lint). First 0.12.0 milestone item. /study-repomemory-discipline guard (prompt-only) — top-level "CRITICAL: Memory Discipline" section in.claude/skills/study-repo/SKILL.mdexplicitly forbids the LLM from extracting external projects' tech stack as project-level facts. Roots the cleanup workclaude-memory rejecthad to do during 0.11 (27-fact misattribution cluster on 2026-04-23/24, seequality_review.md2026-04-30 cause-4 finding). Defense-in-depth detector deferred to 0.12.x or later, only built if measurement shows persistent leakage.- API stability audit (
docs/api_stability.md) — authoritative public-API contract enumerating which CLI commands, MCP tools, hook events, Ruby classes, and schema surfaces are stable / experimental / internal. Default-to-internal applied throughout; the doc is the source of truth for what 1.0's semver promise will lock down. NewClaudeMemory::Deprecations.warn(name:, replacement:, removed_in:)module wired intoPredicatePolicy.canonicalizeas the first soft-rename —has_conventionandprimary_languagesynonyms now emit deprecation warnings scheduled for removal in1.0.0. README + CLAUDE.md link to the new doc; suppress noise viaCLAUDE_MEMORY_NO_DEPRECATIONS=1. - Release-to-release benchmark scoreboard —
bin/run-evalsnow writesspec/benchmarks/results/<version>.jsonafter each run; newbin/bench-diffcompares the current scoreboard against the most recent prior tagged version's and exits non-zero if any tracked pass-rate dropped beyond the threshold (default -5%, configurable via--threshold). Wired into/releaseskill Phase 1 as Step 7 — the release aborts on regressions before publish. First release with this gate is 0.12.0 itself; from 0.13.0 onward bench-diff actively gates against 0.12 baselines.
Deferred to 0.13
- CLAUDE.md comparative baseline numbers (#4) — the comparative E2E harness compares static CLAUDE.md (auto-loaded into context) against ClaudeMemory's MCP-tool retrieval, but in headless
claude -pmode Claude doesn't proactively call the recall tools, so the comparison doesn't yet exercise ClaudeMemory's retrieval path fairly (first run returned a misleading ClaudeMemory 0/10 = no-memory 0/10 vs CLAUDE.md 8/10). Publishing that would mislead, so the numbers are withheld and the harness fix is tracked for 0.13. This surfaced a genuine separable observation — in fully headless, non-tool-forcing usage, ClaudeMemory's contribution rides entirely on the SessionStart context-hook injection — also tracked for 0.13. Seedocs/1_0_punchlist.md#4 / #16.
Upgrade Notes
- Schema migrates automatically to v18 (OTel telemetry tables +
prompt_idonactivity_events) on first DB open viaSequel::Migrator— no manual step. Round-trip migration specs cover the upgrade path from prior release boundaries. - The staleness marker now appears in SessionStart context for single-value facts (
uses_database/deployment_platform/auth_method) older than 180 days and not recently recalled. This is additive and advisory (a⚠ stale … verify before relyingnote). Tune the window withCLAUDE_MEMORY_INJECTION_STALE_DAYS; the existingCLAUDE_MEMORY_STALE_DAYS(dashboard review window) is unchanged. - No breaking API changes.
has_convention/primary_languagepredicate synonyms continue to emit deprecation warnings (scheduled for removal in 1.0.0); suppress viaCLAUDE_MEMORY_NO_DEPRECATIONS=1.
🧪 Real Eval Validation
Results: 2/6 passed
v0.11.0 — Trust & Cost: Token Budget, Quality Score, ROI Nudge, Show, Harm Prototype
Theme: Trust & Cost — five user-visible signals that answer "is memory still worth it?" with numbers a skeptical user can read in <30 seconds.
Added
- Token budget telemetry — every successful SessionStart context injection now records an estimated
context_tokenscount on itsactivity_eventsrow. Surfaced three ways:- Dashboard Trust panel emits a
token_budgetblock with p50/p95/avg/sample_size over the last 30 days, so the JSON dashboard endpoint and any downstream consumer answer "what does memory cost per session?" claude-memory digestincludes a "Context cost" subsection between activity and new-knowledge so the weekly report shows the price tag next to the value.claude-memory stats --tokens [--since DAYS]reports total sessions, p50/p95/avg/min/max, and a histogram across <500 / 500-1k / 1-2k / 2-5k / 5k+ buckets.
- Dashboard Trust panel emits a
- Pure additive — no schema migration. Historical events written before this release simply contribute zero samples until new injections accumulate.
- First 0.11.0 milestone item from the 1.0 punchlist (Trust & Cost). Closes the "what % of my SessionStart token budget does memory consume?" gap.
- Hallucination rate metric — the dashboard now quantifies how clean the fact base is, not just how full it is.
Distill::BareConclusionDetectoris the production-side mirror of the SessionStart prompt's reason-clause requirement (decision/convention facts must embed "because…" / "so that…" / "to avoid…"). Surfaced two ways:- Dashboard Trust panel emits a
quality_scoreblock aggregating across project + global active facts:suspect_count(predicate=reference, retagged by ReferenceMaterialDetector),bare_conclusion_count, percentages, and an overall 0–100 score (higher = cleaner). Returns 100 on empty stores so fresh installs aren't penalized. claude-memory digestincludes a "Quality" section showing the score breakdown plus the in-window rejection rate ("of facts created in the last 7 days, X% have been rejected since"), so calibration drift is visible.
- Dashboard Trust panel emits a
- Second 0.11.0 milestone item. Pairs with token-budget telemetry to answer "is memory still worth its cost?" via two skeptic-friendly numbers.
claude-memory show— new CLI command prints what memory would inject at the next SessionStart in plain Markdown. Runs the exactHook::ContextInjectorpath real sessions use, so output matches what Claude actually receives. Footer reports fact count, ~token estimate, and char count so users see the SessionStart cost at a glance.- Default suppresses the raw-transcript "Pending Knowledge Extraction" dump (intended for LLM distillation, not human reading); pass
--pendingto include it. --source SOURCE(startup/resume/clear) simulates each fresh-session entrypoint so users can preview which sections would appear.
- Default suppresses the raw-transcript "Pending Knowledge Extraction" dump (intended for LLM distillation, not human reading); pass
- Third 0.11.0 milestone item. Closes the inspectability gap — trust requires being able to see what memory will inject, the same way
cat CLAUDE.mdworks. - First-week ROI nudge — at SessionEnd, memory now prints
memory contributed N facts this session, %used = Xfor the first 10 sessions, then quiets. New users get user-visible proof memory is doing work for them without having to know about the dashboard. Once trust is established (or it isn't), the nudge gets out of the way.- New
claude-memory hook nudgesubcommand +Hook::Handler#nudge. SessionEnd config now wires[ingest, sweep, nudge]in order. - Silent on
CLAUDE_MEMORY_NO_NUDGE=1opt-out, missing session_id, n=0 contributions, and after MAX_NUDGES emissions. The empty-session silent path doesn't burn a slot — quiet sessions don't count toward the 10. - Activity event
roi_nudgerecords{n, used, pct, prior_count}per emission so a future migration could change the threshold without re-counting from raw events.
- New
- Fourth 0.11.0 milestone item. Cold-start trust signal that pairs with #47 (token cost) and #48 (quality) to make the first-week answer to "is this worth it?" visible without effort.
- Harm benchmark prototype —
spec/benchmarks/dataset/harm_scenarios.yml+spec/benchmarks/e2e/harm_bench_spec.rb. Three hand-written cases spanning the riskiest harm classes (stale_tech, mismatched_scope, superseded_undetected). The first ClaudeMemory benchmark that measures whether memory can make Claude wrong — every other benchmark only measures whether memory helps.- Structure validation (regex compile, fact loadability, harm-class coverage) runs in stub mode as part of
:benchmarktag. - Real-mode runner:
EVAL_MODE=real bundle exec rspec spec/benchmarks/e2e/harm_bench_spec.rb— needsclaudeCLI on PATH, ~$2-8 per run. Reports harm rate; doesn't enforce a threshold yet (that's the 0.12 release gate).
- Structure validation (regex compile, fact loadability, harm-class coverage) runs in stub mode as part of
- 0.11.0 risk-de-risking item. If even one of these three surfaces a harm now, the full 10-15-case benchmark planned for 0.12 will likely reveal a fundamental issue — better to learn that at 0.11 than at 0.12. Real-mode prototype run on 2026-04-30 reported 0/3 harm — green light to expand to the full corpus in 0.12.
Changed
- Hallucination-rate metric calibration —
Dashboard::Trust#quality_scorenow reports a windowed (last 30d) "live" score as the headline plus a "historical" block over all active facts. Production verification on 2026-04-30 (recorded indocs/quality_review.md) showed the unwindowed metric was technically correct but pragmatically misleading: 97% of bare-conclusion facts pre-dated the 2026-04-20 reason-clause prompt commit, and the entire 7-day rejection cluster was a single-class systemic failure (a/study-repoburst), not ongoing noise. The split makes the metric actionable: live score = ongoing extraction quality, historical = legacy data. The digest's "Quality" section uses the live score as the headline.
Fixed
- Real-eval CLI runner now passes
allowed_toolsthrough explicitly so the harm benchmark and other real-mode benches can pre-allow MCP memory tools without per-test wiring.
Upgrade Notes
- No schema migration. All new features ship purely additive.
- Hooks run the installed gem from PATH, not the working tree. After upgrading,
bundle exec rake install(orgem install claude_memory) is required for the new SessionEnd nudge,claude-memory showcommand,--tokensstats flag, andcontext_tokensactivity-event field to actually fire on real hook events. - Existing
quality_scoreconsumers will see additional fields (window_days,historical) in the snapshot. The original keys (score,total_active,suspect_count,bare_conclusion_count,suspect_pct,bare_pct) remain at the top level and now reflect the 30-day live window — historical numbers move to thehistoricalsub-hash.
🧪 Real Eval Validation
Results: 4/6 passed
Duration: 73.33s
Estimated Cost: ~$0.12
v0.10.0 — Dashboard, Observability, Memory Quality
Added
Dashboard — feed-first redesign with observability built in
- New feed-first dashboard UI with scope-aware moments, fact detail modal, query tester, and activity drilldown. Reuse, Trust, Knowledge, Conflicts, and Moments panels each backed by a dedicated module (
Dashboard::{Reuse, Trust, Knowledge, Conflicts, Moments}) under unit tests, replacing the prior all-in-API-class layout. - 👍/👎 feedback on individual moments with persisted verdicts (schema v16,
moment_feedbacktable). Trust panel surfaces a 30-day up/down ratio so the dashboard can answer "when memory surfaces something, are users marking it useful?". - Utilization ratio panel — of facts extracted in the last 30 days, how many has Claude actually used in a recall or context injection? Color-coded (green ≥40%, yellow ≥15%, red below). Hidden on fresh installs to avoid misleading zeros.
- Conflict deduping at the display layer: identical (subject, predicate, object_pair) detections collapse into one row with a
×Nbadge. Sidebar "Needs review" count now reflects distinct contradictions, not raw row count. - Activity events drilldown: each moment opens a payload modal with prettified JSONL, recall trigger correlation (which user prompt motivated this lookup), and linked-fact resolution scoped per database.
- Vector index health threshold and clickable remediation hints in the health dashboard.
CLI — observability surfaces and one-shot cleanups
claude-memory digest [--since DAYS] [--output FILE]— weekly markdown report. Sections: Activity, New knowledge by predicate, Utilization (extracted vs used), Conflicts, Feedback. No new schema; renders from existing aggregates.claude-memory census [--root DIR]— privacy-safe cross-project vocabulary scan. Aggregates per-DB predicate × status counts, novel predicates, synonym candidates. Suppresses object literals, entity names, and paths; per-DB IDs are SHA256-prefixed.claude-memory dedupe-conflicts [--scope SCOPE] [--dry-run]— one-shot cleanup for historical conflict-row duplication that predates the Resolver dedup fix (commit f571ba4). Groups by (subject, predicate, normalized object pair), keeps the earliest, migrates provenance to the keeper.claude-memory reclassify-references [--scope SCOPE] [--dry-run]— retags active convention facts that the newDistill::ReferenceMaterialDetectorflags as reference material (LOC counts, star counts, "X is a plugin..." templates, "by Firstname Lastname" attributions).
Memory quality
- Access-based staleness scoring (improvements.md #35). Schema v17 adds
last_recalled_atto facts.Sweep::RecallTimestampRefresherderives the field periodically from activity_events;claude-memory stats --stale [--stale-days N]lists facts that haven't been recalled inside the threshold. Replaces the prior "active facts minus seen-in-recalls" approximation. - Auto-memory mirror (improvements.md #36). On fresh sessions, the SessionStart context hook scans
~/.claude/projects/<slug>/memory/*.mdand surfaces new or changed entries as extraction candidates so users can promote auto-memory observations into claude_memory without manual copy-paste. - Reasoning requirement enforced in distillation (improvements.md #34). The SessionStart prompt and the
/distill-transcriptsskill now require a why clause fordecisionandconventionpredicates ("because…", "so that…", etc.). Audit found ~75% of facts were bare conclusions before this change. Distill::ReferenceMaterialDetectorreclassifies convention facts whose object text matches reference patterns. Newreferencepredicate registered inPredicatePolicywith its own:referencessnapshot section. Detector runs at write time inManagementHandlers#store_extractionso mislabeling can't persist.- Predicate census command (#30) for cross-project vocabulary audits — see CLI section above.
Benchmarks and observability
- Repeat-correction benchmark harness (improvements.md #32).
spec/benchmarks/e2e/repeat_correction_spec.rbpre-loads a past correction as a memory fact, runs the prompt through real Claude underEVAL_MODE=real, and reports pass rate (no violation patterns matched). Starter set of 2 scenarios drawn from this project's recurring gotchas. - Relevance ratio metric (improvements.md #31).
Hook::ContextInjector#emitted_subjectsexposes the subjects injected at SessionStart;BenchmarkHelpers::RelevanceMetricsmeasures whether they appear in Claude's response. Trend signal for memory-application quality, integrated intodevmemeval_spec.rb. - MCP server embeds the V=R/C ("Verify before Recommend / Correct") mental model in agent instructions so memory recommendations come with built-in verification cues.
Schema v15 → v17 (additive only, automatic on first run)
- Migration 015: adds
activity_eventstable for hook/recall/context/sweep telemetry. Powers the dashboard timeline, moments feed, and efficacy reports. - Migration 016: adds
moment_feedbacktable (unique on event_id) for the dashboard 👍/👎 surface. - Migration 017: adds nullable
facts.last_recalled_atfor access-based staleness scoring.
1.0 readiness track
- New
docs/1_0_punchlist.mdopens the path to 1.0: token-budget telemetry, hallucination-rate metric, negative-fact harm benchmark, CLAUDE.md baseline publication,claude-memory show, benchmark scoreboard. Ten entries (#47-56) added todocs/improvements.mdwith concrete file:line plumbing notes.
Changed
Resolver#apply_conflictno longer creates a duplicate disputed fact + conflict row when the same contradicting value is re-extracted. Looks up disputed facts in the same (subject, predicate) slot and reinforces with provenance instead.Resolverno longer treats the distiller'sscope_hintas a scope override.scope_hintis advisory metadata;fact.scopemust match the DB the row lives in. Earlier behavior caused scope leakage where global-hinted distillations landed in the project DB.Hook::ContextInjectoraddsemitted_fact_idsandemitted_subjectsaccessors so benchmark harnesses can attribute injection contributions per session.SQLiteStoredecomposed via module inclusion:LLMCacheandMetricsAggregatorextracted intolib/claude_memory/store/. SQLiteStore back under 600 LOC.Dashboard::APIdecomposed:FactPresenter,Conflicts,Efficacy::Reporter,Timeline,Healthextracted into dedicated classes following the boundary pattern. API now routes/delegates rather than aggregating.- Dashboard releases DB connections after each HTTP request (was holding connections open for the lifetime of the WEBrick session).
Sweep::Maintenancegainsdedupe_open_conflictsandreclassify_referencesfor the one-shot CLI commands above.- Round-trip migration specs from v12, v13, v14 → v17 (per-version migrations covered by
spec/claude_memory/store/migrations/). Codifies the release-blocker convention: any schema bump must round-trip from each prior major-release boundary back ~3 releases.
Fixed
- Dashboard surfaces an actionable hint when Recall hits FTS5 corruption (run
claude-memory compactrather than a generic error). - Dashboard query tester unwraps the nested Recall result shape rather than printing the raw envelope.
- Dashboard health checks correctly detect the claude-memory hook installation across the two-level Claude Code hooks structure (was reporting false negatives when hooks were installed under a matcher block).
- Dashboard Efficacy "this session" correlation falls back to a time window when the recall event has no
session_id(MCP tool calls don't thread session_id). - Bulk-reject in the Conflicts modal now retries with an actionable message when the server-side state is stale.
Upgrade Notes
Schema bump v14 → v17. Three migrations run automatically on first launch after upgrade. All three are additive (no existing data is rewritten):
- Migration 015 creates
activity_events(hook/recall telemetry). - Migration 016 creates
moment_feedback(dashboard verdicts). - Migration 017 adds
facts.last_recalled_at(NULL by default;Sweep::RecallTimestampRefresherpopulates it on the next sweep cycle from existing activity_events).
The migration delta has round-trip spec coverage in spec/claude_memory/store/migrations/. Forward-compatibility: 0.10.0 databases cannot be opened by 0.9.x or earlier. Downgrade is destructive — back up ~/.claude/memory.sqlite3 and .claude/memory.sqlite3 before downgrading.
Optional historical cleanups. Two new admin commands address data tails left by earlier bugs that have since been fixed at the source:
claude-memory dedupe-conflicts --dry-run # preview duplicate conflict rows
claude-memory dedupe-conflicts # consolidate them
claude-memory reclassify-references --dry-run # preview reference-material mislabels
claude-memory reclassify-references # retag themBoth are opt-in. Neither runs in the regular sweep cycle. Use --scope global to clean the global DB.
Telemetry footprint. The activity_events table grows with hook activity. The dashboard surfaces this by default and powers the timeline/moments/efficacy panels. Retention pruning is not yet automatic (planned for a follow-up); manual cleanup via DELETE FROM activity_events WHERE occurred_at < ? is safe — the dashboard tolerates missing history.
v0.9.1 — MCP JSON-RPC notifications fix
Fixed
- MCP server now conforms to JSON-RPC 2.0: notifications (messages without an
id) never receive a response. Previously,notifications/initialized— which Claude Code sends after every handshake — triggered a spuriousMethod not founderror frame, causing strict MCP clients to mark the server failed on/mcpreconnect after the initial connection.
v0.9.0 — Predicate Design Overhaul, Reject/Restore, Telemetry
Highlights
Predicate vocabulary overhaul — curated from 13 → 8 predicates based on a multi-project survey of real memory databases. uses_framework reclassified as multi-value (fixing silent data loss in production). PredicatePolicy is now the single source of truth for vocabulary, snapshot sections, synonym canonicalization, and LLM guidance.
New commands: reject and restore — first-class tools for managing distiller quality. Mark hallucinated facts as wrong, or recover facts that were superseded by an obsolete classification.
MCP tool-call telemetry — every tool invocation is timed and recorded. claude-memory stats --tools shows call counts, latency percentiles, and error rates.
Proactive memory recall — MCP instructions now direct Claude to check conventions before code generation, architecture before explanations, and decisions before refactoring. A/B testing showed this produces 76-line accurate architecture explanations vs honest refusals without memory.
Added
claude-memory reject <id_or_docid>command +memory.reject_factMCP tool — explicitly mark distiller hallucinations as wrong, closing associated conflictsclaude-memory restore --predicate NAMEcommand — recover facts superseded by obsolete single-value predicate classifications (Jaccard-based token overlap heuristic)- MCP tool-call telemetry:
mcp_tool_callstable,claude-memory stats --tools [--since DAYS], 90-day retention via Sweep CLAUDE_CONFIG_DIRenv var support for non-standard Claude Code config locations- Predicate synonym canonicalization at insert time (
has_convention→convention,primary_language→uses_language) - Novel predicate warnings at insert time
- NullDistiller emits
uses_languagefacts for detected language entities - Proactive memory recall guidance in MCP server instructions
- YARD documentation across 13 core source files (+473 lines)
Changed
uses_frameworkreclassified as multi-value — real projects use multiple frameworks (Rails + Turbo + Tailwind). Prior single-value classification silently superseded valid facts. Runclaude-memory restore --predicate uses_frameworkto recoverPredicatePolicyis single source of truth for vocabulary, snapshot sections, synonym canonicalization, and LLM guidance- Predicate vocabulary curated 13 → 8 based on multi-project usage data
Registry::COMMANDSstores{class:, description:}with direct class references- Plugin and gem descriptions rewritten to be outcome-focused
Fixed
StatsCommandbroken in production — usedSequel.sqlite(requires unlistedsqlite3gem). Now uses extralite adapter- Missing
embeddingscommand in shell completion output
Upgrade Notes
Schema: v12 → v14 (automatic). Migration 013 adds mcp_tool_calls. Migration 014 canonicalizes stale predicate names in existing facts.
Action required for uses_framework recovery: If your project uses multiple frameworks, past sessions may have superseded valid facts:
claude-memory restore --predicate uses_framework --dry-run # preview
claude-memory restore --predicate uses_framework # restorePruned predicates still work: preference, workflow, dependency, testing_strategy, tool_usage, ci_platform fall through to default multi-value policy. Existing facts are unaffected.
Full Changelog: v0.8.0...v0.9.0
v0.8.0
Added
Three-Layer Distillation Pipeline
- Automatic distillation via NullDistiller in ingest pipeline (Layer 1: regex-based, P95 < 5ms)
- Context hook injection for LLM-based extraction at SessionStart (Layer 2: Claude Code as distiller, zero extra cost)
/distill-transcriptsskill for manual deep extraction (Layer 3: on-demand, depth-aware prompts)memory.undistilledandmemory.mark_distilledMCP tools for distillation trackingHook::DistillationRunnerextracted from Handler for context hook injectionTaskCompletedandTeammateIdlehook events for ingest triggers- Distillation metrics backfill on database initialization
- Doctor check for undistilled content
- Pending distillation count in
memory.statusoutput
Recall Enhancements
- Intent parameter for recall query disambiguation (#3)
- Retrieval score traces for semantic search (#5)
- Configurable embedding providers with dimension checking
Hook Enhancements
statusMessageon all hooks for descriptive spinner text during hook executionStopFailurehook to capture transcript data even on session errors (rate limits, server errors)Notificationhook withidle_promptmatcher for opportunistic sweep during idle
New Commands & Skills
install-skillcommand andmemory-recallagent (#8, #12)- Shell completion command for bash and zsh (#18)
Distillation Benchmark Results
- NullDistiller: Concept Recall 0.952, Fact Precision/Recall 1.000 (31 test cases)
- Claude Code LLM: Concept Recall 0.902 (all 41 cases), 0.900 on semantic cases (vs 0.333 for regex)
- Average 1.6 facts stored per case across LLM extraction
- E2E distillation recall benchmark and extraction quality benchmarks
- Concept-based matching for distiller-agnostic benchmark comparison
Fixed
--allowedToolsadded toClaudeCliRunnerfor MCP tool permissions- Test isolation for context hook when global database has facts
Internal
- Extracted
RetryHandlerandSchemaManagermodules fromSQLiteStore - Extracted
Recallinto engine strategy pattern withDualEngine,LegacyEngine, and sharedQueryCore - Extracted
Toolsgod object into 6 handler modules - Added 36 specs for 5 previously untested files
- All 3 god objects eliminated, 0 files over 500 lines
v0.7.1
Added
Three-Level Sweep Escalation
Maintenanceclass with light/standard/deep sweep levels for progressive database maintenance- Exposed sweep escalation via
memory.sweep_nowMCP tool with configurable level - Tool escalation workflow added to MCP QueryGuide documentation
Embedding Deduplication
- Content-addressed deduplication for embeddings using SHA256 hashing
- Deduplication before vector scoring in fallback path to prevent duplicate results
MCP Enhancements
- Structured error classification for MCP tools via
ErrorClassifiermodule - Dynamic knowledge summary in MCP server instructions via
InstructionsBuilder
Fixed
- Plugin hook loading error: Removed explicit
hooksreference fromplugin.jsonmanifest — Claude Code auto-loadshooks/hooks.jsonfrom the plugin root, so declaring it caused "Duplicate hooks file detected" errors on plugin install
Internal
- Influence study: lossless-claw v0.3.0 DAG-based lossless context management
- Marked 7 improvements as implemented (#10, #11, #14, #15, #16, #19, #20)
v0.7.0
ClaudeMemory v0.7.0
Added
FTS5 Contentless Mode
- FTS5 tables now created with
content=''for ~40% smaller databases - Auto-detection: both legacy and contentless formats work seamlessly
compactcommand rebuilds FTS index to contentless formatstatscommand reports FTS format and optimization hints
Worktree-Aware Project Paths
- Project database now resolves to main repository root across git worktrees
- Prevents duplicate project databases when using
git worktree - Opt-out: set
CLAUDE_MEMORY_ISOLATE_WORKTREES=1for per-worktree isolation
MCP Enhancements
- Tool annotations:
readOnlyHint,idempotentHint,destructiveHinton all 21 tools - Stdout protection: MCP server redirects
$stdoutto$stderrto prevent protocol corruption from accidentalputs/printcalls - Self-excluding agent conversations via
SELF_CONTEXT_MARKERto prevent meta-pollution
New Commands
git-lfscommand for setting up git-lfs tracking of project memory databases
Fixed
- Narrowed rescue clauses in
discover_other_projects(was barerescue, now catches specific exceptions) - FTS entries now cleaned up when content is pruned by sweeper (prevents orphaned index entries)
- FTS index rebuilt during
compactfor consistent state after upgrades - Real evals CI: install gem and use correct release API
Internal
- Resolver refactored for better thread safety (parameters instead of instance variables)
SnippetExtractorDRY refactoringStoreManager.promote_factsingle-transaction safety- Influence study: QMD v2.0.1 SDK-first architecture analysis
22 CLI commands · 21 MCP tools · 1,435 tests
v0.6.0
What's New
Native Vector Storage (sqlite-vec)
- Integrated sqlite-vec for native KNN vector search
VectorIndexclass with vec0 virtual table for cosine similarity search- Dual-write: embeddings stored in both JSON column and vec0 index
claude-memory index --vecflag for backfilling existing embeddings into vec0- Fast path in
Recalluses sqlite-vec KNN when available, falls back to JSON + Ruby - Sweeper cleans up vec0 entries for superseded/expired facts
- Doctor and MCP status/stats report vec0 availability and coverage
- Cross-platform support with platform-specific gem installation
Database Maintenance
compactcommand for database maintenance (VACUUM + integrity check)exportcommand for fact backup and migration to JSON
Hook Enhancements
- SessionStart context injection via
hookSpecificOutput.additionalContext- Injects recent facts and project context at session start
- Tool-specific observation compression for reduced token usage
--asyncflag for non-blocking hook execution- Hook error classification for graceful degradation
- Conversation exclusion markers for session-level opt-out
MCP Discovery
memory.list_projectsMCP tool for discovering all project databases
Developer Experience
- Dynamic MCP server instructions with progressive disclosure documentation
- Comparative benchmark suite with QMD and grepai adapters
Bug Fixes
- Recall returned no results:
DualQueryTemplateaccessed stores before initializing them, causing all recall queries to silently return empty results. Refactored to use existingstore_for_scopemethod. - Doctor crashed on sqlite-vec tables:
SchemaValidatoriterated all tables including vec0 virtual tables, which require the sqlite-vec extension. Now skipsfacts_vec*tables using prefix match. - Forward-migrated databases: Older gem versions now gracefully handle databases migrated by newer versions instead of crashing.
- Hybrid retrieval ordering: Preserved BM25 scores and RRF ordering in hybrid search results instead of re-sorting by source/time.
Stats
- 21 MCP tools, 22 CLI commands
- 1316 test examples, 0 failures
- Full changelog: CHANGELOG.md