Skip to content

Releases: codenamev/claude_memory

v0.12.1 — Upgrade-Experience Patches (setup-vectors, doctor EmbeddingsCheck, plugin manifest fix)

05 Jun 15:36
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Theme: Upgrade-experience patches surfaced by the 0.12.0 soak. Four small but high-impact fixes — all uncovered by one user upgrading a single project — closing visibility gaps in the doctor and the plugin manifest. No schema changes, no breaking changes.

Added

  • claude-memory setup-vectors command — the documented opt-in path for end users who want vector recall via the BAAI/bge-small-en-v1.5 model. fastembed remains a dev/test gem dependency by design (the default install stays light); this command verifies the chosen provider is loadable (gracefully prompts to gem install fastembed if not), writes CLAUDE_MEMORY_EMBEDDING_PROVIDER (and optional CLAUDE_MEMORY_EMBEDDING_MODEL) to the project's .claude/settings.json env block — the same mechanism Claude Code uses for OTel — and re-indexes existing facts via the existing IndexCommand (skip with --no-reindex). Supports --status for current config and --dry-run for inspection. Preserves unrelated settings.json keys.
  • Checks::EmbeddingsCheck in claude-memory doctor — surfaces the active embedding provider name and dimensions, hints to set CLAUDE_MEMORY_EMBEDDING_PROVIDER=fastembed when on tfidf default and fastembed is loadable, and reports dimension mismatches between stored vectors and the current provider. Closes the visibility gap where a user could see sqlite-vec available ✓ while silently running on tfidf without knowing.

Fixed

  • plugin.json declared skills: "./skills/" and outputStyles: "./output-styles/" pointing at non-existent directories. Per Claude Code's plugin reference, distill-transcripts.md is correctly a flat command (not a skill); both forms register as /<name> slash commands. Dead keys removed. Plugin spec rewritten as deletion-safe ("every directory key in plugin.json points at an existing directory") so this can't regress.

Documentation

  • README "Upgrading" section now documents the marketplace-refresh + /reload-plugins flow explicitly. After /plugin marketplace update <name> users must run /reload-plugins or restart Claude Code for new slash commands to appear — this bit one user upgrading to 0.12.0 looking for /distill-transcripts. Includes /audit-memory and /distill-transcripts as named examples.

Upgrade Notes

  • No DB migrations. Schema stays at v18.
  • After gem update claude_memory, run /plugin marketplace update claude-memory && /reload-plugins (or restart Claude Code) to see the new /distill-transcripts and /audit-memory slash commands.
  • Existing fact bases continue to use whatever embedding provider they were indexed under. To opt into fastembed, run claude-memory setup-vectors — it handles provider switching + re-index in one step.
  • claude-memory doctor will now emit a warning on tfidf default with fastembed loadable. This is informational, not an error; the system continues to function on tfidf.

🧪 Real Eval Validation

Results: 0/6 passed ⚠️ 6 failed
Duration: 68.54s
Estimated Cost: ~$0.12

⚠️ Some real eval tests failed. Check the workflow logs for details.

v0.12.0 — Release Discipline, Observability, Self-Audit

01 Jun 13:17
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Theme: Release Discipline + Observability + Self-Audit — the infrastructure that makes a 1.0 semver promise defensible. This release locks down the public API surface, adds the observability primitives (OTel ingestion, dashboard Telemetry) and the self-audit toolkit (claude-memory audit) that serve the visibility pillar, and ships the negative-fact harm benchmark + staleness guard that make the long-horizon-quality claim measurable rather than aspirational.

Added

  • Staleness guard for single-value facts — single-value predicates (uses_database / deployment_platform / auth_method) are exclusive claims Claude follows authoritatively, so a stale one is the most dangerous kind of memory. The 0.12 harm benchmark caught Claude emitting git push heroku HEAD:main from a stale deployment_platform fact with zero hedge — and supersession only protects against this if the replacement was recorded. New Recall::StalenessAnnotator (pure function) flags single-value facts that are old (valid_from/created_at older than injection_stale_days, default 180) AND not recently confirmed (last_recalled_at null or stale); Hook::ContextInjector appends a ⚠ stale: recorded YYYY-MM-DD … verify before relying marker at SessionStart so Claude can hedge or verify instead of blindly following. Multi-value predicates are never annotated (they accumulate; one stale entry isn't authoritative). New Configuration#injection_stale_days (CLAUDE_MEMORY_INJECTION_STALE_DAYS), deliberately much longer than the 14-day dashboard review window. Serves the 1.0 long-horizon-quality pillar — it's the first defense against memory degrading session quality over months.
  • Negative-fact harm benchmark — full 13-scenario corpus + release gate — expands the 0.11 3-scenario prototype to 13 cases across four harm classes (stale_tech, mismatched_scope, superseded_undetected, and the new reference_material_as_fact). Each scenario ships a project_files scaffold whose current state contradicts the wrong memory fact, so the test measures "does Claude follow stale/wrong memory over the project's actual state?" rather than reacting to an empty directory. Scored best-of-N (default 3 runs, majority vote per scenario via HARM_BENCH_RUNS) to absorb single-shot LLM nondeterminism. HARM_RATE_THRESHOLD (default 1%) fails the run if the majority-harmed scenario rate is exceeded — making "memory doesn't make Claude wrong" a measurable release gate rather than a marketing claim. The first full-corpus real-mode run surfaced a real harm (stale deployment fact) and a harness confound (empty-tmpdir noise), which drove both the staleness guard above and the scaffold + best-of-N harness hardening.
  • claude-memory audit — memory health diagnostic — productionizes the 2026-05-21 contamination audit into a stable diagnostic surface anyone using claude_memory can run on their own setup. Ten contract checks (C001-C010) cover open conflicts, single-cardinality multiplicity, distillation backlog, shortcut-leak detection, duplicate global conventions, bare-conclusion rate, project starvation, auto-memory import gaps, and single-cardinality churn. --json is the stable contract for CI; --severity filters; --no-exit always exits 0. The /audit-memory slash command wraps the same runner for an interactive walkthrough. docs/audit_runbook.md documents each check's rationale and remediation. CHECK_METHODS is append-only by design so JSON consumers don't break when new checks land. New claude-memory import-auto-memory retroactively pulls ~/.claude/projects/<slug>/memory/*.md entries that AutoMemoryMirror previously missed (slug bug: tr("/", "-") left underscores intact, so claude_memory paths never matched). Contributes to the visibility pillar of 1.0.
  • Contamination guardrails — ReferenceMaterialDetector example-quote guard + Resolver :discard path — the distiller used to treat example sentences in docs/CLAUDE.md ("e.g., postgres", "for example, mysql") as literal claims about the project, accumulating 103 rejected single-cardinality facts over six weeks before being caught by the 2026-05-21 audit. Two defenses now: (1) ReferenceMaterialDetector flags single-cardinality predicate extractions whose source text contains e.g., / for example / i.e. quote patterns so they're tagged reference material at write time; (2) Resolver gains a :discard resolution path for the same shape so the fact never lands even if the detector misses. Memory shortcuts (memory.decisions / .conventions / .architecture) refactored from FTS text search (which returned facts whose object matched the predicate keyword) to predicate-based filtering via PredicatePolicy, with project-DB precedence over global. Closes a class of "is memory still trustworthy?" bugs that erode the 1.0 stability claim.
  • OpenTelemetry ingestion + dashboard Telemetry tab — Claude Code can now export metrics, log-style events, and (opt-in) traces straight into the dashboard via OTLP/HTTP/JSON. New claude-memory otel CLI manages the env block in .claude/settings.json (--enable, --disable, --enable-traces, --capture-prompts, --status, --verify); the dashboard exposes /v1/metrics, /v1/logs, /v1/traces on 127.0.0.1:3377 and a new "Telemetry" drawer showing cost per hour, tokens by model, top tools by latency, and a per-prompt journey waterfall that UNIONs otel_events with the existing activity_events. Schema v18 adds otel_metrics/otel_events/otel_traces plus an additive prompt_id column on activity_events for journey correlation. Privacy posture: nothing past metric counts is captured by default; OTEL_LOG_USER_PROMPTS only flips on with explicit --capture-prompts confirmation; traces remain 501-gated until the user opts in. Sweep retention defaults: 30 days metrics, 14 days events, 7 days traces.
  • Pre-release hook smoke gate (bin/pre-release-smoke) — verifies the installed claude-memory gem actually fires hooks correctly and populates expected detail_json fields per spec/smoke/expected_fields.yml. Codifies the verification convention from feedback_hooks_run_installed_gem.md into a machine-enforced release gate. The trap has been sprung twice (2026-04-16 ActivityLog, 2026-04-30 #47 token-budget); the gate exists so it can't be sprung a third time. Wired into the /release skill as Phase 1 Step 6 (after specs, before lint). First 0.12.0 milestone item.
  • /study-repo memory-discipline guard (prompt-only) — top-level "CRITICAL: Memory Discipline" section in .claude/skills/study-repo/SKILL.md explicitly forbids the LLM from extracting external projects' tech stack as project-level facts. Roots the cleanup work claude-memory reject had to do during 0.11 (27-fact misattribution cluster on 2026-04-23/24, see quality_review.md 2026-04-30 cause-4 finding). Defense-in-depth detector deferred to 0.12.x or later, only built if measurement shows persistent leakage.
  • API stability audit (docs/api_stability.md) — authoritative public-API contract enumerating which CLI commands, MCP tools, hook events, Ruby classes, and schema surfaces are stable / experimental / internal. Default-to-internal applied throughout; the doc is the source of truth for what 1.0's semver promise will lock down. New ClaudeMemory::Deprecations.warn(name:, replacement:, removed_in:) module wired into PredicatePolicy.canonicalize as the first soft-rename — has_convention and primary_language synonyms now emit deprecation warnings scheduled for removal in 1.0.0. README + CLAUDE.md link to the new doc; suppress noise via CLAUDE_MEMORY_NO_DEPRECATIONS=1.
  • Release-to-release benchmark scoreboardbin/run-evals now writes spec/benchmarks/results/<version>.json after each run; new bin/bench-diff compares the current scoreboard against the most recent prior tagged version's and exits non-zero if any tracked pass-rate dropped beyond the threshold (default -5%, configurable via --threshold). Wired into /release skill Phase 1 as Step 7 — the release aborts on regressions before publish. First release with this gate is 0.12.0 itself; from 0.13.0 onward bench-diff actively gates against 0.12 baselines.

Deferred to 0.13

  • CLAUDE.md comparative baseline numbers (#4) — the comparative E2E harness compares static CLAUDE.md (auto-loaded into context) against ClaudeMemory's MCP-tool retrieval, but in headless claude -p mode Claude doesn't proactively call the recall tools, so the comparison doesn't yet exercise ClaudeMemory's retrieval path fairly (first run returned a misleading ClaudeMemory 0/10 = no-memory 0/10 vs CLAUDE.md 8/10). Publishing that would mislead, so the numbers are withheld and the harness fix is tracked for 0.13. This surfaced a genuine separable observation — in fully headless, non-tool-forcing usage, ClaudeMemory's contribution rides entirely on the SessionStart context-hook injection — also tracked for 0.13. See docs/1_0_punchlist.md #4 / #16.

Upgrade Notes

  • Schema migrates automatically to v18 (OTel telemetry tables + prompt_id on activity_events) on first DB open via Sequel::Migrator — no manual step. Round-trip migration specs cover the upgrade path from prior release boundaries.
  • The staleness marker now appears in SessionStart context for single-value facts (uses_database / deployment_platform / auth_method) older than 180 days and not recently recalled. This is additive and advisory (a ⚠ stale … verify before relying note). Tune the window with CLAUDE_MEMORY_INJECTION_STALE_DAYS; the existing CLAUDE_MEMORY_STALE_DAYS (dashboard review window) is unchanged.
  • No breaking API changes. has_convention / primary_language predicate synonyms continue to emit deprecation warnings (scheduled for removal in 1.0.0); suppress via CLAUDE_MEMORY_NO_DEPRECATIONS=1.

🧪 Real Eval Validation

Results: 2/6 passed ⚠️ 4 fai...

Read more

v0.11.0 — Trust & Cost: Token Budget, Quality Score, ROI Nudge, Show, Harm Prototype

30 Apr 21:37
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Theme: Trust & Cost — five user-visible signals that answer "is memory still worth it?" with numbers a skeptical user can read in <30 seconds.

Added

  • Token budget telemetry — every successful SessionStart context injection now records an estimated context_tokens count on its activity_events row. Surfaced three ways:
    • Dashboard Trust panel emits a token_budget block with p50/p95/avg/sample_size over the last 30 days, so the JSON dashboard endpoint and any downstream consumer answer "what does memory cost per session?"
    • claude-memory digest includes a "Context cost" subsection between activity and new-knowledge so the weekly report shows the price tag next to the value.
    • claude-memory stats --tokens [--since DAYS] reports total sessions, p50/p95/avg/min/max, and a histogram across <500 / 500-1k / 1-2k / 2-5k / 5k+ buckets.
  • Pure additive — no schema migration. Historical events written before this release simply contribute zero samples until new injections accumulate.
  • First 0.11.0 milestone item from the 1.0 punchlist (Trust & Cost). Closes the "what % of my SessionStart token budget does memory consume?" gap.
  • Hallucination rate metric — the dashboard now quantifies how clean the fact base is, not just how full it is. Distill::BareConclusionDetector is the production-side mirror of the SessionStart prompt's reason-clause requirement (decision/convention facts must embed "because…" / "so that…" / "to avoid…"). Surfaced two ways:
    • Dashboard Trust panel emits a quality_score block aggregating across project + global active facts: suspect_count (predicate=reference, retagged by ReferenceMaterialDetector), bare_conclusion_count, percentages, and an overall 0–100 score (higher = cleaner). Returns 100 on empty stores so fresh installs aren't penalized.
    • claude-memory digest includes a "Quality" section showing the score breakdown plus the in-window rejection rate ("of facts created in the last 7 days, X% have been rejected since"), so calibration drift is visible.
  • Second 0.11.0 milestone item. Pairs with token-budget telemetry to answer "is memory still worth its cost?" via two skeptic-friendly numbers.
  • claude-memory show — new CLI command prints what memory would inject at the next SessionStart in plain Markdown. Runs the exact Hook::ContextInjector path real sessions use, so output matches what Claude actually receives. Footer reports fact count, ~token estimate, and char count so users see the SessionStart cost at a glance.
    • Default suppresses the raw-transcript "Pending Knowledge Extraction" dump (intended for LLM distillation, not human reading); pass --pending to include it.
    • --source SOURCE (startup/resume/clear) simulates each fresh-session entrypoint so users can preview which sections would appear.
  • Third 0.11.0 milestone item. Closes the inspectability gap — trust requires being able to see what memory will inject, the same way cat CLAUDE.md works.
  • First-week ROI nudge — at SessionEnd, memory now prints memory contributed N facts this session, %used = X for the first 10 sessions, then quiets. New users get user-visible proof memory is doing work for them without having to know about the dashboard. Once trust is established (or it isn't), the nudge gets out of the way.
    • New claude-memory hook nudge subcommand + Hook::Handler#nudge. SessionEnd config now wires [ingest, sweep, nudge] in order.
    • Silent on CLAUDE_MEMORY_NO_NUDGE=1 opt-out, missing session_id, n=0 contributions, and after MAX_NUDGES emissions. The empty-session silent path doesn't burn a slot — quiet sessions don't count toward the 10.
    • Activity event roi_nudge records {n, used, pct, prior_count} per emission so a future migration could change the threshold without re-counting from raw events.
  • Fourth 0.11.0 milestone item. Cold-start trust signal that pairs with #47 (token cost) and #48 (quality) to make the first-week answer to "is this worth it?" visible without effort.
  • Harm benchmark prototypespec/benchmarks/dataset/harm_scenarios.yml + spec/benchmarks/e2e/harm_bench_spec.rb. Three hand-written cases spanning the riskiest harm classes (stale_tech, mismatched_scope, superseded_undetected). The first ClaudeMemory benchmark that measures whether memory can make Claude wrong — every other benchmark only measures whether memory helps.
    • Structure validation (regex compile, fact loadability, harm-class coverage) runs in stub mode as part of :benchmark tag.
    • Real-mode runner: EVAL_MODE=real bundle exec rspec spec/benchmarks/e2e/harm_bench_spec.rb — needs claude CLI on PATH, ~$2-8 per run. Reports harm rate; doesn't enforce a threshold yet (that's the 0.12 release gate).
  • 0.11.0 risk-de-risking item. If even one of these three surfaces a harm now, the full 10-15-case benchmark planned for 0.12 will likely reveal a fundamental issue — better to learn that at 0.11 than at 0.12. Real-mode prototype run on 2026-04-30 reported 0/3 harm — green light to expand to the full corpus in 0.12.

Changed

  • Hallucination-rate metric calibrationDashboard::Trust#quality_score now reports a windowed (last 30d) "live" score as the headline plus a "historical" block over all active facts. Production verification on 2026-04-30 (recorded in docs/quality_review.md) showed the unwindowed metric was technically correct but pragmatically misleading: 97% of bare-conclusion facts pre-dated the 2026-04-20 reason-clause prompt commit, and the entire 7-day rejection cluster was a single-class systemic failure (a /study-repo burst), not ongoing noise. The split makes the metric actionable: live score = ongoing extraction quality, historical = legacy data. The digest's "Quality" section uses the live score as the headline.

Fixed

  • Real-eval CLI runner now passes allowed_tools through explicitly so the harm benchmark and other real-mode benches can pre-allow MCP memory tools without per-test wiring.

Upgrade Notes

  • No schema migration. All new features ship purely additive.
  • Hooks run the installed gem from PATH, not the working tree. After upgrading, bundle exec rake install (or gem install claude_memory) is required for the new SessionEnd nudge, claude-memory show command, --tokens stats flag, and context_tokens activity-event field to actually fire on real hook events.
  • Existing quality_score consumers will see additional fields (window_days, historical) in the snapshot. The original keys (score, total_active, suspect_count, bare_conclusion_count, suspect_pct, bare_pct) remain at the top level and now reflect the 30-day live window — historical numbers move to the historical sub-hash.

🧪 Real Eval Validation

Results: 4/6 passed ⚠️ 2 failed
Duration: 73.33s
Estimated Cost: ~$0.12

⚠️ Some real eval tests failed. Check the workflow logs for details.

v0.10.0 — Dashboard, Observability, Memory Quality

28 Apr 19:59
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Added

Dashboard — feed-first redesign with observability built in

  • New feed-first dashboard UI with scope-aware moments, fact detail modal, query tester, and activity drilldown. Reuse, Trust, Knowledge, Conflicts, and Moments panels each backed by a dedicated module (Dashboard::{Reuse, Trust, Knowledge, Conflicts, Moments}) under unit tests, replacing the prior all-in-API-class layout.
  • 👍/👎 feedback on individual moments with persisted verdicts (schema v16, moment_feedback table). Trust panel surfaces a 30-day up/down ratio so the dashboard can answer "when memory surfaces something, are users marking it useful?".
  • Utilization ratio panel — of facts extracted in the last 30 days, how many has Claude actually used in a recall or context injection? Color-coded (green ≥40%, yellow ≥15%, red below). Hidden on fresh installs to avoid misleading zeros.
  • Conflict deduping at the display layer: identical (subject, predicate, object_pair) detections collapse into one row with a ×N badge. Sidebar "Needs review" count now reflects distinct contradictions, not raw row count.
  • Activity events drilldown: each moment opens a payload modal with prettified JSONL, recall trigger correlation (which user prompt motivated this lookup), and linked-fact resolution scoped per database.
  • Vector index health threshold and clickable remediation hints in the health dashboard.

CLI — observability surfaces and one-shot cleanups

  • claude-memory digest [--since DAYS] [--output FILE] — weekly markdown report. Sections: Activity, New knowledge by predicate, Utilization (extracted vs used), Conflicts, Feedback. No new schema; renders from existing aggregates.
  • claude-memory census [--root DIR] — privacy-safe cross-project vocabulary scan. Aggregates per-DB predicate × status counts, novel predicates, synonym candidates. Suppresses object literals, entity names, and paths; per-DB IDs are SHA256-prefixed.
  • claude-memory dedupe-conflicts [--scope SCOPE] [--dry-run] — one-shot cleanup for historical conflict-row duplication that predates the Resolver dedup fix (commit f571ba4). Groups by (subject, predicate, normalized object pair), keeps the earliest, migrates provenance to the keeper.
  • claude-memory reclassify-references [--scope SCOPE] [--dry-run] — retags active convention facts that the new Distill::ReferenceMaterialDetector flags as reference material (LOC counts, star counts, "X is a plugin..." templates, "by Firstname Lastname" attributions).

Memory quality

  • Access-based staleness scoring (improvements.md #35). Schema v17 adds last_recalled_at to facts. Sweep::RecallTimestampRefresher derives the field periodically from activity_events; claude-memory stats --stale [--stale-days N] lists facts that haven't been recalled inside the threshold. Replaces the prior "active facts minus seen-in-recalls" approximation.
  • Auto-memory mirror (improvements.md #36). On fresh sessions, the SessionStart context hook scans ~/.claude/projects/<slug>/memory/*.md and surfaces new or changed entries as extraction candidates so users can promote auto-memory observations into claude_memory without manual copy-paste.
  • Reasoning requirement enforced in distillation (improvements.md #34). The SessionStart prompt and the /distill-transcripts skill now require a why clause for decision and convention predicates ("because…", "so that…", etc.). Audit found ~75% of facts were bare conclusions before this change.
  • Distill::ReferenceMaterialDetector reclassifies convention facts whose object text matches reference patterns. New reference predicate registered in PredicatePolicy with its own :references snapshot section. Detector runs at write time in ManagementHandlers#store_extraction so mislabeling can't persist.
  • Predicate census command (#30) for cross-project vocabulary audits — see CLI section above.

Benchmarks and observability

  • Repeat-correction benchmark harness (improvements.md #32). spec/benchmarks/e2e/repeat_correction_spec.rb pre-loads a past correction as a memory fact, runs the prompt through real Claude under EVAL_MODE=real, and reports pass rate (no violation patterns matched). Starter set of 2 scenarios drawn from this project's recurring gotchas.
  • Relevance ratio metric (improvements.md #31). Hook::ContextInjector#emitted_subjects exposes the subjects injected at SessionStart; BenchmarkHelpers::RelevanceMetrics measures whether they appear in Claude's response. Trend signal for memory-application quality, integrated into devmemeval_spec.rb.
  • MCP server embeds the V=R/C ("Verify before Recommend / Correct") mental model in agent instructions so memory recommendations come with built-in verification cues.

Schema v15 → v17 (additive only, automatic on first run)

  • Migration 015: adds activity_events table for hook/recall/context/sweep telemetry. Powers the dashboard timeline, moments feed, and efficacy reports.
  • Migration 016: adds moment_feedback table (unique on event_id) for the dashboard 👍/👎 surface.
  • Migration 017: adds nullable facts.last_recalled_at for access-based staleness scoring.

1.0 readiness track

  • New docs/1_0_punchlist.md opens the path to 1.0: token-budget telemetry, hallucination-rate metric, negative-fact harm benchmark, CLAUDE.md baseline publication, claude-memory show, benchmark scoreboard. Ten entries (#47-56) added to docs/improvements.md with concrete file:line plumbing notes.

Changed

  • Resolver#apply_conflict no longer creates a duplicate disputed fact + conflict row when the same contradicting value is re-extracted. Looks up disputed facts in the same (subject, predicate) slot and reinforces with provenance instead.
  • Resolver no longer treats the distiller's scope_hint as a scope override. scope_hint is advisory metadata; fact.scope must match the DB the row lives in. Earlier behavior caused scope leakage where global-hinted distillations landed in the project DB.
  • Hook::ContextInjector adds emitted_fact_ids and emitted_subjects accessors so benchmark harnesses can attribute injection contributions per session.
  • SQLiteStore decomposed via module inclusion: LLMCache and MetricsAggregator extracted into lib/claude_memory/store/. SQLiteStore back under 600 LOC.
  • Dashboard::API decomposed: FactPresenter, Conflicts, Efficacy::Reporter, Timeline, Health extracted into dedicated classes following the boundary pattern. API now routes/delegates rather than aggregating.
  • Dashboard releases DB connections after each HTTP request (was holding connections open for the lifetime of the WEBrick session).
  • Sweep::Maintenance gains dedupe_open_conflicts and reclassify_references for the one-shot CLI commands above.
  • Round-trip migration specs from v12, v13, v14 → v17 (per-version migrations covered by spec/claude_memory/store/migrations/). Codifies the release-blocker convention: any schema bump must round-trip from each prior major-release boundary back ~3 releases.

Fixed

  • Dashboard surfaces an actionable hint when Recall hits FTS5 corruption (run claude-memory compact rather than a generic error).
  • Dashboard query tester unwraps the nested Recall result shape rather than printing the raw envelope.
  • Dashboard health checks correctly detect the claude-memory hook installation across the two-level Claude Code hooks structure (was reporting false negatives when hooks were installed under a matcher block).
  • Dashboard Efficacy "this session" correlation falls back to a time window when the recall event has no session_id (MCP tool calls don't thread session_id).
  • Bulk-reject in the Conflicts modal now retries with an actionable message when the server-side state is stale.

Upgrade Notes

Schema bump v14 → v17. Three migrations run automatically on first launch after upgrade. All three are additive (no existing data is rewritten):

  1. Migration 015 creates activity_events (hook/recall telemetry).
  2. Migration 016 creates moment_feedback (dashboard verdicts).
  3. Migration 017 adds facts.last_recalled_at (NULL by default; Sweep::RecallTimestampRefresher populates it on the next sweep cycle from existing activity_events).

The migration delta has round-trip spec coverage in spec/claude_memory/store/migrations/. Forward-compatibility: 0.10.0 databases cannot be opened by 0.9.x or earlier. Downgrade is destructive — back up ~/.claude/memory.sqlite3 and .claude/memory.sqlite3 before downgrading.

Optional historical cleanups. Two new admin commands address data tails left by earlier bugs that have since been fixed at the source:

claude-memory dedupe-conflicts --dry-run   # preview duplicate conflict rows
claude-memory dedupe-conflicts             # consolidate them
claude-memory reclassify-references --dry-run   # preview reference-material mislabels
claude-memory reclassify-references             # retag them

Both are opt-in. Neither runs in the regular sweep cycle. Use --scope global to clean the global DB.

Telemetry footprint. The activity_events table grows with hook activity. The dashboard surfaces this by default and powers the timeline/moments/efficacy panels. Retention pruning is not yet automatic (planned for a follow-up); manual cleanup via DELETE FROM activity_events WHERE occurred_at < ? is safe — the dashboard tolerates missing history.

v0.9.1 — MCP JSON-RPC notifications fix

28 Apr 20:11
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Fixed

  • MCP server now conforms to JSON-RPC 2.0: notifications (messages without an id) never receive a response. Previously, notifications/initialized — which Claude Code sends after every handshake — triggered a spurious Method not found error frame, causing strict MCP clients to mark the server failed on /mcp reconnect after the initial connection.

v0.9.0 — Predicate Design Overhaul, Reject/Restore, Telemetry

16 Apr 17:41
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Highlights

Predicate vocabulary overhaul — curated from 13 → 8 predicates based on a multi-project survey of real memory databases. uses_framework reclassified as multi-value (fixing silent data loss in production). PredicatePolicy is now the single source of truth for vocabulary, snapshot sections, synonym canonicalization, and LLM guidance.

New commands: reject and restore — first-class tools for managing distiller quality. Mark hallucinated facts as wrong, or recover facts that were superseded by an obsolete classification.

MCP tool-call telemetry — every tool invocation is timed and recorded. claude-memory stats --tools shows call counts, latency percentiles, and error rates.

Proactive memory recall — MCP instructions now direct Claude to check conventions before code generation, architecture before explanations, and decisions before refactoring. A/B testing showed this produces 76-line accurate architecture explanations vs honest refusals without memory.


Added

  • claude-memory reject <id_or_docid> command + memory.reject_fact MCP tool — explicitly mark distiller hallucinations as wrong, closing associated conflicts
  • claude-memory restore --predicate NAME command — recover facts superseded by obsolete single-value predicate classifications (Jaccard-based token overlap heuristic)
  • MCP tool-call telemetry: mcp_tool_calls table, claude-memory stats --tools [--since DAYS], 90-day retention via Sweep
  • CLAUDE_CONFIG_DIR env var support for non-standard Claude Code config locations
  • Predicate synonym canonicalization at insert time (has_conventionconvention, primary_languageuses_language)
  • Novel predicate warnings at insert time
  • NullDistiller emits uses_language facts for detected language entities
  • Proactive memory recall guidance in MCP server instructions
  • YARD documentation across 13 core source files (+473 lines)

Changed

  • uses_framework reclassified as multi-value — real projects use multiple frameworks (Rails + Turbo + Tailwind). Prior single-value classification silently superseded valid facts. Run claude-memory restore --predicate uses_framework to recover
  • PredicatePolicy is single source of truth for vocabulary, snapshot sections, synonym canonicalization, and LLM guidance
  • Predicate vocabulary curated 13 → 8 based on multi-project usage data
  • Registry::COMMANDS stores {class:, description:} with direct class references
  • Plugin and gem descriptions rewritten to be outcome-focused

Fixed

  • StatsCommand broken in production — used Sequel.sqlite (requires unlisted sqlite3 gem). Now uses extralite adapter
  • Missing embeddings command in shell completion output

Upgrade Notes

Schema: v12 → v14 (automatic). Migration 013 adds mcp_tool_calls. Migration 014 canonicalizes stale predicate names in existing facts.

Action required for uses_framework recovery: If your project uses multiple frameworks, past sessions may have superseded valid facts:

claude-memory restore --predicate uses_framework --dry-run   # preview
claude-memory restore --predicate uses_framework              # restore

Pruned predicates still work: preference, workflow, dependency, testing_strategy, tool_usage, ci_platform fall through to default multi-value policy. Existing facts are unaffected.


Full Changelog: v0.8.0...v0.9.0

v0.8.0

30 Mar 17:58
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Added

Three-Layer Distillation Pipeline

  • Automatic distillation via NullDistiller in ingest pipeline (Layer 1: regex-based, P95 < 5ms)
  • Context hook injection for LLM-based extraction at SessionStart (Layer 2: Claude Code as distiller, zero extra cost)
  • /distill-transcripts skill for manual deep extraction (Layer 3: on-demand, depth-aware prompts)
  • memory.undistilled and memory.mark_distilled MCP tools for distillation tracking
  • Hook::DistillationRunner extracted from Handler for context hook injection
  • TaskCompleted and TeammateIdle hook events for ingest triggers
  • Distillation metrics backfill on database initialization
  • Doctor check for undistilled content
  • Pending distillation count in memory.status output

Recall Enhancements

  • Intent parameter for recall query disambiguation (#3)
  • Retrieval score traces for semantic search (#5)
  • Configurable embedding providers with dimension checking

Hook Enhancements

  • statusMessage on all hooks for descriptive spinner text during hook execution
  • StopFailure hook to capture transcript data even on session errors (rate limits, server errors)
  • Notification hook with idle_prompt matcher for opportunistic sweep during idle

New Commands & Skills

  • install-skill command and memory-recall agent (#8, #12)
  • Shell completion command for bash and zsh (#18)

Distillation Benchmark Results

  • NullDistiller: Concept Recall 0.952, Fact Precision/Recall 1.000 (31 test cases)
  • Claude Code LLM: Concept Recall 0.902 (all 41 cases), 0.900 on semantic cases (vs 0.333 for regex)
  • Average 1.6 facts stored per case across LLM extraction
  • E2E distillation recall benchmark and extraction quality benchmarks
  • Concept-based matching for distiller-agnostic benchmark comparison

Fixed

  • --allowedTools added to ClaudeCliRunner for MCP tool permissions
  • Test isolation for context hook when global database has facts

Internal

  • Extracted RetryHandler and SchemaManager modules from SQLiteStore
  • Extracted Recall into engine strategy pattern with DualEngine, LegacyEngine, and shared QueryCore
  • Extracted Tools god object into 6 handler modules
  • Added 36 specs for 5 previously untested files
  • All 3 god objects eliminated, 0 files over 500 lines

v0.7.1

30 Mar 17:47
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

Added

Three-Level Sweep Escalation

  • Maintenance class with light/standard/deep sweep levels for progressive database maintenance
  • Exposed sweep escalation via memory.sweep_now MCP tool with configurable level
  • Tool escalation workflow added to MCP QueryGuide documentation

Embedding Deduplication

  • Content-addressed deduplication for embeddings using SHA256 hashing
  • Deduplication before vector scoring in fallback path to prevent duplicate results

MCP Enhancements

  • Structured error classification for MCP tools via ErrorClassifier module
  • Dynamic knowledge summary in MCP server instructions via InstructionsBuilder

Fixed

  • Plugin hook loading error: Removed explicit hooks reference from plugin.json manifest — Claude Code auto-loads hooks/hooks.json from the plugin root, so declaring it caused "Duplicate hooks file detected" errors on plugin install

Internal

  • Influence study: lossless-claw v0.3.0 DAG-based lossless context management
  • Marked 7 improvements as implemented (#10, #11, #14, #15, #16, #19, #20)

v0.7.0

13 Mar 13:22
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

ClaudeMemory v0.7.0

Added

FTS5 Contentless Mode

  • FTS5 tables now created with content='' for ~40% smaller databases
  • Auto-detection: both legacy and contentless formats work seamlessly
  • compact command rebuilds FTS index to contentless format
  • stats command reports FTS format and optimization hints

Worktree-Aware Project Paths

  • Project database now resolves to main repository root across git worktrees
  • Prevents duplicate project databases when using git worktree
  • Opt-out: set CLAUDE_MEMORY_ISOLATE_WORKTREES=1 for per-worktree isolation

MCP Enhancements

  • Tool annotations: readOnlyHint, idempotentHint, destructiveHint on all 21 tools
  • Stdout protection: MCP server redirects $stdout to $stderr to prevent protocol corruption from accidental puts/print calls
  • Self-excluding agent conversations via SELF_CONTEXT_MARKER to prevent meta-pollution

New Commands

  • git-lfs command for setting up git-lfs tracking of project memory databases

Fixed

  • Narrowed rescue clauses in discover_other_projects (was bare rescue, now catches specific exceptions)
  • FTS entries now cleaned up when content is pruned by sweeper (prevents orphaned index entries)
  • FTS index rebuilt during compact for consistent state after upgrades
  • Real evals CI: install gem and use correct release API

Internal

  • Resolver refactored for better thread safety (parameters instead of instance variables)
  • SnippetExtractor DRY refactoring
  • StoreManager.promote_fact single-transaction safety
  • Influence study: QMD v2.0.1 SDK-first architecture analysis

22 CLI commands · 21 MCP tools · 1,435 tests

v0.6.0

06 Mar 15:56
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

What's New

Native Vector Storage (sqlite-vec)

  • Integrated sqlite-vec for native KNN vector search
    • VectorIndex class with vec0 virtual table for cosine similarity search
    • Dual-write: embeddings stored in both JSON column and vec0 index
    • claude-memory index --vec flag for backfilling existing embeddings into vec0
    • Fast path in Recall uses sqlite-vec KNN when available, falls back to JSON + Ruby
    • Sweeper cleans up vec0 entries for superseded/expired facts
    • Doctor and MCP status/stats report vec0 availability and coverage
    • Cross-platform support with platform-specific gem installation

Database Maintenance

  • compact command for database maintenance (VACUUM + integrity check)
  • export command for fact backup and migration to JSON

Hook Enhancements

  • SessionStart context injection via hookSpecificOutput.additionalContext
    • Injects recent facts and project context at session start
  • Tool-specific observation compression for reduced token usage
  • --async flag for non-blocking hook execution
  • Hook error classification for graceful degradation
  • Conversation exclusion markers for session-level opt-out

MCP Discovery

  • memory.list_projects MCP tool for discovering all project databases

Developer Experience

  • Dynamic MCP server instructions with progressive disclosure documentation
  • Comparative benchmark suite with QMD and grepai adapters

Bug Fixes

  • Recall returned no results: DualQueryTemplate accessed stores before initializing them, causing all recall queries to silently return empty results. Refactored to use existing store_for_scope method.
  • Doctor crashed on sqlite-vec tables: SchemaValidator iterated all tables including vec0 virtual tables, which require the sqlite-vec extension. Now skips facts_vec* tables using prefix match.
  • Forward-migrated databases: Older gem versions now gracefully handle databases migrated by newer versions instead of crashing.
  • Hybrid retrieval ordering: Preserved BM25 scores and RRF ordering in hybrid search results instead of re-sorting by source/time.

Stats

  • 21 MCP tools, 22 CLI commands
  • 1316 test examples, 0 failures
  • Full changelog: CHANGELOG.md