Releases · codenamev/claude_memory

05 Jun 15:36

Immutable

v0.12.1

0cf2e66

v0.12.1 — Upgrade-Experience Patches (setup-vectors, doctor EmbeddingsCheck, plugin manifest fix) Latest

Latest

Theme: Upgrade-experience patches surfaced by the 0.12.0 soak. Four small but high-impact fixes — all uncovered by one user upgrading a single project — closing visibility gaps in the doctor and the plugin manifest. No schema changes, no breaking changes.

Added

claude-memory setup-vectors command — the documented opt-in path for end users who want vector recall via the BAAI/bge-small-en-v1.5 model. fastembed remains a dev/test gem dependency by design (the default install stays light); this command verifies the chosen provider is loadable (gracefully prompts to gem install fastembed if not), writes CLAUDE_MEMORY_EMBEDDING_PROVIDER (and optional CLAUDE_MEMORY_EMBEDDING_MODEL) to the project's .claude/settings.json env block — the same mechanism Claude Code uses for OTel — and re-indexes existing facts via the existing IndexCommand (skip with --no-reindex). Supports --status for current config and --dry-run for inspection. Preserves unrelated settings.json keys.
Checks::EmbeddingsCheck in claude-memory doctor — surfaces the active embedding provider name and dimensions, hints to set CLAUDE_MEMORY_EMBEDDING_PROVIDER=fastembed when on tfidf default and fastembed is loadable, and reports dimension mismatches between stored vectors and the current provider. Closes the visibility gap where a user could see sqlite-vec available ✓ while silently running on tfidf without knowing.

Fixed

plugin.json declared skills: "./skills/" and outputStyles: "./output-styles/" pointing at non-existent directories. Per Claude Code's plugin reference, distill-transcripts.md is correctly a flat command (not a skill); both forms register as /<name> slash commands. Dead keys removed. Plugin spec rewritten as deletion-safe ("every directory key in plugin.json points at an existing directory") so this can't regress.

Documentation

README "Upgrading" section now documents the marketplace-refresh + /reload-plugins flow explicitly. After /plugin marketplace update <name> users must run /reload-plugins or restart Claude Code for new slash commands to appear — this bit one user upgrading to 0.12.0 looking for /distill-transcripts. Includes /audit-memory and /distill-transcripts as named examples.

Upgrade Notes

No DB migrations. Schema stays at v18.
After gem update claude_memory, run /plugin marketplace update claude-memory && /reload-plugins (or restart Claude Code) to see the new /distill-transcripts and /audit-memory slash commands.
Existing fact bases continue to use whatever embedding provider they were indexed under. To opt into fastembed, run claude-memory setup-vectors — it handles provider switching + re-index in one step.
claude-memory doctor will now emit a warning on tfidf default with fastembed loadable. This is informational, not an error; the system continues to function on tfidf.

🧪 Real Eval Validation

Results: 0/6 passed ⚠️ 6 failed
Duration: 68.54s
Estimated Cost: ~$0.12

⚠️ Some real eval tests failed. Check the workflow logs for details.

Assets 3

01 Jun 13:17

codenamev

Immutable

v0.12.0

829017f

v0.12.0 — Release Discipline, Observability, Self-Audit

Theme: Release Discipline + Observability + Self-Audit — the infrastructure that makes a 1.0 semver promise defensible. This release locks down the public API surface, adds the observability primitives (OTel ingestion, dashboard Telemetry) and the self-audit toolkit (claude-memory audit) that serve the visibility pillar, and ships the negative-fact harm benchmark + staleness guard that make the long-horizon-quality claim measurable rather than aspirational.

Added

Staleness guard for single-value facts — single-value predicates (uses_database / deployment_platform / auth_method) are exclusive claims Claude follows authoritatively, so a stale one is the most dangerous kind of memory. The 0.12 harm benchmark caught Claude emitting git push heroku HEAD:main from a stale deployment_platform fact with zero hedge — and supersession only protects against this if the replacement was recorded. New Recall::StalenessAnnotator (pure function) flags single-value facts that are old (valid_from/created_at older than injection_stale_days, default 180) AND not recently confirmed (last_recalled_at null or stale); Hook::ContextInjector appends a ⚠ stale: recorded YYYY-MM-DD … verify before relying marker at SessionStart so Claude can hedge or verify instead of blindly following. Multi-value predicates are never annotated (they accumulate; one stale entry isn't authoritative). New Configuration#injection_stale_days (CLAUDE_MEMORY_INJECTION_STALE_DAYS), deliberately much longer than the 14-day dashboard review window. Serves the 1.0 long-horizon-quality pillar — it's the first defense against memory degrading session quality over months.
Negative-fact harm benchmark — full 13-scenario corpus + release gate — expands the 0.11 3-scenario prototype to 13 cases across four harm classes (stale_tech, mismatched_scope, superseded_undetected, and the new reference_material_as_fact). Each scenario ships a project_files scaffold whose current state contradicts the wrong memory fact, so the test measures "does Claude follow stale/wrong memory over the project's actual state?" rather than reacting to an empty directory. Scored best-of-N (default 3 runs, majority vote per scenario via HARM_BENCH_RUNS) to absorb single-shot LLM nondeterminism. HARM_RATE_THRESHOLD (default 1%) fails the run if the majority-harmed scenario rate is exceeded — making "memory doesn't make Claude wrong" a measurable release gate rather than a marketing claim. The first full-corpus real-mode run surfaced a real harm (stale deployment fact) and a harness confound (empty-tmpdir noise), which drove both the staleness guard above and the scaffold + best-of-N harness hardening.
claude-memory audit — memory health diagnostic — productionizes the 2026-05-21 contamination audit into a stable diagnostic surface anyone using claude_memory can run on their own setup. Ten contract checks (C001-C010) cover open conflicts, single-cardinality multiplicity, distillation backlog, shortcut-leak detection, duplicate global conventions, bare-conclusion rate, project starvation, auto-memory import gaps, and single-cardinality churn. --json is the stable contract for CI; --severity filters; --no-exit always exits 0. The /audit-memory slash command wraps the same runner for an interactive walkthrough. docs/audit_runbook.md documents each check's rationale and remediation. CHECK_METHODS is append-only by design so JSON consumers don't break when new checks land. New claude-memory import-auto-memory retroactively pulls ~/.claude/projects/<slug>/memory/*.md entries that AutoMemoryMirror previously missed (slug bug: tr("/", "-") left underscores intact, so claude_memory paths never matched). Contributes to the visibility pillar of 1.0.
Contamination guardrails — ReferenceMaterialDetector example-quote guard + Resolver :discard path — the distiller used to treat example sentences in docs/CLAUDE.md ("e.g., postgres", "for example, mysql") as literal claims about the project, accumulating 103 rejected single-cardinality facts over six weeks before being caught by the 2026-05-21 audit. Two defenses now: (1) ReferenceMaterialDetector flags single-cardinality predicate extractions whose source text contains e.g., / for example / i.e. quote patterns so they're tagged reference material at write time; (2) Resolver gains a :discard resolution path for the same shape so the fact never lands even if the detector misses. Memory shortcuts (memory.decisions / .conventions / .architecture) refactored from FTS text search (which returned facts whose object matched the predicate keyword) to predicate-based filtering via PredicatePolicy, with project-DB precedence over global. Closes a class of "is memory still trustworthy?" bugs that erode the 1.0 stability claim.
OpenTelemetry ingestion + dashboard Telemetry tab — Claude Code can now export metrics, log-style events, and (opt-in) traces straight into the dashboard via OTLP/HTTP/JSON. New claude-memory otel CLI manages the env block in .claude/settings.json (--enable, --disable, --enable-traces, --capture-prompts, --status, --verify); the dashboard exposes /v1/metrics, /v1/logs, /v1/traces on 127.0.0.1:3377 and a new "Telemetry" drawer showing cost per hour, tokens by model, top tools by latency, and a per-prompt journey waterfall that UNIONs otel_events with the existing activity_events. Schema v18 adds otel_metrics/otel_events/otel_traces plus an additive prompt_id column on activity_events for journey correlation. Privacy posture: nothing past metric counts is captured by default; OTEL_LOG_USER_PROMPTS only flips on with explicit --capture-prompts confirmation; traces remain 501-gated until the user opts in. Sweep retention defaults: 30 days metrics, 14 days events, 7 days traces.
Pre-release hook smoke gate (bin/pre-release-smoke) — verifies the installed claude-memory gem actually fires hooks correctly and populates expected detail_json fields per spec/smoke/expected_fields.yml. Codifies the verification convention from feedback_hooks_run_installed_gem.md into a machine-enforced release gate. The trap has been sprung twice (2026-04-16 ActivityLog, 2026-04-30 #47 token-budget); the gate exists so it can't be sprung a third time. Wired into the /release skill as Phase 1 Step 6 (after specs, before lint). First 0.12.0 milestone item.
/study-repo memory-discipline guard (prompt-only) — top-level "CRITICAL: Memory Discipline" section in .claude/skills/study-repo/SKILL.md explicitly forbids the LLM from extracting external projects' tech stack as project-level facts. Roots the cleanup work claude-memory reject had to do during 0.11 (27-fact misattribution cluster on 2026-04-23/24, see quality_review.md 2026-04-30 cause-4 finding). Defense-in-depth detector deferred to 0.12.x or later, only built if measurement shows persistent leakage.
API stability audit (docs/api_stability.md) — authoritative public-API contract enumerating which CLI commands, MCP tools, hook events, Ruby classes, and schema surfaces are stable / experimental / internal. Default-to-internal applied throughout; the doc is the source of truth for what 1.0's semver promise will lock down. New ClaudeMemory::Deprecations.warn(name:, replacement:, removed_in:) module wired into PredicatePolicy.canonicalize as the first soft-rename — has_convention and primary_language synonyms now emit deprecation warnings scheduled for removal in 1.0.0. README + CLAUDE.md link to the new doc; suppress noise via CLAUDE_MEMORY_NO_DEPRECATIONS=1.
Release-to-release benchmark scoreboard — bin/run-evals now writes spec/benchmarks/results/<version>.json after each run; new bin/bench-diff compares the current scoreboard against the most recent prior tagged version's and exits non-zero if any tracked pass-rate dropped beyond the threshold (default -5%, configurable via --threshold). Wired into /release skill Phase 1 as Step 7 — the release aborts on regressions before publish. First release with this gate is 0.12.0 itself; from 0.13.0 onward bench-diff actively gates against 0.12 baselines.

Deferred to 0.13

CLAUDE.md comparative baseline numbers (#4) — the comparative E2E harness compares static CLAUDE.md (auto-loaded into context) against ClaudeMemory's MCP-tool retrieval, but in headless claude -p mode Claude doesn't proactively call the recall tools, so the comparison doesn't yet exercise ClaudeMemory's retrieval path fairly (first run returned a misleading ClaudeMemory 0/10 = no-memory 0/10 vs CLAUDE.md 8/10). Publishing that would mislead, so the numbers are withheld and the harness fix is tracked for 0.13. This surfaced a genuine separable observation — in fully headless, non-tool-forcing usage, ClaudeMemory's contribution rides entirely on the SessionStart context-hook injection — also tracked for 0.13. See docs/1_0_punchlist.md #4 / #16.

Upgrade Notes

Schema migrates automatically to v18 (OTel telemetry tables + prompt_id on activity_events) on first DB open via Sequel::Migrator — no manual step. Round-trip migration specs cover the upgrade path from prior release boundaries.
The staleness marker now appears in SessionStart context for single-value facts (uses_database / deployment_platform / auth_method) older than 180 days and not recently recalled. This is additive and advisory (a ⚠ stale … verify before relying note). Tune the window with CLAUDE_MEMORY_INJECTION_STALE_DAYS; the existing CLAUDE_MEMORY_STALE_DAYS (dashboard review window) is unchanged.
No breaking API changes. has_convention / primary_language predicate synonyms continue to emit deprecation warnings (scheduled for removal in 1.0.0); suppress via CLAUDE_MEMORY_NO_DEPRECATIONS=1.

🧪 Real Eval Validation

Results: 2/6 passed ⚠️ 4 fai...

Assets 3

30 Apr 21:37

codenamev

Immutable

v0.11.0

c8ed0bc

v0.11.0 — Trust & Cost: Token Budget, Quality Score, ROI Nudge, Show, Harm Prototype

Theme: Trust & Cost — five user-visible signals that answer "is memory still worth it?" with numbers a skeptical user can read in <30 seconds.

Added

Token budget telemetry — every successful SessionStart context injection now records an estimated context_tokens count on its activity_events row. Surfaced three ways:
- Dashboard Trust panel emits a token_budget block with p50/p95/avg/sample_size over the last 30 days, so the JSON dashboard endpoint and any downstream consumer answer "what does memory cost per session?"
- claude-memory digest includes a "Context cost" subsection between activity and new-knowledge so the weekly report shows the price tag next to the value.
- claude-memory stats --tokens [--since DAYS] reports total sessions, p50/p95/avg/min/max, and a histogram across <500 / 500-1k / 1-2k / 2-5k / 5k+ buckets.
Pure additive — no schema migration. Historical events written before this release simply contribute zero samples until new injections accumulate.
First 0.11.0 milestone item from the 1.0 punchlist (Trust & Cost). Closes the "what % of my SessionStart token budget does memory consume?" gap.
Hallucination rate metric — the dashboard now quantifies how clean the fact base is, not just how full it is. Distill::BareConclusionDetector is the production-side mirror of the SessionStart prompt's reason-clause requirement (decision/convention facts must embed "because…" / "so that…" / "to avoid…"). Surfaced two ways:
- Dashboard Trust panel emits a quality_score block aggregating across project + global active facts: suspect_count (predicate=reference, retagged by ReferenceMaterialDetector), bare_conclusion_count, percentages, and an overall 0–100 score (higher = cleaner). Returns 100 on empty stores so fresh installs aren't penalized.
- claude-memory digest includes a "Quality" section showing the score breakdown plus the in-window rejection rate ("of facts created in the last 7 days, X% have been rejected since"), so calibration drift is visible.
Second 0.11.0 milestone item. Pairs with token-budget telemetry to answer "is memory still worth its cost?" via two skeptic-friendly numbers.
claude-memory show — new CLI command prints what memory would inject at the next SessionStart in plain Markdown. Runs the exact Hook::ContextInjector path real sessions use, so output matches what Claude actually receives. Footer reports fact count, ~token estimate, and char count so users see the SessionStart cost at a glance.
- Default suppresses the raw-transcript "Pending Knowledge Extraction" dump (intended for LLM distillation, not human reading); pass --pending to include it.
- --source SOURCE (startup/resume/clear) simulates each fresh-session entrypoint so users can preview which sections would appear.
Third 0.11.0 milestone item. Closes the inspectability gap — trust requires being able to see what memory will inject, the same way cat CLAUDE.md works.
First-week ROI nudge — at SessionEnd, memory now prints memory contributed N facts this session, %used = X for the first 10 sessions, then quiets. New users get user-visible proof memory is doing work for them without having to know about the dashboard. Once trust is established (or it isn't), the nudge gets out of the way.
- New claude-memory hook nudge subcommand + Hook::Handler#nudge. SessionEnd config now wires [ingest, sweep, nudge] in order.
- Silent on CLAUDE_MEMORY_NO_NUDGE=1 opt-out, missing session_id, n=0 contributions, and after MAX_NUDGES emissions. The empty-session silent path doesn't burn a slot — quiet sessions don't count toward the 10.
- Activity event roi_nudge records {n, used, pct, prior_count} per emission so a future migration could change the threshold without re-counting from raw events.
Fourth 0.11.0 milestone item. Cold-start trust signal that pairs with #47 (token cost) and #48 (quality) to make the first-week answer to "is this worth it?" visible without effort.
Harm benchmark prototype — spec/benchmarks/dataset/harm_scenarios.yml + spec/benchmarks/e2e/harm_bench_spec.rb. Three hand-written cases spanning the riskiest harm classes (stale_tech, mismatched_scope, superseded_undetected). The first ClaudeMemory benchmark that measures whether memory can make Claude wrong — every other benchmark only measures whether memory helps.
- Structure validation (regex compile, fact loadability, harm-class coverage) runs in stub mode as part of :benchmark tag.
- Real-mode runner: EVAL_MODE=real bundle exec rspec spec/benchmarks/e2e/harm_bench_spec.rb — needs claude CLI on PATH, ~$2-8 per run. Reports harm rate; doesn't enforce a threshold yet (that's the 0.12 release gate).
0.11.0 risk-de-risking item. If even one of these three surfaces a harm now, the full 10-15-case benchmark planned for 0.12 will likely reveal a fundamental issue — better to learn that at 0.11 than at 0.12. Real-mode prototype run on 2026-04-30 reported 0/3 harm — green light to expand to the full corpus in 0.12.

Changed

Hallucination-rate metric calibration — Dashboard::Trust#quality_score now reports a windowed (last 30d) "live" score as the headline plus a "historical" block over all active facts. Production verification on 2026-04-30 (recorded in docs/quality_review.md) showed the unwindowed metric was technically correct but pragmatically misleading: 97% of bare-conclusion facts pre-dated the 2026-04-20 reason-clause prompt commit, and the entire 7-day rejection cluster was a single-class systemic failure (a /study-repo burst), not ongoing noise. The split makes the metric actionable: live score = ongoing extraction quality, historical = legacy data. The digest's "Quality" section uses the live score as the headline.

Fixed

Real-eval CLI runner now passes allowed_tools through explicitly so the harm benchmark and other real-mode benches can pre-allow MCP memory tools without per-test wiring.

Upgrade Notes

No schema migration. All new features ship purely additive.
Hooks run the installed gem from PATH, not the working tree. After upgrading, bundle exec rake install (or gem install claude_memory) is required for the new SessionEnd nudge, claude-memory show command, --tokens stats flag, and context_tokens activity-event field to actually fire on real hook events.
Existing quality_score consumers will see additional fields (window_days, historical) in the snapshot. The original keys (score, total_active, suspect_count, bare_conclusion_count, suspect_pct, bare_pct) remain at the top level and now reflect the 30-day live window — historical numbers move to the historical sub-hash.

🧪 Real Eval Validation

Results: 4/6 passed ⚠️ 2 failed
Duration: 73.33s
Estimated Cost: ~$0.12

⚠️ Some real eval tests failed. Check the workflow logs for details.

Assets 3

28 Apr 19:59

codenamev

Immutable

v0.10.0

65dc1db

v0.10.0 — Dashboard, Observability, Memory Quality

Added

Dashboard — feed-first redesign with observability built in

New feed-first dashboard UI with scope-aware moments, fact detail modal, query tester, and activity drilldown. Reuse, Trust, Knowledge, Conflicts, and Moments panels each backed by a dedicated module (Dashboard::{Reuse, Trust, Knowledge, Conflicts, Moments}) under unit tests, replacing the prior all-in-API-class layout.
👍/👎 feedback on individual moments with persisted verdicts (schema v16, moment_feedback table). Trust panel surfaces a 30-day up/down ratio so the dashboard can answer "when memory surfaces something, are users marking it useful?".
Utilization ratio panel — of facts extracted in the last 30 days, how many has Claude actually used in a recall or context injection? Color-coded (green ≥40%, yellow ≥15%, red below). Hidden on fresh installs to avoid misleading zeros.
Conflict deduping at the display layer: identical (subject, predicate, object_pair) detections collapse into one row with a ×N badge. Sidebar "Needs review" count now reflects distinct contradictions, not raw row count.
Activity events drilldown: each moment opens a payload modal with prettified JSONL, recall trigger correlation (which user prompt motivated this lookup), and linked-fact resolution scoped per database.
Vector index health threshold and clickable remediation hints in the health dashboard.

CLI — observability surfaces and one-shot cleanups

claude-memory digest [--since DAYS] [--output FILE] — weekly markdown report. Sections: Activity, New knowledge by predicate, Utilization (extracted vs used), Conflicts, Feedback. No new schema; renders from existing aggregates.
claude-memory census [--root DIR] — privacy-safe cross-project vocabulary scan. Aggregates per-DB predicate × status counts, novel predicates, synonym candidates. Suppresses object literals, entity names, and paths; per-DB IDs are SHA256-prefixed.
claude-memory dedupe-conflicts [--scope SCOPE] [--dry-run] — one-shot cleanup for historical conflict-row duplication that predates the Resolver dedup fix (commit f571ba4). Groups by (subject, predicate, normalized object pair), keeps the earliest, migrates provenance to the keeper.
claude-memory reclassify-references [--scope SCOPE] [--dry-run] — retags active convention facts that the new Distill::ReferenceMaterialDetector flags as reference material (LOC counts, star counts, "X is a plugin..." templates, "by Firstname Lastname" attributions).

Memory quality

Access-based staleness scoring (improvements.md #35). Schema v17 adds last_recalled_at to facts. Sweep::RecallTimestampRefresher derives the field periodically from activity_events; claude-memory stats --stale [--stale-days N] lists facts that haven't been recalled inside the threshold. Replaces the prior "active facts minus seen-in-recalls" approximation.
Auto-memory mirror (improvements.md #36). On fresh sessions, the SessionStart context hook scans ~/.claude/projects/<slug>/memory/*.md and surfaces new or changed entries as extraction candidates so users can promote auto-memory observations into claude_memory without manual copy-paste.
Reasoning requirement enforced in distillation (improvements.md #34). The SessionStart prompt and the /distill-transcripts skill now require a why clause for decision and convention predicates ("because…", "so that…", etc.). Audit found ~75% of facts were bare conclusions before this change.
Distill::ReferenceMaterialDetector reclassifies convention facts whose object text matches reference patterns. New reference predicate registered in PredicatePolicy with its own :references snapshot section. Detector runs at write time in ManagementHandlers#store_extraction so mislabeling can't persist.
Predicate census command (#30) for cross-project vocabulary audits — see CLI section above.

Benchmarks and observability

Repeat-correction benchmark harness (improvements.md #32). spec/benchmarks/e2e/repeat_correction_spec.rb pre-loads a past correction as a memory fact, runs the prompt through real Claude under EVAL_MODE=real, and reports pass rate (no violation patterns matched). Starter set of 2 scenarios drawn from this project's recurring gotchas.
Relevance ratio metric (improvements.md #31). Hook::ContextInjector#emitted_subjects exposes the subjects injected at SessionStart; BenchmarkHelpers::RelevanceMetrics measures whether they appear in Claude's response. Trend signal for memory-application quality, integrated into devmemeval_spec.rb.
MCP server embeds the V=R/C ("Verify before Recommend / Correct") mental model in agent instructions so memory recommendations come with built-in verification cues.

Schema v15 → v17 (additive only, automatic on first run)

Migration 015: adds activity_events table for hook/recall/context/sweep telemetry. Powers the dashboard timeline, moments feed, and efficacy reports.
Migration 016: adds moment_feedback table (unique on event_id) for the dashboard 👍/👎 surface.
Migration 017: adds nullable facts.last_recalled_at for access-based staleness scoring.

1.0 readiness track

New docs/1_0_punchlist.md opens the path to 1.0: token-budget telemetry, hallucination-rate metric, negative-fact harm benchmark, CLAUDE.md baseline publication, claude-memory show, benchmark scoreboard. Ten entries (#47-56) added to docs/improvements.md with concrete file:line plumbing notes.

Changed

Resolver#apply_conflict no longer creates a duplicate disputed fact + conflict row when the same contradicting value is re-extracted. Looks up disputed facts in the same (subject, predicate) slot and reinforces with provenance instead.
Resolver no longer treats the distiller's scope_hint as a scope override. scope_hint is advisory metadata; fact.scope must match the DB the row lives in. Earlier behavior caused scope leakage where global-hinted distillations landed in the project DB.
Hook::ContextInjector adds emitted_fact_ids and emitted_subjects accessors so benchmark harnesses can attribute injection contributions per session.
SQLiteStore decomposed via module inclusion: LLMCache and MetricsAggregator extracted into lib/claude_memory/store/. SQLiteStore back under 600 LOC.
Dashboard::API decomposed: FactPresenter, Conflicts, Efficacy::Reporter, Timeline, Health extracted into dedicated classes following the boundary pattern. API now routes/delegates rather than aggregating.
Dashboard releases DB connections after each HTTP request (was holding connections open for the lifetime of the WEBrick session).
Sweep::Maintenance gains dedupe_open_conflicts and reclassify_references for the one-shot CLI commands above.
Round-trip migration specs from v12, v13, v14 → v17 (per-version migrations covered by spec/claude_memory/store/migrations/). Codifies the release-blocker convention: any schema bump must round-trip from each prior major-release boundary back ~3 releases.

Fixed

Dashboard surfaces an actionable hint when Recall hits FTS5 corruption (run claude-memory compact rather than a generic error).
Dashboard query tester unwraps the nested Recall result shape rather than printing the raw envelope.
Dashboard health checks correctly detect the claude-memory hook installation across the two-level Claude Code hooks structure (was reporting false negatives when hooks were installed under a matcher block).
Dashboard Efficacy "this session" correlation falls back to a time window when the recall event has no session_id (MCP tool calls don't thread session_id).
Bulk-reject in the Conflicts modal now retries with an actionable message when the server-side state is stale.

Upgrade Notes

Schema bump v14 → v17. Three migrations run automatically on first launch after upgrade. All three are additive (no existing data is rewritten):

Migration 015 creates activity_events (hook/recall telemetry).
Migration 016 creates moment_feedback (dashboard verdicts).
Migration 017 adds facts.last_recalled_at (NULL by default; Sweep::RecallTimestampRefresher populates it on the next sweep cycle from existing activity_events).

The migration delta has round-trip spec coverage in spec/claude_memory/store/migrations/. Forward-compatibility: 0.10.0 databases cannot be opened by 0.9.x or earlier. Downgrade is destructive — back up ~/.claude/memory.sqlite3 and .claude/memory.sqlite3 before downgrading.

Optional historical cleanups. Two new admin commands address data tails left by earlier bugs that have since been fixed at the source:

claude-memory dedupe-conflicts --dry-run   # preview duplicate conflict rows
claude-memory dedupe-conflicts             # consolidate them
claude-memory reclassify-references --dry-run   # preview reference-material mislabels
claude-memory reclassify-references             # retag them

Both are opt-in. Neither runs in the regular sweep cycle. Use --scope global to clean the global DB.

Telemetry footprint. The activity_events table grows with hook activity. The dashboard surfaces this by default and powers the timeline/moments/efficacy panels. Retention pruning is not yet automatic (planned for a follow-up); manual cleanup via DELETE FROM activity_events WHERE occurred_at < ? is safe — the dashboard tolerates missing history.

Assets 3

28 Apr 20:11

codenamev

Immutable

v0.9.1

2e151dc

v0.9.1 — MCP JSON-RPC notifications fix

Fixed

MCP server now conforms to JSON-RPC 2.0: notifications (messages without an id) never receive a response. Previously, notifications/initialized — which Claude Code sends after every handshake — triggered a spurious Method not found error frame, causing strict MCP clients to mark the server failed on /mcp reconnect after the initial connection.

Assets 3

16 Apr 17:41

codenamev

Immutable

v0.9.0

6509bd3

v0.9.0 — Predicate Design Overhaul, Reject/Restore, Telemetry

Highlights

Predicate vocabulary overhaul — curated from 13 → 8 predicates based on a multi-project survey of real memory databases. uses_framework reclassified as multi-value (fixing silent data loss in production). PredicatePolicy is now the single source of truth for vocabulary, snapshot sections, synonym canonicalization, and LLM guidance.

New commands: reject and restore — first-class tools for managing distiller quality. Mark hallucinated facts as wrong, or recover facts that were superseded by an obsolete classification.

MCP tool-call telemetry — every tool invocation is timed and recorded. claude-memory stats --tools shows call counts, latency percentiles, and error rates.

Proactive memory recall — MCP instructions now direct Claude to check conventions before code generation, architecture before explanations, and decisions before refactoring. A/B testing showed this produces 76-line accurate architecture explanations vs honest refusals without memory.

Added

claude-memory reject <id_or_docid> command + memory.reject_fact MCP tool — explicitly mark distiller hallucinations as wrong, closing associated conflicts
claude-memory restore --predicate NAME command — recover facts superseded by obsolete single-value predicate classifications (Jaccard-based token overlap heuristic)
MCP tool-call telemetry: mcp_tool_calls table, claude-memory stats --tools [--since DAYS], 90-day retention via Sweep
CLAUDE_CONFIG_DIR env var support for non-standard Claude Code config locations
Predicate synonym canonicalization at insert time (has_convention → convention, primary_language → uses_language)
Novel predicate warnings at insert time
NullDistiller emits uses_language facts for detected language entities
Proactive memory recall guidance in MCP server instructions
YARD documentation across 13 core source files (+473 lines)

Changed

uses_framework reclassified as multi-value — real projects use multiple frameworks (Rails + Turbo + Tailwind). Prior single-value classification silently superseded valid facts. Run claude-memory restore --predicate uses_framework to recover
PredicatePolicy is single source of truth for vocabulary, snapshot sections, synonym canonicalization, and LLM guidance
Predicate vocabulary curated 13 → 8 based on multi-project usage data
Registry::COMMANDS stores {class:, description:} with direct class references
Plugin and gem descriptions rewritten to be outcome-focused

Fixed

StatsCommand broken in production — used Sequel.sqlite (requires unlisted sqlite3 gem). Now uses extralite adapter
Missing embeddings command in shell completion output

Upgrade Notes

Schema: v12 → v14 (automatic). Migration 013 adds mcp_tool_calls. Migration 014 canonicalizes stale predicate names in existing facts.

Action required for uses_framework recovery: If your project uses multiple frameworks, past sessions may have superseded valid facts:

claude-memory restore --predicate uses_framework --dry-run   # preview
claude-memory restore --predicate uses_framework              # restore

Pruned predicates still work: preference, workflow, dependency, testing_strategy, tool_usage, ci_platform fall through to default multi-value policy. Existing facts are unaffected.

Full Changelog: v0.8.0...v0.9.0

Assets 3

30 Mar 17:58

codenamev

Immutable

v0.8.0

c909035

v0.8.0

Added

Three-Layer Distillation Pipeline

Automatic distillation via NullDistiller in ingest pipeline (Layer 1: regex-based, P95 < 5ms)
Context hook injection for LLM-based extraction at SessionStart (Layer 2: Claude Code as distiller, zero extra cost)
/distill-transcripts skill for manual deep extraction (Layer 3: on-demand, depth-aware prompts)
memory.undistilled and memory.mark_distilled MCP tools for distillation tracking
Hook::DistillationRunner extracted from Handler for context hook injection
TaskCompleted and TeammateIdle hook events for ingest triggers
Distillation metrics backfill on database initialization
Doctor check for undistilled content
Pending distillation count in memory.status output

Recall Enhancements

Intent parameter for recall query disambiguation (#3)
Retrieval score traces for semantic search (#5)
Configurable embedding providers with dimension checking

Hook Enhancements

statusMessage on all hooks for descriptive spinner text during hook execution
StopFailure hook to capture transcript data even on session errors (rate limits, server errors)
Notification hook with idle_prompt matcher for opportunistic sweep during idle

New Commands & Skills

install-skill command and memory-recall agent (#8, #12)
Shell completion command for bash and zsh (#18)

Distillation Benchmark Results

NullDistiller: Concept Recall 0.952, Fact Precision/Recall 1.000 (31 test cases)
Claude Code LLM: Concept Recall 0.902 (all 41 cases), 0.900 on semantic cases (vs 0.333 for regex)
Average 1.6 facts stored per case across LLM extraction
E2E distillation recall benchmark and extraction quality benchmarks
Concept-based matching for distiller-agnostic benchmark comparison

Fixed

--allowedTools added to ClaudeCliRunner for MCP tool permissions
Test isolation for context hook when global database has facts

Internal

Extracted RetryHandler and SchemaManager modules from SQLiteStore
Extracted Recall into engine strategy pattern with DualEngine, LegacyEngine, and shared QueryCore
Extracted Tools god object into 6 handler modules
Added 36 specs for 5 previously untested files
All 3 god objects eliminated, 0 files over 500 lines

Assets 3

30 Mar 17:47

codenamev

Immutable

v0.7.1

eb7aee5

v0.7.1

Added

Three-Level Sweep Escalation

Maintenance class with light/standard/deep sweep levels for progressive database maintenance
Exposed sweep escalation via memory.sweep_now MCP tool with configurable level
Tool escalation workflow added to MCP QueryGuide documentation

Embedding Deduplication

Content-addressed deduplication for embeddings using SHA256 hashing
Deduplication before vector scoring in fallback path to prevent duplicate results

MCP Enhancements

Structured error classification for MCP tools via ErrorClassifier module
Dynamic knowledge summary in MCP server instructions via InstructionsBuilder

Fixed

Plugin hook loading error: Removed explicit hooks reference from plugin.json manifest — Claude Code auto-loads hooks/hooks.json from the plugin root, so declaring it caused "Duplicate hooks file detected" errors on plugin install

Internal

Influence study: lossless-claw v0.3.0 DAG-based lossless context management
Marked 7 improvements as implemented (#10, #11, #14, #15, #16, #19, #20)

Assets 3

13 Mar 13:22

codenamev

Immutable

v0.7.0

d5b1ff8

v0.7.0

ClaudeMemory v0.7.0

Added

FTS5 Contentless Mode

FTS5 tables now created with content='' for ~40% smaller databases
Auto-detection: both legacy and contentless formats work seamlessly
compact command rebuilds FTS index to contentless format
stats command reports FTS format and optimization hints

Worktree-Aware Project Paths

Project database now resolves to main repository root across git worktrees
Prevents duplicate project databases when using git worktree
Opt-out: set CLAUDE_MEMORY_ISOLATE_WORKTREES=1 for per-worktree isolation

MCP Enhancements

Tool annotations: readOnlyHint, idempotentHint, destructiveHint on all 21 tools
Stdout protection: MCP server redirects $stdout to $stderr to prevent protocol corruption from accidental puts/print calls
Self-excluding agent conversations via SELF_CONTEXT_MARKER to prevent meta-pollution

New Commands

git-lfs command for setting up git-lfs tracking of project memory databases

Fixed

Narrowed rescue clauses in discover_other_projects (was bare rescue, now catches specific exceptions)
FTS entries now cleaned up when content is pruned by sweeper (prevents orphaned index entries)
FTS index rebuilt during compact for consistent state after upgrades
Real evals CI: install gem and use correct release API

Internal

Resolver refactored for better thread safety (parameters instead of instance variables)
SnippetExtractor DRY refactoring
StoreManager.promote_fact single-transaction safety
Influence study: QMD v2.0.1 SDK-first architecture analysis

22 CLI commands · 21 MCP tools · 1,435 tests

Assets 3

06 Mar 15:56

codenamev

Immutable

v0.6.0

7992f27

v0.6.0

What's New

Native Vector Storage (sqlite-vec)

Integrated sqlite-vec for native KNN vector search
- VectorIndex class with vec0 virtual table for cosine similarity search
- Dual-write: embeddings stored in both JSON column and vec0 index
- claude-memory index --vec flag for backfilling existing embeddings into vec0
- Fast path in Recall uses sqlite-vec KNN when available, falls back to JSON + Ruby
- Sweeper cleans up vec0 entries for superseded/expired facts
- Doctor and MCP status/stats report vec0 availability and coverage
- Cross-platform support with platform-specific gem installation

Database Maintenance

compact command for database maintenance (VACUUM + integrity check)
export command for fact backup and migration to JSON

Hook Enhancements

SessionStart context injection via hookSpecificOutput.additionalContext
- Injects recent facts and project context at session start
Tool-specific observation compression for reduced token usage
--async flag for non-blocking hook execution
Hook error classification for graceful degradation
Conversation exclusion markers for session-level opt-out

MCP Discovery

memory.list_projects MCP tool for discovering all project databases

Developer Experience

Dynamic MCP server instructions with progressive disclosure documentation
Comparative benchmark suite with QMD and grepai adapters

Bug Fixes

Recall returned no results: DualQueryTemplate accessed stores before initializing them, causing all recall queries to silently return empty results. Refactored to use existing store_for_scope method.
Doctor crashed on sqlite-vec tables: SchemaValidator iterated all tables including vec0 virtual tables, which require the sqlite-vec extension. Now skips facts_vec* tables using prefix match.
Forward-migrated databases: Older gem versions now gracefully handle databases migrated by newer versions instead of crashing.
Hybrid retrieval ordering: Preserved BM25 scores and RRF ordering in hybrid search results instead of re-sorting by source/time.

Stats

21 MCP tools, 22 CLI commands
1316 test examples, 0 failures
Full changelog: CHANGELOG.md

Assets 3

Releases: codenamev/claude_memory

v0.12.1 — Upgrade-Experience Patches (setup-vectors, doctor EmbeddingsCheck, plugin manifest fix)

Added

Fixed

Documentation

Upgrade Notes

🧪 Real Eval Validation

Uh oh!

v0.12.0 — Release Discipline, Observability, Self-Audit

Added

Deferred to 0.13

Upgrade Notes

🧪 Real Eval Validation

Uh oh!

v0.11.0 — Trust & Cost: Token Budget, Quality Score, ROI Nudge, Show, Harm Prototype

Added

Changed

Fixed

Upgrade Notes

🧪 Real Eval Validation

Uh oh!

v0.10.0 — Dashboard, Observability, Memory Quality

Added

Changed

Fixed

Upgrade Notes

Uh oh!

v0.9.1 — MCP JSON-RPC notifications fix

Fixed

Uh oh!

v0.9.0 — Predicate Design Overhaul, Reject/Restore, Telemetry

Highlights

Added

Changed

Fixed

Upgrade Notes

Uh oh!

v0.8.0

Added

Fixed

Internal

Uh oh!

v0.7.1

Added

Fixed

Internal

Uh oh!

v0.7.0

ClaudeMemory v0.7.0

Added

Fixed

Internal

Uh oh!

v0.6.0

What's New

Native Vector Storage (sqlite-vec)

Database Maintenance

Hook Enhancements

MCP Discovery

Developer Experience

Bug Fixes

Stats

Uh oh!