Skip to content

Non-384-dim fastembed model fails on existing DB (vec0 dimension fixed at 384); plus recall-breaking corrupt FTS5 rank path undetected by doctor #7

@codenamev

Description

@codenamev

Summary

Switching from the default tfidf backend to a fastembed model whose dimension is not 384 (e.g. BAAI/bge-base-en-v1.5 at 768) fails on an existing database. Indexing aborts with a vec0 dimension mismatch, and the built-in dimension-change handling does not recover. I also hit a separate corrupt-FTS5 failure mode during recall that the health checks do not detect. Details and a working manual recovery below.

Environment

  • claude_memory 0.12.1 (MCP plugin install)
  • Ruby 4.0.1, fastembed 1.1.0, sqlite-vec 0.1.9, extralite 2.14
  • macOS (arm64, Apple Silicon)
  • Existing project DB previously on the default tfidf provider, vec_indexed = 0 (no embeddings yet)

Finding 1: non-384 fastembed model cannot be adopted on an existing DB

Repro

  1. Existing DB created under the default config (tfidf, 384). embedding_dimensions meta is unset.
  2. Configure a 768-dim model:
    export CLAUDE_MEMORY_EMBEDDING_PROVIDER=fastembed
    export CLAUDE_MEMORY_EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
    
  3. claude-memory embeddings check reports all OK (model resolves to 768-dim).
  4. claude-memory index --provider fastembed --scope all --force

Result

Failed: Extralite::SQLError: Dimension mismatch for inserted vector for the
"embedding" column. Expected 384 dimensions but received 768.
  .../claude_memory/index/vector_index.rb:160 in insert_embedding
  .../claude_memory/commands/index_command.rb:160 in process_batch

Root cause

The facts_vec vec0 virtual table is created with a fixed width read from the embedding_dimensions meta:

  • VectorIndex#initialize sets @dimensions = store.get_meta("embedding_dimensions")&.to_i || DEFAULT_DIMENSIONS (384).
  • ensure_vec_table! runs CREATE VIRTUAL TABLE IF NOT EXISTS facts_vec USING vec0(... embedding float[#{@dimensions}] ...).

But embedding_dimensions is only written at the end of a successful index run (IndexCommand#process_batch -> store.set_meta("embedding_dimensions", generator.dimensions.to_s)). On a fresh or tfidf DB the meta is unset, so the table is created at 384, and the first 768-dim insert fails. This is a chicken-and-egg: the dimension is only recorded after the run that the wrong dimension already broke.

IndexCommand#handle_dimension_mismatch does not help here either. DimensionCheck returns :fresh when the meta is unset (not :mismatch), so the stale-clearing path is skipped. And even on a true :mismatch, clear_stale_embeddings only nulls embedding_json / vec_indexed_at and calls vector_index.clear!. It never drops/recreates the facts_vec table, whose dimension is immutable once created. So a genuine 384 -> 768 change would also leave a 384-wide table in place and fail the same way.

Manual recovery that worked

Per scope (global + project):

  1. Set the meta to the target dim before indexing: store.set_meta("embedding_dimensions", "768") (and embedding_provider).
  2. DROP TABLE IF EXISTS facts_vec with the sqlite-vec extension loaded so the virtual table destructor runs.
  3. claude-memory index --provider fastembed --scope all --force then recreates facts_vec at 768 and indexes all active facts successfully.

Suggested fix

When the resolved provider's dimension differs from the table's actual dimension (or the meta is unset), drop and recreate facts_vec at the new dimension as part of indexing. Two concrete options:

  • Have DimensionCheck treat :fresh + a non-default provider dimension as needing a (re)create, and make clear_stale_embeddings (or a new VectorIndex#recreate!) actually DROP TABLE facts_vec so ensure_vec_table! rebuilds it at the current dimension.
  • Or write the embedding_dimensions meta from the resolved generator before the first insert, and recreate the table whenever the meta changes.

Finding 2: recall crashes on a corrupt contentless FTS5 index, undetected by health checks

After indexing succeeded, claude-memory recall "..." failed with:

Extralite::Error: database disk image is malformed
  .../claude_memory/index/lexical_fts.rb:42 in LexicalFTS#search

What is notable is how narrow the failure is. All of these passed on the same DB:

  • PRAGMA integrity_check -> ok
  • FTS5 INSERT INTO content_fts(content_fts) VALUES('integrity-check') -> no error
  • SELECT count(*) FROM content_fts -> fine
  • SELECT rowid FROM content_fts WHERE text MATCH 'example' LIMIT 3 -> returns rows

Only the BM25 ranking path fails:

  • SELECT rowid FROM content_fts WHERE text MATCH 'example' ORDER BY rank LIMIT 3 -> database disk image is malformed

That is exactly what LexicalFTS#search and #search_with_ranks issue (... text MATCH ? ORDER BY rank ...), so recall is fully broken even though claude-memory doctor reports "Schema health: healthy" and the embedding/vector side is fine. The table is the contentless fts5(text, content='', tokenize='porter unicode61').

Rebuilding the FTS index fixed it (LexicalFTS#rebuild!, which is what claude-memory compact runs). After rebuild, MATCH ... ORDER BY rank works and recall returns ranked results.

I could not determine the original cause of the corruption (it predated the embedding change, since content_fts is never touched by the index command). Two suggestions regardless:

  • doctor / the schema checks should exercise a MATCH ... ORDER BY rank probe, since PRAGMA integrity_check and the FTS5 integrity-check both miss this and it takes recall down entirely.
  • Consider catching this specific malformed-on-rank failure in LexicalFTS#search and surfacing a "run claude-memory compact" hint rather than an unhandled stacktrace.

Minor: leaked serve-mcp processes

Separately, I found 16 claude-memory serve-mcp processes accumulated over ~3 weeks (oldest 20 days), several no longer attached to any live session, all holding SQLite connections. Not the cause of the above (the failures reproduce single-connection), but the lack of cleanup of old MCP server processes is worth a look.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions