fix: include vectors in backup and fix Float32Array serialization#145
fix: include vectors in backup and fix Float32Array serialization#145baizenghu wants to merge 2 commits intoCortexReach:mainfrom
Conversation
…up JSONL Backup JSONL now contains vector data so restores work without re-embedding (critical for intranet environments where embedding API may be unavailable). Backup files include a _meta header line with model/dimensions/timestamp. list() gains an optional includeVectors parameter (default false) to keep existing callers unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LanceDB returns Float32Array (not plain Array), so Array.isArray() returns false. Use Array.from() to correctly convert typed arrays when includeVectors is true. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review: fix: include vectors in backup and fix Float32Array serializationVerdict: Fix-then-merge — the same missing test file issue blocks this one too. ✅ What's working
🔴 BlockingNo test file.
Suggested // Simulate Float32Array as LanceDB would return
const fa = new Float32Array([0.1, 0.2, 0.3]);
assert.ok(Array.isArray(Array.from(fa))); // Array.from works
assert.ok(!Array.isArray(fa)); // sanity: raw Float32Array is not Array
assert.equal(Array.from(fa).length, 3);
// Null guard
const result = (null && undefined) ? Array.from(null) : [];
assert.deepEqual(result, []);Wire it into 🟡 Suggested before mergeBackup size can grow dramatically without warning. At 1024 dimensions (float32), each memory entry adds ~4 KB of vector data. 1000 entries → backup grows from ~200 KB to ~4 MB; at the 10,000-entry cap → ~40 MB per file, 7 rotations → ~280 MB total. The success log already prints the entry count — adding a size estimate would help operators catch unexpected growth: const backupBytes = Buffer.byteLength(metaLine + "\n" + dataLines.join("\n") + "\n");
api.logger.debug(`backup: ${allMemories.length} entries, ${(backupBytes/1024/1024).toFixed(1)} MB → ${backupFile}`);⚪ Non-blocking
Clean fix for a real offline-restore problem. The blocking issue is solely the missing test coverage — the logic itself is correct. |
|
I'll analyze this and get back to you. |
|
@codex review |

Problem
The
memory_backuptool produces JSONL files with empty vector arrays ("vector": []). When restoring from backup, all memories must be re-embedded, which is slow and costly — especially for large memory stores.Root Cause
LanceDB returns vectors as
Float32Array(a typed array), but the backup code usedArray.isArray()to check vectors before serialization. SinceArray.isArray(new Float32Array(...))returnsfalse, vectors silently fell through to the empty array default.Solution
1. Add
includeVectorsoption tostore.list()(src/store.ts)includeVectors = false(backward compatible)true, adds "vector" to the LanceDB select columnsFloat32Arrayto plain array viaArray.from()for proper JSON serialization2. Include vectors in backup with metadata header (
index.ts)store.list(..., true)to include vectorsBug Fix Details
Testing
Verified backup now produces correct vector data (non-empty arrays with proper float values). Restore with matching embedding model skips re-embedding entirely.