Fix HNSW boot-integrity check: compare element count, not label_map size#27
Conversation
_repopulate_label_map_from_sqlite() runs immediately before _initialize_hnsw_index(), so len(_label_map) always equals sqlite_count and the integrity check never fired. The HNSW file could have fewer elements (e.g. 1 vs 2 in SQLite) causing knn_query(k=2) on a 1-element index to raise RuntimeError, breaking every insert via the pattern separation gate. Fix: compare self._hnsw.get_current_count() (actual file elements) to sqlite_count instead of len(_label_map). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Heads up: the red build & test (macOS) here is pre-existing on This PR's branch differs from
The same job on the Release v1.2.0 push to Happy to send a separate PR fixing the macOS-CI issues (the 🤖 via Claude Code |
The bug
HippoDB._initialize_hnsw_indexdecides whether to rebuild the HNSW index by comparing the in-memory_label_mapsize against the SQLite row count:But
_label_mapis repopulated from SQLite (_repopulate_label_map_from_sqlite()) before this check runs, solen(self._label_map)already equalssqlite_count— the condition is effectively always false and the rebuild never fires.When the on-disk HNSW file is stale relative to SQLite (e.g. an interrupted write), the daemon boots with the HNSW index holding fewer elements than the label map / SQLite believe exist. The integrity check that's supposed to catch and repair this silently passes. A subsequent
knn_query(k=2)against a 1-element index then raisesRuntimeErroron every insert.Impact (observed on Windows)
This blocks the deferred-capture drain completely: with HNSW count=1 but
_label_map/SQLite count=2, every capture insert crashes, so a backlog of session files never drains. After the fix, the same store drains cleanly (a 66-file / 322-turn backlog went from N=2 to draining normally).The fix
Compare the actual HNSW element count against SQLite, so a real divergence triggers the rebuild:
4 lines, no behavioral change when the index is already consistent. Surfaced while validating the new Windows support (v1.2.0) end-to-end.
🤖 Generated with Claude Code