Problem
forward_index is #[serde(skip)] in Bm25Index. After deserialization, remove_document falls back to O(|V|) full scan of all posting lists instead of the O(terms_in_doc) path.
Impact
Performance trap: remove on a deserialized index with 50K unique terms is ~10,000x slower than on a freshly-built index.
Fix Options
- Serialize
forward_index (increases snapshot size)
- Add
ensure_forward_index(&mut self) that rebuilds from inverted index (matches existing ensure_doc_lengths_vec pattern)
- Document the performance cliff in
remove_document docstring
Option 2 is recommended — consistent with existing patterns.
Source
Critic review of PR #301.
Files
crates/khive-bm25/src/index/mod.rs:463-464
crates/khive-bm25/src/index/indexing.rs:178-189
Problem
forward_indexis#[serde(skip)]inBm25Index. After deserialization,remove_documentfalls back to O(|V|) full scan of all posting lists instead of the O(terms_in_doc) path.Impact
Performance trap: remove on a deserialized index with 50K unique terms is ~10,000x slower than on a freshly-built index.
Fix Options
forward_index(increases snapshot size)ensure_forward_index(&mut self)that rebuilds from inverted index (matches existingensure_doc_lengths_vecpattern)remove_documentdocstringOption 2 is recommended — consistent with existing patterns.
Source
Critic review of PR #301.
Files
crates/khive-bm25/src/index/mod.rs:463-464crates/khive-bm25/src/index/indexing.rs:178-189