Skip to content

fix: khive-bm25 forward_index not serialized — O(|V|) remove perf cliff #307

@ohdearquant

Description

@ohdearquant

Problem

forward_index is #[serde(skip)] in Bm25Index. After deserialization, remove_document falls back to O(|V|) full scan of all posting lists instead of the O(terms_in_doc) path.

Impact

Performance trap: remove on a deserialized index with 50K unique terms is ~10,000x slower than on a freshly-built index.

Fix Options

  1. Serialize forward_index (increases snapshot size)
  2. Add ensure_forward_index(&mut self) that rebuilds from inverted index (matches existing ensure_doc_lengths_vec pattern)
  3. Document the performance cliff in remove_document docstring

Option 2 is recommended — consistent with existing patterns.

Source

Critic review of PR #301.

Files

  • crates/khive-bm25/src/index/mod.rs:463-464
  • crates/khive-bm25/src/index/indexing.rs:178-189

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions