Skip to content

vector-store: implement FTS index actor with Tantivy#470

Draft
knowack1 wants to merge 3 commits into
scylladb:masterfrom
knowack1:vector-617-implement-bm25-search-execution
Draft

vector-store: implement FTS index actor with Tantivy#470
knowack1 wants to merge 3 commits into
scylladb:masterfrom
knowack1:vector-617-implement-bm25-search-execution

Conversation

@knowack1

@knowack1 knowack1 commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Closes: VECTOR-617

knowack1 added 3 commits May 28, 2026 10:26
- Add fts_index module with FtsIndex actor and FtsIndexExt trait
- Rename index/ directory to vs_index/
- Rename vs_index::Index to VsIndex for clarity alongside FtsIndex
- Update monitor_items to match-dispatch on Index variants directly
- Update httproutes to pattern-match on Index::Vs for ANN queries
The FTS actor now tracks document count: AddDocument increments,
RemoveDocument decrements (saturating), RemovePartition resets to zero,
and Count responds with the current value.

An integration test asserts that the FTS index correctly counts documents
ingested during fullscan. The db_basic mock is fixed to derive DbIndexKind
from IndexMetadata.kind and handle GetIndexParams for non-VS indexes.
Replace the stub FTS index actor with a real Tantivy 0.26.1-backed
implementation using per-partition indexes (mirroring usearch pattern).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial full-text search (FTS) index support by introducing a Tantivy-backed FTS index actor and wiring it into the engine and HTTP status reporting. It also refactors the existing vector-search (VS) index internals into a dedicated vs_index module and adds an integration test for FTS index readiness/count.

Changes:

  • Add Tantivy-based FTS index actor (fts_index) and integrate it into engine/index plumbing.
  • Refactor VS index actor/message types (IndexVsIndex) and move VS internals under vs_index.
  • Add an integration test ensuring an FTS index reaches Serving with the expected document count.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/vector-store/tests/integration/main.rs Registers new FTS integration test module.
crates/vector-store/tests/integration/fts.rs Adds integration test asserting FTS index count via /status.
crates/vector-store/tests/integration/db_basic.rs Extends test DB mock to support FTS rows and index kind mapping.
crates/vector-store/src/vs_index/validator.rs Adds embedding dimension validation helper + unit tests.
crates/vector-store/src/vs_index/usearch.rs Updates VS index actor message type to VsIndex and adjusts imports/traits.
crates/vector-store/src/vs_index/opensearch.rs Same VsIndex message-type refactor for OpenSearch backend.
crates/vector-store/src/vs_index/mod.rs Exposes VsIndex instead of old Index type.
crates/vector-store/src/vs_index/factory.rs Updates factory trait to return Sender<VsIndex>.
crates/vector-store/src/vs_index/actor.rs Renames actor message enum to VsIndex and updates IndexExt.
crates/vector-store/src/monitor_items.rs Routes DB changes to either VS or FTS index actor based on index kind.
crates/vector-store/src/lib.rs Wires in new fts_index/vs_index modules and updates factory exports.
crates/vector-store/src/indexes.rs Stores a unified Index adapter (VS or FTS) in IndexEntry.
crates/vector-store/src/index.rs Introduces Index adapter enum wrapping VS/FTS senders.
crates/vector-store/src/httproutes.rs Makes index status/metrics counting work for both VS and FTS.
crates/vector-store/src/fts_index/tantivy.rs Implements Tantivy-backed per-partition FTS actor + unit tests.
crates/vector-store/src/fts_index/mod.rs Adds FTS module wiring and re-exports.
crates/vector-store/src/fts_index/actor.rs Defines FTS actor messages and FtsIndexExt trait helpers.
crates/vector-store/src/engine.rs Creates either VS or FTS index actors based on IndexKind.
crates/vector-store/Cargo.toml Adds tantivy dependency to vector-store crate.
Cargo.toml Adds workspace dependency pin for Tantivy.
Cargo.lock Locks Tantivy and its transitive dependencies.

Comment on lines +67 to +83
fn handle_add_document(partition: &mut PartitionState, primary_id: PrimaryId, document: String) {
let primary_id_field = partition.schema.get_field("primary_id").unwrap();
let body_field = partition.schema.get_field("body").unwrap();

let mut doc = TantivyDocument::new();
doc.add_u64(primary_id_field, u64::from(primary_id));
doc.add_text(body_field, &document);
let _ = partition.writer.add_document(doc);
if let Err(err) = partition.writer.commit() {
error!(
"fts: failed to commit add for {:?}: {err}",
partition.partition_id
);
return;
}
partition.size.fetch_add(1, Ordering::Relaxed);
}
Comment on lines +85 to +97
fn handle_remove_document(partition: &mut PartitionState, primary_id: PrimaryId) {
let primary_id_field = partition.schema.get_field("primary_id").unwrap();
let term = tantivy::Term::from_field_u64(primary_id_field, u64::from(primary_id));
partition.writer.delete_term(term);
if let Err(err) = partition.writer.commit() {
error!(
"fts: failed to commit remove for {:?}: {err}",
partition.partition_id
);
return;
}
partition.size.fetch_sub(1, Ordering::Relaxed);
}
Comment on lines +180 to +191
FtsIndex::AddDocument {
partition_id,
primary_id,
document,
..
} => {
let partition = partitions.entry(partition_id).or_insert_with(|| {
PartitionState::new(partition_id)
.expect("fts: failed to create partition state")
});
handle_add_document(partition, primary_id, document);
}
Comment on lines +340 to +343
let count_result = match &index {
Index::Vs(sender) => sender.count(index_key).await,
Index::Fts(sender) => sender.count(index_key).await,
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants