vector-store: implement FTS index actor with Tantivy#470
Draft
knowack1 wants to merge 3 commits into
Draft
Conversation
- Add fts_index module with FtsIndex actor and FtsIndexExt trait - Rename index/ directory to vs_index/ - Rename vs_index::Index to VsIndex for clarity alongside FtsIndex - Update monitor_items to match-dispatch on Index variants directly - Update httproutes to pattern-match on Index::Vs for ANN queries
The FTS actor now tracks document count: AddDocument increments, RemoveDocument decrements (saturating), RemovePartition resets to zero, and Count responds with the current value. An integration test asserts that the FTS index correctly counts documents ingested during fullscan. The db_basic mock is fixed to derive DbIndexKind from IndexMetadata.kind and handle GetIndexParams for non-VS indexes.
Replace the stub FTS index actor with a real Tantivy 0.26.1-backed implementation using per-partition indexes (mirroring usearch pattern).
There was a problem hiding this comment.
Pull request overview
This PR adds initial full-text search (FTS) index support by introducing a Tantivy-backed FTS index actor and wiring it into the engine and HTTP status reporting. It also refactors the existing vector-search (VS) index internals into a dedicated vs_index module and adds an integration test for FTS index readiness/count.
Changes:
- Add Tantivy-based FTS index actor (
fts_index) and integrate it into engine/index plumbing. - Refactor VS index actor/message types (
Index→VsIndex) and move VS internals undervs_index. - Add an integration test ensuring an FTS index reaches
Servingwith the expected document count.
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/vector-store/tests/integration/main.rs | Registers new FTS integration test module. |
| crates/vector-store/tests/integration/fts.rs | Adds integration test asserting FTS index count via /status. |
| crates/vector-store/tests/integration/db_basic.rs | Extends test DB mock to support FTS rows and index kind mapping. |
| crates/vector-store/src/vs_index/validator.rs | Adds embedding dimension validation helper + unit tests. |
| crates/vector-store/src/vs_index/usearch.rs | Updates VS index actor message type to VsIndex and adjusts imports/traits. |
| crates/vector-store/src/vs_index/opensearch.rs | Same VsIndex message-type refactor for OpenSearch backend. |
| crates/vector-store/src/vs_index/mod.rs | Exposes VsIndex instead of old Index type. |
| crates/vector-store/src/vs_index/factory.rs | Updates factory trait to return Sender<VsIndex>. |
| crates/vector-store/src/vs_index/actor.rs | Renames actor message enum to VsIndex and updates IndexExt. |
| crates/vector-store/src/monitor_items.rs | Routes DB changes to either VS or FTS index actor based on index kind. |
| crates/vector-store/src/lib.rs | Wires in new fts_index/vs_index modules and updates factory exports. |
| crates/vector-store/src/indexes.rs | Stores a unified Index adapter (VS or FTS) in IndexEntry. |
| crates/vector-store/src/index.rs | Introduces Index adapter enum wrapping VS/FTS senders. |
| crates/vector-store/src/httproutes.rs | Makes index status/metrics counting work for both VS and FTS. |
| crates/vector-store/src/fts_index/tantivy.rs | Implements Tantivy-backed per-partition FTS actor + unit tests. |
| crates/vector-store/src/fts_index/mod.rs | Adds FTS module wiring and re-exports. |
| crates/vector-store/src/fts_index/actor.rs | Defines FTS actor messages and FtsIndexExt trait helpers. |
| crates/vector-store/src/engine.rs | Creates either VS or FTS index actors based on IndexKind. |
| crates/vector-store/Cargo.toml | Adds tantivy dependency to vector-store crate. |
| Cargo.toml | Adds workspace dependency pin for Tantivy. |
| Cargo.lock | Locks Tantivy and its transitive dependencies. |
Comment on lines
+67
to
+83
| fn handle_add_document(partition: &mut PartitionState, primary_id: PrimaryId, document: String) { | ||
| let primary_id_field = partition.schema.get_field("primary_id").unwrap(); | ||
| let body_field = partition.schema.get_field("body").unwrap(); | ||
|
|
||
| let mut doc = TantivyDocument::new(); | ||
| doc.add_u64(primary_id_field, u64::from(primary_id)); | ||
| doc.add_text(body_field, &document); | ||
| let _ = partition.writer.add_document(doc); | ||
| if let Err(err) = partition.writer.commit() { | ||
| error!( | ||
| "fts: failed to commit add for {:?}: {err}", | ||
| partition.partition_id | ||
| ); | ||
| return; | ||
| } | ||
| partition.size.fetch_add(1, Ordering::Relaxed); | ||
| } |
Comment on lines
+85
to
+97
| fn handle_remove_document(partition: &mut PartitionState, primary_id: PrimaryId) { | ||
| let primary_id_field = partition.schema.get_field("primary_id").unwrap(); | ||
| let term = tantivy::Term::from_field_u64(primary_id_field, u64::from(primary_id)); | ||
| partition.writer.delete_term(term); | ||
| if let Err(err) = partition.writer.commit() { | ||
| error!( | ||
| "fts: failed to commit remove for {:?}: {err}", | ||
| partition.partition_id | ||
| ); | ||
| return; | ||
| } | ||
| partition.size.fetch_sub(1, Ordering::Relaxed); | ||
| } |
Comment on lines
+180
to
+191
| FtsIndex::AddDocument { | ||
| partition_id, | ||
| primary_id, | ||
| document, | ||
| .. | ||
| } => { | ||
| let partition = partitions.entry(partition_id).or_insert_with(|| { | ||
| PartitionState::new(partition_id) | ||
| .expect("fts: failed to create partition state") | ||
| }); | ||
| handle_add_document(partition, primary_id, document); | ||
| } |
Comment on lines
+340
to
+343
| let count_result = match &index { | ||
| Index::Vs(sender) => sender.count(index_key).await, | ||
| Index::Fts(sender) => sender.count(index_key).await, | ||
| }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: VECTOR-617