Skip to content

vector-store: introduce fts index module with document counting#462

Open
knowack1 wants to merge 4 commits into
scylladb:masterfrom
knowack1:vector-621-index-adapter-abstraction
Open

vector-store: introduce fts index module with document counting#462
knowack1 wants to merge 4 commits into
scylladb:masterfrom
knowack1:vector-621-index-adapter-abstraction

Conversation

@knowack1

@knowack1 knowack1 commented May 25, 2026

Copy link
Copy Markdown
Collaborator

This PR introduces the full-text-search (FTS) index infrastructure alongside the existing vector-search index. The core change is a new Index enum that dispatches operations to either a VsIndex or FtsIndex actor through their respective mpsc::Sender channels. Callers match on the enum variant directly rather than going through an adapter with methods, keeping the dispatch explicit at each call site.

The previous index/ directory is renamed to vs_index/ and its actor enum is renamed from Index to VsIndex so both index types follow a consistent naming convention. A new fts_index module provides the FtsIndex actor with document add/remove/count operations and an extension trait (FtsIndexExt) mirroring the existing IndexExt pattern.

VECTOR-621

@knowack1 knowack1 changed the title Vector 621 index adapter abstraction vector-search: add IndexAdapter abstraction for VS and FTS indexes May 25, 2026
@knowack1 knowack1 force-pushed the vector-621-index-adapter-abstraction branch 2 times, most recently from 06f71bb to 3728664 Compare May 27, 2026 06:05
@knowack1 knowack1 changed the title vector-search: add IndexAdapter abstraction for VS and FTS indexes vector-store: introduce fts index module with document counting May 27, 2026
@knowack1 knowack1 changed the title vector-store: introduce fts index module with document counting vector-store: introduce fts index module May 27, 2026
@knowack1 knowack1 changed the title vector-store: introduce fts index module vector-store: introduce fts index module with document counting May 27, 2026
@knowack1 knowack1 force-pushed the vector-621-index-adapter-abstraction branch 3 times, most recently from 84cf87f to be774ee Compare May 30, 2026 04:28
@knowack1 knowack1 requested a review from Copilot May 30, 2026 04:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an FTS (full-text-search) index module alongside the existing vector-search index, and adds an Index enum dispatcher so call sites explicitly handle both variants. The previous index/ module is renamed to vs_index/ (with enum IndexVsIndex) for naming symmetry with the new fts_index/.

Changes:

  • New fts_index actor module providing FtsIndex messages, an FtsIndexExt trait, and a minimal in-memory document counter; FTS branch wired into engine::add_index, monitor_items, and HTTP status/metrics routes.
  • New Index enum in src/index.rs (Vs(Sender<VsIndex>) / Fts(Sender<FtsIndex>)) with From impls; IndexEntry and engine get_index now return/store Index rather than a raw vector-search sender.
  • Module rename index/vs_index/ (enum renamed to VsIndex), with all imports and exports updated, plus a new integration test tests/integration/fts.rs and scan_fn_documents helper.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/vector-store/src/fts_index/mod.rs New FTS actor with add/remove/count messages and naive in-memory counter.
crates/vector-store/src/index.rs New Index enum wrapping VS/FTS senders with From conversions.
crates/vector-store/src/engine.rs Removes FTS not-implemented guard; adds create_index dispatch on IndexKind.
crates/vector-store/src/monitor_items.rs Routes add/remove/partition operations per index variant; updated tests.
crates/vector-store/src/httproutes.rs Calls count and ANN against the new Index enum; rejects ANN for FTS.
crates/vector-store/src/indexes.rs IndexEntry now holds Index and returns it by reference.
crates/vector-store/src/lib.rs Adds fts_index/vs_index modules and re-exports IndexFactory from vs_index.
crates/vector-store/src/vs_index/{mod,actor,factory,usearch,opensearch,validator}.rs Mechanical rename of index/vs_index/ and IndexVsIndex.
crates/vector-store/tests/integration/fts.rs New integration test asserting FTS index becomes Serving with count == 2.
crates/vector-store/tests/integration/db_basic.rs Factors scan_fn via make_scan_fn; adds scan_fn_documents; maps IndexKind to DbIndexKind.
crates/vector-store/tests/integration/main.rs Registers the new fts integration test module.
Comments suppressed due to low confidence (1)

crates/vector-store/src/vs_index/actor.rs:80

  • Naming inconsistency: the actor enum was renamed from Index to VsIndex, and the new FTS module exposes FtsIndex + FtsIndexExt, but the corresponding vector-search trait is still called IndexExt (re-exported as vs_index::IndexExt). For symmetry with FtsIndexExt and to match the renamed enum, consider renaming this trait to VsIndexExt.

Comment thread crates/vector-store/src/fts_index/mod.rs Outdated
@knowack1 knowack1 marked this pull request as ready for review May 30, 2026 04:41
@knowack1 knowack1 requested a review from ewienik May 30, 2026 05:03

@ewienik ewienik left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider splitting the first commit into such commits:

  • move index dir into vs_index (to preserve git history)
  • refactor vs_index names
  • create fts_index module (and inside mod.rs, actor.rs, factory.rs and tantivy.rs - similary to the vs_index module; in the future we could have other implementation of fts index)

Comment thread crates/vector-store/src/indexes.rs Outdated
Comment thread crates/vector-store/src/httproutes.rs Outdated
Comment thread crates/vector-store/src/httproutes.rs
Comment thread crates/vector-store/src/httproutes.rs Outdated
Comment thread crates/vector-store/src/httproutes.rs Outdated
Comment thread crates/vector-store/src/vs_index/factory.rs Outdated
Comment thread crates/vector-store/src/engine.rs Outdated
Comment thread crates/vector-store/src/fts_index/mod.rs Outdated
Comment thread crates/vector-store/src/fts_index/mod.rs Outdated
Comment thread crates/vector-store/src/fts_index/mod.rs Outdated
@knowack1 knowack1 force-pushed the vector-621-index-adapter-abstraction branch 2 times, most recently from def35b7 to 75c3f23 Compare June 8, 2026 11:02
@knowack1 knowack1 requested a review from Copilot June 8, 2026 11:02

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated no new comments.

@knowack1 knowack1 force-pushed the vector-621-index-adapter-abstraction branch from 75c3f23 to 28e4f83 Compare June 8, 2026 11:21
@knowack1 knowack1 requested a review from Copilot June 8, 2026 11:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.

Comment thread crates/vector-store/tests/integration/db_basic.rs
Comment thread crates/vector-store/src/monitor_items.rs
@knowack1 knowack1 requested a review from ewienik June 8, 2026 11:42
@knowack1

knowack1 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

@ewienik All comments has been addressed, please re-review latest changes.

knowack1 added 4 commits June 9, 2026 09:49
Rename the index/ directory to vs_index/ to make room for a parallel
full-text-search index module. This commit only moves the files and
updates the module path so git records the rename and preserves history;
type names are left unchanged and refactored in a follow-up commit.
Rename the vector-search index types so they carry a Vs prefix, mirroring
the naming used for the upcoming full-text-search module:

- Index        -> VsIndex
- IndexExt     -> VsIndexExt
- IndexFactory -> VsIndexFactory
- IndexConfiguration -> VsIndexConfiguration

This keeps both index implementations following a consistent naming
convention.
- Add fts_index module with FtsIndex actor, FtsIndexExt, and TantivyIndexFactory
- Introduce IndexDispatch trait in monitor_items for generic index dispatch
- Make monitor_items::new generic over IndexDispatch (VsIndex, FtsIndex)
- Split engine add_index into add_index_vs and add_index_fts via AddIndexContext
- Extend Indexes to store FTS entries alongside VS entries
- Update httproutes for FTS index status and count
The FTS actor now tracks document count: AddDocument increments,
RemoveDocument decrements (saturating), RemovePartition resets to zero,
and Count responds with the current value.

An integration test asserts that the FTS index correctly counts documents
ingested during fullscan. The db_basic mock is fixed to derive DbIndexKind
from IndexMetadata.kind and handle GetIndexParams for non-VS indexes.
@knowack1 knowack1 force-pushed the vector-621-index-adapter-abstraction branch from 28e4f83 to caab5d3 Compare June 9, 2026 08:05
@knowack1

knowack1 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Consider splitting the first commit into such commits:

  • move index dir into vs_index (to preserve git history)
  • refactor vs_index names
  • create fts_index module (and inside mod.rs, actor.rs, factory.rs and tantivy.rs - similary to the vs_index module; in the future we could have other implementation of fts index)

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants