feat: Add BM25 lexical search support for Milvus destination#646
Draft
micmarty-deepsense wants to merge 2 commits intomainfrom
Draft
feat: Add BM25 lexical search support for Milvus destination#646micmarty-deepsense wants to merge 2 commits intomainfrom
micmarty-deepsense wants to merge 2 commits intomainfrom
Conversation
Add configuration flag and helper method to enable BM25 full-text search in Milvus 2.5+ destinations using the built-in BM25 Function API. Changes: - Add enable_lexical_search flag to MilvusUploadStagerConfig - Add create_bm25_schema() static helper for creating BM25-enabled collection schemas - Add unit tests for new configuration and schema generation - BM25 function auto-generates sparse vectors from text field for lexical search Requires Milvus 2.5+ and manual schema creation before ingestion using the provided create_bm25_schema() helper method.
Add enable_lexical_search flag to MilvusUploadStagerConfig to indicate that the collection is configured for BM25 full-text search with sparse vectors. Add create_bm25_schema() static helper method that provides example schema for Milvus 2.5+ BM25 Function API. Users must manually create collection with this schema before ingestion. The BM25 Function auto-generates sparse vectors from text content for keyword-based lexical search.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
enable_lexical_searchconfiguration flag to Milvus destination connectorcreate_bm25_schema()helper method for creating BM25-enabled collection schemasDetails
This PR adds support for BM25 lexical (full-text) search in Milvus destinations using Milvus 2.5+ built-in BM25 Function API.
Key Changes:
Configuration Flag: Added
enable_lexical_searchboolean field toMilvusUploadStagerConfigFalsefor backward compatibilitySchema Helper: Added
create_bm25_schema()static method toMilvusUploadStagertextfield withenable_analyzer=True,sparse_vectorfield, denseembeddingfieldMilvusClient.create_collection()Tests: Added comprehensive unit tests
Usage Pattern:
Collection schema must be created BEFORE ingestion using the helper:
Requirements:
Related:
enable_lexical_searchto connector config input modelTest Plan