All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Add:
- Added
BasicDiskVectorDatabaseto provide a basic automatically disk persistent vector database, and associatedBasicDiskVectorStoreandBasicDiskVocabularyStore.
Add:
- Added
IBatchEmbeddingsGeneratorand.AddTextsAsyncto support batch adding of texts to database. - Added OpenAI support for
IBatchEmbeddingsGenerator - Added
IVectorTextResultItem.Similarityand markedIVectorTextResultItem.VectorComparisonobsolete.VectorComparisonwill be removed in the future. - Added more comment metadata to code
Fixed:
- Fixed a bug when loading saved database from file/stream where
IntIdGeneratororNumericIdGeneratorlose max Id, resulting in adding new texts to database causes existing texts to be overwritten. This specifically affectedSharpVector.OpenAIandSharpVector.Ollamalibraries but the fix is implemented within the coreBuild5Nines.SharpVectorlibrary.
Add:
- Add optional
filterparameter to.Searchand.SearchAsyncmethods that is of typeFunc<TMetadata, bool>that is called for each text item in the database for more advanced filtering prior to performing vector similarity search and returning results. If undefined ornullit's ignored.
Add:
- Added
VectorTextResultItem.Idproperty so it's easy to get the database ID for search results if necessary. IVectorDatabasenow inherits fromIEnumerableso you can easily look through the texts documents that have been added to the database.
Fixed:
- Fixed text tokenization to correctly remove special characters
- Update
BasicTextPreprocessorto support Emoji characters too - Refactorings for more Clean Code
Breaking Changes:
- The
.Searchand.SearchAsyncmethods now return aIVectorTextResultItem<TId, TDocument, TMetadata>instead ofVectorTextResultItem<TDocument, TMetadata>. If you're using things like the documentation shows, then you wont see any changes or have any issues with this update.
Add:
- Add Ollama support via
Build5Nines.SharpVector.Ollamanuget package - Added
Build5Nines.SharpVector.Embeddings.IEmbeddingsGeneratorto support creating external embeddings providers - Added
Build5Nines.OpenAI.IOpenAIMemoryVectorDatabaseinterface
Fixed:
- Internal refactoring of save/load database persistence file code to make more maintainable and reusable going forward.
- Implement some performance tweaks in the code; adding
const stringand other best practices to help overall performance for handling larger amounts of data.
Add:
- Add
SerializeToBinaryStreamandDeserializeFromBinaryStreammethods to replace (and mark obsolete)SerializeToJsonStreamandDeserializeFromJsonStreammethods. They read/write binary zip file data, not json, so they were named incorrectly.
Fixed:
- Fixed
.LoadFromFilemethod that was previously inaccessible.
Added:
- Expose internal vector array of
VectorTextItemfromVectorTextResultItem.Vectorsproperty, to make vector array accessible for consuming code in cases where access is required. This is mostly for more flexible usage of the library. - Added Overlapping Window text chunking (
TextChunkingMethod.OverlappingWindow) toTextDataLoaderfor enhanced document segmentation with overlapping content, improving metadata extraction and search result relevance.
Fixed:
- When using
Data.TextDataLoaderwithTextChunkingMethod.FixedLengthit was splitting on a space character which wouldn't work correctly with Chinese text characters. This is now fixed to work correctly with Chinese characters too.
Added:
- Add data persistence capability to save/load from a file or to/from a
Stream(Both SharpVector and SharpVector.OpenAI) - Add Chinese language/character support
Breaking Change:
- Refactor
IVocabularyStoreto be used withinMemoryDictionaryVectorStoreWithVocabulary. This simplifies implementation ofMemoryVectorDatabaseBase, and helps to enable data persistence capability.
Notes:
- The breaking change only applies if the base classes are being used. If the
BasicMemoryVectorDatabaseis being used, this will likely not break applications that depend on this library. However, in some instances where explicitly depending onVectorTextResultit's properties (without usingvarin consuming code) there might be minor code changes needed when migrating from previous versions of the library.
- Upgrade to .NET 8 or higher
Added:
- Simplify object model by combining Async and non-Async classes,
BasicMemoryVectorDatabasenow support both synchronous and asynchronous operations. - Refactored to remove unnecessary classes where the
Asyncversions will work just fine. - Improve async/await and multi-threading use
Added:
- Added
Asyncversion of classes to support multi-threading - Metadata is no longer required when calling
.AddText()and.AddTextAsync() - Refactor
IVectorSimilarityCalculatortoIVectorComparerandCosineVectorSimilarityCalculatorAsynctoCosineSimilarityVectorComparerAsync - Add new
EuclideanDistanceVectorComparerAsync - Fix
MemoryVectorDatabaseto no longer requird unusedTIdgeneric type - Rename
VectorSimilarityandSimilarityproperties toVectorComparison
Added:
- Add
TextDataLoaderclass to provide support for different methods of text chunking when loading documents into the vector database.
Added:
- Introduced the
BasicMemoryVectorDatabaseclass as the basic Vector Database implementations that uses a Bag of Words vectorization strategy, with Cosine similarity, a dictionary vocabulary store, and a basic text preprocessor. - Add more C# Generics use, so the library is more customizable when used, and custom vector databases can be implemented if desired.
- Added
VectorTextResultItem.Similarityso consuming code can inspect similarity of the Text in the vector search results. - Update
.Searchmethod to support search result paging and threshold support for similarity comparison - Add some basic Unit Tests
Added:
- Initial release - let's do this!