FastEmbed VectorStore

A high-performance, in-memory vector store with FastEmbed integration for Python applications.

Supported Embedding Models

The library supports a wide variety of embedding models:

BGE Models: BGEBaseENV15, BGELargeENV15, BGESmallENV15 (with quantized variants)
Nomic Models: NomicEmbedTextV1, NomicEmbedTextV15 (with quantized variants)
GTE Models: GTEBaseENV15, GTELargeENV15 (with quantized variants)
Multilingual Models: MultilingualE5Small, MultilingualE5Base, MultilingualE5Large
Specialized Models: ClipVitB32, JinaEmbeddingsV2BaseCode, ModernBertEmbedLarge
And many more...

Installation

Prerequisites

Python 3.8 or higher

Install from PyPI

pip install fastembed-vectorstore

From Source

Clone the repository:

git clone https://github.com/sauravniraula/fastembed_vectorstore.git
cd fastembed_vectorstore

Install the package:

pip install -e .

Quick Start

from fastembed_vectorstore import FastembedVectorstore, FastembedEmbeddingModel

# Initialize with a model
model = FastembedEmbeddingModel.BGESmallENV15
vectorstore = FastembedVectorstore(model)

# Optional Configurations
# vectorstore = FastembedVectorstore(
#     model,
#     show_download_progress=False,           # default: True
#     cache_directory="fastembed_cache",      # default: fastembed_cache
# )

# Add documents
documents = [
    "The quick brown fox jumps over the lazy dog",
    "A quick brown dog jumps over the lazy fox",
    "The lazy fox sleeps while the quick brown dog watches",
    "Python is a programming language",
    "Rust is a systems programming language"
]

# Embed and store documents
success = vectorstore.embed_documents(documents)
print(f"Documents embedded: {success}")

# Search for similar documents
query = "What is Python?"
results = vectorstore.search(query, n=3)

for doc, similarity in results:
    print(f"Document: {doc}")
    print(f"Similarity: {similarity:.4f}")
    print("---")

# Save the vector store
vectorstore.save("my_vectorstore.json")

# Load the vector store later
loaded_vectorstore = FastembedVectorstore.load(model, "my_vectorstore.json")

# Optional Configurations
# loaded_vectorstore = FastembedVectorstore.load(
#     model,
#     "my_vectorstore.json",
#     show_download_progress=False,           # default: True
#     cache_directory="fastembed_cache",      # default: fastembed_cache
# )

API Reference

FastembedEmbeddingModel

Enum containing all supported embedding models. Choose based on your use case:

Small models: Faster, lower memory usage (e.g., BGESmallENV15)
Base models: Balanced performance (e.g., BGEBaseENV15)
Large models: Higher quality embeddings (e.g., BGELargeENV15)
Quantized models: Reduced memory usage (e.g., BGESmallENV15Q)

FastembedVectorstore

Constructor

vectorstore = FastembedVectorstore(
    model: FastembedEmbeddingModel,
    show_download_progress: bool | None = ...,
    cache_directory: str | os.PathLike[str] | None = ...,
)

Args:

model: Embedding model to use.
show_download_progress: Whether to show model download progress. Defaults to True.
cache_directory: Directory to cache/download model files. Defaults to ./fastembed.

Methods

`embed_documents(documents: List[str]) -> bool`

Embeds a list of documents and stores them in the vector store.

`search(query: str, n: int) -> List[Tuple[str, float]]`

Searches for the most similar documents to the query. Returns a list of tuples containing (document, similarity_score).

`save(path: str) -> bool`

Saves the vector store to a JSON file.

`load(model: FastembedEmbeddingModel, path: str) -> FastembedVectorstore`

Loads a vector store from a JSON file.

Performance Considerations

Memory Usage: All embeddings are stored in memory, so consider the size of your document collection
Model Selection: Smaller models are faster but may have lower quality embeddings
Batch Processing: The embed_documents method processes documents in batches for efficiency

Use Cases

Semantic Search: Find documents similar to a query
Document Clustering: Group similar documents together
Recommendation Systems: Find similar items or content
Question Answering: Retrieve relevant context for Q&A systems
Content Discovery: Help users find related content

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Author

Saurav Niraula - sauravniraula
Email: developmentsaurav@gmail.com

Acknowledgments

Built with FastEmbed for efficient text embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.cargo		.cargo
fastembed_vectorstore		fastembed_vectorstore
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastEmbed VectorStore

Supported Embedding Models

Installation

Prerequisites

Install from PyPI

From Source

Quick Start

API Reference

FastembedEmbeddingModel

FastembedVectorstore

Constructor

Methods

`embed_documents(documents: List[str]) -> bool`

`search(query: str, n: int) -> List[Tuple[str, float]]`

`save(path: str) -> bool`

`load(model: FastembedEmbeddingModel, path: str) -> FastembedVectorstore`

Performance Considerations

Use Cases

Contributing

License

Author

Acknowledgments

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

sauravniraula/fastembed-vectorstore

Folders and files

Latest commit

History

Repository files navigation

FastEmbed VectorStore

Supported Embedding Models

Installation

Prerequisites

Install from PyPI

From Source

Quick Start

API Reference

FastembedEmbeddingModel

FastembedVectorstore

Constructor

Methods

embed_documents(documents: List[str]) -> bool

search(query: str, n: int) -> List[Tuple[str, float]]

save(path: str) -> bool

load(model: FastembedEmbeddingModel, path: str) -> FastembedVectorstore

Performance Considerations

Use Cases

Contributing

License

Author

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

`embed_documents(documents: List[str]) -> bool`

`search(query: str, n: int) -> List[Tuple[str, float]]`

`save(path: str) -> bool`

`load(model: FastembedEmbeddingModel, path: str) -> FastembedVectorstore`

Packages