GigaVector is a high-performance, production-ready vector database library written in C with optional Python bindings. Designed for applications requiring fast approximate nearest neighbor search, semantic memory management, and LLM integration.
- Multiple Index Algorithms: KD-Tree, HNSW (Hierarchical Navigable Small Worlds), IVFPQ (Inverted File with Product Quantization), and Sparse Index
- Distance Metrics: Euclidean and cosine distance with optimized implementations
- Rich Metadata: Support for multiple key-value metadata pairs per vector with efficient filtering
- Persistence: Snapshot-based persistence with Write-Ahead Logging (WAL) for durability
- Memory Management: Structure-of-Arrays storage, quantization options, and configurable resource limits
- Semantic Memory Layer: Extract, store, and consolidate memories from conversations with importance scoring
- LLM Integration: Support for OpenAI, Anthropic, and Google LLMs for memory extraction and generation
- Embedding Services: Integration with OpenAI, Google, and HuggingFace embedding APIs
- Context Graphs: Build entity-relationship graphs for context-aware retrieval
- Production Ready: Monitoring, statistics, health checks, and comprehensive error handling
- SIMD Optimizations: Automatic detection and use of SSE4.2, AVX2, and AVX-512F
- Thread-Safe: Concurrent read operations with external write synchronization
- Memory Efficient: Optimized data structures and optional quantization
- Scalable: Supports millions of vectors with configurable resource limits
make # builds everything (library + main executable)
make lib # builds static and shared libraries into build/lib/
make c-test # runs C tests (needs LD_LIBRARY_PATH=build/lib)For complete build and test instructions including LLM tests, see Build and Test Guide
# Configure build
cmake -B build -DCMAKE_BUILD_TYPE=Release
# Build library and executables
cmake --build build
# Run tests
cd build && ctest
# Install (optional)
cmake --install build --prefix /usr/localCMake Options:
-DBUILD_SHARED_LIBS=ON/OFF- Build shared library (default: ON)-DBUILD_TESTS=ON/OFF- Build test executables (default: ON)-DBUILD_BENCHMARKS=ON/OFF- Build benchmark executables (default: ON)-DENABLE_SANITIZERS=ON/OFF- Enable sanitizers (ASAN, TSAN, UBSAN) (default: OFF)-DENABLE_COVERAGE=ON/OFF- Enable code coverage (default: OFF)
Example with options:
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_TESTS=ON \
-DBUILD_BENCHMARKS=ON \
-DENABLE_SANITIZERS=OFF
cmake --build buildThe CMake build system automatically detects and enables available SIMD optimizations (SSE4.2, AVX2, AVX-512F, FMA) for optimal performance.
From PyPI:
pip install gigavectorFrom source:
cd python
python -m pip install .from gigavector import Database, DistanceType, IndexType
with Database.open("example.db", dimension=4, index=IndexType.KDTREE) as db:
db.add_vector([1, 2, 3, 4], metadata={"tag": "sample", "owner": "user"})
hits = db.search([1, 2, 3, 4], k=1, distance=DistanceType.EUCLIDEAN)
print(hits[0].distance, hits[0].vector.metadata)Persistence:
db.save("example.db") # snapshot
# On reopen, WAL is replayed automatically
with Database.open("example.db", dimension=4, index=IndexType.KDTREE) as db:
...IVFPQ training (dimension must match):
train = [[(i % 10) / 10.0 for _ in range(8)] for i in range(256)]
with Database.open(None, dimension=8, index=IndexType.IVFPQ) as db:
db.train_ivfpq(train)
db.add_vector([0.5] * 8)GigaVector supports various environment variables for API keys and configuration.
Quick Setup:
# Copy the example file and fill in your API keys
cp .env.example .env
# Edit .env with your actual API keysRequired for Tests:
OPENAI_API_KEY- For LLM and embedding testsANTHROPIC_API_KEY- For Anthropic/Claude LLM tests
Optional:
GOOGLE_API_KEY- For Google embeddingsGV_WAL_DIR- Override WAL directory location
See .env.example for a complete list with descriptions, or API Keys Documentation for detailed information.
- Usage Guide - Comprehensive guide for using GigaVector
- Build and Test Guide - Complete build instructions and testing
- Python Bindings Guide - Python API documentation and best practices
- C API Guide - C API usage patterns and examples
- API Reference - Complete API reference with detailed function documentation
- Architecture Documentation - Deep dive into system architecture and design
- API Keys Guide - Environment variables and API key configuration
- Deployment Guide - Production deployment, scaling, and operations
- Security Guide - Security best practices and hardening
- Troubleshooting Guide - Common issues and solutions
- Basic Usage Examples - Getting started with C and Python APIs
- Advanced Features - Advanced patterns and optimization techniques
- Performance Tuning Guide - Index selection, parameter tuning, and optimization
- Contributing Guidelines - How to contribute to GigaVector
This project is licensed under the DBaJ-NC-CFL License.
