Skip to content

Shah91n/Weaviate-Disk-Storage-Calculator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weaviate Disk Storage Calculator

Weaviate GitHub Repo stars Streamlit App Python

Weaviate Disk Storage Calculator is a tool designed to estimate disk storage requirements for the Weaviate vector database. It provides insights into storage usage based on your data characteristics and allows for extrapolation from existing datasets. The calculator offers a detailed breakdown of storage components, including objects, vector indexes, and inverted indexes, to help you plan capacity accurately.

Visit Weaviate Disk Storage Calculator image image

Features

Comprehensive Storage Calculation

  • Estimate storage requirements based on:
    • Number of objects and average object size
    • Vector dimensions and quantization methods
    • Inverted Indexes: Factors for filterable and searchable property storage, with defaults derived from real-world dataset analysis.
  • Supports all Weaviate quantization methods:
    • None (32-bit float)
    • Scalar Quantization (SQ)
    • Product Quantization (PQ)
    • Rotational Quantization RQ-8 (8-bit per dimension)
    • Rotational Quantization RQ-1 (1-bit per dimension, packed in uint64 blocks)
    • Binary Quantization (BQ)
  • Provides a detailed breakdown of storage components:
    • Object Properties Storage
    • Uncompressed Vector Storage (float32, stored alongside compressed index when quantization is enabled)
    • HNSW Index Storage (quantized or full float32 when no quantization)
    • Commit Log Storage (HNSW write-ahead log, ~10–20% of vector storage)
    • Filterable Properties Index Storage (est.)
    • Searchable Properties (BM25) Index Storage (est.)

Extrapolation from Sample Data

  • Predict storage requirements for larger datasets based on a sample from your existing Weaviate instance.
  • Supports both linear and conservative extrapolation models to account for growth overhead.

Interactive Visualizations

  • Detailed bar charts showing the full breakdown of storage components.
  • Side-by-side comparison of different quantization methods.

Installation

  1. Clone the repository:

    git clone https://github.com/Shah91n/Weaviate-Disk-Storage-Calculator.git
    cd Weaviate-Disk-Storage-Calculator
  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Run the application:

    streamlit run streamlit_app.py

Usage

  1. Open the application in your browser.
  2. Navigate between the tabs:
    • 📊 Parameter-Based Calculation: Input your data characteristics to get a detailed storage breakdown. Adjust inverted index factors in the sidebar for fine-tuning.
    • 📈 Extrapolate from Dataset: Use metrics from your current dataset to project future storage needs.
  3. Use the "Clear All & Reset" button in the sidebar to reset all fields to their default values.

Notes on Compression Methods

  • Scalar Quantization (SQ): Reduces vector size by ~75% with minimal quality loss.
  • Product Quantization (PQ): Segments vectors and compresses them for significant storage savings.
  • Rotational Quantization RQ-8: Rotates vectors then quantizes each dimension to 8 bits (1 byte). Same compressed size as SQ but higher accuracy thanks to the rotation step. Weaviate stores both the full float32 vectors (for re-ranking) and the RQ-8 compressed HNSW index on disk.
  • Rotational Quantization RQ-1: Rotates vectors then quantizes to 1 bit per dimension, packed into uint64 blocks (ceil(dims / 64) × 8 bytes). Identical on-disk layout to BQ, but the rotation preserves more structure. Weaviate stores both the full float32 vectors and the compressed index.
  • Binary Quantization (BQ): Compresses vectors to 1 bit per dimension packed into uint64 blocks (ceil(dims / 64) × 8 bytes), achieving ~32× compression. Weaviate stores both the full float32 vectors and the BQ-compressed HNSW index on disk.

License

© 2026 Weaviate Storage Calculator. This tool is for estimation purposes only. Actual storage requirements may vary.

About

This app helps you estimate storage requirements for Weaviate vector database based on your data characteristics. You can either calculate estimates from basic parameters or extrapolate from existing measurements.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages