This guide provides practical strategies for optimizing memory usage and scaling Weaviate deployments. In most vector systems, memory is the primary cost driver and a key performance bottleneck.
Weaviate uses two distinct memory spaces. Total Pod/container memory is the sum of these spaces plus runtime overhead.
| Space | Components |
|---|---|
| Go Heap | HNSW graph connections, vector cache, compressed vectors |
| Off-Heap | Memory-mapped (mmap) LSM segments, goroutine stacks, GC metadata |
The 1.5x Rule: Container memory typically runs at about 1.5x Go heap in-use. The OS needs additional space for page cache (off-heap) to maintain high-speed data retrieval.
Memory usage is dominated by Layer 0 connections (about 100% of nodes).
- Formula:
bytes_per_conn = 2-5 bytes(variable encoding based on index size) - Optimization: Reducing
maxConnectionscan significantly lower RAM usage, with a possible small reduction in recall.
Vectors are cached as float32 values (4 bytes per dimension).
- Formula:
Cache Memory = cached_vectors × (dimensions × 4 + 30) bytes
- Optimization: Use
vectorCacheMaxObjectsto prevent unbounded cache growth.- Frequently queried vectors should remain in memory for consistent low-latency lookups.
Object data resides in mmap'd segments off-heap. Files larger than 8 KB are memory-mapped. Segments at or below 8 KB are read into Go heap memory.
Follow these steps to estimate production memory requirements:
- Calculate Go Heap:
(HNSW + Vector Cache + 2 GB buffer). - Set GOMEMLIMIT:
Total Heap × 1.2(adds 20% headroom under load). - Set Container Limit:
GOMEMLIMIT / 0.8(Weaviate maps GOMEMLIMIT to 80% of container memory).
- Go Heap: about 68 GB.
- GOMEMLIMIT: 82 GB.
- Container Limit:
82 / 0.8 = about 103 GB. - Expected Runtime Usage: about 102 GB (aligned with the 1.5x rule).
As vector count grows, Go heap usage increases. Because of the 1.5x rule, a 10 GB heap increase often requires about 15 GB additional container RAM to preserve performance.
- 80% Threshold: If
go_memstats_heap_inuse_bytesstays above 80% ofGOMEMLIMIT, scale vertically or shard horizontally.- Horizontal scaling guidance: shard across nodes only when your design supports it (for example,
desiredCount=3withRF=3), since rebalancing may occur.
- Horizontal scaling guidance: shard across nodes only when your design supports it (for example,
- Performance Degradation: If heap usage approaches the limit, GC runs more frequently (CPU spikes), and the OS may evict off-heap page cache, which increases query latency.
- Scaling Lead Time: Scale before the limit is reached. For example, when planning growth from 10M to 15M vectors, estimate future heap and increase container memory by about 1.5x the projected heap increase.
- Strategy 1: Reduce Graph Size: Lower
maxConnectionsto reduce HNSW graph RAM. - Strategy 2: Enable Compression: Use PQ, SQ, BQ, or RQ to reduce vector memory footprint.
- Strategy 3: Limit Cache: Set
vectorCacheMaxObjectsintentionally, based on your hot data set. - Strategy 4: Set GOMEMLIMIT Correctly: Set
GOMEMLIMITto about 80% of the container memory limit to avoid aggressive GC behavior and OOM risk.
- GOMEMLIMIT: Set to 80% of total container memory to reduce OOM risk while preserving runtime overhead.
- 1.5x Rule Alignment: Confirm container limits include the multiplier for OS page cache.
- Vector Cache: Explicitly set
vectorCacheMaxObjectsinstead of relying on the default (1e12, effectively unlimited for most deployments). - Vertical Scaling Buffer: Define a scale-up trigger (for example, heap in-use > 80% of
GOMEMLIMIT) to keep enough operational lead time.
- HNSW
maxConnectionsis tuned to your recall vs. memory target. - Compression Strategy (PQ/SQ/BQ/RQ) is selected and validated for your workload.
- Heap Tracking: Monitor
go_memstats_heap_inuse_bytesas the primary active-memory baseline. - Off-Heap Approximation: Track
container.memory.usage - go_memstats_heap_inuse_bytesto monitor page cache health.
You can also use this calculator for memory and CPU sizing: Weaviate-Memory-CPU-Calculator: https://github.com/Shah91n/Weaviate-Memory-CPU-Calculator