-
Notifications
You must be signed in to change notification settings - Fork 6
Architecture
Zeppelin is an S3-native vector search engine. All persistent state lives in object storage. Nodes are stateless and disposable — restart any node and it reconstructs its view from S3.
┌─────────────────────────────────────────────────────────┐
│ HTTP Client │
└──────────────────────────┬──────────────────────────────┘
│
┌──────▼──────┐
│ Axum HTTP │ (routes, handlers, middleware)
│ Server │
└──────┬──────┘
│
┌────────────────┼────────────────┐
│ │ │
┌─────▼─────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Namespace │ │ Query │ │ WAL │
│ Manager │ │ Engine │ │ Writer │
└─────┬─────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ ┌──────▼──────┐ │
│ │ Index │ │
│ │ (IVF/HANN) │ │
│ └──────┬──────┘ │
│ │ │
┌─────▼────────────────▼────────────────▼─────┐
│ Disk Cache (LRU) │
└──────────────────────┬──────────────────────┘
│
┌──────▼──────┐
│ Storage │ (object_store wrapper)
│ Layer │
└──────┬──────┘
│
┌────────────┼────────────────┐
│ │ │
┌────▼───┐ ┌────▼───┐ ┌─────────▼──┐
│ AWS S3 │ │ GCS │ │ Azure Blob │
└────────┘ └────────┘ └────────────┘
-
S3 is the source of truth. Never trust local state over S3. The manifest on S3 is always authoritative. Local cache is disposable.
-
Immutable artifacts. WAL fragments and segments are write-once. Never modified in place. The manifest tracks what exists.
-
No fallbacks. Code crashes explicitly on errors. No silent degradation, no swallowing errors, no default values for things that should be configured.
-
Stateless nodes. Any node can serve any namespace. On startup, the node scans S3 to discover namespaces.
<namespace>/
├── meta.json # Namespace metadata
├── manifest.json # WAL manifest (fragments + segments)
├── lease.json # Writer lease (fencing token)
├── wal/
│ ├── <ulid>.fragment.json # WAL fragment (vectors + deletes)
│ └── ...
└── segments/
└── <segment_id>/
├── centroids.bin # IVF centroid vectors
├── cluster_<N>.bin # Full-precision cluster data
├── f16_cluster_<N>.bin # f16-compressed cluster data
├── sq_calibration.bin # SQ8 calibration parameters
├── sq_cluster_<N>.bin # SQ8-quantized cluster data
├── pq_codebook.bin # PQ codebook
├── pq_cluster_<N>.bin # PQ-encoded cluster data
├── attributes_<N>.json # Cluster attribute data
├── bitmap_<field>.bin # RoaringBitmap per attribute field
├── fts_<field>.bin # Inverted index per FTS field
└── tree.json # Hierarchical tree structure
| Module | Responsibility |
|---|---|
src/storage/ |
Object store wrapper. Nothing above this touches object_store directly |
src/wal/ |
Write-ahead log: fragment serialization, manifest management, leases |
src/namespace/ |
Namespace CRUD and metadata |
src/index/ |
Vector indexing: IVF-Flat, Hierarchical, SQ8, PQ, bitmap, f16 |
src/compaction/ |
Background WAL → segment compaction |
src/cache/ |
Local disk cache with LRU eviction |
src/server/ |
Axum HTTP handlers (thin layer over domain logic) |
src/fts/ |
Full-text search: tokenizer, BM25, inverted indexes, rank_by |
src/query.rs |
Query execution: manifest read → WAL scan + segment search → merge |
src/config.rs |
Configuration loading (env vars + TOML + defaults) |
src/types.rs |
Core types: VectorEntry, Filter, DistanceMetric, IndexType |
src/error.rs |
Error types with HTTP status code mapping |
src/metrics.rs |
Prometheus metrics registry |
Client POST /v1/namespaces/:ns/vectors
│
▼
Handler: validate dimensions, batch size, vector IDs
│
▼
WalWriter::append()
│
├── Serialize vectors + attributes to JSON
├── Compute xxHash checksum
├── Write fragment to S3: <ns>/wal/<ulid>.fragment.json
└── CAS update manifest.json (add FragmentRef)
- Validate input against namespace metadata
- Create a WAL fragment with vectors, attributes, and optional deletes
- Write the fragment to S3 (immutable, write-once)
- Update the manifest via CAS (compare-and-swap using ETags)
Client POST /v1/namespaces/:ns/query
│
▼
Handler: validate query, resolve consistency
│
▼
Read manifest.json from S3
│
├── [Strong] Scan all WAL fragments
│ └── Brute-force distance/BM25 on each fragment
│
├── [Always] Search index segments
│ ├── Load centroids from cache/S3
│ ├── Find top-nprobe nearest centroids
│ ├── Load cluster data for those centroids
│ ├── Apply bitmap pre-filter (if available)
│ └── Compute distances, collect top-k per segment
│
└── Merge WAL + segment results
├── Deduplicate (WAL wins on conflict)
├── Remove deleted IDs
├── Apply post-filter
└── Return top-k results
Background compaction loop (every N seconds)
│
▼
For each namespace with pending WAL fragments:
│
├── 1. Read manifest (get fragment list + existing segments)
├── 2. Acquire lease (fencing token)
├── 3. Load all WAL fragments
├── 4. Merge with existing segment data
├── 5. Compute delete set
├── 6. Train k-means centroids
├── 7. Assign vectors to clusters
├── 8. Write cluster artifacts to S3
│ ├── centroids.bin
│ ├── cluster_<N>.bin (full precision)
│ ├── f16_cluster_<N>.bin (if f16 enabled)
│ ├── sq_cluster_<N>.bin (if SQ8)
│ ├── pq_cluster_<N>.bin (if PQ)
│ ├── attributes_<N>.json
│ ├── bitmap_<field>.bin (per attribute)
│ └── fts_<field>.bin (per FTS field)
├── 9. CAS update manifest (add SegmentRef, clear processed fragments)
└── 10. Deferred deletion of old artifacts
Compaction is atomic via CAS: if another writer updates the manifest concurrently, the compaction retries. Old segments are deleted only after the new manifest is committed (deferred deletion pattern).
| Level | Behavior | Use Case |
|---|---|---|
| Strong | Reads manifest + scans all uncompacted WAL fragments + queries segments | Default. See all committed writes |
| Eventual | Reads segments only (skips WAL) | Faster queries, may miss recent writes |
The consistency level is set per-query via the consistency field.
Getting Started
API & SDKs
Configuration
Architecture
Operations