-
Notifications
You must be signed in to change notification settings - Fork 6
Formal Verification
Zeppelin uses TLA+ to formally verify correctness of its concurrent algorithms. These specs have found real bugs and proven key invariants before deployment.
Distributed systems with concurrent writes, compaction, and queries have subtle race conditions that are nearly impossible to find through testing alone. TLA+ exhaustively explores all possible interleavings to prove or disprove invariants.
Key results:
- Found real bugs in compaction safety and query read consistency before they hit production
- Proved correctness of the multi-writer lease protocol, bitmap filter algebra, and BM25 query consistency
- 26,000+ states explored across 15 specifications
| Spec | Invariants | States | What it verifies |
|---|---|---|---|
CompactionSafety.tla |
5 | ~5,000 | No data loss during WAL → segment compaction. Concurrent query sees consistent view. Deferred deletion is safe. |
QueryReadConsistency.tla |
4 | ~3,000 | Strong queries never miss committed writes. WAL overrides segment on conflict. Deleted vectors excluded. |
MultiWriterLease.tla |
4 | 24,965 | Fencing tokens prevent zombie writes. Two-layer defense (fencing + CAS) is both necessary and sufficient. |
| Spec | Invariants | States | What it verifies |
|---|---|---|---|
BM25QueryConsistency.tla |
5 | 272 | Committed docs never missed. WAL overrides segment on merge. Deleted docs never returned. Concurrent write + compaction + query. |
InvertedIndexBuildSafety.tla |
4 | 139 | Posting lists cover all docs. IDF stats consistent. Compaction preserves non-deleted data. |
BM25ScoreCommutativity.tla |
5 | 5 | Sum commutativity, WAL vs segment scoring equivalence, Product associativity, TopK stability under tie-breaking. |
| Spec | Invariants | States | What it verifies |
|---|---|---|---|
BitmapFilterAlgebra.tla |
6 | 324 | And/Or/Not/Eq/Range/In filter evaluation matches brute-force. Bitmap intersection = logical AND. |
BitmapBuildCoverage.tla |
4 | 65 | Build process produces correct bitmaps from attribute data. Numeric keys sorted. High-cardinality excluded. |
BitmapSearchEquivalence.tla |
3 | 8 | Bitmap-accelerated search returns same results as post-filter search. |
| Spec | Invariants | States | What it verifies |
|---|---|---|---|
HierarchicalVectorCoverage.tla |
4 | 263 | Partition correctness: every vector reachable, no duplicates across clusters. |
BeamSearchTermination.tla |
5 | 30 | Beam search terminates on both balanced and unbalanced trees. Monotonic progress. |
HierarchicalBuildAtomicity.tla |
3 | 34 | Concurrent build + query safety. Query sees consistent snapshot. |
| Spec | Invariants | States | What it verifies |
|---|---|---|---|
WalFragmentIntegrity.tla |
3 | ~100 | Fragment checksums detect corruption. Write-once semantics. |
CASManifestSafety.tla |
3 | ~200 | ETag-based CAS prevents concurrent manifest corruption. |
CacheEvictionSafety.tla |
3 | ~150 | LRU eviction never evicts pinned centroids. Cache coherence. |
The MultiWriterLease.tla spec proved that neither fencing alone nor CAS alone prevents zombie writes:
- Fencing without CAS: A stale writer can pass the fencing check, then a new writer acquires the lease and writes, then the stale writer's write arrives — TOCTOU race.
- CAS without fencing: A stale writer can read the current manifest, compute new state, and win the CAS race because its ETag is still valid.
- Both together: The fencing check catches most stale writers early, and CAS catches the remaining TOCTOU gap.
The CompactionSafety.tla spec showed that deleting old segments before committing the new manifest can cause phantom reads — a concurrent query reads the old manifest, then tries to fetch a segment that was already deleted.
Solution: Write new segment → CAS manifest → Delete old segment (deferred deletion).
The BitmapFilterAlgebra.tla spec exhaustively verified that bitmap evaluation (And = intersection, Or = union, Not = complement) produces identical results to brute-force filter evaluation across all 324 possible states with mixed attribute types.
The BM25ScoreCommutativity.tla spec verified that a document's BM25 score is the same whether the document is stored in the WAL or in a segment. This is crucial for correct merge behavior during compaction.
Prerequisites:
- Java 21+ (OpenJDK)
- TLA+ tools (
tla2tools.jar)
cd formal-verifications/tla
# Run a specific spec
/opt/homebrew/opt/openjdk@21/bin/java \
-jar ../tla2tools.jar \
-config MultiWriterLease.cfg \
MultiWriterLease.tla
# All specs use CHECK_DEADLOCK FALSE (terminal states are expected)Each .tla file has a corresponding .cfg file that defines:
- Constants and their values
- Invariants to check
-
CHECK_DEADLOCK FALSE(since terminal states are expected in these models)
Getting Started
API & SDKs
Configuration
Architecture
Operations