A hands-on guide to vector databases. Learn by experimentation, not memorization.
This guide answers four fundamental questions through interactive experiments:
- Correctness: Are my search results accurate?
- Performance: How fast is it really?
- Durability: What happens when things break?
- Scalability: Where are the limits?
Each experiment takes 1-5 minutes and produces real, measurable results.
┌─────────────────────────┬──────────────────────────┐
│ Control Panel (Left) │ Results Panel (Right) │
├─────────────────────────┼──────────────────────────┤
│ 📤 Document Upload │ 🔍 Search Interface │
│ 📊 Performance Metrics │ 🕐 Recent Queries │
│ ⚡ Quick Actions │ 📄 Search Results │
│ 📈 Latency Visualization│ │
│ 🧪 Advanced Diagnostics │ │
└─────────────────────────┴──────────────────────────┘
One-click testing with sensible defaults:
⚡ Benchmark → 500 vectors, ~10s
✓ Accuracy → NumPy parity check
🔄 Health → System diagnostics
📊 Scale → Quick scaling profile
Customizable test parameters for deep analysis:
- Benchmark: 100 / 500 / 1K / 2K vectors
- Scale Test: Quick / Standard / Thorough profiles
- Accuracy: Top-5 / Top-10 / Top-20 verification
- Health: Comprehensive system report
Live dashboard (5-second refresh):
| Metric | Description |
|---|---|
| Indexed | Documents in database |
| Queries | Total searches executed |
| P95 Latency | 95th percentile response time |
| Status | Color-coded health indicator |
Time: 2-3 seconds
Question: Is VectorLiteDB returning the correct results?
- Navigate to http://localhost:8000
- Click "✓ Accuracy" in Quick Actions
- Observe results
✅ Accuracy Verified
Perfect Match!
VectorLiteDB results are identical to the gold-standard
NumPy brute-force implementation.
Test Details:
• Algorithm: Cosine similarity
• Test vectors: 1,000 random samples
• Top-K compared: 5 results
• Baseline: NumPy (reference)
• System: VectorLiteDB (test)
| Result | Meaning | Action |
|---|---|---|
| ✅ Perfect Match | Results are mathematically correct | No action needed |
| Tie-breaking differences only | Acceptable (floating-point precision) | |
| ❌ Mismatch | Algorithm error detected | Update VectorLiteDB or rebuild DB |
Technical Note: Minor differences in ordering can occur when multiple documents have nearly identical scores (e.g., 0.85234 vs 0.85233). This is normal floating-point behavior and doesn't affect search quality.
Time: Continuous (live monitoring)
Question: How fast are my searches?
Monitor the Performance panel in the left sidebar. Metrics auto-refresh every 5 seconds.
🟢 Excellent < 50ms Production-grade, no optimization needed
🔵 Good 50-100ms Acceptable for most use cases
🟡 OK 100-300ms Noticeable delay, consider optimization
🔴 Slow > 300ms User experience impacted, action required
P95 Latency (95th Percentile)
95% of all search requests complete faster than this value. We track P95 instead of average because:
- Captures worst-case user experience
- Reveals performance outliers
- Better indicator of system stability
Good performance: P95 < 2× P50
Status Calculation
- Indexed file count
- Query volume
- Latency percentiles
- Combined health score
Pro Tip: Users perceive latency differently based on context:
- < 100ms: Feels instant
- 100-300ms: Noticeable but acceptable
- > 300ms: Frustratingly slow
Time: 2 seconds - 1 minute (depending on test size)
Question: How many documents can I index before performance degrades?
Option A: Quick Test
- Click "⚡ Benchmark" (tests 500 vectors)
Option B: Custom Test
- Expand "Advanced Tests"
- Select vector count (100 / 500 / 1K / 2K)
- Click "Run Benchmark"
✓ Benchmark Complete
Configuration:
Vectors: 500
Dimensions: 384
Metric: cosine
Performance:
Insert: 0.8ms/vector (total: 400ms)
Search: 45.2ms (1K queries)
Storage: 12.3 MB
Assessment: Good Performance ✓
Search latency within acceptable range for production use.
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| Insert Speed | < 2ms | 2-10ms | > 10ms |
| Search Latency | < 50ms | 50-200ms | > 200ms |
| Storage Efficiency | ~2KB/vec | ~5KB/vec | > 10KB/vec |
⚠️ Slow Inserts Detected (15.2ms avg)
Likely causes:
• Project stored on iCloud Drive or network storage
• Slow disk I/O performance
• Background sync processes interfering
Recommended fix:
Move database to local, non-synced storage:
mkdir -p ~/Local/vectorbench-db
mv kb.db ~/Local/vectorbench-db/
ln -s ~/Local/vectorbench-db/kb.db kb.db
Root Cause: VectorLiteDB uses SQLite with
PRAGMA synchronous=FULL, which forces disk writes for durability. Cloud-synced folders add network latency to every write operation, causing 50-100× slowdowns.
Time: 20 seconds - 3 minutes
Question: How does performance degrade as data grows?
- Expand "Advanced Tests" → "Scale Test"
- Select profile:
| Profile | Test Sizes | Duration | Use Case |
|---|---|---|---|
| Quick | 100, 250, 500 | ~20s | Daily health checks |
| Standard | 500, 1K, 2K | ~1min | Sprint validation |
| Thorough | 1K, 2.5K, 5K | ~3min | Release qualification |
- Click "Run Scale Test"
- Observe real-time progress and chart
Search Latency (ms)
↑
120 │ •
100 │ •
80 │ •
60 │ •
40 │ •
20 │•
0 └─────────────────────────────→
100 250 500 1K 2K 5K
Vectors
Linear Growth (Expected)
Latency increases proportionally with data size.
This is normal for brute-force search algorithms.
Steep Curve (Warning)
Non-linear growth indicates you're approaching
practical limits. Consider migration to ANN-based
solutions (Chroma, Qdrant, FAISS).
Flat Curve (Unusual)
Performance isn't scaling with data. Possible causes:
• Test size too small to show differences
• Aggressive caching
• Bottleneck elsewhere in system
Time: 10-15 seconds
Question: Will my data survive a crash?
python -c "
from tests.test_persistence_crash import test_normal_persistence
test_normal_persistence()
"✅ PASS: Data persisted correctly
- Database reopened successfully
- Vector count intact
- Metadata preserved
- No corruption detected
❌ FAIL: Data integrity compromised
This is a serious issue indicating:
• VectorLiteDB version bug
• Unreliable storage medium
• Filesystem corruption
• Insufficient write permissions
Action required:
1. Implement regular backups
2. Consider enabling SQLite WAL mode
3. Verify storage medium reliability
4. Check filesystem for errors
Definition: Validation that VectorLiteDB produces identical results to a reference implementation.
Implementation:
- Generate 1,000 random 384-dim vectors
- Index in both VectorLiteDB and NumPy
- Execute identical queries
- Compare top-K results (accounting for ties)
Why it matters: Without parity checks, you can't trust your search results are correct.
| Percentile | What It Measures |
|---|---|
| P50 (Median) | Typical user experience |
| P95 | Worst experience for 95% of users |
| P99 | Outliers and edge cases |
Why P95? It balances between capturing most user experiences while filtering extreme outliers that might be measurement errors.
Brute Force (VectorLiteDB)
✓ 100% accurate results
✓ Simple implementation
✓ Predictable behavior
✗ O(N) time complexity
✗ Doesn't scale to millions
ANN (Approximate Nearest Neighbor)
✓ Sub-linear time complexity
✓ Scales to billions of vectors
✗ ~95-99% recall (trade accuracy for speed)
✗ Complex implementation
✗ Harder to tune
Algorithm for status indicator:
if p95_latency < 50:
return "🟢 Excellent"
elif p95_latency < 100:
return "🔵 Good"
elif p95_latency < 300:
return "🟡 OK"
else:
return "🔴 Slow"Q: Why do parity checks sometimes show different result orderings?
A: When multiple documents have nearly identical similarity scores (e.g., 0.8523 vs 0.8522), the ordering between them is arbitrary. This is due to floating-point precision limits and doesn't affect search quality. As long as the set of results matches, the check passes.
Example:
NumPy: [doc3, doc7, doc2, doc9, doc1]
VectorLiteDB: [doc7, doc3, doc2, doc9, doc1]
Result: ✅ PASS (docs 3 and 7 are tied)
Q: My search is consistently slow. What should I do?
A: Follow this diagnostic tree:
-
Check Status indicator
- Green/Blue: No action needed
- Yellow/Red: Continue troubleshooting
-
Run Scale Test
- Linear growth: Normal brute-force behavior
- Steep curve: Approaching scale limits
-
Check indexed document count
- < 10K: Should be fast, investigate environment
- 10K-50K: Expected to be slower
- > 50K: Consider migration to ANN-based solution
-
Verify environment
- Not on cloud-synced storage
- Fast SSD with good I/O
- Sufficient RAM (1GB+ for 10K vectors)
Q: Database file size is growing rapidly. Is this a problem?
A: This is normal behavior. Here's the math:
Per-vector storage:
384 dims × 4 bytes = 1,536 bytes (vector)
+ ~500-2000 bytes (metadata)
+ ~500 bytes (SQLite overhead)
= 2.5-4 KB per vector
Expected growth:
1K vectors → ~10 MB
10K vectors → ~100 MB
50K vectors → ~500 MB
If growth is significantly higher, you may have:
- Excessive metadata per document
- Large text chunks not properly summarized
- Duplicate entries
Check with: SELECT COUNT(*), AVG(LENGTH(metadata)) FROM vectors;
Q: What's the difference between Quick Actions and Advanced Tests?
A:
Quick Actions → One-click testing with production defaults
- Benchmark: 500 vectors
- Accuracy: Top-5 verification
- Scale: Quick profile
Advanced Tests → Full control for specialized testing
- Benchmark: 100-2,000 vectors
- Accuracy: Top-5 to Top-20
- Scale: Quick/Standard/Thorough profiles
Use Quick Actions for daily health checks. Use Advanced Tests when you need specific test parameters or deeper analysis.
Q: Why are my inserts taking >10ms each?
A: Almost always environmental issues:
Root causes (in order of likelihood):
-
iCloud/OneDrive/Dropbox sync (90% of cases)
- Solution: Move DB to
~/Local/vectorbench-db/
- Solution: Move DB to
-
Network-attached storage
- Solution: Use local SSD
-
Slow disk I/O
- Check:
sudo fs_usage -f filesys | grep kb.db - Solution: Upgrade storage or reduce sync load
- Check:
-
Insufficient disk space
- Check:
df -h - Solution: Free up space (< 10% free triggers slowdowns)
- Check:
Not a VectorLiteDB bug. The library uses standard SQLite with synchronous writes for durability.
Use this table to understand what each test tells you about your system:
| Test | Question Answered | Good Result | Bad Result | Next Steps |
|---|---|---|---|---|
| Accuracy | Are results correct? | Perfect Match | Mismatch Detected | Update library or rebuild DB |
| Performance | Is it fast enough? | Green/Blue status | Red status | Profile queries, check environment |
| Benchmark | What are the limits? | Search < 100ms | Search > 300ms | Reduce data or migrate to ANN |
| Scale | How does it grow? | Linear curve | Steep/flat curve | Investigate bottlenecks |
| Crash Test | Is data durable? | ✅ PASS | ❌ FAIL | Enable backups, check storage |
1. Upload new documents
2. Run quick search tests
3. Monitor performance metrics
4. Check status indicator
1. Run full accuracy verification
2. Execute standard scale test
3. Verify P95 < 100ms
4. Run crash recovery test
5. Export results for documentation
1. Check performance status
2. Run benchmark with multiple sizes
3. Execute scale test
4. Verify environment (iCloud, storage)
5. Check logs for errors
Run parity checks regularly. Perfect Match = mathematically correct results.
Target P95 < 50ms for excellent UX. Yellow/Red status = investigate immediately.
Brute-force is O(N). When the curve steepens, it's time to consider ANN solutions.
Slow inserts (>10ms) = cloud sync or network storage. Fix the environment, not the code.
VectorBench excels at 10K-100K vectors. Beyond that, migrate to purpose-built vector databases.
Ready to dive deeper? Start with Experiment 1 at http://localhost:8000 or explore the testing framework.