Seed, query, and benchmark vector databases using Wikipedia embeddings. Supports turbopuffer, Pinecone, and Supabase (pgvector), with cost estimation across all three plus Elasticsearch and OpenSearch.
~224K Wikipedia articles with pre-computed embeddings, initially from Supabase/wikipedia-en-embeddings but also includes OpenAIs small and large.
| Namespace | Model | Dimensions |
|---|---|---|
| wiki-openai | ada-002 | 1536 |
| wiki-minilm | all-MiniLM-L6-v2 | 384 |
| wiki-gte | gte-small | 384 |
| wiki-3-small | text-embedding-3-small | 512 |
| wiki-3-large | text-embedding-3-large | 1024 |
All five are hosted on Hugging Face and downloaded with the download command.
Requires Bun.
bun install
cp .env.example .env
# Fill in your API keys in .envAll commands are run via bun src/index.ts. Use --backend to select a backend (default: tpuf).
bun src/index.ts downloadDownloads all five datasets (~6.4GB total) from Hugging Face to data/.
bun src/index.ts seed
bun src/index.ts seed --namespace wiki-gte --limit 1000
bun src/index.ts seed --backend pinecone --batch-size 100Re-embed text using OpenAI's newer models:
bun src/index.ts embed --model text-embedding-3-small
bun src/index.ts embed --model text-embedding-3-large --concurrency 3bun src/index.ts query --doc-id "some-document-id"# Recall (turbopuffer only — uses the recall API)
bun src/index.ts recall-benchmark
# Single-query latency
bun src/index.ts latency-benchmark --queries 50
# Throughput (QPS under load)
bun src/index.ts throughput-benchmark --concurrency 10
# Upsert throughput
bun src/index.ts upsert-benchmark --namespace wiki-gte --records 10000All benchmarks accept --output <path> to save JSON results.
Compare estimated monthly costs across backends:
bun src/index.ts cost-estimate
bun src/index.ts cost-estimate --vectors 1000000 --dimensions 384 --queries 500000bun src/index.ts stats # Namespace row counts
bun src/index.ts delete --confirm # Delete all wiki-* namespaces
bun src/index.ts supabase-sql # Generate pgvector setup SQL| Variable | Required for |
|---|---|
TURBOPUFFER_API_KEY |
turbopuffer backend |
TURBOPUFFER_REGION |
turbopuffer (optional) |
PINECONE_API_KEY |
pinecone backend |
SUPABASE_URL |
supabase backend |
SUPABASE_ANON_KEY |
supabase backend |
OPENAI_API_KEY |
embed command |
MIT