Chunk Lab

Same document. Same question. Different chunking — different answer.

An interactive demo showing why chunking strategy matters as much as the vector database you pick. Built to demonstrate end-to-end RAG: real embeddings, real similarity search, real LLM answer generation.

What it does

You give Chunk Lab a document and a question. It runs a complete RAG pipeline:

Chunks the document using your selected strategy (fixed-size, recursive, or sentence-level)
Embeds each chunk into a vector using Voyage AI
Searches by cosine similarity against the embedded query
Generates an answer with Claude using only the top retrieved chunks

Then you switch strategies and rerun — and see how chunking choice changes everything downstream.

Stack

Layer	Tech
Framework	Next.js 14 (App Router) + TypeScript
Embeddings	Voyage AI (`voyage-3-lite`, 512-dim)
Vector store	In-memory cosine similarity
LLM	Anthropic Claude (`claude-haiku-4-5`)
Styling	Tailwind CSS
Deploy target	Vercel

The vector store is intentionally in-memory. Pinecone, Weaviate, ChromaDB, and Qdrant all do the same fundamental thing — storage + similarity search. To swap one in, replace lib/vectorStore.ts. The rest of the app stays untouched.

Local setup

1. Clone and install

git clone https://github.com/<your-username>/chunk-lab.git
cd chunk-lab
npm install

2. Get API keys

Anthropic — console.anthropic.com/settings/keys. Free tier is plenty.
Voyage AI — dash.voyageai.com/api-keys. Free tier gives 200M tokens/month.

3. Set env vars

Create .env.local in the project root:

ANTHROPIC_API_KEY=sk-ant-...
VOYAGE_API_KEY=pa-...

4. Run

npm run dev

Open http://localhost:3000.

Deploy to Vercel

1. Push to GitHub

git init
git add .
git commit -m "initial commit"
git branch -M main
git remote add origin https://github.com/<your-username>/chunk-lab.git
git push -u origin main

2. Import to Vercel

Go to vercel.com/new
Import your chunk-lab repo
Vercel auto-detects Next.js — no config needed

3. Add env vars in Vercel

In the import flow (or under Project → Settings → Environment Variables), add:

Name	Value
`ANTHROPIC_API_KEY`	your Anthropic key
`VOYAGE_API_KEY`	your Voyage key

4. Deploy

Click Deploy. First build takes ~90 seconds. Subsequent deploys ship in ~30 seconds on every git push.

Project structure

chunk-lab/
├── app/
│   ├── api/chunk/route.ts      # The full RAG pipeline endpoint
│   ├── globals.css              # Tailwind + base styles
│   ├── layout.tsx               # Root layout + metadata
│   └── page.tsx                 # The UI
├── lib/
│   ├── chunking.ts              # 3 chunking strategies
│   ├── embeddings.ts            # Voyage AI embedding calls
│   ├── samples.ts               # Demo documents
│   └── vectorStore.ts           # In-memory cosine similarity
├── .env.example                 # Template for env vars
├── next.config.js
├── package.json
├── postcss.config.js
├── tailwind.config.js
├── tsconfig.json
└── README.md

Cost notes

For typical usage on the demo:

Voyage embeddings: ~free. Each run embeds ~10-20 chunks + 1 query. 200M tokens/month free tier is millions of runs.
Claude: ~$0.001 per run (Haiku is cheap, prompt is small).

If you embed Chunk Lab in a portfolio and it gets 1,000 visits with average 3 runs each → ~$3 in Claude costs. Budget accordingly.

Why no Pinecone or ChromaDB?

The honest answer: for demonstrating chunking effects, an in-memory cosine similarity is identical to what a vector DB does — minus the indexing, persistence, and clustering layers that matter at scale.

Adding Pinecone would have meant: another API key, another account, slower setup, and zero new insight for the demo's purpose. The decision is documented in lib/vectorStore.ts.

If you want to see Chunk Lab v2 with real Pinecone, swap the search() function. The interface stays the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chunk Lab

What it does

Stack

Local setup

1. Clone and install

2. Get API keys

3. Set env vars

4. Run

Deploy to Vercel

1. Push to GitHub

2. Import to Vercel

3. Add env vars in Vercel

4. Deploy

Project structure

Cost notes

Why no Pinecone or ChromaDB?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
lib		lib
.gitignore		.gitignore
README.md		README.md
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Chunk Lab

What it does

Stack

Local setup

1. Clone and install

2. Get API keys

3. Set env vars

4. Run

Deploy to Vercel

1. Push to GitHub

2. Import to Vercel

3. Add env vars in Vercel

4. Deploy

Project structure

Cost notes

Why no Pinecone or ChromaDB?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages