DocBlock is a Retrieval-Augmented Generation (RAG) app that turns any document into a chat partner. You upload a file, it gets chunked, embedded, and indexed into a private vector collection — then every answer the LLM gives is grounded in retrieved excerpts from your document, with page-level citations.
The UI is deliberately loud: thick black strokes, candy-bright blocks, hard offset shadows, and mono type. It's designed to look like a sticker pack, not a chatbot.
Why "neo-brutalist"? Most AI products lean glassy and polite. DocBlock leans the other way — chunky borders, raw color blocks, sharp shadows. The interface should feel as physical as the document you just dropped into it.
| Bring-your-own document | Upload any PDF or .txt file (≤ 15 MB) |
| Per-document isolation | Every upload gets its own Qdrant collection — chats never leak across files |
| Grounded answers | The LLM is forced to refuse questions it can't answer from the retrieved context |
| Page-aware citations | Each answer expands to show exact chunks with page numbers |
| Persistent session | Refresh-safe via localStorage |
| Snappy UI | 3px strokes, hard shadows, hover-tilt on every interactive surface |
┌─────────────────────────┐
│ Upload (PDF / TXT) │
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ Page-level loader │ WebPDFLoader / inline text
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ Recursive chunking │ size 1100, overlap 180
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ HuggingFace embeddings │ all-MiniLM-L6-v2 · 384-d
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ Qdrant collection │ docblock_<sessionId>
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ Question ─► top-k=6 │
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ Groq · Llama 3.3 70B │ temperature 0.15
└────────────┬────────────┘
▼
┌─────────────────────────┐
│ Answer + Citations │
└─────────────────────────┘
| Frontend | Next.js 16 (App Router) · React 19 · Tailwind v4 · custom brutalist CSS |
| Backend | Next.js Route Handlers (Node runtime) |
| Vector DB | Qdrant Cloud (free tier) |
| Embeddings | HuggingFace Inference API · sentence-transformers/all-MiniLM-L6-v2 |
| LLM | Groq · llama-3.3-70b-versatile |
| RAG glue | LangChain (community, qdrant, textsplitters) |
| PDF parsing | WebPDFLoader — browser-compatible, serverless-safe |
| Hosting | Vercel |
DocBlock uses LangChain's RecursiveCharacterTextSplitter with a deliberately
sentence-aware separator chain.
{
chunkSize: 1100,
chunkOverlap: 180,
separators: ["\n\n", "\n", ". ", "? ", "! ", "; ", ", ", " ", ""]
}The splitter walks down that list in order — paragraphs first, then line breaks, then sentence terminators, then clauses, then words, and only falls back to character splits as a last resort. This keeps natural prose units intact whenever the chunk size allows.
After splitting, every chunk is:
- Trimmed and dropped if shorter than 24 characters — those are almost always page numbers, footers, or splitter artefacts that hurt retrieval recall.
- Annotated with
{ source, page, chunkIndex, charCount, wordCount }. - Page-tagged so citations can point at a real page in the source PDF.
The 1100/180 ratio (~16% overlap) is the compromise: large enough to keep ideas spanning a chunk boundary recoverable, small enough to avoid pulling the same content under multiple chunk IDs at retrieval time.
The chunking config is also returned in the /api/ingest response under a
strategy key, which makes A/B testing different splitters trivial.
- Accepts a single file via
multipart/form-data(field namefile). - Validates: ≤ 15 MB, MIME
application/pdfortext/plain. - PDFs →
WebPDFLoaderwithsplitPages: trueso page numbers survive. - TXT → wrapped as a single
Documentwithpage: 1. - Documents are chunked, filtered, and embedded.
- A new
sessionId(10-char nanoid) gets its own collectiondocblock_<sessionId>in Qdrant.
Example response
{
"sessionId": "abc123",
"fileName": "paper.pdf",
"pages": 12,
"chunks": 42,
"strategy": {
"splitter": "recursive-character",
"chunkSize": 1100,
"chunkOverlap": 180
}
}Request shape
{
"sessionId": "abc123",
"question": "What is this paper about?",
"history": [{ "role": "user", "content": "..." }]
}The route then:
- Opens the per-session Qdrant collection.
- Runs a top-
k = 6similarity search using the same embedding model used at index time. - Formats retrieved excerpts as
[excerpt N | page X | chunk #Y]. - Builds the prompt — strict system prompt + last 6 turns + user question.
- Calls Groq's
llama-3.3-70b-versatileattemperature: 0.15. - Returns the answer plus structured citations (page, chunk index, 240-char snippet).
The system prompt explicitly instructs the model to:
- treat the supplied excerpts as the only source of truth;
- reply
"I couldn't find that in the document."when the excerpts don't cover the question; - cite pages inline as
(p. N); - never invent facts, numbers, names, or quotes.
Combined with the low temperature, this keeps responses tightly bound to the document.
app/
├── layout.js ← fonts + metadata
├── page.js ← brutalist UI (upload + chat)
├── globals.css ← design tokens (palette, shadows, animations)
└── api/
├── ingest/route.js ← parse → chunk → embed → index
└── chat/route.js ← retrieve → ground → generate
lib/
└── rag.js ← embedder + Qdrant helpers
public/ ← static assets
.env.local ← API keys (gitignored)
git clone https://github.com/RAJVEER42/notebook-LLM.git
cd notebook-LLM
npm install --legacy-peer-depsCreate .env.local at the project root:
GROQ_API_KEY=your_groq_key
HUGGINGFACE_API_KEY=your_hf_inference_key
QDRANT_URL=https://your-cluster.qdrant.io
QDRANT_API_KEY=your_qdrant_api_key| Service | Where to get it |
|---|---|
| Groq API key | https://console.groq.com/keys |
| HuggingFace token | https://huggingface.co/settings/tokens (needs Inference API access) |
| Qdrant cluster | https://cloud.qdrant.io/ (free tier works) |
npm run devOpen http://localhost:3000 and drop a PDF.
DocBlock is built for one-click Vercel deploys.
- Push this repo to GitHub.
- Import it into Vercel (auto-detects Next.js).
- Add the four env vars above in Settings → Environment Variables.
- Deploy.
Serverless notes
/api/ingestruns withmaxDuration: 60,/api/chatwithmaxDuration: 300, both on the Node runtime. PDF parsing usesWebPDFLoaderso there are no native dependencies — it just works inside Vercel's serverless environment.
The brutalist look is enforced via a small set of CSS tokens:
| Token | Value | Used for |
|---|---|---|
| Stroke | 3px solid #000 |
Every interactive surface |
| Shadow | 6px 6px 0 0 #000 |
Resting elevation |
| Shadow (lift) | 10px 10px 0 0 #000 |
Hover state |
| Pink | #ff5da2 |
Primary actions, accents |
| Yellow | #ffd23f |
Header, active blocks |
| Blue | #4d80ff |
User messages, CTAs |
| Green | #36d399 |
Success, "live" indicators |
| Lilac | #c084fc |
Tertiary accents |
| Paper | #fff8e7 |
Background |
| Display font | Space Grotesk | Headings, body |
| Mono font | JetBrains Mono | Inputs, labels, tags |
Interaction rules:
- Hover →
translate(-2px, -2px)+ thicker shadow + slight tilt - Active →
translate(2px, 2px)+ tight shadow (the block "sinks in") - All transitions in 140–160 ms for a snappy, physical feel
- Multi-document chat (query across several uploaded files)
- Streaming responses (token-by-token)
- Highlight the cited region inside an inline PDF preview
- Export chat as Markdown / PDF
- Hybrid retrieval (BM25 + dense) for keyword-heavy docs
- Re-ranking with a cross-encoder
Built as a learning project — feel free to fork, remix, and ship your own version.
