tiny-rag-lab is a learning-first RAG engine/laboratory for understanding how
classic retrieval-augmented generation works end to end.
The goal is to keep the RAG lifecycle visible: document loading, text normalization, chunking, metadata, embeddings, local vector search, retrieval, prompt assembly, answer generation, citations, evaluation, and failure inspection.
Phase 1, Phase 1.5, Phase 1.6, Phase 1.7, and Phase 1.8 are complete.
- Phase 1 — Naive Classic RAG: full pipeline from corpus to grounded answers with citations
- Phase 1.5 — Retrieval Mechanics: BM25 keyword retrieval, hybrid retrieval, and retriever comparison flags
- Phase 1.6 — Evaluation Harness: retrieval quality metrics (
rag eval) against a prepared QA set - Phase 1.7 — Observability And Debugging: retrieve/ask traces, stage latency, and optional JSON trace output
- Phase 1.8 — RAG Failure Lab: curated failure cases and
rag diagnosefor baseline vs. intervention retrieval
Completed phase contracts:
- Phase index
- Phase 1 spec · taskboard
- Phase 1.5 spec · taskboard
- Phase 1.6 spec · taskboard
- Phase 1.7 spec · taskboard
- Phase 1.8 spec · taskboard
Phase 1 delivers a minimal but complete CLI-first RAG baseline:
local corpus -> documents -> normalized text -> chunks -> embeddings
-> local vector index -> query embedding -> cosine retrieval
-> grounded prompt -> generated answer with citations
Key decisions:
- Python implementation
argparseCLI- primary corpus: IBM
watsonxDocsQA - local embeddings:
sentence-transformers/all-MiniLM-L6-v2 - OpenAI-compatible online generation for real answers
- fake embedder and fake generator for tests
- local index files under
.tiny-rag/index/ - no vector database in Phase 1
- no LangChain/LlamaIndex/Haystack wrapper in Phase 1
Phase 1.5 adds inspectable retrieval strategies to compare dense vector search, BM25 keyword search, and hybrid retrieval with Reciprocal Rank Fusion.
query + index -> dense retrieval | BM25 retrieval -> optional RRF fusion
-> ranked chunks and eval reports tagged with retriever=dense|bm25|hybrid
Phase 1.6 adds a rag eval command that measures retrieval quality against the
prepared qa.jsonl evaluation set. Four deterministic metrics are reported:
hit rate @ k, MRR, context precision, and context recall.
qa.jsonl + index -> embed questions -> retrieve top-k -> compare to gold docs
-> hit rate, MRR, context precision, context recall
Phase 1.7 adds trace records and human-readable trace output for retrieve and ask flows. Traces expose the retriever, top-k, ranked chunks, scores, citations, prompt/answer context, and stage latency.
query + retrieval/ask flow -> trace fields -> readable trace and optional JSON
Phase 1.8 adds a failure lab for curated retrieval failure scenarios. The
rag diagnose command compares each case's baseline and intervention retrieval
config, labels heuristic failure modes, and reports whether failures were
confirmed, fixed, moved, or unchanged.
failure cases + index -> baseline retrieval + intervention retrieval
-> failure labels, metrics, and diagnosis report
rag index --corpus PATH --index-dir .tiny-rag/index --chunk-size 800 --chunk-overlap 120
rag retrieve "question text" --index-dir .tiny-rag/index --top-k 5 --retriever dense
rag retrieve "question text" --index-dir .tiny-rag/index --top-k 5 --retriever bm25
rag retrieve "question text" --index-dir .tiny-rag/index --top-k 5 --retriever hybrid
rag ask "question text" --index-dir .tiny-rag/index --top-k 5
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --top-k 5 --retriever dense
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --top-k 5 --retriever bm25
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --top-k 5 --retriever hybrid
rag diagnose --cases-file tests/fixtures/failure/cases.jsonl --index-dir .tiny-rag/indexHelp is available for each command:
uv run rag --help
uv run rag index --help
uv run rag retrieve --help
uv run rag ask --help
uv run rag eval --help
uv run rag diagnose --helpInstall/sync dependencies:
uv sync --group devRun tests:
uv run pytest --tb=short -qPrepare the primary corpus after dependencies are installed:
uv run python scripts/prepare_watsonx_docsqa.py --inspect
uv run python scripts/prepare_watsonx_docsqa.py --output-dir corpus/watsonx-docsqaGenerated corpora and indexes are intentionally ignored by git:
corpus/
.tiny-rag/
- Proposal: project purpose, philosophy, and non-goals
- Roadmap: directional phase sequence
- Architecture: conceptual RAG planes and boundaries
- Agent guidelines: collaboration, review, and handoff workflow
- File structure: quick repository map
- Phase docs: active phase pointer and phase contracts
For implementation work, the phase spec and taskboard under docs/phases/ are
the source of truth.