RAG Lab introduces a closed-loop retrieval intelligence framework that augments conventional RAG pipelines with post-hoc evaluation, causal diagnosis, and optimization guidance.
A modular Streamlit app to compare seven Retrieval-Augmented Generation strategies side-by-side:
| Mode | What it does |
|---|---|
| No RAG | Direct LLM call — no retrieval |
| Vector RAG | Hybrid vector + BM25 search with optional cross-encoder reranking |
| Graph RAG | Knowledge graph entity traversal (configurable 1–3 hops) |
| Vector + Graph RAG | Combined vector search + graph traversal for broader context |
| Agentic RAG | LangGraph ReAct agent that autonomously picks tools |
| FABLE RAG | Hierarchical bi-path retrieval — top-down semantic + bottom-up vector |
| MACER RAG | Multi-agent iterative context evolution (Retriever → Constructor → Reflector loop) |
Most RAG systems stop at generation. RAG Lab closes the loop: after every response it evaluates what was retrieved, diagnoses why quality was high or low, and surfaces concrete configuration changes to improve the next run — without any manual inspection.
The seven retrieval strategies serve as a comparison surface. Graph RAG excels at relational queries ("Who founded X?"), Vector RAG at semantic similarity, Agentic RAG at multi-step reasoning, and MACER at progressively refining context when one pass isn't enough. The closed-loop layer — LLM-as-a-Judge, noise analysis, efficiency metrics, and optimization suggestions — applies equally to all of them.
- Frontend: Streamlit
- LLM: Any OpenAI-compatible API (default: LM Studio at
localhost:1234) - Embeddings:
sentence-transformers/all-MiniLM-L6-v2(runs locally) - Reranker:
cross-encoder/ms-marco-MiniLM-L-6-v2 - Vector Store: ChromaDB (persistent, cosine similarity)
- Knowledge Graph: NetworkX (persisted to JSON)
- Agent: LangGraph ReAct with
search_vector,search_graph,verify_infotools - Clustering: scikit-learn k-means (FABLE hierarchy)
- Python: 3.13 required (3.14 breaks Pydantic v1 used by ChromaDB/LangChain)
# 1. Clone
git clone https://github.com/paloknath/rag_lab.git
cd rag_lab
# 2. Create venv with Python 3.13
python3.13 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install
pip install -r requirements.txt
# 4. Start your LLM server (e.g. LM Studio on port 1234)
# 5. Run
streamlit run app.py --server.headless trueOpen http://localhost:8501 in your browser.
- Upload PDF or TXT files via the sidebar and click Ingest
- Select a retrieval mode from the dropdown
- Chat — ask questions about your documents
- Compare — switch modes and ask the same question to see the difference
- Hybrid Alpha slider:
1.0= pure vector,0.0= pure BM25 - Reranking toggle: enable/disable cross-encoder reranking
- Graph Traversal Hops slider: 1–3 hops from matched entities (more hops = broader but potentially noisier context)
The Top-Down Branches slider controls how many cluster branches are explored during semantic hierarchy navigation. FABLE runs two retrieval paths simultaneously — a top-down semantic traversal from cluster summaries to leaf chunks, and a bottom-up vector search that gathers cluster context for matched documents.
The bi-path trace is shown in the Hierarchy Navigation Trace panel.
The Max Iterations slider caps the retriever-constructor-reflector loop. The loop exits early when the Reflector agent judges the accumulated context sufficient. The full iteration trace is shown in the Iteration Trace panel.
The agent's thinking process is shown in a live Agent Trace panel — you can see which tools it calls and why.
Enable the LLM-as-a-Judge toggle in the sidebar to automatically score each response on four metrics after every query:
| Metric | What it measures |
|---|---|
| Context Relevance | How relevant are the retrieved passages to the query? (1–5) |
| Context Sufficiency | Do the passages provide enough information to fully answer? (1–5) |
| Faithfulness | Is the answer grounded in the context without hallucination? (1–5) |
| Answer Relevance | Does the answer directly address the query? (1–5) |
Scores are colour-coded: 🟢 ≥ 4, 🟡 ≥ 3, 🔴 < 3. Adds ~3–6s latency.
Enable the Context Noise Analysis toggle to classify every retrieved chunk as relevant, partial, or irrelevant relative to the query. Results are shown after the answer is rendered (non-blocking) with colour-coded per-chunk highlights:
- ✅ Relevant — directly supports the query
⚠️ Partial — tangentially related or partially useful- ❌ Irrelevant — off-topic noise
The Noise Ratio (irrelevant chunks / total chunks) gives an at-a-glance signal of retrieval precision. Adds ~2–4s latency.
app.py → Streamlit UI (sidebar, chat, metrics, evaluation, noise analysis)
config.py → All configuration constants
ingestion.py → Document loading, chunking, embedding, KG + FABLE hierarchy
retrievers.py → Strategy pattern: BaseRetriever + 7 implementations
evaluation.py → LLM-as-a-Judge: 4-metric quality scoring
noise_analysis.py → Context Noise Analyzer: per-chunk relevance classification
- Strategy Pattern — all retrievers share a common
retrieve(query) -> RetrievalResultinterface, making it easy to add new strategies - Parent-Document Retrieval — child chunks (300 tokens) are indexed for search, but parent chunks (800 tokens) are returned to the LLM for richer context
- Knowledge Graph — the LLM extracts
(Subject, Predicate, Object)triplets during ingestion, stored in NetworkX and persisted to disk - FABLE Hierarchy — k-means cluster tree with LLM-generated summaries built at ingestion time and queried via cosine similarity at retrieval time
- MACER Loop — iterative context refinement: each cycle retrieves new chunks, extracts facts, and evaluates sufficiency before deciding to continue or stop
Edit config.py to change:
LLM_BASE_URL = "http://localhost:1234/v1" # Any OpenAI-compatible endpoint
LLM_MODEL_NAME = "your-model-name" # Model loaded in your LLM server
CHILD_CHUNK_SIZE = 300 # Tokens for search index
PARENT_CHUNK_SIZE = 800 # Tokens for LLM context
TOP_K_RETRIEVAL = 10 # Initial candidates
TOP_K_RERANK = 5 # After reranking
GRAPH_HOPS_DEFAULT = 1 # Graph traversal depth (1–3)
FABLE_NUM_LEVELS = 2 # Hierarchy depth
FABLE_TOP_K_BRANCHES = 3 # Branches explored top-down
MACER_MAX_ITERATIONS = 3 # Max iterative refinement loopsMIT