RAG Lab

RAG Lab introduces a closed-loop retrieval intelligence framework that augments conventional RAG pipelines with post-hoc evaluation, causal diagnosis, and optimization guidance.

A modular Streamlit app to compare seven Retrieval-Augmented Generation strategies side-by-side:

Mode	What it does
No RAG	Direct LLM call — no retrieval
Vector RAG	Hybrid vector + BM25 search with optional cross-encoder reranking
Graph RAG	Knowledge graph entity traversal (configurable 1–3 hops)
Vector + Graph RAG	Combined vector search + graph traversal for broader context
Agentic RAG	LangGraph ReAct agent that autonomously picks tools
FABLE RAG	Hierarchical bi-path retrieval — top-down semantic + bottom-up vector
MACER RAG	Multi-agent iterative context evolution (Retriever → Constructor → Reflector loop)

Why?

Most RAG systems stop at generation. RAG Lab closes the loop: after every response it evaluates what was retrieved, diagnoses why quality was high or low, and surfaces concrete configuration changes to improve the next run — without any manual inspection.

The seven retrieval strategies serve as a comparison surface. Graph RAG excels at relational queries ("Who founded X?"), Vector RAG at semantic similarity, Agentic RAG at multi-step reasoning, and MACER at progressively refining context when one pass isn't enough. The closed-loop layer — LLM-as-a-Judge, noise analysis, efficiency metrics, and optimization suggestions — applies equally to all of them.

Tech Stack

Frontend: Streamlit
LLM: Any OpenAI-compatible API (default: LM Studio at localhost:1234)
Embeddings: sentence-transformers/all-MiniLM-L6-v2 (runs locally)
Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2
Vector Store: ChromaDB (persistent, cosine similarity)
Knowledge Graph: NetworkX (persisted to JSON)
Agent: LangGraph ReAct with search_vector, search_graph, verify_info tools
Clustering: scikit-learn k-means (FABLE hierarchy)
Python: 3.13 required (3.14 breaks Pydantic v1 used by ChromaDB/LangChain)

Quick Start

# 1. Clone
git clone https://github.com/paloknath/rag_lab.git
cd rag_lab

# 2. Create venv with Python 3.13
python3.13 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3. Install
pip install -r requirements.txt

# 4. Start your LLM server (e.g. LM Studio on port 1234)

# 5. Run
streamlit run app.py --server.headless true

Open http://localhost:8501 in your browser.

Usage

Upload PDF or TXT files via the sidebar and click Ingest
Select a retrieval mode from the dropdown
Chat — ask questions about your documents
Compare — switch modes and ask the same question to see the difference

Vector RAG / Vector + Graph RAG / FABLE RAG / MACER RAG Controls

Hybrid Alpha slider: 1.0 = pure vector, 0.0 = pure BM25
Reranking toggle: enable/disable cross-encoder reranking

Graph RAG / Vector + Graph RAG / MACER RAG Controls

Graph Traversal Hops slider: 1–3 hops from matched entities (more hops = broader but potentially noisier context)

FABLE RAG

The Top-Down Branches slider controls how many cluster branches are explored during semantic hierarchy navigation. FABLE runs two retrieval paths simultaneously — a top-down semantic traversal from cluster summaries to leaf chunks, and a bottom-up vector search that gathers cluster context for matched documents.

The bi-path trace is shown in the Hierarchy Navigation Trace panel.

MACER RAG

The Max Iterations slider caps the retriever-constructor-reflector loop. The loop exits early when the Reflector agent judges the accumulated context sufficient. The full iteration trace is shown in the Iteration Trace panel.

Agentic RAG

The agent's thinking process is shown in a live Agent Trace panel — you can see which tools it calls and why.

Evaluation

LLM-as-a-Judge

Enable the LLM-as-a-Judge toggle in the sidebar to automatically score each response on four metrics after every query:

Metric	What it measures
Context Relevance	How relevant are the retrieved passages to the query? (1–5)
Context Sufficiency	Do the passages provide enough information to fully answer? (1–5)
Faithfulness	Is the answer grounded in the context without hallucination? (1–5)
Answer Relevance	Does the answer directly address the query? (1–5)

Scores are colour-coded: 🟢 ≥ 4, 🟡 ≥ 3, 🔴 < 3. Adds ~3–6s latency.

Context Noise Analysis

Enable the Context Noise Analysis toggle to classify every retrieved chunk as relevant, partial, or irrelevant relative to the query. Results are shown after the answer is rendered (non-blocking) with colour-coded per-chunk highlights:

✅ Relevant — directly supports the query
⚠️ Partial — tangentially related or partially useful
❌ Irrelevant — off-topic noise

The Noise Ratio (irrelevant chunks / total chunks) gives an at-a-glance signal of retrieval precision. Adds ~2–4s latency.

Architecture

app.py              → Streamlit UI (sidebar, chat, metrics, evaluation, noise analysis)
config.py           → All configuration constants
ingestion.py        → Document loading, chunking, embedding, KG + FABLE hierarchy
retrievers.py       → Strategy pattern: BaseRetriever + 7 implementations
evaluation.py       → LLM-as-a-Judge: 4-metric quality scoring
noise_analysis.py   → Context Noise Analyzer: per-chunk relevance classification

Key Design

Strategy Pattern — all retrievers share a common retrieve(query) -> RetrievalResult interface, making it easy to add new strategies
Parent-Document Retrieval — child chunks (300 tokens) are indexed for search, but parent chunks (800 tokens) are returned to the LLM for richer context
Knowledge Graph — the LLM extracts (Subject, Predicate, Object) triplets during ingestion, stored in NetworkX and persisted to disk
FABLE Hierarchy — k-means cluster tree with LLM-generated summaries built at ingestion time and queried via cosine similarity at retrieval time
MACER Loop — iterative context refinement: each cycle retrieves new chunks, extracts facts, and evaluates sufficiency before deciding to continue or stop

Configuration

Edit config.py to change:

LLM_BASE_URL = "http://localhost:1234/v1"   # Any OpenAI-compatible endpoint
LLM_MODEL_NAME = "your-model-name"           # Model loaded in your LLM server
CHILD_CHUNK_SIZE = 300                        # Tokens for search index
PARENT_CHUNK_SIZE = 800                       # Tokens for LLM context
TOP_K_RETRIEVAL = 10                          # Initial candidates
TOP_K_RERANK = 5                              # After reranking
GRAPH_HOPS_DEFAULT = 1                        # Graph traversal depth (1–3)
FABLE_NUM_LEVELS = 2                          # Hierarchy depth
FABLE_TOP_K_BRANCHES = 3                      # Branches explored top-down
MACER_MAX_ITERATIONS = 3                      # Max iterative refinement loops

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Lab

Why?

Tech Stack

Quick Start

Usage

Vector RAG / Vector + Graph RAG / FABLE RAG / MACER RAG Controls

Graph RAG / Vector + Graph RAG / MACER RAG Controls

FABLE RAG

MACER RAG

Agentic RAG

Evaluation

LLM-as-a-Judge

Context Noise Analysis

Architecture

Key Design

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
evaluation.py		evaluation.py
fluttering-conjuring-bonbon.md		fluttering-conjuring-bonbon.md
ingestion.py		ingestion.py
noise_analysis.py		noise_analysis.py
recommendations.py		recommendations.py
requirements.txt		requirements.txt
retrievers.py		retrievers.py

Folders and files

Latest commit

History

Repository files navigation

RAG Lab

Why?

Tech Stack

Quick Start

Usage

Vector RAG / Vector + Graph RAG / FABLE RAG / MACER RAG Controls

Graph RAG / Vector + Graph RAG / MACER RAG Controls

FABLE RAG

MACER RAG

Agentic RAG

Evaluation

LLM-as-a-Judge

Context Noise Analysis

Architecture

Key Design

Configuration

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages