RAG Studio

A web-based Retrieval-Augmented Generation (RAG) system that allows users to upload documents, ask questions, and receive answers grounded explicitly in retrieved source content.

Architecture

What is RAG Studio?

RAG Studio demonstrates the core RAG pipeline:

Upload PDF or TXT documents
Index documents using semantic embeddings
Query documents with natural language questions
Generate answers grounded in retrieved document chunks
Attribute answers by showing the exact source chunks used

The system emphasizes source transparency by always showing which document chunks were used to generate each answer.

Features

Multi-Query Retrieval - Automatically rewrites queries in multiple ways for better document matching
Smart Similarity Filtering - Only applies threshold when more than 5 documents indexed
Strict Document Grounding - Answers are generated only when relevant document context exists
Multi-document Support - Upload and manage multiple documents
Document Management - Delete documents and their embeddings
Source Attribution - Shows which documents were used for each answer
Semantic Search - ChromaDB vector database for retrieval
Answer Generation - Google Gemini for grounded responses
Clean UI - Streamlit chat interface with left-right layout

Multi-Query Vector Search

The system improves retrieval by:

Query Analysis - LLM analyzes user intent
Query Rewriting - Generates 2-3 alternative phrasings
Multi-Vector Search - Searches with all query variations
Result Aggregation - Deduplicates and ranks by best scores
Quality Boost - Documents matching multiple queries rank higher

Grounding Enforcement

The system will not answer questions if no relevant context is found (when more than 5 documents are indexed).

When you ask a question:

System rewrites your query into multiple perspectives
Searches uploaded documents with all variations
If more than 5 documents: Applies similarity threshold filtering (0.5)
If 5 or fewer documents: Returns all matches (no threshold)
If no relevant content found (when threshold applies), it explicitly says so
LLM is only called when relevant context exists
All answers include source attribution

System Components

Backend (FastAPI):

/upload - Accept and store document files
/index - Extract text, chunk, embed, and store in vector database
/query - Retrieve relevant chunks and generate grounded answers
/documents - List all indexed documents
/documents/{filename} - Delete a document and its embeddings

Components:

SentenceTransformers for embeddings (local model)
ChromaDB for vector storage (persistent)
Google Gemini for answer generation
Streamlit frontend with chat interface

Setup Instructions

Prerequisites

Python 3.8 or higher
Google Gemini API key - Get one at https://aistudio.google.com/api-keys

Installation

Clone the repository:

git clone https://github.com/nkmohit/rag-studio.git
cd rag-studio

Install dependencies:

pip install -r requirements.txt

Download the embedding model:

python download_model.py

This will download the sentence-transformers model to ./utils/models/retriever/

Create environment file:

echo "GEMINI_API_KEY=your_api_key_here" > .env

Replace your_api_key_here with your actual Gemini API key.

Running the Application

Start the backend server:

uvicorn main:app --reload

Backend will run on http://localhost:8000

In a separate terminal, start the Streamlit UI:

streamlit run streamlit_app.py

UI will open automatically in your browser at http://localhost:8501

Usage

Upload Documents - Click "Upload New Document" and select PDF or TXT files
Manage Documents - View all indexed documents in the left panel
Ask Questions - Type questions in the chat interface
View Sources - See which documents and chunks were used for each answer

Design Principles

Multi-Query Retrieval - Rewrite queries for comprehensive document coverage
Strict Grounding - Never answer without relevant document context
Similarity Filtering - Only retrieve chunks above threshold
Source Attribution - Always show which documents were used
Clear Separation of Concerns - Loader, embeddings, generation are independent
Explicit Over Implicit - Clear error messages when context is missing
No Hallucination - LLM instructed to answer only from provided context

License

MIT License - See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
utils/rag		utils/rag
.env.example		.env.example
.gitignore		.gitignore
Architecture.png		Architecture.png
LICENSE		LICENSE
README.md		README.md
download_model.py		download_model.py
main.py		main.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Studio

Architecture

What is RAG Studio?

Features

Multi-Query Vector Search

Grounding Enforcement

System Components

Setup Instructions

Prerequisites

Installation

Running the Application

Usage

Design Principles

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Studio

Architecture

What is RAG Studio?

Features

Multi-Query Vector Search

Grounding Enforcement

System Components

Setup Instructions

Prerequisites

Installation

Running the Application

Usage

Design Principles

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages