📄 Research Paper Question Answering System (RAG)

A fully local, offline Retrieval-Augmented Generation (RAG) pipeline that lets you ask natural language questions over a collection of PDF research papers — with cited answers powered by a local LLM (no API key required).

🧠 What It Does

Ingests PDF research papers using PyMuPDF
Chunks each page into ~400-token pieces with overlap
Embeds chunks into 384-dimensional vectors using Sentence Transformers
Indexes them in a FAISS vector store for fast similarity search
Retrieves the top-4 most relevant passages for any query
Generates a grounded answer with source citations using a local Ollama LLM

🏗️ Architecture

PDF Papers
    │
    ▼
Text Extraction (PyMuPDF)
    │
    ▼
400-token Chunks (tiktoken + LangChain splitter)
    │
    ▼
Embeddings (sentence-transformers/all-MiniLM-L6-v2 · 384-dim)
    │
    ▼
FAISS Index (IndexFlatIP · cosine similarity)
    │
    ▼  ◄── User Question (embedded the same way)
Top-4 Retrieval
    │
    ▼
LLM (Ollama · llama3.2) ──► Answer + Citations

🛠️ Tech Stack

Component	Library / Tool
PDF parsing	PyMuPDF (`fitz`)
Tokenization	`tiktoken` (cl100k_base)
Text splitting	LangChain `RecursiveCharacterTextSplitter`
Embeddings	`sentence-transformers/all-MiniLM-L6-v2`
Vector store	FAISS (`IndexFlatIP`)
LLM	Ollama (`llama3.2` 3B — local, free, offline)
Orchestration	LangChain (`langchain-core`, `langchain-ollama`)
Interface	Jupyter Notebook

📋 Prerequisites

Python 3.11+ (via conda recommended)
Ollama installed and running
llama3.2 model pulled

⚡ Quick Start

1. Clone the repo

git clone https://github.com/<your-username>/RAG-Research-QA.git
cd RAG-Research-QA

2. Create a conda environment

conda create -n rag_env python=3.11 -y
conda activate rag_env
pip install -r requirements.txt

3. Install and start Ollama

# macOS
brew install ollama
brew services start ollama
ollama pull llama3.2

4. Add your papers

Drop any PDF research papers into the papers/ folder.

5. Open the notebook

jupyter notebook RAG_Pipeline.ipynb

Register the kernel if needed:

python -m ipykernel install --user --name rag_env --display-name "RAG Project (Python 3.11)"

📓 Notebook Walkthrough

Cell	Purpose
Cell 1	Install all dependencies
Cell 2	Configure paths, chunk size, model names
Cell 3	Extract text from all PDFs in `papers/`
Cell 4	Split pages into 400-token chunks
Cell 5	Generate embeddings (384-dim, CPU)
Cell 6	Build & save FAISS index to `data/`
Cell 6b	(Optional) Load existing index — skip re-embedding
Cell 7	Ask a question → get answer + citations

Typical workflow:

First time / adding new papers: Run Cells 2 → 3 → 4 → 5 → 6, then Cell 7
Returning session: Run Cell 6b (loads saved index), then Cell 7

📂 Project Structure

RAG/
├── RAG_Pipeline.ipynb   # Main notebook — full pipeline
├── requirements.txt     # Python dependencies
├── .env.example         # Environment variable template
├── papers/              # Drop your PDF papers here
│   ├── attention_is_all_you_need.pdf
│   ├── bert.pdf
│   ├── gpt3.pdf
│   ├── llama.pdf
│   └── rag_paper.pdf
└── data/                # Auto-generated (gitignored)
    ├── index.faiss      # FAISS vector index
    └── metadata.pkl     # Chunk metadata

⚙️ Configuration

All settings are in Cell 2 of the notebook:

PAPERS_DIR    = "papers"      # folder with your PDFs
CHUNK_SIZE    = 400           # tokens per chunk
CHUNK_OVERLAP = 50            # token overlap between chunks
TOP_K         = 4             # number of passages retrieved per query
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
OLLAMA_MODEL  = "llama3.2"    # any model you have pulled in Ollama

📈 Performance (approximate)

Metric	Value
Embedding model size	~90 MB
Embedding speed	~500 chunks/min on CPU
FAISS search latency	< 50 ms for 18K vectors
LLM response time	~1–2 min (llama3.2 on CPU)
Scales to	~10,000+ papers before needing ANN index

🔄 Scaling Up

This setup handles 200+ papers with no code changes — just drop PDFs in and re-run Cells 3→6.

For 10,000+ papers, switch to an approximate index:

# In Cell 6, replace IndexFlatIP with:
quantizer = faiss.IndexFlatIP(dim)
index = faiss.IndexIVFFlat(quantizer, dim, 100)
index.train(embeddings)

📄 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
papers		papers
.env.example		.env.example
.gitignore		.gitignore
RAG_Pipeline.ipynb		RAG_Pipeline.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Research Paper Question Answering System (RAG)

🧠 What It Does

🏗️ Architecture

🛠️ Tech Stack

📋 Prerequisites

⚡ Quick Start

1. Clone the repo

2. Create a conda environment

3. Install and start Ollama

4. Add your papers

5. Open the notebook

📓 Notebook Walkthrough

📂 Project Structure

⚙️ Configuration

📈 Performance (approximate)

🔄 Scaling Up

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 Research Paper Question Answering System (RAG)

🧠 What It Does

🏗️ Architecture

🛠️ Tech Stack

📋 Prerequisites

⚡ Quick Start

1. Clone the repo

2. Create a conda environment

3. Install and start Ollama

4. Add your papers

5. Open the notebook

📓 Notebook Walkthrough

📂 Project Structure

⚙️ Configuration

📈 Performance (approximate)

🔄 Scaling Up

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages