This repository contains a small retrieval-augmented generation (RAG) pipeline that I built to teach myself how modern LLM tooling works. The documents in the data/ directory (not included in this repo due to copyright reasons) are ingested, chunked, embedded into a Chroma vector database, and then queried through an Ollama-hosted Llama 3.2 model.
Although the data is personal, the code path demonstrates the full pipeline you would follow to build a grounded Q&A assistant over any small PDF/TXT corpus and optionally expose it via a Streamlit UI.
- Ingestion script that reads PDFs/TXT, splits them with LangChain text splitters, and stores embeddings in a persistent Chroma vector database.
- CLI query interface plus a Streamlit-powered GUI that reload the vector store, track recent Q&A history, and send grounded prompts to Llama 3.2 via Ollama.
- Pytest suite that validates ingestion, retrieval, and prompt construction logic with lightweight stubs.
- Clear separation between data preparation (
ingest.py), inference (query.py), and presentation (app.py).
| Path | Description |
|---|---|
data/ |
Place PDFs/TXT files here before running ingestion (not bundled in the repo). |
vectorstore/ |
Folder where the persisted Chroma DB is written; created after ingestion. |
src/ingest.py |
Orchestrates the ingestion pipeline. See method breakdown below. |
src/query.py |
Command-line interface for grounded, history-aware question answering. |
src/app.py |
Streamlit front-end that wraps the query pipeline, including session-level history. |
src/utils.py |
Placeholder for helper functions (currently unused). |
tests/test_ingest.py |
Pytest coverage for document loading, chunking, and vector store creation. |
tests/test_query.py |
Tests prompt formatting, model initialization, and the query loop. |
tests/test_retrieval.py |
Ensures retrieved snippets and sources are surfaced when querying. |
requirements.txt |
Python dependencies used across ingestion and query steps. |
load_documents(data_dir="./data"): Iterates through PDFs/TXT files, loads their contents using LangChain loaders, and keeps a running count for visibility.chunk_documents(documents): AppliesRecursiveCharacterTextSplitter(chunk size 1000, overlap 200) so downstream retrieval captures enough context while staying efficient.create_vecstore(chunks, persist_dir="./vectorstore"): Generates embeddings using HuggingFacesentence-transformers/all-MiniLM-L6-v2, builds a Chroma vector database, and persists it for reuse during querying.main(): High-level wrapper that calls the three steps in sequence, prints progress, and guides the user to runquery.pynext.
load_vecstore(persist_dir="./vectorstore"): Reloads the Chroma store with the same embedding function so similarity search returns vector-compatible results.init_llm(): Spins up an Ollama client pointed atllama3.2(temperature 0.1) for low-variance answers.format_docs(docs): Converts retrieved documents into a human-readable context string, including per-document metadata (source, page).create_prompt(context, question, history): Crafts instructions that force the LLM to stay grounded in retrieved context while also adding the last few Q&A turns for conversational continuity.query(vectorstore, llm, question, history, k=3): Implements the RAG loop—retrieve similar chunks, build a history-aware prompt, invoke the LLM, print the answer, and list sources with short excerpts.main(): CLI loop that keeps answering questions until the user typesquit/exit, appending successful exchanges to thehistorylist.
Streamlit app that caches the vector store/LLM, keeps conversational state in st.session_state.history, reuses the same retrieval + prompt-building logic from query.py, and renders answers with expandable source panels. Run it with streamlit run src/app.py.
-
Create & activate an environment
python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows
-
Install dependencies
pip install -r requirements.txt
-
Add your documents
- Place PDF/TXT files into
./data. In my case these were personal publications, so they were intentionally left out of the repository.
- Place PDF/TXT files into
-
Run ingestion
python src/ingest.py
This loads documents, chunks them, creates embeddings, and persists the Chroma vector store under
./vectorstore. -
Start querying
python src/query.py
Type questions at the prompt. The script will retrieve the top 3 relevant chunks, build a grounded prompt for Llama 3.2, and display both the answer and cited sources.
-
Optional: launch the Streamlit UI
streamlit run src/app.py
This provides a simple web interface for the same retrieval pipeline.
-
Run tests
pytest
Tests rely on stubs/mocks, so they can run without downloading large models.
- Make sure Ollama is installed locally and has the
llama3.2model pulled before runningquery.py. - Because this project is part of my personal portfolio, the bundled code is intentionally lightweight and showcases end-to-end comprehension rather than production hardening.
- Potential future improvements: swap in a GPU-backed embedding model (it's hardcoded for CPU at the moment), add document upload to the Streamlit app, or package the ingestion/query steps into a small API service.