A privacy-first, fully local RAG (Retrieval-Augmented Generation) system for querying your personal notes, PDFs, and Markdown files β powered by ChromaDB, SentenceTransformers, and Ollama.
| Feature | Details |
|---|---|
| π 100% local | No cloud APIs β your data never leaves your machine |
| π Multi-format ingestion | PDF, Markdown (.md), plain text (.txt) |
| π Hybrid search | BM25 + dense vector search fused with Reciprocal Rank Fusion (RRF) |
| π Query refinement loop | Iterative LLM-powered query rewriting for better retrieval |
| π§© Streamlit UI | Chat interface, one-off search, summarisation, and insights tabs |
| π» CLI | Full-featured command-line interface + REPL |
| π¦ Idempotent ingestion | Re-ingesting the same file never creates duplicate chunks |
| βοΈ Fully configurable | All parameters in .env or environment variables |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface β
β CLI (cli.py) Streamlit UI (app.py) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG Pipeline (src/rag_pipeline.py) β
β β
β ββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββ β
β β Ingestion β β Query Rewriter β β LLM Client β β
β β (ingestion.pyβ β(query_rewriter.py)β β (llm.py) β β
β ββββββββ¬ββββββββ ββββββββββ¬ββββββββββ βββββββββ¬ββββββββ β
β β β β β
β βΌ βΌ β β
β ββββββββββββββββ ββββββββββββββββββββ β β
β β Embeddings β β Hybrid Retriever βββββββββββββββ β
β β(embeddings.pyβ β (retrieval.py) β β
β ββββββββ¬ββββββββ ββββββββββ¬ββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β Vector Store β β
β β (vector_store.py) β β
β β ChromaDB β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Ingestion:
Document (PDF/MD/TXT) β Chunk β Embed (SentenceTransformers) β Store (ChromaDB)
Query:
Query β [Refine loop] β Embed β Hybrid Search (BM25 + Vector + RRF) β LLM β Answer
# Python 3.10+
python --version
# Install Ollama from https://ollama.com
ollama serve
ollama pull llama3 # or mistral, phi3, gemma2, etc.git clone https://github.com/YOUR_USERNAME/rag-knowledge-base
cd rag-knowledge-base
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env to set your preferred model and paths# Ingest a directory of notes
python cli.py ingest ./docs_sample
# Or a single file
python cli.py ingest ./my_notes/important.pdf# CLI one-shot
python cli.py query "What are the main components of a RAG system?"
# Streaming response
python cli.py query "Explain hybrid search" --stream
# Streamlit UI
streamlit run app.pyusage: rag-kb [-h] [--db DB] [--model MODEL] {ingest,query,summarise,insights,stats,repl} ...
Commands:
ingest Ingest a file or directory
query Ask a question (with optional streaming & refinement)
summarise Summarise a topic from the knowledge base
insights Generate insights from the knowledge base
stats Show knowledge base statistics
repl Interactive REPL mode
# Ingest
python cli.py ingest ./my_notes --no-recursive
# Query with options
python cli.py query "What is BM25?" --mode bm25 --stream
python cli.py query "Explain RAG" --no-refine
# Summarise
python cli.py summarise "data engineering best practices"
# Insights
python cli.py insights "machine learning"
# Stats
python cli.py stats
# Interactive REPL
python cli.py repl
# In REPL: /mode hybrid|semantic|bm25, /stats, /quitstreamlit run app.pyOpens at http://localhost:8501 with four tabs:
| Tab | Feature |
|---|---|
| π¬ Chat | Streaming Q&A with source attribution |
| π Search | Raw hybrid/semantic/BM25 search results |
| π Insights | AI-generated insights from your notes |
| π Summarise | Topic summarisation |
All settings can be set via .env or environment variables:
| Variable | Default | Description |
|---|---|---|
CHROMA_DB_PATH |
./chroma_db |
ChromaDB persistence directory |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
SentenceTransformer model |
EMBEDDING_DEVICE |
cpu |
cpu or cuda |
OLLAMA_MODEL |
llama3 |
Ollama model name |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
TOP_K_SEMANTIC |
10 |
Dense retrieval candidates |
TOP_K_BM25 |
10 |
BM25 retrieval candidates |
TOP_K_FINAL |
5 |
Final chunks passed to LLM |
RRF_K |
60 |
RRF smoothing constant |
CHUNK_SIZE |
512 |
Words per chunk |
CHUNK_OVERLAP |
64 |
Overlap between chunks |
MAX_REFINEMENT_LOOPS |
2 |
Query refinement iterations |
REFINEMENT_SCORE_THRESHOLD |
0.35 |
Minimum relevance to skip refinement |
pytest tests/ -v
pytest tests/ --cov=src --cov-report=term-missing| Format | Extension | Notes |
|---|---|---|
| Markdown | .md |
YAML front-matter is stripped |
| Plain text | .txt |
UTF-8 |
.pdf |
Text extraction via pdfplumber |
- Cross-encoder reranking β add a
ms-marco-MiniLMreranker after retrieval - HyDE β Hypothetical Document Embeddings for better open-domain QA
- Multi-modal β embed images from PDFs (CLIP embeddings)
- Conversation memory β multi-turn chat with context window management
- Document tagging β auto-tag chunks with topics for filtered search
- Evaluation harness β RAGAS-based automated quality scoring
- Web scraping ingestion β ingest URLs directly into the knowledge base
- Export β export Q&A sessions as Markdown reports
- Fork the repo
- Create a feature branch (
git checkout -b feat/cross-encoder-reranker) - Commit your changes (
git commit -m "feat: add cross-encoder reranker") - Push and open a PR
MIT β see LICENSE for details.
- ChromaDB β vector database
- SentenceTransformers β embedding models
- Ollama β local LLM inference
- rank-bm25 β BM25 implementation
- pdfplumber β PDF text extraction
- Streamlit β web UI framework