A production-quality document question-answering system built on a Retrieval-Augmented Generation (RAG) pipeline. Upload PDF, TXT, or Markdown files and ask natural-language questions — the system retrieves the most relevant passages and presents them with relevance scores and source citations.
Runs entirely locally with no API keys required. Optionally enable generative answers via OpenAI.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Streamlit UI │
│ ┌──────────────┐ ┌─────────────────────────────────┐ │
│ │ Sidebar │ │ Chat Interface │ │
│ │ - Upload │ │ User question │ │
│ │ - Stats │ │ │ │ │
│ │ - Settings │ │ ▼ │ │
│ │ - Clear │ │ Retrieved passages + scores │ │
│ └──────────────┘ └─────────────────────────────────┘ │
└────────────┬────────────────────┬───────────────────────┘
│ upload │ query
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ Ingestion Layer │ │ Retrieval Layer │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ PDF / TXT │ │ │ │ Embedding │ │
│ │ Loader │ │ │ │ Engine │ │
│ └──────┬───────┘ │ │ └──────┬───────┘ │
│ ▼ │ │ ▼ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ Recursive │ │ │ │ FAISS │ │
│ │ Chunker │ │ │ │ VectorStore │ │
│ └──────┬───────┘ │ │ └──────┬───────┘ │
│ ▼ │ │ ▼ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ Sentence │ │ │ │ Retriever │ │
│ │ Transformer │ │ │ │ (top-k + │ │
│ │ Embeddings │ │ │ │ scoring) │ │
│ └──────────────┘ │ │ └──────────────┘ │
└────────────────────┘ └────────┬───────────┘
│
▼
┌────────────────────┐
│ QA Chain │
│ Extractive mode: │
│ ranked passages │
│ Generative mode: │
│ LLM synthesis │
└────────────────────┘
- Multi-format ingestion — PDF (PyPDF2), plain text, and Markdown
- Recursive text chunking — paragraph-aware splitting with configurable overlap for context preservation
- Local embeddings — sentence-transformers
all-MiniLM-L6-v2(384-dim, runs on CPU, no API key) - FAISS vector store — persistent index with save/load to disk
- Similarity search — cosine similarity with configurable top-k and score threshold
- Extractive QA — surfaces the most relevant passages with scores and source citations (no LLM needed)
- Optional generative QA — toggle on OpenAI-powered answers when an API key is available
- Chat-style UI — conversational interface with expandable source passages
- Tested — unit tests for chunking, vector storage, retrieval, and answer generation
| Component | Technology |
|---|---|
| Web UI | Streamlit |
| Embeddings | sentence-transformers |
| Vector Store | FAISS (faiss-cpu) |
| PDF Parsing | PyPDF2 |
| LLM (optional) | OpenAI API |
| Testing | pytest |
| Language | Python 3.12 |
# Clone the repository
git clone https://github.com/yourusername/docqa-ai.git
cd docqa-ai
# Install dependencies
make install
# Run the application
make runThe app opens at http://localhost:8501. Upload a document via the sidebar and start asking questions.
- Python 3.12+
- ~500 MB disk space for the sentence-transformers model (downloaded on first run)
python3 -m pip install -r requirements.txtpython3 -m streamlit run app/main.pypython3 -m pytest tests/ -vSet your OpenAI API key to unlock LLM-powered answer synthesis:
export OPENAI_API_KEY="sk-..."
make runThen toggle "Use LLM for answers" in the sidebar.
docqa-ai/
├── app/
│ ├── main.py # Streamlit application
│ ├── config.py # Centralised settings (dataclasses)
│ ├── ingestion/
│ │ ├── loader.py # PDF/TXT/MD file loaders
│ │ ├── chunker.py # Recursive text chunking with overlap
│ │ └── embeddings.py # Sentence-transformer embedding engine
│ ├── retrieval/
│ │ ├── vectorstore.py # FAISS index management + persistence
│ │ └── retriever.py # Similarity search with score filtering
│ ├── qa/
│ │ ├── chain.py # Extractive + generative QA chain
│ │ └── prompts.py # Prompt templates for LLM mode
│ └── utils/
│ └── text.py # Unicode normalisation, whitespace cleaning
├── tests/
│ ├── test_chunker.py # Chunking strategy tests
│ ├── test_retriever.py # Vector store and retrieval tests
│ └── test_chain.py # QA chain and prompt formatting tests
├── sample_docs/
│ └── sample.txt # Sample document about RAG pipelines
├── requirements.txt
├── Makefile
└── .gitignore
-
Extractive-first approach: The default mode requires zero API keys. It retrieves and ranks passages, displaying them with relevance scores — this demonstrates the full RAG pipeline while being immediately usable.
-
Recursive chunking: Paragraph boundaries are preferred over fixed-size splits, preserving semantic coherence. Configurable overlap ensures context continuity across chunk boundaries.
-
Normalised cosine similarity: FAISS
IndexFlatIPwith L2-normalised vectors gives exact cosine similarity scores, making relevance scores interpretable (0-1 range). -
Singleton embedding model: The sentence-transformer model is loaded once via
lru_cacheand shared across the session, avoiding repeated 500MB model loads. -
Persistent vector store: The FAISS index and metadata are saved to disk after each upload, surviving Streamlit reruns and browser refreshes.
MIT