Chat with your PDFs like they’re alive — upload lecture notes, textbooks, resumes, or any PDF, and ask questions in natural language. The app retrieves context from your documents and combines it with Gemini’s reasoning to answer clearly.
- 📄 Upload and process multiple PDFs.
- ✂️ Smart text chunking for long documents.
- 🧠 Vector embeddings with FAISS for semantic search.
- 🤖 Google Gemini integration for natural QnA and summarization.
- 📝 Document summaries generated automatically.
- 💬 Conversation memory to keep context from past questions.
- 📥 Export your entire chat history + document summaries as JSON.
- Frontend/UI: Streamlit
- Document parsing: PyPDF2
- Chunking:
RecursiveCharacterTextSplitter(LangChain) - Embeddings: SpaCy (
en_core_web_sm) - Vector DB: FAISS
- LLM: Google Gemini (
gemini-1.5-flash) - Config & Env: python-dotenv
- Persistence: JSON export
┌───────────────┐
│ PDF(s) │
└───────┬───────┘
│ Extract (PyPDF2)
▼
┌────────────────────┐
│ Text Splitter │
│ (chunk_size=1000) │
└─────────┬──────────┘
│
▼
┌────────────────────┐
│ Embeddings (SpaCy) │
└─────────┬──────────┘
│
▼
┌────────────────────┐
│ Vector DB (FAISS) │
└─────────┬──────────┘
│ Retrieval (top-k)
▼
┌─────────────────────────┐
│ Gemini LLM │
│ - Uses context chunks │
│ - Falls back to GK if │
│ no match found │
└─────────┬──────────────┘
│
▼
┌────────────────────┐
│ Chat UI (Streamlit)│
│ + conversation mem │
└────────────────────┘
- Upload PDF → Extract text → Chunk it → Vectorize → Store in FAISS
- Ask question → Vectorize → Search in FAISS → Get context → Gemini generates answer
- If not found → fallback to general AI knowledge
- Chat + Export feature
-
Clone repo
git clone https://github.com/your-repo/pdf-qna-bot.git cd pdf-qna-bot -
Create virtual environment
python -m venv venv source venv/bin/activate # Mac/Linux venv\Scripts\activate # Windows
-
Install dependencies
pip install -r requirements.txt
-
Download SpaCy model
python -m spacy download en_core_web_sm
-
Set environment variable Create a
.envfile in project root:GEMINI_API_KEY=your_api_key_here -
Run app
streamlit run app.py
-
Upload one or more PDFs from the sidebar.
-
Click "Process Documents" — text is split, embedded, and stored in FAISS.
-
Ask any question in the chat box.
- If answer is in PDF → retrieved and answered using Gemini.
- If not found → Gemini provides a fallback general knowledge answer.
-
See summaries per document in the sidebar.
-
Export chat + summaries as JSON.
Made with ❤️🔥 in AI domain.