Chat with your PDFs using the power of LangChain, OpenAI, and FAISS โ all wrapped in a slick Streamlit interface.
This app lets you upload multiple PDFs and ask natural language questions about their content. It uses semantic search and a conversational AI model (GPT-3.5/4) to retrieve and answer based on your document context.
Upload PDFs โ Process โ Chat in natural language
-
You upload one or more PDFs through the sidebar of the app. These could be reports, manuals, research papersโanything.
-
The app reads all the text from those PDFs using a PDF parser. This raw text might be hundreds of lines long.
-
It then splits the text into smaller, overlapping chunks (like paragraphs), so theyโre easier for the AI to handle. Think of this as breaking a book into pages.
-
Each chunk is converted into a vectorโa mathematical representation that captures the meaning of the text. This is done using OpenAIโs embedding model.
-
All these vectors are stored in a FAISS vector database, which acts like a super-fast "search engine for meaning."
-
Now, when you ask a question, like:
โWhat findings are mentioned in the scan report?โ
your question is also converted into a vector.
-
The app searches the FAISS database for chunks that are semantically similar (even if the words donโt exactly match).
-
The most relevant chunks are passed to ChatGPT, which reads them along with your question and responds intelligentlyโlike itโs read your documents.
-
The conversation is remembered, so you can ask follow-up questions naturally.
- ๐งพ Upload and chat with multiple PDFs
- โก Built with LangChain + FAISS + OpenAI
- ๐ง Remembers previous questions in the chat
- ๐ฏ Retrieves semantic matches from documents
- ๐ฌ Clean Streamlit chat UI with custom templates
โโโ main.py # Main Streamlit app
โโโ htmlTemplates.py # Custom HTML & CSS for chatbot UI
โโโ .env # OpenAI API key stored securely
โโโ requirements.txt # Python dependencies| Component | Role |
|---|---|
| Streamlit | Frontend framework |
| PyPDF2 | Extracts text from uploaded PDFs |
| LangChain | Orchestrates LLM + retrieval + memory |
| OpenAI | Provides embeddings + GPT model responses |
| FAISS | Fast semantic search on text embeddings |
| dotenv | Loads API keys securely |
- Clone the repo
git clone https://github.com/yourusername/pdf-chat-ai.git
cd pdf-chat-ai- Create
.envfile
OPENAI_API_KEY=your_openai_key- Install dependencies
pip install -r requirements.txt- Run the app
streamlit run main.pySee requirements.txt or install manually:
streamlit
PyPDF2
langchain
openai
faiss-cpu
python-dotenv- Go to the sidebar and upload PDFs
- Click Process
- Ask any question about the documents like:
- "What abnormalities are found?"
- "Summarize the second report"
- "What is the diagnosis?"
.envis used to protect your OpenAI key- API calls are handled server-side in Streamlit
- Add source highlighting and chunk citations
- Add document summarization button
- Support scanned OCR PDFs (e.g., with Tesseract)
- Integrate Whisper for audio-to-text documents
MIT License
