MultiPDF - Personal Assistant is an interactive document-querying application built with Python, LangChain, Streamlit, and Hugging Face.
It helps users upload multiple PDFs, ask conversational questions, and instantly retrieve accurate answers with exact source citations.
Deployed using Streamlit Cloud: personalpdfreader.streamlit.app/
- Upload and process multiple PDF documents simultaneously.
- Ask follow-up questions
- Advanced Document Retrieval:
- Uses MMR (Maximal Marginal Relevance) to cast a wide net and balance similarity with diversity, ensuring small documents (like resumes) don't get buried by big PDFs.
- Automatically stamps text chunks with their source filenames to prevent the LLM from crossing context streams.
- Transparent Sourcing: Appends a clean HTML list to the bottom of the bot's answers, showing the exact document snippets it used.
- Custom Prompting: Overrides default LangChain memory poisoning to work flawlessly with open-weight instruction models.
- Python 3.12
- LangChain
- Streamlit
- FAISS
- Hugging Face Inference API
- Sentence-Transformers
- PyPDF2
- Clone the repository
git clone https://github.com/yourUsername/Personal_PDF_Chatbot.git
- Create and activate a virtual environment
python3 -m venv venv source venv/bin/activate -> for Mac venv\Scripts\activate -> for Windows
- Set up your environment variables
Make a .env file and put your HUGGINGFACE_HUB_API_TOKEN= in there - Install dependencies
pip install -r requirements.txt
- Start the app via Streamlit
streamlit run app.py