Skip to content

anushkamohan18/AI-Chat_Pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

11 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“š AI PDF Chatbot

Chat with your PDFs using the power of LangChain, OpenAI, and FAISS โ€” all wrapped in a slick Streamlit interface.

This app lets you upload multiple PDFs and ask natural language questions about their content. It uses semantic search and a conversational AI model (GPT-3.5/4) to retrieve and answer based on your document context.


๐Ÿš€ Demo

Demo Preview

Upload PDFs โ†’ Process โ†’ Chat in natural language


๐Ÿง  How It Works

  1. You upload one or more PDFs through the sidebar of the app. These could be reports, manuals, research papersโ€”anything.

  2. The app reads all the text from those PDFs using a PDF parser. This raw text might be hundreds of lines long.

  3. It then splits the text into smaller, overlapping chunks (like paragraphs), so theyโ€™re easier for the AI to handle. Think of this as breaking a book into pages.

  4. Each chunk is converted into a vectorโ€”a mathematical representation that captures the meaning of the text. This is done using OpenAIโ€™s embedding model.

  5. All these vectors are stored in a FAISS vector database, which acts like a super-fast "search engine for meaning."

  6. Now, when you ask a question, like:

    โ€œWhat findings are mentioned in the scan report?โ€

    your question is also converted into a vector.

  7. The app searches the FAISS database for chunks that are semantically similar (even if the words donโ€™t exactly match).

  8. The most relevant chunks are passed to ChatGPT, which reads them along with your question and responds intelligentlyโ€”like itโ€™s read your documents.

  9. The conversation is remembered, so you can ask follow-up questions naturally.


๐Ÿ› ๏ธ Features

  • ๐Ÿงพ Upload and chat with multiple PDFs
  • โšก Built with LangChain + FAISS + OpenAI
  • ๐Ÿง  Remembers previous questions in the chat
  • ๐ŸŽฏ Retrieves semantic matches from documents
  • ๐Ÿ’ฌ Clean Streamlit chat UI with custom templates

๐Ÿ“‚ Project Structure

โ”œโ”€โ”€ main.py                # Main Streamlit app
โ”œโ”€โ”€ htmlTemplates.py       # Custom HTML & CSS for chatbot UI
โ”œโ”€โ”€ .env                   # OpenAI API key stored securely
โ”œโ”€โ”€ requirements.txt       # Python dependencies

๐Ÿง‘โ€๐Ÿ’ป Tech Stack

Component Role
Streamlit Frontend framework
PyPDF2 Extracts text from uploaded PDFs
LangChain Orchestrates LLM + retrieval + memory
OpenAI Provides embeddings + GPT model responses
FAISS Fast semantic search on text embeddings
dotenv Loads API keys securely


โš™๏ธ Installation

  1. Clone the repo
git clone https://github.com/yourusername/pdf-chat-ai.git
cd pdf-chat-ai
  1. Create .env file
OPENAI_API_KEY=your_openai_key
  1. Install dependencies
pip install -r requirements.txt
  1. Run the app
streamlit run main.py

๐Ÿ“ฅ Requirements

See requirements.txt or install manually:

streamlit
PyPDF2
langchain
openai
faiss-cpu
python-dotenv

โ“Usage

  1. Go to the sidebar and upload PDFs
  2. Click Process
  3. Ask any question about the documents like:
    • "What abnormalities are found?"
    • "Summarize the second report"
    • "What is the diagnosis?"

๐Ÿ” Security

  • .env is used to protect your OpenAI key
  • API calls are handled server-side in Streamlit

๐Ÿ“Œ To-Do / Future Enhancements

  • Add source highlighting and chunk citations
  • Add document summarization button
  • Support scanned OCR PDFs (e.g., with Tesseract)
  • Integrate Whisper for audio-to-text documents

๐Ÿ“„ License

MIT License


๐Ÿ™Œ Acknowledgements

About

AI-Chat_Pdf is a Python project that lets users interact with PDF files through an AI chat interface, making it easy to extract and understand information from documents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages