InsightPDF - RAG-Powered AI Chatbot

Project Overview

A PDF-based Retrieval-Augmented Generation (RAG) chatbot allows users to interact with PDF documents intelligently. Instead of manually searching through pages, you can simply ask questions, and the chatbot extracts relevant information from your PDFs, providing accurate and context-aware answers instantly.

This chatbot built with Streamlit, LangChain, HuggingFace embeddings, FAISS, and Groq LLM.

Live Demo : (https://insightpdf---rag-powered-ai-chatbot.streamlit.app/)

Screenshots

Key Features

Context-Aware PDF Intelligence: Engineered for deep, targeted querying, allowing users to extract precise insights instantly without manual skimming or document searching.
Intelligent Text Processing: Utilizes automated extraction and recursive chunking logic to preserve document hierarchy and maximize retrieval accuracy.
High-Speed Retrieval with FAISS: Implements local vector storage for optimized similarity searches, ensuring near-instant access to relevant document segments.
Ultra-Fast Inference via Groq: Delivers real-time, grounded responses by leveraging Groq’s Tensor Streaming Processor (TSP) architecture for industry-leading, deterministic low latency.
Semantic Precision with HuggingFace: Employs state-of-the-art embedding models to generate high-fidelity semantic representations, ensuring superior search relevance.

Installation

Clone the Repository

git clone https://github.com/SruthiPuli/InsightPDF---RAG-Powered-AI-Chatbot.git
cd InsightPDF---RAG-Powered-AI-Chatbot

Optional: Create a Python Virtual Environment

To avoid package dependency issues, it is recommended to create a virtual environment before installing the required libraries. You can skip this step if you prefer installing packages globally.

# Create a virtual environment named 'my_venv'
python -m venv my_venv

# Activate the virtual environment
# On Windows
my_venv\Scripts\activate

# On macOS/Linux
source my_venv/bin/activate

Install Project Dependencies

Install all required dependencies using the following command:

# python packages
pip install -r requirements.txt

Usage

Once the setup is complete, start the Streamlit app by running:

# Run the chatbot
streamlit run app.py
# Open the URL in your browser (usually http://localhost:8501)

Upload any PDF file using the file uploader in the Streamlit interface.
Once the document is processed and indexed, start asking questions through the chat input.
The chatbot retrieves relevant context from the PDF and answers your queries in real time.

Configuration

Configure Environment Variables

Ensure your .env file is set up with your Groq API key:

# To access Groq LLM
GROQ_API_KEY="your_actual_api_key_here"

Then, in your Python script, load it like this:

# Python
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("GROQ_API_KEY")

Streamlit Deployment (Important)

If you deploy this application on Streamlit Cloud, do not use the .env file.

Go to your app dashboard on Streamlit Cloud.
Open Settings → Secrets.
Add your API key in the following format:

# Secrets
GROQ_API_KEY = "your_actual_api_key_here"

Tech Stack

Python – Core programming language used for application logic.
Streamlit – Interactive web framework for building the user interface.
LangChain – Provides end-to-end components for building Retrieval-Augmented Generation (RAG) pipelines, including document loading, text splitting, embedding integration, vector store management, and seamless LLM orchestration.
Groq LLM – Used for real-time response generation, leveraging Groq’s Tensor Streaming Processor (TSP) architecture to deliver ultra-fast, deterministic, low-latency inference for context-aware answers. It allows 14,000+ requests in a day for free.
HuggingFace Embeddings – Responsible for converting document text and user queries into semantic vector representations, enabling accurate similarity-based retrieval.
FAISS – High-performance vector database for efficient and fast similarity search over embedded document chunks.
PyPdfReader – Extracts and processes text from PDF documents.
Sentence-Transformers – Provides pre-trained embedding models; this project uses sentence-transformers/all-MiniLM-L6-v2 for lightweight, high-quality embeddings that balance speed and semantic accuracy.
Python-dotenv – Manages environment variables securely during local development.

Folder Structure

pdf-rag-chatbot/
├─ outputs/               # Images, Live Demo Video
├─ .gitattributes         # Tells Git how to handle files
├─ app.py                 # Main Streamlit app
├─ requirements.txt       # Python dependencies
├─ sample_pdf             # Sample Pdf to upload
├─ LICENSE                # MIT License
└─ README.md              # README File

Contributions

Contributions are welcome! If you’d like to improve this project, feel free to fork the repository, create a new branch, and submit a pull request. Bug reports, feature requests, and documentation improvements are all appreciated.

License

This project is licensed under the MIT License. If you fork or use this project, please give credit by mentioning or pinging me: Sruthi Pulipati (GitHub: SruthiPuli).

About

This project is solely developed by Sruthi Pulipati (GitHub: SruthiPuli).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InsightPDF - RAG-Powered AI Chatbot

Project Overview

Screenshots

Table of Contents

Key Features

Installation

Clone the Repository

Optional: Create a Python Virtual Environment

Install Project Dependencies

Usage

Configuration

Configure Environment Variables

Streamlit Deployment (Important)

Tech Stack

Folder Structure

Contributions

License

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
outputs		outputs
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
sample_pdf.pdf		sample_pdf.pdf

Folders and files

Latest commit

History

Repository files navigation

InsightPDF - RAG-Powered AI Chatbot

Project Overview

Screenshots

Table of Contents

Key Features

Installation

Clone the Repository

Optional: Create a Python Virtual Environment

Install Project Dependencies

Usage

Configuration

Configure Environment Variables

Streamlit Deployment (Important)

Tech Stack

Folder Structure

Contributions

License

About

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages