π Project Overview
This is an AI-powered PDF Document Summarizer App that extracts text from PDF files and generates concise summaries using the LaMini-Flan-T5 model. The app is built with Streamlit for an interactive web interface and integrates LangChain for smart text chunking and Transformers for natural language processing.
π Key Features
- π Upload and preview PDF documents directly in-browser
- βοΈ Intelligent document chunking using LangChain
- π€ Summarization using
LaMini-Flan-T5from Hugging Face - β‘ Efficient text preprocessing to avoid token overflow
- π§ Built-in PDF viewer for side-by-side comparison
- π Streamlit-powered UI for fast deployment
π§βπ» Tech Stack
- Frontend: Streamlit
- NLP: Hugging Face Transformers (
pipelineAPI), LaMini-Flan-T5 - Text Preprocessing: LangChain (
RecursiveCharacterTextSplitter) - PDF Parsing: PyPDFLoader (LangChain Community)
- Frameworks: PyTorch
- Other Tools: Base64 encoding for PDF rendering
π How It Works
- π€ Upload a
.pdffile via the Streamlit interface - π Text is extracted and chunked using
RecursiveCharacterTextSplitter - π€ The summarization pipeline runs with
T5ForConditionalGeneration - π The original PDF and the generated summary are displayed side by side
π οΈ Installation & Setup
# Clone the repository
git clone https://github.com/yourusername/document-summarizer-app.git
cd document-summarizer-app
# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# Install required packages
pip install -r requirements.txt