Skip to content

pandu992003/multi_model_RAG_chatbot

Repository files navigation

🤖 Multimodal RAG Chatbot System

Python LangChain Pinecone Gradio

An intelligent AI chatbot that understands documents, speaks with you, and generates images

Powered by advanced RAG technology, multimodal processing, and AI image generation

🚀 Quick Start✨ Features📖 How to Use🐳 Docker❓ Help


🌟 What is This?

This is a smart chatbot that can:

  • 📚 Read and understand your documents (PDFs, text files, images)
  • 💬 Answer questions about what it read
  • 🎤 Listen to your voice and respond back
  • 🎨 Generate images based on document content
  • 🖼️ Extract text from images using OCR technology

Think of it as your personal AI assistant that reads documents and discusses them with you!


✨ Features

🧠 Smart Document Understanding

📄 Multiple Formats

  • PDF documents
  • Text files (.txt)
  • Images with text (JPG, PNG, etc.)

🔍 Intelligent Search

  • Semantic similarity search
  • Context-aware responses
  • Accurate information retrieval

💬 Interactive Communication

✍️ Text Chat

  • Ask questions naturally
  • Get detailed answers
  • Copy and share responses

🎙️ Voice Chat

  • Speak your questions
  • Hear AI responses
  • Hands-free interaction

🎨 Visual Intelligence

📸 OCR Processing

  • Extract text from images
  • Process scanned documents
  • Understand visual content

🖼️ Image Generation

  • Create visualizations
  • Generate diagrams
  • Illustrate concepts

🏗️ How It Works

graph LR
    A[📄 Upload Document] --> B[🔄 Process & Store]
    B --> C[💾 Vector Database]
    D[❓ Ask Question] --> E[🔍 Search Similar Content]
    E --> C
    E --> F[🤖 AI Analysis]
    F --> G[💬 Answer]
    F --> H[🎨 Optional Image]
    
    style A fill:#e3f2fd
    style C fill:#fff3e0
    style F fill:#e8f5e9
    style G fill:#f3e5f5
    style H fill:#fce4ec
Loading

The Magic Behind the Scenes

Step What Happens Technology
1️⃣ Upload You upload a document PDF/Text/Image Reader
2️⃣ Process AI breaks it into pieces LangChain Text Splitter
3️⃣ Understand Converts to AI language HuggingFace Embeddings
4️⃣ Store Saves in smart database Pinecone Vector DB
5️⃣ Ask You ask a question Natural Language
6️⃣ Search Finds relevant parts Semantic Search
7️⃣ Answer AI crafts response OpenAI/OpenRouter LLM
8️⃣ Visualize Creates images (optional) Stable Diffusion

🚀 Quick Start

What You Need

1. API Keys (Free to Get!)

Service What It's For Get It Here
🔑 OpenRouter AI brain for answers openrouter.ai
🔑 Pinecone Document storage pinecone.io
🔑 HuggingFace Image generation huggingface.co

2. System Requirements

  • Computer: Windows, Mac, or Linux
  • Python: Version 3.9 or newer
  • Memory: 4GB minimum (8GB recommended)
  • GPU: Optional (for faster image generation)

📥 Installation Steps

Option 1: Simple Installation

Step 1: Download the Project

  • Download and extract the project files to your computer

Step 2: Create Virtual Environment

  • Open terminal/command prompt in the project folder
  • Run: python -m venv virtual_env
  • Activate it:
    • Windows: virtual_env\Scripts\activate
    • Mac/Linux: source virtual_env/bin/activate

Step 3: Install Dependencies

  • Run: pip install -r requirements.txt
  • Wait for installation to complete (may take 5-10 minutes)

Step 4: Configure API Keys

  • Create a file named .env in the project folder
  • Add your API keys (see format below)

Step 5: Launch!

  • Run: python main.py
  • Open browser to: http://127.0.0.1:7860

🔐 Setting Up API Keys

Create a file called .env in your project folder and add:

Format:

OPENAI_API_KEY=your_key_here
OPENAI_API_BASE=https://openrouter.ai/api/v1
PINECONE_API_KEY=your_key_here
HUGGINGFACE_TOKEN=your_token_here

Example:

OPENAI_API_KEY=sk-or-v1-abc123xyz...
OPENAI_API_BASE=https://openrouter.ai/api/v1
PINECONE_API_KEY=pcsk-abc123xyz...
HUGGINGFACE_TOKEN=hf_abc123xyz...

💡 Tip: Replace your_key_here with your actual API keys!


📖 How to Use

Step 1: Upload a Document

What to do:

  1. Click "Upload PDF or TXT" button
  2. Select your document
  3. Click "📤 Upload to Knowledge Base"
  4. Wait for "✅ Document processed" message

Supported formats:

  • 📄 PDF files
  • 📝 Text files (.txt)
  • 🖼️ Images (JPG, PNG, BMP)

What happens:

  • ✅ Text is extracted
  • ✅ Content is analyzed
  • ✅ Information is stored
  • ✅ Ready for questions!

Step 2: Ask Questions (Text)

What to do:

  1. Type your question in the text box
  2. Click "🚀 Submit" or press Enter
  3. Read the AI's response

Example questions:

  • "What is this document about?"
  • "Summarize the main points"
  • "What does it say about [topic]?"
  • "Explain the key concepts"

Response types:

  • 📚 Document-based: Uses uploaded content
  • 💡 General: When no document loaded
  • 🎨 With image: If visualization requested

Step 3: Voice Interaction (Optional)

What to do:

  1. Click the 🎤 microphone icon
  2. Speak your question clearly
  3. Stop recording
  4. Get text + audio response

Tips for best results:

  • Speak clearly and at normal pace
  • Keep questions short and focused
  • Use a quiet environment
  • Wait for processing to complete

The process:

  1. 🎤 You speak
  2. 📝 Converted to text
  3. 🔍 AI searches document
  4. 💬 Generates answer
  5. 🔊 Reads answer back

Step 4: Image Generation (Optional)

What to do:

  1. Ask for a visualization in chat
  2. Or use the dedicated image section
  3. Describe what you want to see
  4. AI validates it's document-related
  5. Image appears if approved

Example requests:

  • "Show me a diagram of the architecture"
  • "Create a visualization of the workflow"
  • "Illustrate the main concept"

Important notes:

  • ✅ Only generates document-related images
  • ❌ Rejects random/generic requests
  • 🎨 Uses Stable Diffusion AI
  • ⏱️ Takes 10-30 seconds

🐳 Docker Setup

Why Use Docker?

  • Easy setup - No manual installation
  • Consistent - Works the same everywhere
  • Isolated - Doesn't affect your system
  • Portable - Run anywhere Docker works

Quick Docker Start

Prerequisites:

  • Docker installed on your computer
  • Docker Compose installed

Launch in 3 steps:

Step Command What It Does
1 Create .env file Add your API keys
2 docker-compose up --build Build and start
3 Open http://localhost:7860 Use the app!

📘 For detailed Docker instructions, see DOCKER.md


🎨 Technology Stack

Core Technologies

🧠 AI & LLM

OpenAI/OpenRouter
Language understanding
Response generation

💾 Vector Database

Pinecone
Document storage
Semantic search

🔤 Embeddings

HuggingFace
Sentence Transformers
Text understanding

⚙️ Framework

LangChain
RAG orchestration
Chain management

Additional Features

🎨 Image Gen

Stable Diffusion
AI art generation
Visual content

📸 OCR

EasyOCR
Text extraction
Image processing

🎤 Speech

SpeechRecognition
Voice to text
Audio processing

🖥️ Interface

Gradio
Beautiful web UI
Easy interaction

📁 Project Structure

📦 RAG1/
├── 🎯 main.py                    # Main application (start here!)
├── ⚙️ config.py                  # Settings and configuration
├── 🧠 rag_system.py              # Core RAG intelligence
├── 🎨 image_generator.py         # Image creation magic
├── 📸 ocr_processor.py           # Text extraction from images
├── 🎤 speech_tts.py              # Voice interaction
├── 🖥️ ui_components.py           # User interface elements
├── 💬 simple_rag_chat.py         # Alternative simple UI
├── 🔧 multimodal_rag.py          # Multimodal coordination
│
├── 📋 requirements.txt           # Required packages
├── 🔐 .env                       # Your API keys (create this!)
├── 🐳 Dockerfile                 # Docker image recipe
├── 🐳 docker-compose.yml         # Docker orchestration
│
└── 📖 README.md                  # This file!

🔧 Troubleshooting

Common Issues & Solutions

❌ "No module named 'xxx'"

Problem: Missing Python package

Solution:

  • Activate virtual environment
  • Run: pip install -r requirements.txt
  • Restart the application

❌ "API key not found"

Problem: Missing or incorrect API keys

Solution:

  • Check .env file exists
  • Verify API keys are correct
  • Remove any extra spaces
  • Restart application

❌ "Pinecone index error"

Problem: Database connection issue

Solution:

  • Verify Pinecone API key is correct
  • Check internet connection
  • Wait a moment and retry
  • Index is auto-created on first run

❌ "Out of memory" (Image Generation)

Problem: Not enough RAM for image generation

Solution:

  • Close other applications
  • Reduce image dimensions in config.py
  • Use smaller inference steps
  • Consider using Docker with memory limits

❌ "Speech recognition not working"

Problem: Missing voice dependencies

Solution:

  • Install: pip install SpeechRecognition pyaudio
  • Check microphone permissions
  • Ensure stable internet (uses Google's service)
  • Try uploading audio file instead

❌ "GPU not detected"

Problem: CUDA/GPU not available for Stable Diffusion

Solution:

  • App works on CPU (slower but functional)
  • For GPU: Install CUDA toolkit
  • Reinstall PyTorch with CUDA support
  • Check GPU compatibility with system

Getting Help

If you're still stuck:

  1. ✅ Check this README carefully
  2. ✅ Review DOCKER.md if using Docker
  3. ✅ Ensure all API keys are valid
  4. ✅ Check console/terminal for error messages
  5. ✅ Try with a smaller document first
  6. ✅ Restart the application

🎯 Usage Tips

📚 Best Practices for Documents

Document Type Recommendation Why
PDF Use text-based PDFs Scanned PDFs may not extract well
Text Keep under 50 pages Faster processing
Images Clear, high contrast Better OCR accuracy
Language English works best AI is optimized for English

💡 Writing Better Questions

✅ Good Questions ❌ Avoid
"What are the main skills mentioned?" "Tell me everything"
"Summarize the work experience" "What is this?"
"Explain the technical architecture" Single word questions
"What does it say about [specific topic]?" Very vague questions

🎨 Image Generation Tips

✅ Will Generate ❌ Won't Generate
"Diagram of the RAG architecture" "Draw a cat"
"Visualize the workflow from document" Random art requests
"Illustration of key concepts" Generic images
"Technical diagram of the system" Personal photos

🌟 Advanced Configuration

Customizing the Experience

You can adjust settings in config.py:

Performance Settings

Setting Default What It Does
CHUNK_SIZE 800 Document split size
CHUNK_OVERLAP 150 Overlap between chunks
LLM_TEMPERATURE 0.7 Response creativity (0-1)
LLM_MAX_TOKENS 512 Response length

Image Settings

Setting Default What It Does
DEFAULT_STEPS 25 Quality (higher = better)
DEFAULT_GUIDANCE 7.5 AI adherence to prompt
DEFAULT_WIDTH 512 Image width (pixels)
DEFAULT_HEIGHT 512 Image height (pixels)

💡 Tip: Increase steps to 50 for higher quality images (slower)


🤝 Contributing

We welcome contributions! Here's how:

  1. Fork the repository
  2. Create your feature branch
  3. Test your changes thoroughly
  4. Submit a pull request

Areas for contribution:

  • 🐛 Bug fixes
  • ✨ New features
  • 📝 Documentation improvements
  • 🎨 UI enhancements
  • 🌍 Language support

📝 License

This project is licensed under the MIT License - free to use, modify, and distribute!


🙏 Credits

Built with amazing open-source tools:

  • LangChain - RAG framework
  • Pinecone - Vector database
  • OpenRouter - LLM access
  • Gradio - Beautiful UI
  • HuggingFace - AI models
  • Stable Diffusion - Image generation
  • EasyOCR - Text extraction

🎯 Future Roadmap

Coming soon:

  • 🌍 Multi-language support
  • 💾 Chat history saving
  • 📊 Analytics dashboard
  • 🔗 API endpoints
  • 📱 Mobile interface
  • 🎯 Custom models
  • 📈 Batch processing
  • 🔔 Real-time notifications

📞 Support

Need Help?

Quick answers:

  1. Check Troubleshooting
  2. Review DOCKER.md for Docker issues
  3. Ensure API keys are configured correctly

Still stuck?

  • Check existing issues on GitHub
  • Create a new issue with details
  • Include error messages and logs

🌟 Enjoy Your AI Assistant!

Made with ❤️ using cutting-edge AI technology

Star ⭐ this project if you find it helpful!

⬆ Back to Top

About

Multimodal RAG-powered chatbot that reads your PDFs, text files, and images, answers questions over them with LangChain + Pinecone, supports voice chat, does OCR on images, and generates Stable Diffusion visuals via a Gradio web UI with Docker support. ​

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors