🤖 Multimodal RAG Chatbot System

An intelligent AI chatbot that understands documents, speaks with you, and generates images

Powered by advanced RAG technology, multimodal processing, and AI image generation

🚀 Quick Start • ✨ Features • 📖 How to Use • 🐳 Docker • ❓ Help

🌟 What is This?

This is a smart chatbot that can:

📚 Read and understand your documents (PDFs, text files, images)
💬 Answer questions about what it read
🎤 Listen to your voice and respond back
🎨 Generate images based on document content
🖼️ Extract text from images using OCR technology

Think of it as your personal AI assistant that reads documents and discusses them with you!

✨ Features

🧠 Smart Document Understanding

📄 Multiple Formats

PDF documents
Text files (.txt)
Images with text (JPG, PNG, etc.)

🔍 Intelligent Search

Semantic similarity search
Context-aware responses
Accurate information retrieval

💬 Interactive Communication

✍️ Text Chat

Ask questions naturally
Get detailed answers
Copy and share responses

🎙️ Voice Chat

Speak your questions
Hear AI responses
Hands-free interaction

🎨 Visual Intelligence

📸 OCR Processing

Extract text from images
Process scanned documents
Understand visual content

🖼️ Image Generation

Create visualizations
Generate diagrams
Illustrate concepts

🏗️ How It Works

graph LR
    A[📄 Upload Document] --> B[🔄 Process & Store]
    B --> C[💾 Vector Database]
    D[❓ Ask Question] --> E[🔍 Search Similar Content]
    E --> C
    E --> F[🤖 AI Analysis]
    F --> G[💬 Answer]
    F --> H[🎨 Optional Image]
    
    style A fill:#e3f2fd
    style C fill:#fff3e0
    style F fill:#e8f5e9
    style G fill:#f3e5f5
    style H fill:#fce4ec

The Magic Behind the Scenes

Step	What Happens	Technology
1️⃣ Upload	You upload a document	PDF/Text/Image Reader
2️⃣ Process	AI breaks it into pieces	LangChain Text Splitter
3️⃣ Understand	Converts to AI language	HuggingFace Embeddings
4️⃣ Store	Saves in smart database	Pinecone Vector DB
5️⃣ Ask	You ask a question	Natural Language
6️⃣ Search	Finds relevant parts	Semantic Search
7️⃣ Answer	AI crafts response	OpenAI/OpenRouter LLM
8️⃣ Visualize	Creates images (optional)	Stable Diffusion

🚀 Quick Start

What You Need

1. API Keys (Free to Get!)

Service	What It's For	Get It Here
🔑 OpenRouter	AI brain for answers	openrouter.ai
🔑 Pinecone	Document storage	pinecone.io
🔑 HuggingFace	Image generation	huggingface.co

2. System Requirements

Computer: Windows, Mac, or Linux
Python: Version 3.9 or newer
Memory: 4GB minimum (8GB recommended)
GPU: Optional (for faster image generation)

📥 Installation Steps

Option 1: Simple Installation

Step 1: Download the Project

Download and extract the project files to your computer

Step 2: Create Virtual Environment

Open terminal/command prompt in the project folder
Run: python -m venv virtual_env
Activate it:
- Windows: virtual_env\Scripts\activate
- Mac/Linux: source virtual_env/bin/activate

Step 3: Install Dependencies

Run: pip install -r requirements.txt
Wait for installation to complete (may take 5-10 minutes)

Step 4: Configure API Keys

Create a file named .env in the project folder
Add your API keys (see format below)

Step 5: Launch!

Run: python main.py
Open browser to: http://127.0.0.1:7860

🔐 Setting Up API Keys

Create a file called .env in your project folder and add:

Format:

OPENAI_API_KEY=your_key_here
OPENAI_API_BASE=https://openrouter.ai/api/v1
PINECONE_API_KEY=your_key_here
HUGGINGFACE_TOKEN=your_token_here

Example:

OPENAI_API_KEY=sk-or-v1-abc123xyz...
OPENAI_API_BASE=https://openrouter.ai/api/v1
PINECONE_API_KEY=pcsk-abc123xyz...
HUGGINGFACE_TOKEN=hf_abc123xyz...

💡 Tip: Replace your_key_here with your actual API keys!

📖 How to Use

Step 1: Upload a Document

What to do:

Click "Upload PDF or TXT" button
Select your document
Click "📤 Upload to Knowledge Base"
Wait for "✅ Document processed" message

Supported formats:

📄 PDF files
📝 Text files (.txt)
🖼️ Images (JPG, PNG, BMP)

What happens:

✅ Text is extracted
✅ Content is analyzed
✅ Information is stored
✅ Ready for questions!

Step 2: Ask Questions (Text)

What to do:

Type your question in the text box
Click "🚀 Submit" or press Enter
Read the AI's response

Example questions:

"What is this document about?"
"Summarize the main points"
"What does it say about [topic]?"
"Explain the key concepts"

Response types:

📚 Document-based: Uses uploaded content
💡 General: When no document loaded
🎨 With image: If visualization requested

Step 3: Voice Interaction (Optional)

What to do:

Click the 🎤 microphone icon
Speak your question clearly
Stop recording
Get text + audio response

Tips for best results:

Speak clearly and at normal pace
Keep questions short and focused
Use a quiet environment
Wait for processing to complete

The process:

🎤 You speak
📝 Converted to text
🔍 AI searches document
💬 Generates answer
🔊 Reads answer back

Step 4: Image Generation (Optional)

What to do:

Ask for a visualization in chat
Or use the dedicated image section
Describe what you want to see
AI validates it's document-related
Image appears if approved

Example requests:

"Show me a diagram of the architecture"
"Create a visualization of the workflow"
"Illustrate the main concept"

Important notes:

✅ Only generates document-related images
❌ Rejects random/generic requests
🎨 Uses Stable Diffusion AI
⏱️ Takes 10-30 seconds

🐳 Docker Setup

Why Use Docker?

✅ Easy setup - No manual installation
✅ Consistent - Works the same everywhere
✅ Isolated - Doesn't affect your system
✅ Portable - Run anywhere Docker works

Quick Docker Start

Prerequisites:

Docker installed on your computer
Docker Compose installed

Launch in 3 steps:

Step	Command	What It Does
1	Create `.env` file	Add your API keys
2	`docker-compose up --build`	Build and start
3	Open `http://localhost:7860`	Use the app!

📘 For detailed Docker instructions, see DOCKER.md

🎨 Technology Stack

Core Technologies

🧠 AI & LLM

OpenAI/OpenRouter
Language understanding
Response generation

💾 Vector Database

Pinecone
Document storage
Semantic search

🔤 Embeddings

HuggingFace
Sentence Transformers
Text understanding

⚙️ Framework

LangChain
RAG orchestration
Chain management

Additional Features

🎨 Image Gen

Stable Diffusion
AI art generation
Visual content

📸 OCR

EasyOCR
Text extraction
Image processing

🎤 Speech

SpeechRecognition
Voice to text
Audio processing

🖥️ Interface

Gradio
Beautiful web UI
Easy interaction

📁 Project Structure

📦 RAG1/
├── 🎯 main.py                    # Main application (start here!)
├── ⚙️ config.py                  # Settings and configuration
├── 🧠 rag_system.py              # Core RAG intelligence
├── 🎨 image_generator.py         # Image creation magic
├── 📸 ocr_processor.py           # Text extraction from images
├── 🎤 speech_tts.py              # Voice interaction
├── 🖥️ ui_components.py           # User interface elements
├── 💬 simple_rag_chat.py         # Alternative simple UI
├── 🔧 multimodal_rag.py          # Multimodal coordination
│
├── 📋 requirements.txt           # Required packages
├── 🔐 .env                       # Your API keys (create this!)
├── 🐳 Dockerfile                 # Docker image recipe
├── 🐳 docker-compose.yml         # Docker orchestration
│
└── 📖 README.md                  # This file!

🔧 Troubleshooting

Common Issues & Solutions

❌ "No module named 'xxx'"

Problem: Missing Python package

Solution:

Activate virtual environment
Run: pip install -r requirements.txt
Restart the application

❌ "API key not found"

Problem: Missing or incorrect API keys

Solution:

Check .env file exists
Verify API keys are correct
Remove any extra spaces
Restart application

❌ "Pinecone index error"

Problem: Database connection issue

Solution:

Verify Pinecone API key is correct
Check internet connection
Wait a moment and retry
Index is auto-created on first run

❌ "Out of memory" (Image Generation)

Problem: Not enough RAM for image generation

Solution:

Close other applications
Reduce image dimensions in config.py
Use smaller inference steps
Consider using Docker with memory limits

❌ "Speech recognition not working"

Problem: Missing voice dependencies

Solution:

Install: pip install SpeechRecognition pyaudio
Check microphone permissions
Ensure stable internet (uses Google's service)
Try uploading audio file instead

❌ "GPU not detected"

Problem: CUDA/GPU not available for Stable Diffusion

Solution:

App works on CPU (slower but functional)
For GPU: Install CUDA toolkit
Reinstall PyTorch with CUDA support
Check GPU compatibility with system

Getting Help

If you're still stuck:

✅ Check this README carefully
✅ Review DOCKER.md if using Docker
✅ Ensure all API keys are valid
✅ Check console/terminal for error messages
✅ Try with a smaller document first
✅ Restart the application

🎯 Usage Tips

📚 Best Practices for Documents

Document Type	Recommendation	Why
PDF	Use text-based PDFs	Scanned PDFs may not extract well
Text	Keep under 50 pages	Faster processing
Images	Clear, high contrast	Better OCR accuracy
Language	English works best	AI is optimized for English

💡 Writing Better Questions

✅ Good Questions	❌ Avoid
"What are the main skills mentioned?"	"Tell me everything"
"Summarize the work experience"	"What is this?"
"Explain the technical architecture"	Single word questions
"What does it say about [specific topic]?"	Very vague questions

🎨 Image Generation Tips

✅ Will Generate	❌ Won't Generate
"Diagram of the RAG architecture"	"Draw a cat"
"Visualize the workflow from document"	Random art requests
"Illustration of key concepts"	Generic images
"Technical diagram of the system"	Personal photos

🌟 Advanced Configuration

Customizing the Experience

You can adjust settings in config.py:

Performance Settings

Setting	Default	What It Does
CHUNK_SIZE	800	Document split size
CHUNK_OVERLAP	150	Overlap between chunks
LLM_TEMPERATURE	0.7	Response creativity (0-1)
LLM_MAX_TOKENS	512	Response length

Image Settings

Setting	Default	What It Does
DEFAULT_STEPS	25	Quality (higher = better)
DEFAULT_GUIDANCE	7.5	AI adherence to prompt
DEFAULT_WIDTH	512	Image width (pixels)
DEFAULT_HEIGHT	512	Image height (pixels)

💡 Tip: Increase steps to 50 for higher quality images (slower)

🤝 Contributing

We welcome contributions! Here's how:

Fork the repository
Create your feature branch
Test your changes thoroughly
Submit a pull request

Areas for contribution:

🐛 Bug fixes
✨ New features
📝 Documentation improvements
🎨 UI enhancements
🌍 Language support

📝 License

This project is licensed under the MIT License - free to use, modify, and distribute!

🙏 Credits

Built with amazing open-source tools:

LangChain - RAG framework
Pinecone - Vector database
OpenRouter - LLM access
Gradio - Beautiful UI
HuggingFace - AI models
Stable Diffusion - Image generation
EasyOCR - Text extraction

🎯 Future Roadmap

Coming soon:

📞 Support

Need Help?

Quick answers:

Check Troubleshooting
Review DOCKER.md for Docker issues
Ensure API keys are configured correctly

Still stuck?

Check existing issues on GitHub
Create a new issue with details
Include error messages and logs

🌟 Enjoy Your AI Assistant!

Made with ❤️ using cutting-edge AI technology

Star ⭐ this project if you find it helpful!

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.dockerignore		.dockerignore
.env		.env
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
image_generator.py		image_generator.py
main.py		main.py
multimodal_rag.py		multimodal_rag.py
ocr_processor.py		ocr_processor.py
rag_system.py		rag_system.py
requirements.txt		requirements.txt
simple_rag_chat.py		simple_rag_chat.py
speech_tts.py		speech_tts.py
ui_components.py		ui_components.py

Folders and files

Latest commit

History

Repository files navigation

🤖 Multimodal RAG Chatbot System

🌟 What is This?

✨ Features

🧠 Smart Document Understanding

💬 Interactive Communication

🎨 Visual Intelligence

🏗️ How It Works

The Magic Behind the Scenes

🚀 Quick Start

What You Need

1. API Keys (Free to Get!)

2. System Requirements

📥 Installation Steps

Option 1: Simple Installation

🔐 Setting Up API Keys

📖 How to Use

Step 1: Upload a Document

Step 2: Ask Questions (Text)

Step 3: Voice Interaction (Optional)

Step 4: Image Generation (Optional)

🐳 Docker Setup

Why Use Docker?

Quick Docker Start

🎨 Technology Stack

Core Technologies

🧠 AI & LLM

💾 Vector Database

🔤 Embeddings

⚙️ Framework

Additional Features

🎨 Image Gen

📸 OCR

🎤 Speech

🖥️ Interface

📁 Project Structure

🔧 Troubleshooting

Common Issues & Solutions

❌ "No module named 'xxx'"

❌ "API key not found"

❌ "Pinecone index error"

❌ "Out of memory" (Image Generation)

❌ "Speech recognition not working"

❌ "GPU not detected"

Getting Help

🎯 Usage Tips

📚 Best Practices for Documents

💡 Writing Better Questions

🎨 Image Generation Tips

🌟 Advanced Configuration

Customizing the Experience

Performance Settings

Image Settings

🤝 Contributing

📝 License

🙏 Credits

🎯 Future Roadmap

📞 Support

Need Help?

🌟 Enjoy Your AI Assistant!

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages