An intelligent AI chatbot that understands documents, speaks with you, and generates images
Powered by advanced RAG technology, multimodal processing, and AI image generation
🚀 Quick Start • ✨ Features • 📖 How to Use • 🐳 Docker • ❓ Help
This is a smart chatbot that can:
- 📚 Read and understand your documents (PDFs, text files, images)
- 💬 Answer questions about what it read
- 🎤 Listen to your voice and respond back
- 🎨 Generate images based on document content
- 🖼️ Extract text from images using OCR technology
Think of it as your personal AI assistant that reads documents and discusses them with you!
|
📄 Multiple Formats
|
🔍 Intelligent Search
|
|
✍️ Text Chat
|
🎙️ Voice Chat
|
|
📸 OCR Processing
|
🖼️ Image Generation
|
graph LR
A[📄 Upload Document] --> B[🔄 Process & Store]
B --> C[💾 Vector Database]
D[❓ Ask Question] --> E[🔍 Search Similar Content]
E --> C
E --> F[🤖 AI Analysis]
F --> G[💬 Answer]
F --> H[🎨 Optional Image]
style A fill:#e3f2fd
style C fill:#fff3e0
style F fill:#e8f5e9
style G fill:#f3e5f5
style H fill:#fce4ec
| Step | What Happens | Technology |
|---|---|---|
| 1️⃣ Upload | You upload a document | PDF/Text/Image Reader |
| 2️⃣ Process | AI breaks it into pieces | LangChain Text Splitter |
| 3️⃣ Understand | Converts to AI language | HuggingFace Embeddings |
| 4️⃣ Store | Saves in smart database | Pinecone Vector DB |
| 5️⃣ Ask | You ask a question | Natural Language |
| 6️⃣ Search | Finds relevant parts | Semantic Search |
| 7️⃣ Answer | AI crafts response | OpenAI/OpenRouter LLM |
| 8️⃣ Visualize | Creates images (optional) | Stable Diffusion |
| Service | What It's For | Get It Here |
|---|---|---|
| 🔑 OpenRouter | AI brain for answers | openrouter.ai |
| 🔑 Pinecone | Document storage | pinecone.io |
| 🔑 HuggingFace | Image generation | huggingface.co |
- Computer: Windows, Mac, or Linux
- Python: Version 3.9 or newer
- Memory: 4GB minimum (8GB recommended)
- GPU: Optional (for faster image generation)
Step 1: Download the Project
- Download and extract the project files to your computer
Step 2: Create Virtual Environment
- Open terminal/command prompt in the project folder
- Run:
python -m venv virtual_env - Activate it:
- Windows:
virtual_env\Scripts\activate - Mac/Linux:
source virtual_env/bin/activate
- Windows:
Step 3: Install Dependencies
- Run:
pip install -r requirements.txt - Wait for installation to complete (may take 5-10 minutes)
Step 4: Configure API Keys
- Create a file named
.envin the project folder - Add your API keys (see format below)
Step 5: Launch!
- Run:
python main.py - Open browser to:
http://127.0.0.1:7860
Create a file called .env in your project folder and add:
Format:
OPENAI_API_KEY=your_key_here
OPENAI_API_BASE=https://openrouter.ai/api/v1
PINECONE_API_KEY=your_key_here
HUGGINGFACE_TOKEN=your_token_here
Example:
OPENAI_API_KEY=sk-or-v1-abc123xyz...
OPENAI_API_BASE=https://openrouter.ai/api/v1
PINECONE_API_KEY=pcsk-abc123xyz...
HUGGINGFACE_TOKEN=hf_abc123xyz...
💡 Tip: Replace
your_key_herewith your actual API keys!
|
What to do:
Supported formats:
|
What happens:
|
|
What to do:
Example questions:
|
Response types:
|
|
What to do:
Tips for best results:
|
The process:
|
|
What to do:
Example requests:
|
Important notes:
|
- ✅ Easy setup - No manual installation
- ✅ Consistent - Works the same everywhere
- ✅ Isolated - Doesn't affect your system
- ✅ Portable - Run anywhere Docker works
Prerequisites:
- Docker installed on your computer
- Docker Compose installed
Launch in 3 steps:
| Step | Command | What It Does |
|---|---|---|
| 1 | Create .env file |
Add your API keys |
| 2 | docker-compose up --build |
Build and start |
| 3 | Open http://localhost:7860 |
Use the app! |
📘 For detailed Docker instructions, see DOCKER.md
|
OpenAI/OpenRouter Language understanding Response generation |
Pinecone Document storage Semantic search |
HuggingFace Sentence Transformers Text understanding |
LangChain RAG orchestration Chain management |
|
Stable Diffusion AI art generation Visual content |
EasyOCR Text extraction Image processing |
SpeechRecognition Voice to text Audio processing |
Gradio Beautiful web UI Easy interaction |
📦 RAG1/
├── 🎯 main.py # Main application (start here!)
├── ⚙️ config.py # Settings and configuration
├── 🧠 rag_system.py # Core RAG intelligence
├── 🎨 image_generator.py # Image creation magic
├── 📸 ocr_processor.py # Text extraction from images
├── 🎤 speech_tts.py # Voice interaction
├── 🖥️ ui_components.py # User interface elements
├── 💬 simple_rag_chat.py # Alternative simple UI
├── 🔧 multimodal_rag.py # Multimodal coordination
│
├── 📋 requirements.txt # Required packages
├── 🔐 .env # Your API keys (create this!)
├── 🐳 Dockerfile # Docker image recipe
├── 🐳 docker-compose.yml # Docker orchestration
│
└── 📖 README.md # This file!
Problem: Missing Python package
Solution:
- Activate virtual environment
- Run:
pip install -r requirements.txt - Restart the application
Problem: Missing or incorrect API keys
Solution:
- Check
.envfile exists - Verify API keys are correct
- Remove any extra spaces
- Restart application
Problem: Database connection issue
Solution:
- Verify Pinecone API key is correct
- Check internet connection
- Wait a moment and retry
- Index is auto-created on first run
Problem: Not enough RAM for image generation
Solution:
- Close other applications
- Reduce image dimensions in
config.py - Use smaller inference steps
- Consider using Docker with memory limits
Problem: Missing voice dependencies
Solution:
- Install:
pip install SpeechRecognition pyaudio - Check microphone permissions
- Ensure stable internet (uses Google's service)
- Try uploading audio file instead
Problem: CUDA/GPU not available for Stable Diffusion
Solution:
- App works on CPU (slower but functional)
- For GPU: Install CUDA toolkit
- Reinstall PyTorch with CUDA support
- Check GPU compatibility with system
If you're still stuck:
- ✅ Check this README carefully
- ✅ Review DOCKER.md if using Docker
- ✅ Ensure all API keys are valid
- ✅ Check console/terminal for error messages
- ✅ Try with a smaller document first
- ✅ Restart the application
| Document Type | Recommendation | Why |
|---|---|---|
| Use text-based PDFs | Scanned PDFs may not extract well | |
| Text | Keep under 50 pages | Faster processing |
| Images | Clear, high contrast | Better OCR accuracy |
| Language | English works best | AI is optimized for English |
| ✅ Good Questions | ❌ Avoid |
|---|---|
| "What are the main skills mentioned?" | "Tell me everything" |
| "Summarize the work experience" | "What is this?" |
| "Explain the technical architecture" | Single word questions |
| "What does it say about [specific topic]?" | Very vague questions |
| ✅ Will Generate | ❌ Won't Generate |
|---|---|
| "Diagram of the RAG architecture" | "Draw a cat" |
| "Visualize the workflow from document" | Random art requests |
| "Illustration of key concepts" | Generic images |
| "Technical diagram of the system" | Personal photos |
You can adjust settings in config.py:
| Setting | Default | What It Does |
|---|---|---|
| CHUNK_SIZE | 800 | Document split size |
| CHUNK_OVERLAP | 150 | Overlap between chunks |
| LLM_TEMPERATURE | 0.7 | Response creativity (0-1) |
| LLM_MAX_TOKENS | 512 | Response length |
| Setting | Default | What It Does |
|---|---|---|
| DEFAULT_STEPS | 25 | Quality (higher = better) |
| DEFAULT_GUIDANCE | 7.5 | AI adherence to prompt |
| DEFAULT_WIDTH | 512 | Image width (pixels) |
| DEFAULT_HEIGHT | 512 | Image height (pixels) |
💡 Tip: Increase steps to 50 for higher quality images (slower)
We welcome contributions! Here's how:
- Fork the repository
- Create your feature branch
- Test your changes thoroughly
- Submit a pull request
Areas for contribution:
- 🐛 Bug fixes
- ✨ New features
- 📝 Documentation improvements
- 🎨 UI enhancements
- 🌍 Language support
This project is licensed under the MIT License - free to use, modify, and distribute!
Built with amazing open-source tools:
- LangChain - RAG framework
- Pinecone - Vector database
- OpenRouter - LLM access
- Gradio - Beautiful UI
- HuggingFace - AI models
- Stable Diffusion - Image generation
- EasyOCR - Text extraction
Coming soon:
- 🌍 Multi-language support
- 💾 Chat history saving
- 📊 Analytics dashboard
- 🔗 API endpoints
- 📱 Mobile interface
- 🎯 Custom models
- 📈 Batch processing
- 🔔 Real-time notifications
Quick answers:
- Check Troubleshooting
- Review DOCKER.md for Docker issues
- Ensure API keys are configured correctly
Still stuck?
- Check existing issues on GitHub
- Create a new issue with details
- Include error messages and logs
Made with ❤️ using cutting-edge AI technology
Star ⭐ this project if you find it helpful!