A production-ready Retrieval-Augmented Generation (RAG) system that transforms your documents into intelligent conversations. Built with modern web technologies and AI capabilities for seamless document analysis.
Important
Backend Code Folder: For production ready backend code folder,Follow on GitHub and message me on LinkedIn to receive the production-ready backend code.
- Progressive File Upload: Upload up to 4 documents (PDF/DOCX) without losing existing files
- Intelligent Chat: Ask questions and get answers based on your document content
- Multi-Document Analysis: Query across multiple documents simultaneously
- Semantic Search: Advanced retrieval with both vector similarity and keyword matching
- Real-time Responses: Instant AI-powered answers with source attribution
- Multiple LLM Providers: OpenAI, Anthropic Claude, Google Gemini support
- Semantic Chunking: Intelligent text segmentation preserving context
- Vector Database: ChromaDB for efficient similarity search
- Session Management: Secure, temporary sessions with no data persistence
- Caching System: Embedding caching for improved performance
- Production Ready: Built with FastAPI, proper error handling, and logging
- Beautiful UI: Modern, responsive design with animations
- Drag & Drop: Intuitive file upload experience
- Real-time Feedback: Progress indicators and status updates
- File Management: Add, remove, and manage documents easily
- Source Attribution: See which documents provided each answer
- Python 3.8 or higher
- 4GB+ RAM recommended
- Modern web browser
- Clone the repository
git clone <repository-url>
cd enhanced_rag- Make startup script executable
chmod +x start.sh- Run the application
./start.shThe script will:
- Create a virtual environment
- Install all dependencies
- Initialize databases
- Start the server on
http://localhost:8000
If you prefer manual setup:
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start the server
uvicorn main:app --reload- Open
http://localhost:8000in your browser - Enter your name to create a session
- Choose an AI provider (OpenAI, Anthropic, or Gemini)
- Enter your API key and test the connection
- Drag & Drop: Drop files onto the upload zone
- Browse: Click to select files from your computer
- Progressive Upload: Add new files without removing existing ones
- Supported Formats: PDF, DOCX, DOC (up to 20MB each)
- Ask Questions: Type naturally about your documents
- View Sources: See which documents provided each answer
- File Management: Remove individual files or clear all
- Session Limits: Up to 50 questions per session
- Multi-file Queries: Ask questions that span multiple documents
- Contextual Answers: Get responses that combine information from different sources
- Real-time Processing: See upload progress and typing indicators
backend/
├── api/ # FastAPI routes
│ ├── routes_upload.py # File upload handling
│ ├── routes_rag.py # Chat and Q&A
│ ├── routes_validate.py # Authentication
│ └── routes_files.py # File management
├── core/ # Core system
│ ├── config.py # Configuration management
│ └── session.py # Session handling
├── ingestion/ # Document processing
│ ├── document_processor.py # PDF/DOCX extraction
│ └── chunker.py # Semantic chunking
├── embedding/ # Vector embeddings
│ └── embedding.py # Sentence transformers
├── vector/ # Vector storage
│ ├── vectorstore.py # ChromaDB interface
│ └── retriever.py # Advanced retrieval
└── llm/ # LLM integration
└── provider.py # Multi-provider support
frontend/
├── welcome.html # Landing page
├── auth.html # API key setup
├── chat.html # Main interface
└── static/
├── css/ # Styled components
└── js/ # Interactive features
Create a .env file or modify the generated one:
# Server Settings
HOST=0.0.0.0
PORT=8000
WORKERS=1
# File Upload Limits
MAX_FILES_PER_SESSION=4
MAX_FILE_SIZE_MB=20
# Session Management
MAX_QUESTIONS_PER_SESSION=50
SESSION_TIMEOUT_HOURS=24
# Vector Database
CHROMA_DIR=./chroma_store
# Embedding Model
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Text Processing
CHUNK_SIZE=1000
CHUNK_OVERLAP=200- Models: GPT-3.5 Turbo, GPT-4
- API Key Format:
sk-... - Get API Key: OpenAI Platform
- Models: Claude 3 Haiku, Claude 3 Sonnet
- API Key Format:
sk-ant-... - Get API Key: Anthropic Console
- Models: Gemini Pro
- API Key Format:
AI... - Get API Key: Google AI Studio
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000export RELOAD=true
export LOG_LEVEL=debugexport WORKERS=4
export LOG_LEVEL=warning
export SECRET_KEY=your-production-secret
export ALLOWED_ORIGINS=https://yourdomain.comModify backend/core/config.py:
EMBEDDING_MODEL = "sentence-transformers/all-mpnet-base-v2"
# or
EMBEDDING_MODEL = "sentence-transformers/all-distilroberta-v1"Modify backend/ingestion/chunker.py:
chunker = SemanticChunker(
chunk_size=1500, # Larger chunks
overlap=300, # More overlap
)import requests
# Upload files
files = {'files': open('document.pdf', 'rb')}
response = requests.post(
'http://localhost:8000/api/upload/files',
data={'session_id': 'your-session-id'},
files=files
)
# Ask questions
response = requests.post(
'http://localhost:8000/api/rag/ask',
json={'question': 'What is the main topic?'},
headers={
'X-Session-ID': 'your-session-id',
'X-API-Key': 'your-api-key',
'X-Provider': 'openai'
}
)main.py- FastAPI application entry pointstart.sh- Development startup scriptrequirements.txt- Python dependenciesbackend/- Server-side logicfrontend/- Client-side interface
- Backend: Add routes in
backend/api/ - Frontend: Modify HTML/CSS/JS in
frontend/ - Database: Update models in
backend/core/
pytest tests/ -v- File Processing: ~2-5 seconds per MB
- Query Response: ~1-3 seconds
- Memory Usage: ~500MB base + ~100MB per session
- Concurrent Users: 10+ (single worker)
- Use smaller embedding models for faster processing
- Increase chunk overlap for better retrieval
- Use multiple workers for production
- Enable response caching for repeated queries
- No Persistent Storage: Files and conversations are session-only
- API Key Security: Keys stored in browser session only
- Encrypted Communication: HTTPS recommended for production
# Add to main.py for production
from fastapi.middleware.trustedhost import TrustedHostMiddleware
app.add_middleware(TrustedHostMiddleware, allowed_hosts=["example.com"])# Ensure virtual environment is activated
source venv/bin/activate
pip install -r requirements.txt# Clear vector database
rm -rf chroma_store/
# Restart application# Reduce chunk size in config.py
CHUNK_SIZE = 500
# Use smaller embedding model
EMBEDDING_MODEL = "all-MiniLM-L6-v2"- Check file size limits (20MB max)
- Verify file format (PDF/DOCX only)
- Ensure sufficient disk space
export LOG_LEVEL=debug
export RELOAD=true
./start.shOnce running, visit:
- Interactive Docs:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc - OpenAPI JSON:
http://localhost:8000/openapi.json
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
black backend/ frontend/
flake8 backend/This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI for the excellent web framework
- ChromaDB for vector storage
- Sentence Transformers for embeddings
- Tailwind CSS for beautiful styling
- OpenAI/Anthropic/Google for AI capabilities
If you found this template helpful or want to discuss AI systems, feel free to reach out:
- 📧 Email: hassanaiengineer@gmail.com
- 🔗 LinkedIn: Hassan Khan
Built with ❤️ for intelligent document analysis
For questions, issues, or contributions, please visit our GitHub repository.
