Skip to content

Intelligent RAG system that auto-corrects hallucinations, processes multiple document types (PDF, images, legal, financial), and uses hybrid search for accurate question-answering.

Notifications You must be signed in to change notification settings

Technocrat-dev/RAG_CHATBOT

Repository files navigation

🧠 NeuralRAG - Self-Correcting Multi-Modal RAG System

FastAPI Next.js Python Docker License

A production-ready Retrieval-Augmented Generation (RAG) system with self-correcting capabilities, multi-modal document processing, and hybrid search. Built for accuracy, scalability, and ease of deployment.

✨ Key Features

  • πŸ”„ Self-Correcting RAG Pipeline - Iterative validation and refinement of AI responses for maximum accuracy
  • πŸ“š Multi-Modal Document Processing - Specialized handlers for legal, financial, technical documents and images
  • πŸ” Hybrid Search Engine - Combines dense vector search (semantic) with BM25 sparse search (keyword)
  • 🎯 High Accuracy - Query rewriting, answer validation, and hallucination detection
  • πŸ’¬ Streaming Responses - Real-time answer generation with processing logs
  • πŸ—‚οΈ Collection Isolation - Organize documents into separate knowledge bases
  • πŸ“ Session Management - Persistent chat history with SQLite backend
  • 🐳 Docker Ready - One-command deployment with docker-compose
  • 🎨 Modern UI - Beautiful Next.js frontend with glassmorphism design

πŸ“‹ Table of Contents


πŸ— Architecture

System Overview

graph TB
    subgraph "Frontend Layer"
        UI[Next.js UI<br/>React + TailwindCSS]
    end
    
    subgraph "Backend Layer"
        API[FastAPI Server<br/>Port 8000]
        Router[Document Router<br/>Auto-Detection]
        
        subgraph "Document Handlers"
            LegalH[Legal Handler]
            FinH[Financial Handler]
            TechH[Technical Handler]
            ImgH[Image Handler<br/>Vision LLM]
        end
        
        subgraph "RAG Pipeline"
            QR[Query Rewriter<br/>Multi-Query]
            Search[Hybrid Search<br/>BM25 + Vector]
            Rerank[Cross-Encoder<br/>Reranking]
            SC[Self-Correction<br/>Validator]
        end
    end
    
    subgraph "Storage Layer"
        Chroma[(ChromaDB<br/>Vector Store)]
        BM25[(BM25 Index<br/>Sparse Search)]
        Parent[(Parent Store<br/>JSON)]
        SQLite[(SQLite<br/>Sessions)]
    end
    
    subgraph "LLM Layer"
        Ollama[Ollama<br/>LLM Service]
        Models[llama3<br/>llava vision]
    end
    
    UI -->|HTTP/REST| API
    API --> Router
    Router --> LegalH & FinH & TechH & ImgH
    
    API --> QR
    QR --> Search
    Search --> Chroma & BM25
    Search --> Rerank
    Rerank --> SC
    SC --> Ollama
    
    LegalH & FinH & TechH --> Chroma & BM25 & Parent
    ImgH --> Ollama
    API --> SQLite
    Ollama --> Models
    
    style UI fill:#a78bfa
    style API fill:#60a5fa
    style SC fill:#34d399
    style Ollama fill:#fb923c
Loading

Self-Correcting RAG Pipeline

The core innovation of NeuralRAG is its self-correcting pipeline that iteratively improves answer quality:

flowchart TD
    Start([User Query]) --> QR[Query Rewriter<br/>Generate 3 variations]
    QR --> HS[Hybrid Search<br/>BM25 + Dense Vector]
    
    HS --> RRF[Reciprocal Rank Fusion<br/>Combine Results]
    RRF --> Rerank[Cross-Encoder Reranking<br/>Top-K Selection]
    
    Rerank --> Gen[LLM Generation<br/>Generate Answer]
    Gen --> Val{Answer Validator<br/>Check Quality}
    
    Val -->|Confidence β‰₯ 0.6| Done([Return Answer])
    Val -->|Confidence < 0.6<br/>Iterations < 2| HD[Hallucination Detector]
    Val -->|Max Iterations| Done
    
    HD --> Check{Issues Found?}
    Check -->|Yes| Correct[Hallucination Correction<br/>Regenerate with Guidance]
    Check -->|No| Expand[Expand Search Context]
    
    Correct --> Gen
    Expand --> HS
    
    style Start fill:#a78bfa
    style Val fill:#fbbf24
    style HD fill:#f87171
    style Done fill:#34d399
Loading

Document Processing Pipeline

flowchart LR
    Upload[Document Upload] --> Detect{File Type<br/>Detection}
    
    Detect -->|PDF Analysis| Route{Content Analysis<br/>Score Indicators}
    Detect -->|Image| ImgH[Image Handler<br/>Vision LLM]
    
    Route -->|Legal Terms| LegalH[Legal Handler<br/>Semantic Chunking]
    Route -->|Financial Data| FinH[Financial Handler<br/>Table Extraction]
    Route -->|Technical Docs| TechH[Technical Handler<br/>Markdown Chunks]
    
    LegalH & FinH & TechH --> Parent[Parent Chunking<br/>~1000 chars]
    Parent --> Children[Child Chunking<br/>Sliding Window 200 words]
    
    ImgH --> Describe[Image Description<br/>Structured Extraction]
    Describe --> Children
    
    Children --> Embed[Sentence Transformer<br/>all-mpnet-base-v2]
    Embed --> Index[(Vector Index<br/>+<br/>BM25 Index)]
    
    style Upload fill:#a78bfa
    style Route fill:#fbbf24
    style Index fill:#34d399
Loading

Hybrid Search Architecture

graph TB
    Query[User Query] --> Embed[Embedding Model]
    Query --> Tokenize[BM25 Tokenizer]
    
    Embed --> VectorSearch[Dense Vector Search<br/>Cosine Similarity]
    Tokenize --> SparseSearch[Sparse BM25 Search<br/>TF-IDF Scoring]
    
    VectorSearch --> Results1[Top-K Semantic Matches]
    SparseSearch --> Results2[Top-K Keyword Matches]
    
    Results1 --> RRF[Reciprocal Rank Fusion<br/>k=60]
    Results2 --> RRF
    
    RRF --> Rerank[Cross-Encoder Reranking<br/>Fine-grained Scoring]
    Rerank --> Parent[Retrieve Parent Chunks<br/>Full Context]
    
    Parent --> Final[Final Ranked Results]
    
    style Query fill:#a78bfa
    style RRF fill:#fbbf24
    style Final fill:#34d399
Loading

Collection & Session Architecture

graph TB
    subgraph "User Space"
        User[User]
    end
    
    subgraph "Collections"
        C1[Collection: Legal Docs<br/>doc_count: 5]
        C2[Collection: Financial Q4<br/>doc_count: 3]
        C3[Collection: Default<br/>doc_count: 0]
    end
    
    subgraph "Sessions"
        S1[Session 1<br/>Contract Review<br/>β†’ Collection: Legal Docs]
        S2[Session 2<br/>Q4 Analysis<br/>β†’ Collection: Financial Q4]
        S3[Session 3<br/>General Chat<br/>β†’ Collection: Default]
    end
    
    subgraph "Vector Database"
        V1[(Legal Docs<br/>Vector Space)]
        V2[(Financial Q4<br/>Vector Space)]
        V3[(Default<br/>Vector Space)]
    end
    
    User --> S1 & S2 & S3
    S1 --> C1 --> V1
    S2 --> C2 --> V2
    S3 --> C3 --> V3
    
    style User fill:#a78bfa
    style C1 fill:#60a5fa
    style C2 fill:#60a5fa
    style C3 fill:#60a5fa
Loading

🎯 Features in Detail

1. Self-Correcting Pipeline

The system validates every answer and iteratively improves it:

  • Query Rewriting: Expands queries into 3 variations for better retrieval
  • Answer Validation: Checks factual grounding and confidence scores
  • Hallucination Detection: Identifies unsupported claims
  • Iterative Refinement: Re-generates answers with corrective guidance
  • Max 2 Iterations: Balances accuracy with response time

Example Workflow:

User: "What's the revenue?"
β†’ Query Rewriter: ["What is the revenue?", "Show revenue figures", "Revenue performance"]
β†’ Hybrid Search: Retrieve 10 chunks from all 3 queries
β†’ Generate Answer: "$5.2M in Q4"
β†’ Validator: confidence=0.85 βœ“
β†’ Return: High-confidence answer

2. Multi-Modal Document Processing

Supported Document Types:

Type Handler Features Use Cases
Legal LegalHandler Semantic chunking, metadata extraction, table-aware Contracts, agreements, policies
Financial FinancialHandler Table parsing, numeric extraction, quarter detection Earnings reports, balance sheets
Technical TechnicalHandler Markdown preservation, code blocks, hierarchical chunking Manuals, documentation, guides
Images ImageHandler Vision LLM analysis, structured extraction Charts, diagrams, photos

Auto-Detection: The DocumentRouter analyzes file content and scores indicators for each type, automatically selecting the best handler.

3. Hybrid Search Engine

Dense Vector Search (Semantic):

  • Model: all-mpnet-base-v2 (768 dimensions)
  • Similarity: Cosine distance
  • Strength: Understands meaning and context

Sparse BM25 Search (Keyword):

  • Algorithm: BM25Okapi
  • Tokenization: Word-level
  • Strength: Exact matches and rare terms

Reciprocal Rank Fusion (RRF):

  • Combines rankings from both search methods
  • Formula: score = Ξ£(1 / (k + rank)) where k=60
  • Results in better overall relevance

4. Session & Collection Management

  • Collections: Isolated document groups (e.g., "Q4 Financials", "Legal Contracts")
  • Sessions: Chat threads linked to collections
  • Persistence: SQLite database stores all history
  • Cleanup: Deleting a session removes its documents (if no other sessions use them)

πŸ“Έ Screenshot

Main Interface

image

πŸš€ Installation

Prerequisites

  • Docker (recommended) or Python 3.9+
  • Ollama (for local LLM) or Docker
  • 8GB+ RAM recommended

Option 1: Docker Deployment (Recommended)

Step 1: Clone the repository

git clone <repository-url>
cd RAG_Project

Step 2: Configure environment (optional)

cp .env.example .env
# Edit .env to customize models and paths

Step 3: Start the stack

docker-compose up -d

This will:

  • Pull Ollama image and download llama3 and llava models
  • Build and start the backend (FastAPI)
  • Build and start the frontend (Next.js)
  • Create persistent volumes for data and models

Step 4: Access the application

Step 5: Verify deployment

# Check health status
curl http://localhost:8000/health

# Expected response:
{
  "status": "healthy",
  "ollama": "connected",
  "database": "connected (0 docs)",
  "system": "NeuralRAG v2.0"
}

Option 2: Local Development

Step 1: Install Ollama

# Download from https://ollama.ai
# Pull required models
ollama pull llama3
ollama pull llava

Step 2: Set up Python backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start backend
uvicorn main:app --reload --port 8000

Step 3: Set up Next.js frontend

cd rag-frontend
npm install
npm run dev

Step 4: Configure environment Create .env in project root:

RAG_LLM_MODEL=llama3
RAG_VISION_MODEL=llava
RAG_EMBED_MODEL=all-mpnet-base-v2
OLLAMA_HOST=http://localhost:11434

πŸ“– Usage

Web Interface

1. Create a Collection (optional)

  • Click "New Collection" in sidebar
  • Name: "Q4 Financial Reports"
  • Description: "Earnings and balance sheets for Q4 2024"

2. Upload Documents

  • Select collection from dropdown
  • Click upload button or drag-and-drop
  • Supported formats: PDF, TXT, MD, DOC, DOCX, PNG, JPG

3. Start a Chat Session

  • Click "New Chat" button
  • Select the collection to chat with
  • System creates isolated session

4. Ask Questions

  • Type your question
  • Watch real-time processing logs (query rewriting, search, validation)
  • Receive answer with confidence score and sources

5. View Processing Logs

  • Toggle "Show Logs" to see self-correction pipeline
  • See query variations, search results, validation scores

API Usage

Upload a Document

curl -X POST "http://localhost:8000/upload?collection_id=default" \
  -F "file=@document.pdf"

Response:

{
  "status": "success",
  "message": "Indexed 45 chunks from document.pdf",
  "document_type": "financial",
  "handler": "FinancialHandler",
  "collection_id": "default",
  "chunks_created": 45
}

Chat (Non-Streaming)

curl -X POST "http://localhost:8000/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What was the Q4 revenue?",
    "collection_id": "default",
    "use_self_correction": true
  }'

Response:

{
  "response": "The Q4 revenue was $5.2 million, representing a 15% increase...",
  "context_used": ["Source 1 excerpt...", "Source 2 excerpt..."],
  "confidence": 0.87,
  "iterations": 1,
  "was_corrected": false
}

Chat (Streaming)

curl -X POST "http://localhost:8000/chat/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Summarize the contract terms",
    "collection_id": "legal-docs"
  }'

NDJSON Stream:

{"type": "log", "step": "query_rewriting", "message": "Generated 3 query variations"}
{"type": "log", "step": "search", "message": "Retrieved 12 candidates"}
{"type": "token", "content": "The contract "}
{"type": "token", "content": "specifies a "}
{"type": "token", "content": "term of 24 months..."}
{"type": "done", "confidence": 0.92, "sources": ["contract.pdf:page=2"]}

πŸ“š API Reference

Document Management

Endpoint Method Description
/upload POST Upload and index a document
/documents DELETE Delete all documents (requires confirmation)

Chat Endpoints

Endpoint Method Description
/chat POST Standard chat (non-streaming)
/chat/stream POST Streaming chat with logs
/chat/history/{session_id} GET Get chat history for session
/chat/history/{session_id} DELETE Clear chat history

Collection Management

Endpoint Method Description
/collections GET List all collections
/collections POST Create new collection
/collections/{id} DELETE Delete collection and documents

Session Management

Endpoint Method Description
/sessions GET List all sessions
/sessions POST Create new session
/sessions/{id} GET Get session details and messages
/sessions/{id} PATCH Update session name
/sessions/{id} DELETE Delete session and cleanup resources
/sessions/{id}/messages POST Add message to session

System

Endpoint Method Description
/health GET Health check (Ollama, database)
/stats GET System statistics
/ GET API info

Complete API documentation: http://localhost:8000/docs


πŸ”§ Technical Details

Project Structure

RAG_Project/
β”œβ”€β”€ main.py                 # FastAPI application entry point
β”œβ”€β”€ config.py              # Centralized configuration
β”œβ”€β”€ database.py            # VectorDB with hybrid search
β”œβ”€β”€ collection_manager.py  # Collection isolation logic
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ Dockerfile            # Backend container
β”œβ”€β”€ docker-compose.yml    # Full stack orchestration
β”‚
β”œβ”€β”€ handlers/             # Document processing
β”‚   β”œβ”€β”€ router.py         # Auto-detection and routing
β”‚   β”œβ”€β”€ legal.py          # Legal document handler
β”‚   β”œβ”€β”€ financial.py      # Financial document handler
β”‚   β”œβ”€β”€ technical.py      # Technical document handler
β”‚   └── image.py          # Image vision handler
β”‚
β”œβ”€β”€ self_correction/      # Self-correcting pipeline
β”‚   β”œβ”€β”€ pipeline.py       # Main RAG pipeline
β”‚   β”œβ”€β”€ validator.py      # Answer validation
β”‚   └── query_rewriter.py # Query expansion
β”‚
β”œβ”€β”€ search/              # Search components
β”‚   └── reranker.py      # Cross-encoder reranking
β”‚
β”œβ”€β”€ memory/              # Session management
β”‚   └── session_store.py # SQLite session storage
β”‚
β”œβ”€β”€ multimodal/          # Multi-modal support
β”‚   └── vision.py        # Image analysis with LLaVA
β”‚
β”œβ”€β”€ utils/               # Utilities
β”‚   β”œβ”€β”€ logger.py        # Structured logging
β”‚   └── auth.py          # Authentication helpers
β”‚
└── rag-frontend/        # Next.js frontend
    β”œβ”€β”€ app/
    β”‚   β”œβ”€β”€ page.tsx     # Main chat interface
    β”‚   └── components/  # React components
    β”œβ”€β”€ Dockerfile       # Frontend container
    └── package.json     # Node dependencies

Core Components

1. VectorDB (database.py)

Features:

  • Parent-child chunking strategy
  • Hybrid search (BM25 + Dense)
  • Reciprocal rank fusion
  • Collection-based isolation
  • Persistent storage (ChromaDB + JSON)

Key Methods:

  • add_documents(): Index with parent-child chunking
  • retrieve(): Hybrid search with RRF
  • delete_by_collection(): Cleanup by collection

2. SelfCorrectingRAG (self_correction/pipeline.py)

Workflow:

  1. Query rewriting (3 variations)
  2. Hybrid search for all variations
  3. Cross-encoder reranking
  4. LLM answer generation
  5. Answer validation (factual grounding check)
  6. If low confidence: hallucination correction or search expansion
  7. Max 2 iterations

Validation Criteria:

  • Factual grounding in context
  • Confidence threshold: 0.6
  • Hallucination detection via claim extraction

3. DocumentRouter (handlers/router.py)

Detection Algorithm:

# Image detection: file extension
if ext in ['.png', '.jpg', '.jpeg']: return 'image'

# Content analysis for PDFs
text = extract_first_3000_chars(pdf)
scores = {
    'legal': count_legal_terms(text),
    'financial': count_financial_terms(text) + numeric_density(text),
    'technical': count_technical_terms(text) + code_patterns(text)
}
return max(scores)

4. Session Management (memory/session_store.py)

Schema (SQLite):

CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    name TEXT,
    collection_id TEXT,
    created_at TIMESTAMP
);

CREATE TABLE messages (
    id TEXT PRIMARY KEY,
    session_id TEXT,
    role TEXT,  -- 'user' or 'assistant'
    content TEXT,
    confidence REAL,
    sources TEXT,  -- JSON array
    timestamp TIMESTAMP
);

Configuration

Environment variables (.env):

# Models
RAG_LLM_MODEL=llama3           # Chat model
RAG_VISION_MODEL=llava         # Image analysis
RAG_EMBED_MODEL=all-mpnet-base-v2  # Embeddings

# Storage
RAG_DATA_DIR=data              # Upload directory
RAG_CHROMA_DIR=chroma_db       # Vector database

# Chunking
RAG_CHUNK_SIZE=1000            # Parent chunk size
RAG_CHUNK_OVERLAP=100          # Overlap size

# Ollama (Docker)
OLLAMA_HOST=http://ollama:11434  # Container name

Performance Considerations

Embedding Model:

  • all-mpnet-base-v2: 768 dimensions, best quality
  • Alternative: all-MiniLM-L6-v2 (384 dim, faster)

Chunking Strategy:

  • Parent chunks: ~1000 chars (semantic units)
  • Child chunks: 200 words sliding window
  • Retrieval on children, return parents for full context

Search Parameters:

  • Initial retrieval: top-20 per query variation
  • After RRF: top-10 combined
  • After reranking: top-5 final

Self-Correction:

  • Max iterations: 2
  • Confidence threshold: 0.6
  • Timeout: ~10-15 seconds for full pipeline

πŸ’» Development

Running Tests

# Backend unit tests (if available)
pytest tests/

# Check API health
curl http://localhost:8000/health

Development Workflow

Backend Changes:

# Auto-reload enabled by default
uvicorn main:app --reload --port 8000

Frontend Changes:

cd rag-frontend
npm run dev  # Hot reload enabled

Adding a New Document Handler

  1. Create handlers/custom.py:
from .base import BaseHandler

class CustomHandler(BaseHandler):
    def get_type_name(self) -> str:
        return "custom"
    
    def ingest(self, file_path: str) -> str:
        # Extract text
        pass
    
    def chunk(self, text: str):
        # Return list of dicts with 'text' and 'metadata'
        pass
  1. Register in handlers/router.py:
from .custom import CustomHandler

self.handlers['custom'] = CustomHandler()
  1. Add detection logic in classify() method

Customizing the LLM

Switch to GPT-4 (requires API key):

# In self_correction/pipeline.py, replace ollama.chat() with:
import openai
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)

Use Llama 3.1 (larger model):

# Pull model
docker exec neuralrag-ollama ollama pull llama3.1

# Update .env
RAG_LLM_MODEL=llama3.1

Database Management

View database stats:

curl http://localhost:8000/stats

Backup vector database:

# Copy chroma_db directory
docker cp neuralrag-backend:/app/chroma_db ./backup/

# For local: simply copy the directory
cp -r chroma_db backup/

Reset database:

# Stop containers
docker-compose down

# Remove volumes
docker volume rm neuralrag-chroma neuralrag-uploads

# Restart
docker-compose up -d

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License.

MIT License

Copyright (c) 2024

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

πŸ™ Acknowledgments

  • Ollama for local LLM hosting
  • ChromaDB for vector database
  • Sentence Transformers for embeddings
  • Next.js and FastAPI teams

πŸ“ž Support

For issues and questions:

  • Open an issue on GitHub
  • Check /docs endpoint for API documentation
  • Review logs in Docker: docker logs neuralrag-backend

Built for accurate, reliable AI-powered document Q&A

About

Intelligent RAG system that auto-corrects hallucinations, processes multiple document types (PDF, images, legal, financial), and uses hybrid search for accurate question-answering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published