RAG Chatbot - Intelligent Document Assistant

A production-ready Retrieval-Augmented Generation (RAG) chatbot system that enables intelligent conversations with your documents. Upload PDFs, Word documents, or text files and ask questions about their content using advanced AI capabilities.

🚀 Features

Core Capabilities

Multi-format Document Processing: PDF, DOCX, DOC, and TXT files
Intelligent Document Chunking: Semantic chunking with overlap for context preservation
Real-time Chat Interface: Live streaming responses with SignalR
Vector Semantic Search: Powered by Qdrant for accurate document retrieval
Citation Support: Automatic source document citations in responses
Background Processing: Async document processing with real-time status updates

Technical Features

Real-time Updates: Live document processing status via SignalR hubs
Multi-layer Caching: Redis caching for optimal performance
Health Monitoring: Comprehensive health checks for all services
Clean Architecture: Modular design with clear separation of concerns
Production Ready: Full error handling, logging, and monitoring

🏗 Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   React Web    │    │   .NET Core     │    │   Azure OpenAI  │
│   Frontend      │◄──►│   API           │◄──►│   GPT-4o        │
│                 │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │ SignalR               │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Document      │    │   SQL Server    │    │   Qdrant        │
│   Processing    │    │   Metadata      │    │   Vector DB     │
│   Worker        │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Azure Blob    │    │   Redis Cache   │    │   Background    │
│   Storage       │    │                 │    │   Services      │
│                 │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Project Structure

RagChatbot/
├── RagChatbot.API/              # Web API & SignalR Hubs
│   ├── Controllers/             # REST API endpoints
│   ├── Hubs/                   # SignalR real-time communication
│   └── frontend/               # React TypeScript frontend
├── RagChatbot.Core/            # Domain entities & interfaces
├── RagChatbot.Application/     # Business logic & services
├── RagChatbot.Infrastructure/  # External service implementations
│   ├── Data/                  # EF Core & database
│   ├── Services/              # Azure OpenAI, Qdrant, Redis
│   └── Workers/               # Background processing
└── Tests/                     # Unit & integration tests

🛠 Technology Stack

Backend

.NET 8 - Modern C# web framework
ASP.NET Core - REST API with OpenAPI/Swagger
SignalR - Real-time bidirectional communication
Entity Framework Core - ORM with SQL Server
Serilog - Structured logging

Frontend

React 18 - Modern component-based UI
TypeScript - Type-safe JavaScript
Tailwind CSS - Utility-first styling
Zustand - Lightweight state management
Microsoft SignalR Client - Real-time updates
Axios - HTTP client with interceptors

AI & Data

Azure OpenAI - GPT-4o for chat, text-embedding-ada-002 for embeddings
Qdrant - Vector database for semantic search
Redis - Multi-layer caching strategy
Azure Blob Storage - Scalable file storage

Document Processing

iText7 - PDF text extraction
DocumentFormat.OpenXml - Word document processing
Semantic Chunking - Intelligent text segmentation

🚀 Quick Start

Prerequisites

.NET 8 SDK
Node.js 18+
Docker & Docker Compose
Azure OpenAI API access
SQL Server (or LocalDB)

1. Clone & Setup

git clone https://github.com/yourusername/rag-chatbot-poc.git
cd rag-chatbot-poc

2. Start Infrastructure Services

# Start Qdrant and Redis
docker-compose up -d qdrant redis

3. Configure Backend

cd RagChatbot.API
cp appsettings.json appsettings.Development.json

Edit appsettings.Development.json:

{
  "ConnectionStrings": {
    "DefaultConnection": "Server=(localdb)\\mssqllocaldb;Database=RagChatbotDB;Trusted_Connection=true;",
    "Redis": "localhost:6379"
  },
  "AzureOpenAI": {
    "Endpoint": "https://your-resource.openai.azure.com/",
    "ApiKey": "your-api-key",
    "ChatDeploymentName": "gpt-4o",
    "EmbeddingDeploymentName": "text-embedding-ada-002",
    "ApiVersion": "2024-02-15-preview"
  },
  "Qdrant": {
    "Host": "localhost",
    "Port": 6333,
    "CollectionName": "documents"
  }
}

4. Initialize Database

dotnet ef database update

5. Start Backend

dotnet run

API available at: https://localhost:7262

6. Start Frontend

cd frontend
npm install
npm run dev

Frontend available at: http://localhost:3000

📖 Usage Guide

1. Document Upload

Navigate to the Documents page
Drag & drop or select files (PDF, DOCX, DOC, TXT)
Monitor real-time processing status
View processing statistics and chunk counts

2. Chat with Documents

Go to the Chat page
Create a new chat session
Ask questions about your uploaded documents
View source citations for all responses

3. Direct Queries

Use the Query page for one-off questions
Search documents without creating a chat session
View similarity scores and source chunks

🔧 Configuration

Environment Variables

# Database
ConnectionStrings__DefaultConnection="Server=localhost;Database=RagChatbot;..."
ConnectionStrings__Redis="localhost:6379"

# Azure OpenAI
AzureOpenAI__Endpoint="https://your-resource.openai.azure.com/"
AzureOpenAI__ApiKey="your-api-key"
AzureOpenAI__ChatDeploymentName="gpt-4o"
AzureOpenAI__EmbeddingDeploymentName="text-embedding-ada-002"

# Qdrant
Qdrant__Host="localhost"
Qdrant__Port="6333"
Qdrant__CollectionName="documents"

# Azure Storage
AzureStorage__ConnectionString="DefaultEndpointsProtocol=https;..."
AzureStorage__ContainerName="documents"

RAG Settings

{
  "RagSettings": {
    "ChunkSize": 500,
    "ChunkOverlap": 50,
    "MaxRetrievedChunks": 5,
    "SimilarityThreshold": 0.7
  }
}

🐳 Docker Deployment

Development

docker-compose up -d

Production

# Build and deploy
docker-compose -f docker-compose.prod.yml up -d

📊 Monitoring & Health Checks

Health Endpoints

/health - Overall application health
/health/ready - Readiness probe (DB/Cache)
/health/live - Liveness probe
/health/external - External service status

Metrics & Logging

Serilog structured logging to console and files
Application Insights integration ready
Health check dashboard for monitoring dependencies

🧪 Testing

Backend Tests

# Unit tests
dotnet test RagChatbot.Tests.Unit

# Integration tests
dotnet test RagChatbot.Tests.Integration

# All tests with coverage
dotnet test --collect:"XPlat Code Coverage"

Frontend Tests

cd frontend
npm test
npm run test:coverage

🔒 Security

API Authentication ready for JWT integration
CORS configured for frontend origins
Input validation on all endpoints
SQL injection protection via EF Core
XSS protection via React's built-in sanitization
Secrets management via Azure Key Vault (configurable)

🚀 Performance

Optimizations

Redis caching for frequently accessed data
Connection pooling for database and Redis
Lazy loading for large document collections
Streaming responses for real-time user experience
Background processing for CPU-intensive tasks

Scalability

Horizontal scaling ready with stateless design
Load balancing compatible
Container orchestration ready (Kubernetes)
CDN integration for static assets

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow SOLID principles and clean architecture
Write unit tests for new functionality
Update documentation for API changes
Use conventional commits for clear history

📝 API Documentation

REST Endpoints

Chat Sessions: /api/chat/sessions
Documents: /api/documents
Queries: /api/query
Health: /health

SignalR Hubs

Chat Hub: /hubs/chat - Real-time messaging
Document Hub: /hubs/documents - Processing updates

Full API documentation available at /swagger when running in development mode.

🛟 Support

Common Issues

Azure OpenAI Connection: Verify endpoint and API key
Qdrant Connection: Ensure Docker container is running
Database Issues: Check connection string and run migrations
Frontend CORS: Verify API URL in frontend configuration

Getting Help

Create an issue for bugs or feature requests
Check existing issues for solutions
Review logs for detailed error information

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Roadmap

Multi-tenant Support - Organization-based document isolation
Advanced Analytics - Usage metrics and conversation insights
Plugin System - Extensible document processors
Mobile App - React Native companion app
Voice Interface - Speech-to-text integration
Collaborative Features - Shared documents and conversations

Built with ❤️ for intelligent document interaction

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
RagChatbot.API		RagChatbot.API
RagChatbot.Application		RagChatbot.Application
RagChatbot.Core		RagChatbot.Core
RagChatbot.Infrastructure		RagChatbot.Infrastructure
RagChatbot.Tests.Integration		RagChatbot.Tests.Integration
RagChatbot.Tests.Unit		RagChatbot.Tests.Unit
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

RAG Chatbot - Intelligent Document Assistant

🚀 Features

Core Capabilities

Technical Features

🏗 Architecture

Project Structure

🛠 Technology Stack

Backend

Frontend

AI & Data

Document Processing

🚀 Quick Start

Prerequisites

1. Clone & Setup

2. Start Infrastructure Services

3. Configure Backend

4. Initialize Database

5. Start Backend

6. Start Frontend

📖 Usage Guide

1. Document Upload

2. Chat with Documents

3. Direct Queries

🔧 Configuration

Environment Variables

RAG Settings

🐳 Docker Deployment

Development

Production

📊 Monitoring & Health Checks

Health Endpoints

Metrics & Logging

🧪 Testing

Backend Tests

Frontend Tests

🔒 Security

🚀 Performance

Optimizations

Scalability

🤝 Contributing

Development Guidelines

📝 API Documentation

REST Endpoints

SignalR Hubs

🛟 Support

Common Issues

Getting Help

📄 License

🎯 Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages