Test the RAG - Advanced RAG Knowledge Processing System

A professional-grade Retrieval-Augmented Generation (RAG) testing platform built with FastAPI, featuring intelligent knowledge ingestion, vector storage, and AI-powered query processing.

⚠️ Setup Required: This system requires you to create a secret.json file with your API credentials before it can run. See the Configuration section below for details.

🧪 RAG Testing: Professional

Test the RAG serves as a comprehensive RAG testing platform that enables researchers, developers, and AI practitioners to experiment with and evaluate different RAG configurations from a single repository. This system provides a unified environment for testing various chunking strategies, embedding models, and LLM combinations to optimize RAG performance.

🎯 RAG Testing Capabilities

Multi-Strategy Testing:

8 Chunking Strategies: FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW
3 Embedding Models: ADA-002, text-embedding-3-small, text-embedding-3-large
Multiple LLM Models: GPT-4o, GPT-4, GPT-4-turbo, GPT-3.5-turbo
3 Re-ranking Methods: MMR, Cross-Encoder, LLM Re-ranking
Conversation Tracking: Advanced conversation memory with summarization and context continuity

Configuration-Driven Testing:

Single Repository: Test all combinations without code changes
Dynamic Configuration: Modify keys.json to switch between strategies
Real-time Switching: Change embedding models, chunking strategies, and LLM models instantly
Performance Metrics: Built-in latency tracking and quality assessment

Professional Testing Features:

A/B Testing: Compare different configurations side-by-side
Performance Benchmarking: Measure latency, accuracy, and retrieval quality
Scalable Testing: Test with different knowledge bases and query types
Production-Ready: Deploy and test in real-world scenarios

💬 Advanced Conversation Features

Conversation Tracking & Memory:

Persistent Memory: Maintains conversation history across multiple queries
Automatic Summarization: Generates summaries for each Q&A exchange and cumulative conversation context
Context Continuity: Each query benefits from previous conversation context
Memory Management: Configurable conversation storage limits with automatic cleanup

Conversation Configuration:

Enable/Disable: Toggle conversation tracking via CONVERSATION_MOOD in keys.json
Storage Limits: Configure maximum conversations to store (NUMBER_OF_CONVERSATIONS_TO_STORE)
Dynamic Control: Change conversation settings without code modifications
Memory Optimization: Automatic removal of oldest conversations when limit exceeded

Conversation API Endpoints:

GET /conversation/status: View conversation statistics and current status
GET /conversation/history: Retrieve complete conversation history
DELETE /conversation/clear: Clear all conversation memory
GET /conversation/config: Get current conversation configuration

Conversation Benefits:

Enhanced Context: RAG responses improve with conversation history
Better Continuity: Follow-up questions maintain context from previous exchanges
Intelligent Summarization: Automatic generation of conversation summaries
Memory Efficiency: Smart memory management prevents unlimited growth

🔬 Research & Development Use Cases

Academic Research:

Compare chunking strategies for different document types
Evaluate embedding model performance across domains
Study the impact of re-ranking methods on retrieval quality
Analyze the relationship between chunk size and retrieval accuracy

Industry Applications:

Optimize RAG systems for specific use cases
Test different configurations for production deployment
Benchmark performance across various embedding models
Validate RAG improvements before deployment

Developer Testing:

Rapid prototyping of RAG configurations
Performance testing with real data
Integration testing with different vector databases
End-to-end RAG pipeline validation

🚀 Features

Intelligent Knowledge Ingestion: Process markdown files with configurable chunking strategies
Advanced Embedding: Support for multiple Azure OpenAI embedding models (ADA-002, text-embedding-3-small, text-embedding-3-large)
Vector Database Integration: Real-time Qdrant vector storage and retrieval
RAG Pipeline: Complete retrieval-augmented generation with re-ranking and context assembly
High Reusability: Configuration-driven architecture - change behavior via keys.json
Professional Architecture: Modular design with single responsibility principle

📋 System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Knowledge     │    │   Configurator  │    │   keys.json     │
│   Files (.md)   │───▶│   (Settings)    │◀───│  (Configuration)│
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────┐    ┌─────────────────┐
│   Chunker.py    │    │  Emdbeder.py    │
│  (Text Chunking)│    │ (Embeddings)    │
└─────────────────┘    └─────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────────────────────────────┐
│        KnowledgeIngestion.py            │
│     (Complete Ingestion Pipeline)       │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│           Qdrant Vector DB              │
│        (Vector Storage)                 │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│            RAG Pipeline                 │
│  (Retrieval + Generation)               │
└─────────────────────────────────────────┘

🛠️ Installation & Setup

Prerequisites

Python 3.8+
Azure OpenAI API access
Qdrant Cloud account

1. Clone and Setup

git clone https://github.com/rezwanx/TestTheRAG
cd TestTheRAG
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

1.5. Create Secret Configuration (REQUIRED)

# Copy the template and add your credentials
cp secret.json.template secret.json
nano secret.json  # Add your actual API keys

# Verify the file was created
ls -la secret.json  # Should exist and be ignored by git

⚠️ CRITICAL: You must create secret.json before running the system!

The system will not work without this file
It contains your actual API keys and credentials
This file is automatically ignored by git for security

2. Configuration & Security Setup

⚠️ IMPORTANT SECURITY NOTICE: This repository uses a secure dual-configuration system to protect your API credentials.

Setup your configuration:

# 1. Copy the secret template and add your credentials
cp secret.json.template secret.json
nano secret.json  # Add your actual API keys

# 2. Verify keys.json contains only placeholders (safe for git)
cat keys.json  # Should show placeholder values like "your-api-key-here"

# 3. Test your configuration
python -c "from Configurator import get_config; print('✅ Configuration loaded successfully')"

⚠️ IMPORTANT: The secret.json file is required for the system to work!

This file contains your actual API keys and credentials
It's automatically ignored by git (never committed to the repository)
Without this file, the system will fail to load configuration
Always use the template (secret.json.template) as a starting point

Configuration Files:

secret.json: Contains sensitive API keys (🚫 NEVER COMMITTED TO GIT)
keys.json: Contains non-sensitive configuration (✅ SAFE FOR PUBLIC REPOSITORY)

Required credentials in secret.json:

{
  "AZURE_OPENAI_ENDPOINT": "https://your-endpoint.openai.azure.com",
  "AZURE_OPENAI_API_KEY": "your-actual-api-key-here",
  "AZURE_OPENAI_API_VERSION": "2024-02-01",
  "QDRANT_URL": "https://your-cluster.qdrant.io",
  "QDRANT_API_KEY": "your-actual-qdrant-key-here",
  "QDRANT_COLLECTION": "knowledge_embeddings"
}

Complete configuration options in keys.json (non-sensitive settings only):

{
  "_comment": "All configuration options for the knowledge processing system",
  "_comment_credentials": "Sensitive credentials are loaded from secret.json (ignored by git)",
  
  "CHUNK_STRATEGY": "SEMANTIC",
  "_chunk_strategy_options": "FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW",
  
  "EMBED_MODEL": "EMB_3_LARGE",
  "_embed_model_options": "ADA_002, EMB_3_SMALL, EMB_3_LARGE",
  
  "LLM_MODEL": "gpt-4o",
  "_llm_model_options": "gpt-4o, gpt-4o-mini, gpt-4-turbo",
  
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7,
  
  "RAG_TOP_K": 10,
  "_rag_top_k_range": "8-20",
  
  "RAG_RERANK_METHOD": "MMR",
  "_rag_rerank_options": "MMR, CROSS_ENCODER, LLM_RERANK",
  
  "RAG_FINAL_CHUNKS": 5,
  "_rag_final_chunks_range": "3-7",
  
  "RAG_CONTEXT_MAX_TOKENS": 1500,
  "_rag_context_max_tokens_range": "1000-1500",
  
  "RAG_SYSTEM_PROMPT": "You are a retrieval-augmented assistant. Use only the provided context. Cite sources. If unsure, say you don't know.",
  
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.7,
  "_rag_mmr_diversity_range": "0.5-0.9",
  
  "CONVERSATION_MOOD": true,
  "_conversation_mood_options": "true, false",
  
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10,
  "_conversations_to_store_range": "5-20",
  
  "CONVERSATIONS": []
}

📋 Complete Configuration Reference

Chunking Configuration

CHUNK_STRATEGY: Text chunking strategy
- Options: FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW
- Default: SEMANTIC

Embedding Configuration

EMBED_MODEL: Embedding model to use
- Options: ADA_002, EMB_3_SMALL, EMB_3_LARGE
- Default: EMB_3_LARGE

LLM Configuration

LLM_MODEL: Language model to use
- Options: gpt-4o, gpt-4o-mini, gpt-4-turbo
- Default: gpt-4o
LLM_MAX_TOKENS: Maximum tokens for LLM responses
- Range: 100-4000
- Default: 1000
LLM_TEMPERATURE: LLM temperature (creativity)
- Range: 0.0-2.0
- Default: 0.7

RAG Pipeline Configuration

RAG_TOP_K: Number of chunks to retrieve initially
- Range: 8-20
- Default: 10
RAG_RERANK_METHOD: Re-ranking method
- Options: MMR, CROSS_ENCODER, LLM_RERANK
- Default: MMR
RAG_FINAL_CHUNKS: Number of final chunks for context
- Range: 3-7
- Default: 5
RAG_CONTEXT_MAX_TOKENS: Maximum tokens for context
- Range: 1000-1500
- Default: 1500
RAG_SYSTEM_PROMPT: System prompt for LLM
- Default: "You are a retrieval-augmented assistant. Use only the provided context. Cite sources. If unsure, say you don't know."
RAG_MMR_DIVERSITY_THRESHOLD: MMR diversity threshold
- Range: 0.5-0.9
- Default: 0.7

Conversation Configuration

CONVERSATION_MOOD: Enable/disable conversation tracking
- Options: true, false
- Default: false
NUMBER_OF_CONVERSATIONS_TO_STORE: Maximum conversations to store
- Range: 5-20
- Default: 10
CONVERSATIONS: Conversation storage array
- Default: []

🔐 Security Features:

Automatic Git Protection: secret.json is automatically excluded from version control
Clean Separation: keys.json contains only non-sensitive configuration settings
Dual Configuration: System loads from both files with proper priority
Credential History Cleaned: Previous commits with exposed credentials have been removed from git history

⚠️ Security Checklist:

secret.json exists locally with your real API keys
secret.json is ignored by git (git check-ignore secret.json should return the file)
keys.json contains only non-sensitive configuration settings
Never commit secret.json to any repository
Use environment variables in production deployments

3. Run the System

# Start the FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

📚 API Endpoints

Knowledge Management

`GET /FeedKnowledge`

Ingest all knowledge files from the Knowledge folder.

Process: Reads .md files → Chunks content → Creates embeddings → Stores in Qdrant
Response: Ingestion statistics and results

`GET /knowledge/status`

Get current knowledge ingestion status.

Response: File counts, collection info, configuration details

`DELETE /knowledge/clear`

Clear all data from the knowledge base.

Response: Clear operation results

RAG Query System

`POST /rag`

Process natural language queries using RAG.

Request: {"query": "Your question here"}
Response: AI-generated answer with sources and metadata

`GET /rag/validate`

Validate RAG pipeline components.

Response: Component status, errors, and warnings

`GET /rag/info`

Get RAG pipeline configuration information.

Response: Current settings and model information

Conversation Management

`GET /conversation/status`

Get conversation tracking status and statistics.

Response: Conversation enabled status, total conversations, storage limits

`GET /conversation/history`

Retrieve all stored conversation history.

Response: Complete conversation history with summaries and metadata

`DELETE /conversation/clear`

Clear all stored conversation memory.

Response: Confirmation of conversation memory clearing

`GET /conversation/config`

Get current conversation configuration settings.

Response: Conversation mood, storage limits, and configuration details

Direct LLM Processing

`POST /process`

Direct LLM processing without knowledge retrieval.

Request: {"input": "Your text here"}
Response: {"output": "LLM response"}

🔐 Security & Credential Management

Secure Configuration System

This repository implements a dual-configuration system to protect your API credentials:

File Structure:

secret.json 🚫 NEVER COMMITTED: Contains your actual API keys
keys.json ✅ SAFE FOR GIT: Contains only placeholder values
secret.json.template 📋 SETUP GUIDE: Template for creating secret.json

Security Features:

Automatic Git Protection: secret.json is automatically excluded from version control
History Cleanup: Previous commits with exposed credentials have been completely removed
Placeholder Safety: Public repository contains only safe placeholder values
Dual Loading: System loads from both files with proper priority

Setup Process:

Copy Template: cp secret.json.template secret.json
Add Credentials: Edit secret.json with your actual API keys
Verify Safety: Ensure keys.json contains only placeholders
Test Configuration: Run configuration test to verify setup

Security Checklist:

secret.json exists locally with real credentials
secret.json is ignored by git (git check-ignore secret.json)
keys.json contains only placeholder values
Never commit secret.json to any repository
Use environment variables in production

What Happens If Credentials Are Exposed:

If you accidentally commit secret.json:

Immediate Action: Remove the file from git tracking
History Cleanup: Use git filter-branch to remove from history
Force Push: Update remote repository to remove exposed credentials
Rotate Keys: Generate new API keys from your service providers
Verify Cleanup: Ensure no sensitive data remains in git history

⚙️ Configuration Options

Embedding Models

ADA_002: 1536 dimensions, cost-effective
EMB_3_SMALL: 1536 dimensions, balanced performance
EMB_3_LARGE: 3072 dimensions, highest quality

Chunking Strategies

FIXED: Fixed character count chunks
OVERLAP: Overlapping chunks for context preservation
SENTENCE: Sentence-based chunking
PARAGRAPH: Paragraph-based chunking
SEMANTIC: Semantic boundary detection (placeholder)
HYBRID: Heading-based with token windows
DOC_STRUCTURE: Markdown-aware structure preservation
SLIDING_WINDOW: Token-based sliding window

RAG Configuration

RAG_TOP_K: Initial chunks to retrieve (default: 10)
RAG_FINAL_CHUNKS: Chunks after re-ranking (default: 5)
RAG_CONTEXT_MAX_TOKENS: Maximum context size (default: 1500)
RAG_RERANK_METHOD: MMR, CROSS_ENCODER, or LLM_RERANK
RAG_MMR_DIVERSITY_THRESHOLD: Diversity vs relevance balance (0.5-0.9)

🧪 RAG Testing Examples

1. Test Different Chunking Strategies

Test Sentence-based Chunking:

// keys.json
{
  "CHUNK_STRATEGY": "SENTENCE",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o"
}

Test Semantic Chunking:

// keys.json
{
  "CHUNK_STRATEGY": "SEMANTIC",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o"
}

Test Hybrid Chunking:

// keys.json
{
  "CHUNK_STRATEGY": "HYBRID",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o"
}

2. Test Different Embedding Models

Test ADA-002 (Cost-effective):

// keys.json
{
  "EMBED_MODEL": "ADA_002",
  "CHUNK_STRATEGY": "SENTENCE"
}

Test text-embedding-3-small (Balanced):

// keys.json
{
  "EMBED_MODEL": "EMB_3_SMALL",
  "CHUNK_STRATEGY": "SENTENCE"
}

Test text-embedding-3-large (Highest Quality):

// keys.json
{
  "EMBED_MODEL": "EMB_3_LARGE",
  "CHUNK_STRATEGY": "SENTENCE"
}

3. Test Different LLM Models

Test GPT-4o (Latest):

// keys.json
{
  "LLM_MODEL": "gpt-4o",
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7
}

Test GPT-4 (Stable):

// keys.json
{
  "LLM_MODEL": "gpt-4",
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7
}

Test GPT-3.5-turbo (Fast):

// keys.json
{
  "LLM_MODEL": "gpt-3.5-turbo",
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7
}

Test GPT-4o-mini (Balanced):

// keys.json
{
  "LLM_MODEL": "gpt-4o-mini",
  "LLM_MAX_TOKENS": 1500,
  "LLM_TEMPERATURE": 0.5
}

4. Test Different RAG Configurations

High Precision Configuration:

// keys.json
{
  "RAG_TOP_K": 20,
  "RAG_FINAL_CHUNKS": 7,
  "RAG_CONTEXT_MAX_TOKENS": 2000,
  "RAG_RERANK_METHOD": "MMR",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.8
}

5. Test Conversation Features

Enable Conversation Tracking:

// keys.json
{
  "CONVERSATION_MOOD": true,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}

Disable Conversation Tracking (Default):

// keys.json
{
  "CONVERSATION_MOOD": false,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}

Test Conversation Memory Limits:

// keys.json
{
  "CONVERSATION_MOOD": true,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 5
}

Fast Response Configuration:

// keys.json
{
  "RAG_TOP_K": 5,
  "RAG_FINAL_CHUNKS": 3,
  "RAG_CONTEXT_MAX_TOKENS": 1000,
  "RAG_RERANK_METHOD": "MMR",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.6
}

High Quality Configuration:

// keys.json
{
  "RAG_TOP_K": 20,
  "RAG_FINAL_CHUNKS": 7,
  "RAG_CONTEXT_MAX_TOKENS": 2000,
  "RAG_RERANK_METHOD": "LLM_RERANK",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.8
}

Balanced Configuration:

// keys.json
{
  "RAG_TOP_K": 10,
  "RAG_FINAL_CHUNKS": 5,
  "RAG_CONTEXT_MAX_TOKENS": 1500,
  "RAG_RERANK_METHOD": "CROSS_ENCODER",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.7
}

6. Complete Configuration Testing

Test All Chunking Strategies:

// Test each strategy by changing CHUNK_STRATEGY
{
  "CHUNK_STRATEGY": "FIXED",        // Fixed-size chunks
  "CHUNK_STRATEGY": "OVERLAP",      // Overlapping chunks
  "CHUNK_STRATEGY": "SENTENCE",     // Sentence-based chunks
  "CHUNK_STRATEGY": "PARAGRAPH",    // Paragraph-based chunks
  "CHUNK_STRATEGY": "SEMANTIC",     // Semantic chunks (default)
  "CHUNK_STRATEGY": "HYBRID",       // Hybrid approach
  "CHUNK_STRATEGY": "DOC_STRUCTURE", // Document structure aware
  "CHUNK_STRATEGY": "SLIDING_WINDOW" // Sliding window chunks
}

Test All Embedding Models:

// Test each model by changing EMBED_MODEL
{
  "EMBED_MODEL": "ADA_002",         // Cost-effective
  "EMBED_MODEL": "EMB_3_SMALL",     // Balanced
  "EMBED_MODEL": "EMB_3_LARGE"      // Highest quality (default)
}

Test All Re-ranking Methods:

// Test each method by changing RAG_RERANK_METHOD
{
  "RAG_RERANK_METHOD": "MMR",           // Maximal Marginal Relevance (default)
  "RAG_RERANK_METHOD": "CROSS_ENCODER", // Cross-encoder re-ranking
  "RAG_RERANK_METHOD": "LLM_RERANK"     // LLM-based re-ranking
}

Test System Prompt Customization:

// Customize the system prompt
{
  "RAG_SYSTEM_PROMPT": "You are an expert assistant. Provide detailed, accurate answers based on the context. Always cite your sources."
}

7. A/B Testing Workflow

Step 1: Setup Configuration A

# Update keys.json with Configuration A
# Test and record results
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "Your test query here"}'

Step 2: Switch to Configuration B

# Update keys.json with Configuration B
# Test same query and compare results
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "Your test query here"}'

Step 3: Compare Results

Compare response quality
Measure latency differences
Analyze source relevance
Evaluate answer accuracy

🔧 Basic Usage Examples

1. Ingest Knowledge

curl -X GET "http://localhost:8000/FeedKnowledge"

2. Query with RAG

curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the delivery guidelines?"}'

3. Check System Status

curl -X GET "http://localhost:8000/rag/validate"

4. Get Configuration Info

curl -X GET "http://localhost:8000/rag/info"

🎯 RAG Testing Best Practices

1. Systematic Testing Approach

Test One Variable at a Time:

Keep all other settings constant when testing a specific component
Change only the variable you want to evaluate
Record results for each configuration

Use Consistent Test Queries:

Create a standardized set of test queries
Include different types: factual, analytical, comparative
Test with various query lengths and complexities

Measure Multiple Metrics:

Latency: Response time for each configuration
Quality: Relevance and accuracy of answers
Retrieval: Number and quality of source documents
Cost: API usage and computational resources

2. Testing Methodology

Baseline Establishment:

// Start with a baseline configuration
{
  "CHUNK_STRATEGY": "SENTENCE",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o",
  "RAG_TOP_K": 10,
  "RAG_FINAL_CHUNKS": 5
}

Incremental Testing:

Test different chunking strategies with same embedding model
Test different embedding models with same chunking strategy
Test different LLM models with same retrieval configuration
Test different RAG parameters with same models

Performance Benchmarking:

Run multiple queries with each configuration
Calculate average latency and quality scores
Compare results across different configurations
Document findings and recommendations

3. Test Data Preparation

Knowledge Base Variety:

Test with different document types (technical, narrative, structured)
Use various document lengths and complexities
Include different domains and subject matters

Query Diversity:

Factual Queries: "What is the definition of X?"
Analytical Queries: "Compare and contrast X and Y"
Procedural Queries: "How do I perform X?"
Complex Queries: Multi-part questions requiring synthesis

4. Results Analysis

Quantitative Metrics:

Response time (latency)
Token usage and costs
Retrieval precision and recall
Answer completeness scores

Qualitative Assessment:

Answer relevance and accuracy
Source document quality
Response coherence and clarity
Factual correctness

Comparative Analysis:

Side-by-side configuration comparisons
Performance trade-offs identification
Cost-benefit analysis
Use case optimization recommendations

5. Production Readiness Testing

Load Testing:

Test with high query volumes
Measure system performance under load
Identify bottlenecks and optimization opportunities

Integration Testing:

Test with real-world data
Validate end-to-end pipeline performance
Ensure system reliability and stability

Monitoring Setup:

Implement performance monitoring
Set up alerting for system health
Track usage patterns and optimization opportunities

📁 Project Structure

test-the-rag/
├── main.py                 # FastAPI application and endpoints
├── Configurator.py         # Configuration management
├── Emdbeder.py            # Embedding operations
├── Chunker.py             # Text chunking strategies
├── KnowledgeIngestion.py  # Complete ingestion pipeline
├── KnowledgeProcessor.py  # Knowledge file processing
├── QdrantManager.py       # Vector database operations
├── RAGPipeline.py         # RAG orchestration
├── RAGRetriever.py        # Vector search and retrieval
├── RAGProcessor.py        # Post-processing and re-ranking
├── llmhandler.py          # LLM interaction
├── keys.json              # Non-sensitive configuration (safe for git)
├── secret.json            # Sensitive credentials (ignored by git)
├── secret.json.template   # Template for secret.json setup
├── Knowledge/             # Knowledge files directory
│   └── sample_knowledge.md
├── requirements.txt       # Python dependencies
└── README.md             # This file

🏗️ Architecture Principles

High Reusability

Single Source of Truth: All embedding operations use Emdbeder.py
Centralized Chunking: All chunking operations use Chunker.py
Configuration-Driven: Change keys.json → Change behavior everywhere

Professional Standards

Single Responsibility: Each module has one clear purpose
Dependency Injection: Components receive configuration via constructor
Error Handling: Comprehensive error messages and graceful fallbacks
Type Safety: Full type hints and Pydantic models

Scalability

Modular Design: Easy to extend with new strategies
Async Operations: Non-blocking I/O for better performance
Vector Database: Scalable storage and retrieval

🔍 Interactive Documentation

Visit http://localhost:8000/docs for automatic interactive API documentation with Swagger UI.

🚀 Quick Start

Setup: Follow installation steps above
Configure: Create secret.json with your API credentials (REQUIRED!)
Ingest: GET /FeedKnowledge to process your knowledge files
Query: POST /rag to ask questions about your knowledge base

⚠️ IMPORTANT: You must create secret.json from the template before the system will work!

💬 Conversation Usage Examples

Basic Conversation Flow

1. Enable Conversation Mode:

// keys.json
{
  "CONVERSATION_MOOD": true,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}

2. Start a Conversation:

# First query
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the delivery management guidelines?"}'

3. Follow-up Questions:

# Second query (benefits from conversation context)
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "How are they organized?"}'

# Third query (maintains conversation continuity)
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the purpose of these guidelines?"}'

4. Check Conversation Status:

# View conversation statistics
curl -X GET "http://localhost:8000/conversation/status"

# View conversation history
curl -X GET "http://localhost:8000/conversation/history"

# Clear conversation memory
curl -X DELETE "http://localhost:8000/conversation/clear"

Conversation Response Format

When conversation mode is enabled, RAG responses include conversation metadata:

{
  "answer": "The delivery management guidelines are...",
  "sources": [...],
  "conversation": {
    "enabled": true,
    "serial_number": 3,
    "current_qa_summary": "Summary of current Q&A exchange...",
    "conversation_summary_so_far": "Cumulative conversation summary...",
    "total_conversations": 3
  },
  "used_model": {...},
  "latency_ms": 2500
}

Conversation Benefits

Enhanced Context:

Each query benefits from previous conversation context
Follow-up questions maintain continuity
Better understanding of user intent

Automatic Summarization:

Current Q&A summary for each exchange
Cumulative conversation summary
Intelligent memory management

Memory Management:

Configurable storage limits
Automatic cleanup of old conversations
Efficient memory usage

📊 Logging and Monitoring

Comprehensive Logging System

The system includes extensive logging for all operations:

Log Files:

rag_system.log: Complete system logs with timestamps
Console Output: Real-time logging with emoji indicators

Log Categories:

🚀 [API]: API endpoint calls and responses
🔍 [RAG_PIPELINE]: RAG pipeline step-by-step processing
📚 [INGESTION]: Knowledge ingestion operations
💬 [CONVERSATION]: Conversation tracking and management
🤖 [LLM]: LLM processing and responses
✅ [SUCCESS]: Successful operations
❌ [ERROR]: Error conditions and failures

Logging Features:

Step-by-step tracking: Every RAG pipeline step is logged
Performance metrics: Timing information for all operations
Error tracking: Detailed error logging with context
Conversation tracking: Complete conversation flow logging
API monitoring: All API calls and responses logged

Example Log Output:

2024-01-15 10:30:15 - main - INFO - 🚀 [API] /rag endpoint called with query length: 45 chars
2024-01-15 10:30:15 - main - INFO - 📝 [API] Query preview: What are the delivery guidelines?
2024-01-15 10:30:15 - RAGPipeline - INFO - 🔍 [RAG_PIPELINE] Step 1: Embedding query...
2024-01-15 10:30:16 - RAGPipeline - INFO - ✅ [RAG_PIPELINE] Step 1 completed - Retrieved 10 chunks in 1250.5ms
2024-01-15 10:30:16 - RAGPipeline - INFO - 🔄 [RAG_PIPELINE] Step 2: Re-ranking 10 chunks...
2024-01-15 10:30:16 - RAGPipeline - INFO - ✅ [RAG_PIPELINE] Step 2 completed - Re-ranked to 5 chunks in 45.2ms

🔧 Troubleshooting

Common Issues

❌ "Keys file 'keys.json' not found"

Solution: The keys.json file should be in the repository. If missing, check your git clone.

❌ "Missing required keys" or "Missing or placeholder values"

Solution: You need to create secret.json from the template:

cp secret.json.template secret.json
# Then edit secret.json with your actual API keys

❌ "Configuration loaded successfully" but system fails

Solution: Check that secret.json contains real API keys, not placeholder values.

❌ System works locally but fails in production

Solution: Ensure secret.json exists in your production environment and contains valid credentials.

📊 Performance

Embedding: ~100ms per chunk (varies by model)
RAG Query: ~2-5 seconds end-to-end
Knowledge Ingestion: ~1-2 seconds per file
Vector Search: <100ms for similarity search

🔒 Security

All secrets stored in secret.json (gitignored)
Environment variable overrides supported
No hardcoded credentials in code
Secure API key management

🤝 Contributing

Follow the established architecture patterns
Maintain high reusability principles
Add comprehensive docstrings
Update configuration in keys.json
Test all components thoroughly

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

MIT License

Copyright (c) 2024 WhatTheBot RAG System

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Built with ❤️ using FastAPI, Azure OpenAI, and Qdrant

🎯 About Test the RAG

Test the RAG is specifically designed as a comprehensive testing platform for Retrieval-Augmented Generation systems. It provides researchers, developers, and AI practitioners with a unified environment to:

Test Multiple Configurations: Experiment with different chunking strategies, embedding models, and LLM combinations
Compare Performance: Benchmark different RAG approaches side-by-side
Optimize Systems: Find the best configuration for specific use cases
Validate Improvements: Test RAG enhancements before production deployment

Why Test the RAG?

Single Repository Testing:

Test all RAG configurations from one codebase
No need to maintain multiple testing environments
Consistent testing methodology across all experiments

Configuration-Driven:

Change behavior by modifying keys.json
No code changes required for different configurations
Rapid prototyping and experimentation

Production-Ready:

Real Azure OpenAI integration
Actual Qdrant vector database
Professional monitoring and validation endpoints

Research-Friendly:

Comprehensive documentation and examples
Built-in performance metrics and monitoring
Easy A/B testing and comparison workflows

🚀 Future Improvements & Roadmap

This RAG system represents a solid foundation for advanced retrieval-augmented generation applications, with several key areas identified for future enhancement. Architectural improvements include implementing dependency injection patterns, service containers, and event-driven architecture to improve modularity and testability. Code quality enhancements focus on comprehensive type safety with Pydantic models, advanced configuration validation, and extraction of magic numbers into well-defined constants. Performance optimizations will introduce connection pooling, caching mechanisms, batch processing for embeddings, and async/await patterns throughout the pipeline. Error handling and resilience improvements include custom exception hierarchies, circuit breaker patterns, retry mechanisms with exponential backoff, and comprehensive logging with structured formats. Testing infrastructure will be expanded with unit tests, integration tests, performance benchmarks, and automated testing pipelines. Security enhancements will implement input sanitization, rate limiting, API key rotation, and audit logging. Documentation improvements will add API documentation, architecture diagrams, deployment guides, and developer onboarding materials. These improvements will transform the system from a functional RAG platform into a production-ready, enterprise-grade solution that can scale to handle high-volume workloads while maintaining reliability, security, and performance standards.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Knowledge		Knowledge
__pycache__		__pycache__
.gitignore		.gitignore
Chunker.py		Chunker.py
Configurator.py		Configurator.py
ConversationManager.py		ConversationManager.py
Emdbeder.py		Emdbeder.py
KnowledgeIngestion.py		KnowledgeIngestion.py
KnowledgeProcessor.py		KnowledgeProcessor.py
LICENSE		LICENSE
QdrantManager.py		QdrantManager.py
RAGPipeline.py		RAGPipeline.py
RAGProcessor.py		RAGProcessor.py
RAGRetriever.py		RAGRetriever.py
README.md		README.md
keys.json		keys.json
keys.json.template		keys.json.template
llmhandler.py		llmhandler.py
main.py		main.py
requirements.txt		requirements.txt
secret.json.template		secret.json.template

Folders and files

Latest commit

History

Repository files navigation