Skip to content

rezwanx/TestTheRAG

Repository files navigation

Test the RAG - Advanced RAG Knowledge Processing System

A professional-grade Retrieval-Augmented Generation (RAG) testing platform built with FastAPI, featuring intelligent knowledge ingestion, vector storage, and AI-powered query processing.

⚠️ Setup Required: This system requires you to create a secret.json file with your API credentials before it can run. See the Configuration section below for details.

🧪 RAG Testing: Professional

Test the RAG serves as a comprehensive RAG testing platform that enables researchers, developers, and AI practitioners to experiment with and evaluate different RAG configurations from a single repository. This system provides a unified environment for testing various chunking strategies, embedding models, and LLM combinations to optimize RAG performance.

🎯 RAG Testing Capabilities

Multi-Strategy Testing:

  • 8 Chunking Strategies: FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW
  • 3 Embedding Models: ADA-002, text-embedding-3-small, text-embedding-3-large
  • Multiple LLM Models: GPT-4o, GPT-4, GPT-4-turbo, GPT-3.5-turbo
  • 3 Re-ranking Methods: MMR, Cross-Encoder, LLM Re-ranking
  • Conversation Tracking: Advanced conversation memory with summarization and context continuity

Configuration-Driven Testing:

  • Single Repository: Test all combinations without code changes
  • Dynamic Configuration: Modify keys.json to switch between strategies
  • Real-time Switching: Change embedding models, chunking strategies, and LLM models instantly
  • Performance Metrics: Built-in latency tracking and quality assessment

Professional Testing Features:

  • A/B Testing: Compare different configurations side-by-side
  • Performance Benchmarking: Measure latency, accuracy, and retrieval quality
  • Scalable Testing: Test with different knowledge bases and query types
  • Production-Ready: Deploy and test in real-world scenarios

💬 Advanced Conversation Features

Conversation Tracking & Memory:

  • Persistent Memory: Maintains conversation history across multiple queries
  • Automatic Summarization: Generates summaries for each Q&A exchange and cumulative conversation context
  • Context Continuity: Each query benefits from previous conversation context
  • Memory Management: Configurable conversation storage limits with automatic cleanup

Conversation Configuration:

  • Enable/Disable: Toggle conversation tracking via CONVERSATION_MOOD in keys.json
  • Storage Limits: Configure maximum conversations to store (NUMBER_OF_CONVERSATIONS_TO_STORE)
  • Dynamic Control: Change conversation settings without code modifications
  • Memory Optimization: Automatic removal of oldest conversations when limit exceeded

Conversation API Endpoints:

  • GET /conversation/status: View conversation statistics and current status
  • GET /conversation/history: Retrieve complete conversation history
  • DELETE /conversation/clear: Clear all conversation memory
  • GET /conversation/config: Get current conversation configuration

Conversation Benefits:

  • Enhanced Context: RAG responses improve with conversation history
  • Better Continuity: Follow-up questions maintain context from previous exchanges
  • Intelligent Summarization: Automatic generation of conversation summaries
  • Memory Efficiency: Smart memory management prevents unlimited growth

🔬 Research & Development Use Cases

Academic Research:

  • Compare chunking strategies for different document types
  • Evaluate embedding model performance across domains
  • Study the impact of re-ranking methods on retrieval quality
  • Analyze the relationship between chunk size and retrieval accuracy

Industry Applications:

  • Optimize RAG systems for specific use cases
  • Test different configurations for production deployment
  • Benchmark performance across various embedding models
  • Validate RAG improvements before deployment

Developer Testing:

  • Rapid prototyping of RAG configurations
  • Performance testing with real data
  • Integration testing with different vector databases
  • End-to-end RAG pipeline validation

🚀 Features

  • Intelligent Knowledge Ingestion: Process markdown files with configurable chunking strategies
  • Advanced Embedding: Support for multiple Azure OpenAI embedding models (ADA-002, text-embedding-3-small, text-embedding-3-large)
  • Vector Database Integration: Real-time Qdrant vector storage and retrieval
  • RAG Pipeline: Complete retrieval-augmented generation with re-ranking and context assembly
  • High Reusability: Configuration-driven architecture - change behavior via keys.json
  • Professional Architecture: Modular design with single responsibility principle

📋 System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Knowledge     │    │   Configurator  │    │   keys.json     │
│   Files (.md)   │───▶│   (Settings)    │◀───│  (Configuration)│
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────┐    ┌─────────────────┐
│   Chunker.py    │    │  Emdbeder.py    │
│  (Text Chunking)│    │ (Embeddings)    │
└─────────────────┘    └─────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────────────────────────────┐
│        KnowledgeIngestion.py            │
│     (Complete Ingestion Pipeline)       │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│           Qdrant Vector DB              │
│        (Vector Storage)                 │
└─────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│            RAG Pipeline                 │
│  (Retrieval + Generation)               │
└─────────────────────────────────────────┘

🛠️ Installation & Setup

Prerequisites

  • Python 3.8+
  • Azure OpenAI API access
  • Qdrant Cloud account

1. Clone and Setup

git clone https://github.com/rezwanx/TestTheRAG
cd TestTheRAG
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

1.5. Create Secret Configuration (REQUIRED)

# Copy the template and add your credentials
cp secret.json.template secret.json
nano secret.json  # Add your actual API keys

# Verify the file was created
ls -la secret.json  # Should exist and be ignored by git

⚠️ CRITICAL: You must create secret.json before running the system!

  • The system will not work without this file
  • It contains your actual API keys and credentials
  • This file is automatically ignored by git for security

2. Configuration & Security Setup

⚠️ IMPORTANT SECURITY NOTICE: This repository uses a secure dual-configuration system to protect your API credentials.

Setup your configuration:

# 1. Copy the secret template and add your credentials
cp secret.json.template secret.json
nano secret.json  # Add your actual API keys

# 2. Verify keys.json contains only placeholders (safe for git)
cat keys.json  # Should show placeholder values like "your-api-key-here"

# 3. Test your configuration
python -c "from Configurator import get_config; print('✅ Configuration loaded successfully')"

⚠️ IMPORTANT: The secret.json file is required for the system to work!

  • This file contains your actual API keys and credentials
  • It's automatically ignored by git (never committed to the repository)
  • Without this file, the system will fail to load configuration
  • Always use the template (secret.json.template) as a starting point

Configuration Files:

  • secret.json: Contains sensitive API keys (🚫 NEVER COMMITTED TO GIT)
  • keys.json: Contains non-sensitive configuration (✅ SAFE FOR PUBLIC REPOSITORY)

Required credentials in secret.json:

{
  "AZURE_OPENAI_ENDPOINT": "https://your-endpoint.openai.azure.com",
  "AZURE_OPENAI_API_KEY": "your-actual-api-key-here",
  "AZURE_OPENAI_API_VERSION": "2024-02-01",
  "QDRANT_URL": "https://your-cluster.qdrant.io",
  "QDRANT_API_KEY": "your-actual-qdrant-key-here",
  "QDRANT_COLLECTION": "knowledge_embeddings"
}

Complete configuration options in keys.json (non-sensitive settings only):

{
  "_comment": "All configuration options for the knowledge processing system",
  "_comment_credentials": "Sensitive credentials are loaded from secret.json (ignored by git)",
  
  "CHUNK_STRATEGY": "SEMANTIC",
  "_chunk_strategy_options": "FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW",
  
  "EMBED_MODEL": "EMB_3_LARGE",
  "_embed_model_options": "ADA_002, EMB_3_SMALL, EMB_3_LARGE",
  
  "LLM_MODEL": "gpt-4o",
  "_llm_model_options": "gpt-4o, gpt-4o-mini, gpt-4-turbo",
  
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7,
  
  "RAG_TOP_K": 10,
  "_rag_top_k_range": "8-20",
  
  "RAG_RERANK_METHOD": "MMR",
  "_rag_rerank_options": "MMR, CROSS_ENCODER, LLM_RERANK",
  
  "RAG_FINAL_CHUNKS": 5,
  "_rag_final_chunks_range": "3-7",
  
  "RAG_CONTEXT_MAX_TOKENS": 1500,
  "_rag_context_max_tokens_range": "1000-1500",
  
  "RAG_SYSTEM_PROMPT": "You are a retrieval-augmented assistant. Use only the provided context. Cite sources. If unsure, say you don't know.",
  
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.7,
  "_rag_mmr_diversity_range": "0.5-0.9",
  
  "CONVERSATION_MOOD": true,
  "_conversation_mood_options": "true, false",
  
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10,
  "_conversations_to_store_range": "5-20",
  
  "CONVERSATIONS": []
}

📋 Complete Configuration Reference

Chunking Configuration

  • CHUNK_STRATEGY: Text chunking strategy
    • Options: FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW
    • Default: SEMANTIC

Embedding Configuration

  • EMBED_MODEL: Embedding model to use
    • Options: ADA_002, EMB_3_SMALL, EMB_3_LARGE
    • Default: EMB_3_LARGE

LLM Configuration

  • LLM_MODEL: Language model to use
    • Options: gpt-4o, gpt-4o-mini, gpt-4-turbo
    • Default: gpt-4o
  • LLM_MAX_TOKENS: Maximum tokens for LLM responses
    • Range: 100-4000
    • Default: 1000
  • LLM_TEMPERATURE: LLM temperature (creativity)
    • Range: 0.0-2.0
    • Default: 0.7

RAG Pipeline Configuration

  • RAG_TOP_K: Number of chunks to retrieve initially
    • Range: 8-20
    • Default: 10
  • RAG_RERANK_METHOD: Re-ranking method
    • Options: MMR, CROSS_ENCODER, LLM_RERANK
    • Default: MMR
  • RAG_FINAL_CHUNKS: Number of final chunks for context
    • Range: 3-7
    • Default: 5
  • RAG_CONTEXT_MAX_TOKENS: Maximum tokens for context
    • Range: 1000-1500
    • Default: 1500
  • RAG_SYSTEM_PROMPT: System prompt for LLM
    • Default: "You are a retrieval-augmented assistant. Use only the provided context. Cite sources. If unsure, say you don't know."
  • RAG_MMR_DIVERSITY_THRESHOLD: MMR diversity threshold
    • Range: 0.5-0.9
    • Default: 0.7

Conversation Configuration

  • CONVERSATION_MOOD: Enable/disable conversation tracking
    • Options: true, false
    • Default: false
  • NUMBER_OF_CONVERSATIONS_TO_STORE: Maximum conversations to store
    • Range: 5-20
    • Default: 10
  • CONVERSATIONS: Conversation storage array
    • Default: []

🔐 Security Features:

  • Automatic Git Protection: secret.json is automatically excluded from version control
  • Clean Separation: keys.json contains only non-sensitive configuration settings
  • Dual Configuration: System loads from both files with proper priority
  • Credential History Cleaned: Previous commits with exposed credentials have been removed from git history

⚠️ Security Checklist:

  • secret.json exists locally with your real API keys
  • secret.json is ignored by git (git check-ignore secret.json should return the file)
  • keys.json contains only non-sensitive configuration settings
  • Never commit secret.json to any repository
  • Use environment variables in production deployments

3. Run the System

# Start the FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

📚 API Endpoints

Knowledge Management

GET /FeedKnowledge

Ingest all knowledge files from the Knowledge folder.

  • Process: Reads .md files → Chunks content → Creates embeddings → Stores in Qdrant
  • Response: Ingestion statistics and results

GET /knowledge/status

Get current knowledge ingestion status.

  • Response: File counts, collection info, configuration details

DELETE /knowledge/clear

Clear all data from the knowledge base.

  • Response: Clear operation results

RAG Query System

POST /rag

Process natural language queries using RAG.

  • Request: {"query": "Your question here"}
  • Response: AI-generated answer with sources and metadata

GET /rag/validate

Validate RAG pipeline components.

  • Response: Component status, errors, and warnings

GET /rag/info

Get RAG pipeline configuration information.

  • Response: Current settings and model information

Conversation Management

GET /conversation/status

Get conversation tracking status and statistics.

  • Response: Conversation enabled status, total conversations, storage limits

GET /conversation/history

Retrieve all stored conversation history.

  • Response: Complete conversation history with summaries and metadata

DELETE /conversation/clear

Clear all stored conversation memory.

  • Response: Confirmation of conversation memory clearing

GET /conversation/config

Get current conversation configuration settings.

  • Response: Conversation mood, storage limits, and configuration details

Direct LLM Processing

POST /process

Direct LLM processing without knowledge retrieval.

  • Request: {"input": "Your text here"}
  • Response: {"output": "LLM response"}

🔐 Security & Credential Management

Secure Configuration System

This repository implements a dual-configuration system to protect your API credentials:

File Structure:

  • secret.json 🚫 NEVER COMMITTED: Contains your actual API keys
  • keys.jsonSAFE FOR GIT: Contains only placeholder values
  • secret.json.template 📋 SETUP GUIDE: Template for creating secret.json

Security Features:

  • Automatic Git Protection: secret.json is automatically excluded from version control
  • History Cleanup: Previous commits with exposed credentials have been completely removed
  • Placeholder Safety: Public repository contains only safe placeholder values
  • Dual Loading: System loads from both files with proper priority

Setup Process:

  1. Copy Template: cp secret.json.template secret.json
  2. Add Credentials: Edit secret.json with your actual API keys
  3. Verify Safety: Ensure keys.json contains only placeholders
  4. Test Configuration: Run configuration test to verify setup

Security Checklist:

  • secret.json exists locally with real credentials
  • secret.json is ignored by git (git check-ignore secret.json)
  • keys.json contains only placeholder values
  • Never commit secret.json to any repository
  • Use environment variables in production

What Happens If Credentials Are Exposed:

If you accidentally commit secret.json:

  1. Immediate Action: Remove the file from git tracking
  2. History Cleanup: Use git filter-branch to remove from history
  3. Force Push: Update remote repository to remove exposed credentials
  4. Rotate Keys: Generate new API keys from your service providers
  5. Verify Cleanup: Ensure no sensitive data remains in git history

⚙️ Configuration Options

Embedding Models

  • ADA_002: 1536 dimensions, cost-effective
  • EMB_3_SMALL: 1536 dimensions, balanced performance
  • EMB_3_LARGE: 3072 dimensions, highest quality

Chunking Strategies

  • FIXED: Fixed character count chunks
  • OVERLAP: Overlapping chunks for context preservation
  • SENTENCE: Sentence-based chunking
  • PARAGRAPH: Paragraph-based chunking
  • SEMANTIC: Semantic boundary detection (placeholder)
  • HYBRID: Heading-based with token windows
  • DOC_STRUCTURE: Markdown-aware structure preservation
  • SLIDING_WINDOW: Token-based sliding window

RAG Configuration

  • RAG_TOP_K: Initial chunks to retrieve (default: 10)
  • RAG_FINAL_CHUNKS: Chunks after re-ranking (default: 5)
  • RAG_CONTEXT_MAX_TOKENS: Maximum context size (default: 1500)
  • RAG_RERANK_METHOD: MMR, CROSS_ENCODER, or LLM_RERANK
  • RAG_MMR_DIVERSITY_THRESHOLD: Diversity vs relevance balance (0.5-0.9)

🧪 RAG Testing Examples

1. Test Different Chunking Strategies

Test Sentence-based Chunking:

// keys.json
{
  "CHUNK_STRATEGY": "SENTENCE",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o"
}

Test Semantic Chunking:

// keys.json
{
  "CHUNK_STRATEGY": "SEMANTIC",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o"
}

Test Hybrid Chunking:

// keys.json
{
  "CHUNK_STRATEGY": "HYBRID",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o"
}

2. Test Different Embedding Models

Test ADA-002 (Cost-effective):

// keys.json
{
  "EMBED_MODEL": "ADA_002",
  "CHUNK_STRATEGY": "SENTENCE"
}

Test text-embedding-3-small (Balanced):

// keys.json
{
  "EMBED_MODEL": "EMB_3_SMALL",
  "CHUNK_STRATEGY": "SENTENCE"
}

Test text-embedding-3-large (Highest Quality):

// keys.json
{
  "EMBED_MODEL": "EMB_3_LARGE",
  "CHUNK_STRATEGY": "SENTENCE"
}

3. Test Different LLM Models

Test GPT-4o (Latest):

// keys.json
{
  "LLM_MODEL": "gpt-4o",
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7
}

Test GPT-4 (Stable):

// keys.json
{
  "LLM_MODEL": "gpt-4",
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7
}

Test GPT-3.5-turbo (Fast):

// keys.json
{
  "LLM_MODEL": "gpt-3.5-turbo",
  "LLM_MAX_TOKENS": 1000,
  "LLM_TEMPERATURE": 0.7
}

Test GPT-4o-mini (Balanced):

// keys.json
{
  "LLM_MODEL": "gpt-4o-mini",
  "LLM_MAX_TOKENS": 1500,
  "LLM_TEMPERATURE": 0.5
}

4. Test Different RAG Configurations

High Precision Configuration:

// keys.json
{
  "RAG_TOP_K": 20,
  "RAG_FINAL_CHUNKS": 7,
  "RAG_CONTEXT_MAX_TOKENS": 2000,
  "RAG_RERANK_METHOD": "MMR",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.8
}

5. Test Conversation Features

Enable Conversation Tracking:

// keys.json
{
  "CONVERSATION_MOOD": true,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}

Disable Conversation Tracking (Default):

// keys.json
{
  "CONVERSATION_MOOD": false,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}

Test Conversation Memory Limits:

// keys.json
{
  "CONVERSATION_MOOD": true,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 5
}

Fast Response Configuration:

// keys.json
{
  "RAG_TOP_K": 5,
  "RAG_FINAL_CHUNKS": 3,
  "RAG_CONTEXT_MAX_TOKENS": 1000,
  "RAG_RERANK_METHOD": "MMR",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.6
}

High Quality Configuration:

// keys.json
{
  "RAG_TOP_K": 20,
  "RAG_FINAL_CHUNKS": 7,
  "RAG_CONTEXT_MAX_TOKENS": 2000,
  "RAG_RERANK_METHOD": "LLM_RERANK",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.8
}

Balanced Configuration:

// keys.json
{
  "RAG_TOP_K": 10,
  "RAG_FINAL_CHUNKS": 5,
  "RAG_CONTEXT_MAX_TOKENS": 1500,
  "RAG_RERANK_METHOD": "CROSS_ENCODER",
  "RAG_MMR_DIVERSITY_THRESHOLD": 0.7
}

6. Complete Configuration Testing

Test All Chunking Strategies:

// Test each strategy by changing CHUNK_STRATEGY
{
  "CHUNK_STRATEGY": "FIXED",        // Fixed-size chunks
  "CHUNK_STRATEGY": "OVERLAP",      // Overlapping chunks
  "CHUNK_STRATEGY": "SENTENCE",     // Sentence-based chunks
  "CHUNK_STRATEGY": "PARAGRAPH",    // Paragraph-based chunks
  "CHUNK_STRATEGY": "SEMANTIC",     // Semantic chunks (default)
  "CHUNK_STRATEGY": "HYBRID",       // Hybrid approach
  "CHUNK_STRATEGY": "DOC_STRUCTURE", // Document structure aware
  "CHUNK_STRATEGY": "SLIDING_WINDOW" // Sliding window chunks
}

Test All Embedding Models:

// Test each model by changing EMBED_MODEL
{
  "EMBED_MODEL": "ADA_002",         // Cost-effective
  "EMBED_MODEL": "EMB_3_SMALL",     // Balanced
  "EMBED_MODEL": "EMB_3_LARGE"      // Highest quality (default)
}

Test All Re-ranking Methods:

// Test each method by changing RAG_RERANK_METHOD
{
  "RAG_RERANK_METHOD": "MMR",           // Maximal Marginal Relevance (default)
  "RAG_RERANK_METHOD": "CROSS_ENCODER", // Cross-encoder re-ranking
  "RAG_RERANK_METHOD": "LLM_RERANK"     // LLM-based re-ranking
}

Test System Prompt Customization:

// Customize the system prompt
{
  "RAG_SYSTEM_PROMPT": "You are an expert assistant. Provide detailed, accurate answers based on the context. Always cite your sources."
}

7. A/B Testing Workflow

Step 1: Setup Configuration A

# Update keys.json with Configuration A
# Test and record results
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "Your test query here"}'

Step 2: Switch to Configuration B

# Update keys.json with Configuration B
# Test same query and compare results
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "Your test query here"}'

Step 3: Compare Results

  • Compare response quality
  • Measure latency differences
  • Analyze source relevance
  • Evaluate answer accuracy

🔧 Basic Usage Examples

1. Ingest Knowledge

curl -X GET "http://localhost:8000/FeedKnowledge"

2. Query with RAG

curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the delivery guidelines?"}'

3. Check System Status

curl -X GET "http://localhost:8000/rag/validate"

4. Get Configuration Info

curl -X GET "http://localhost:8000/rag/info"

🎯 RAG Testing Best Practices

1. Systematic Testing Approach

Test One Variable at a Time:

  • Keep all other settings constant when testing a specific component
  • Change only the variable you want to evaluate
  • Record results for each configuration

Use Consistent Test Queries:

  • Create a standardized set of test queries
  • Include different types: factual, analytical, comparative
  • Test with various query lengths and complexities

Measure Multiple Metrics:

  • Latency: Response time for each configuration
  • Quality: Relevance and accuracy of answers
  • Retrieval: Number and quality of source documents
  • Cost: API usage and computational resources

2. Testing Methodology

Baseline Establishment:

// Start with a baseline configuration
{
  "CHUNK_STRATEGY": "SENTENCE",
  "EMBED_MODEL": "EMB_3_LARGE",
  "LLM_MODEL": "gpt-4o",
  "RAG_TOP_K": 10,
  "RAG_FINAL_CHUNKS": 5
}

Incremental Testing:

  1. Test different chunking strategies with same embedding model
  2. Test different embedding models with same chunking strategy
  3. Test different LLM models with same retrieval configuration
  4. Test different RAG parameters with same models

Performance Benchmarking:

  • Run multiple queries with each configuration
  • Calculate average latency and quality scores
  • Compare results across different configurations
  • Document findings and recommendations

3. Test Data Preparation

Knowledge Base Variety:

  • Test with different document types (technical, narrative, structured)
  • Use various document lengths and complexities
  • Include different domains and subject matters

Query Diversity:

  • Factual Queries: "What is the definition of X?"
  • Analytical Queries: "Compare and contrast X and Y"
  • Procedural Queries: "How do I perform X?"
  • Complex Queries: Multi-part questions requiring synthesis

4. Results Analysis

Quantitative Metrics:

  • Response time (latency)
  • Token usage and costs
  • Retrieval precision and recall
  • Answer completeness scores

Qualitative Assessment:

  • Answer relevance and accuracy
  • Source document quality
  • Response coherence and clarity
  • Factual correctness

Comparative Analysis:

  • Side-by-side configuration comparisons
  • Performance trade-offs identification
  • Cost-benefit analysis
  • Use case optimization recommendations

5. Production Readiness Testing

Load Testing:

  • Test with high query volumes
  • Measure system performance under load
  • Identify bottlenecks and optimization opportunities

Integration Testing:

  • Test with real-world data
  • Validate end-to-end pipeline performance
  • Ensure system reliability and stability

Monitoring Setup:

  • Implement performance monitoring
  • Set up alerting for system health
  • Track usage patterns and optimization opportunities

📁 Project Structure

test-the-rag/
├── main.py                 # FastAPI application and endpoints
├── Configurator.py         # Configuration management
├── Emdbeder.py            # Embedding operations
├── Chunker.py             # Text chunking strategies
├── KnowledgeIngestion.py  # Complete ingestion pipeline
├── KnowledgeProcessor.py  # Knowledge file processing
├── QdrantManager.py       # Vector database operations
├── RAGPipeline.py         # RAG orchestration
├── RAGRetriever.py        # Vector search and retrieval
├── RAGProcessor.py        # Post-processing and re-ranking
├── llmhandler.py          # LLM interaction
├── keys.json              # Non-sensitive configuration (safe for git)
├── secret.json            # Sensitive credentials (ignored by git)
├── secret.json.template   # Template for secret.json setup
├── Knowledge/             # Knowledge files directory
│   └── sample_knowledge.md
├── requirements.txt       # Python dependencies
└── README.md             # This file

🏗️ Architecture Principles

High Reusability

  • Single Source of Truth: All embedding operations use Emdbeder.py
  • Centralized Chunking: All chunking operations use Chunker.py
  • Configuration-Driven: Change keys.json → Change behavior everywhere

Professional Standards

  • Single Responsibility: Each module has one clear purpose
  • Dependency Injection: Components receive configuration via constructor
  • Error Handling: Comprehensive error messages and graceful fallbacks
  • Type Safety: Full type hints and Pydantic models

Scalability

  • Modular Design: Easy to extend with new strategies
  • Async Operations: Non-blocking I/O for better performance
  • Vector Database: Scalable storage and retrieval

🔍 Interactive Documentation

Visit http://localhost:8000/docs for automatic interactive API documentation with Swagger UI.

🚀 Quick Start

  1. Setup: Follow installation steps above
  2. Configure: Create secret.json with your API credentials (REQUIRED!)
  3. Ingest: GET /FeedKnowledge to process your knowledge files
  4. Query: POST /rag to ask questions about your knowledge base

⚠️ IMPORTANT: You must create secret.json from the template before the system will work!

💬 Conversation Usage Examples

Basic Conversation Flow

1. Enable Conversation Mode:

// keys.json
{
  "CONVERSATION_MOOD": true,
  "NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}

2. Start a Conversation:

# First query
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the delivery management guidelines?"}'

3. Follow-up Questions:

# Second query (benefits from conversation context)
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "How are they organized?"}'

# Third query (maintains conversation continuity)
curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the purpose of these guidelines?"}'

4. Check Conversation Status:

# View conversation statistics
curl -X GET "http://localhost:8000/conversation/status"

# View conversation history
curl -X GET "http://localhost:8000/conversation/history"

# Clear conversation memory
curl -X DELETE "http://localhost:8000/conversation/clear"

Conversation Response Format

When conversation mode is enabled, RAG responses include conversation metadata:

{
  "answer": "The delivery management guidelines are...",
  "sources": [...],
  "conversation": {
    "enabled": true,
    "serial_number": 3,
    "current_qa_summary": "Summary of current Q&A exchange...",
    "conversation_summary_so_far": "Cumulative conversation summary...",
    "total_conversations": 3
  },
  "used_model": {...},
  "latency_ms": 2500
}

Conversation Benefits

Enhanced Context:

  • Each query benefits from previous conversation context
  • Follow-up questions maintain continuity
  • Better understanding of user intent

Automatic Summarization:

  • Current Q&A summary for each exchange
  • Cumulative conversation summary
  • Intelligent memory management

Memory Management:

  • Configurable storage limits
  • Automatic cleanup of old conversations
  • Efficient memory usage

📊 Logging and Monitoring

Comprehensive Logging System

The system includes extensive logging for all operations:

Log Files:

  • rag_system.log: Complete system logs with timestamps
  • Console Output: Real-time logging with emoji indicators

Log Categories:

  • 🚀 [API]: API endpoint calls and responses
  • 🔍 [RAG_PIPELINE]: RAG pipeline step-by-step processing
  • 📚 [INGESTION]: Knowledge ingestion operations
  • 💬 [CONVERSATION]: Conversation tracking and management
  • 🤖 [LLM]: LLM processing and responses
  • ✅ [SUCCESS]: Successful operations
  • ❌ [ERROR]: Error conditions and failures

Logging Features:

  • Step-by-step tracking: Every RAG pipeline step is logged
  • Performance metrics: Timing information for all operations
  • Error tracking: Detailed error logging with context
  • Conversation tracking: Complete conversation flow logging
  • API monitoring: All API calls and responses logged

Example Log Output:

2024-01-15 10:30:15 - main - INFO - 🚀 [API] /rag endpoint called with query length: 45 chars
2024-01-15 10:30:15 - main - INFO - 📝 [API] Query preview: What are the delivery guidelines?
2024-01-15 10:30:15 - RAGPipeline - INFO - 🔍 [RAG_PIPELINE] Step 1: Embedding query...
2024-01-15 10:30:16 - RAGPipeline - INFO - ✅ [RAG_PIPELINE] Step 1 completed - Retrieved 10 chunks in 1250.5ms
2024-01-15 10:30:16 - RAGPipeline - INFO - 🔄 [RAG_PIPELINE] Step 2: Re-ranking 10 chunks...
2024-01-15 10:30:16 - RAGPipeline - INFO - ✅ [RAG_PIPELINE] Step 2 completed - Re-ranked to 5 chunks in 45.2ms

🔧 Troubleshooting

Common Issues

❌ "Keys file 'keys.json' not found"

  • Solution: The keys.json file should be in the repository. If missing, check your git clone.

❌ "Missing required keys" or "Missing or placeholder values"

  • Solution: You need to create secret.json from the template:
    cp secret.json.template secret.json
    # Then edit secret.json with your actual API keys

❌ "Configuration loaded successfully" but system fails

  • Solution: Check that secret.json contains real API keys, not placeholder values.

❌ System works locally but fails in production

  • Solution: Ensure secret.json exists in your production environment and contains valid credentials.

📊 Performance

  • Embedding: ~100ms per chunk (varies by model)
  • RAG Query: ~2-5 seconds end-to-end
  • Knowledge Ingestion: ~1-2 seconds per file
  • Vector Search: <100ms for similarity search

🔒 Security

  • All secrets stored in secret.json (gitignored)
  • Environment variable overrides supported
  • No hardcoded credentials in code
  • Secure API key management

🤝 Contributing

  1. Follow the established architecture patterns
  2. Maintain high reusability principles
  3. Add comprehensive docstrings
  4. Update configuration in keys.json
  5. Test all components thoroughly

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

MIT License

Copyright (c) 2024 WhatTheBot RAG System

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Built with ❤️ using FastAPI, Azure OpenAI, and Qdrant


🎯 About Test the RAG

Test the RAG is specifically designed as a comprehensive testing platform for Retrieval-Augmented Generation systems. It provides researchers, developers, and AI practitioners with a unified environment to:

  • Test Multiple Configurations: Experiment with different chunking strategies, embedding models, and LLM combinations
  • Compare Performance: Benchmark different RAG approaches side-by-side
  • Optimize Systems: Find the best configuration for specific use cases
  • Validate Improvements: Test RAG enhancements before production deployment

Why Test the RAG?

Single Repository Testing:

  • Test all RAG configurations from one codebase
  • No need to maintain multiple testing environments
  • Consistent testing methodology across all experiments

Configuration-Driven:

  • Change behavior by modifying keys.json
  • No code changes required for different configurations
  • Rapid prototyping and experimentation

Production-Ready:

  • Real Azure OpenAI integration
  • Actual Qdrant vector database
  • Professional monitoring and validation endpoints

Research-Friendly:

  • Comprehensive documentation and examples
  • Built-in performance metrics and monitoring
  • Easy A/B testing and comparison workflows

🚀 Future Improvements & Roadmap

This RAG system represents a solid foundation for advanced retrieval-augmented generation applications, with several key areas identified for future enhancement. Architectural improvements include implementing dependency injection patterns, service containers, and event-driven architecture to improve modularity and testability. Code quality enhancements focus on comprehensive type safety with Pydantic models, advanced configuration validation, and extraction of magic numbers into well-defined constants. Performance optimizations will introduce connection pooling, caching mechanisms, batch processing for embeddings, and async/await patterns throughout the pipeline. Error handling and resilience improvements include custom exception hierarchies, circuit breaker patterns, retry mechanisms with exponential backoff, and comprehensive logging with structured formats. Testing infrastructure will be expanded with unit tests, integration tests, performance benchmarks, and automated testing pipelines. Security enhancements will implement input sanitization, rate limiting, API key rotation, and audit logging. Documentation improvements will add API documentation, architecture diagrams, deployment guides, and developer onboarding materials. These improvements will transform the system from a functional RAG platform into a production-ready, enterprise-grade solution that can scale to handle high-volume workloads while maintaining reliability, security, and performance standards.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages