A professional-grade Retrieval-Augmented Generation (RAG) testing platform built with FastAPI, featuring intelligent knowledge ingestion, vector storage, and AI-powered query processing.
secret.json file with your API credentials before it can run. See the Configuration section below for details.
Test the RAG serves as a comprehensive RAG testing platform that enables researchers, developers, and AI practitioners to experiment with and evaluate different RAG configurations from a single repository. This system provides a unified environment for testing various chunking strategies, embedding models, and LLM combinations to optimize RAG performance.
Multi-Strategy Testing:
- 8 Chunking Strategies: FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW
- 3 Embedding Models: ADA-002, text-embedding-3-small, text-embedding-3-large
- Multiple LLM Models: GPT-4o, GPT-4, GPT-4-turbo, GPT-3.5-turbo
- 3 Re-ranking Methods: MMR, Cross-Encoder, LLM Re-ranking
- Conversation Tracking: Advanced conversation memory with summarization and context continuity
Configuration-Driven Testing:
- Single Repository: Test all combinations without code changes
- Dynamic Configuration: Modify
keys.jsonto switch between strategies - Real-time Switching: Change embedding models, chunking strategies, and LLM models instantly
- Performance Metrics: Built-in latency tracking and quality assessment
Professional Testing Features:
- A/B Testing: Compare different configurations side-by-side
- Performance Benchmarking: Measure latency, accuracy, and retrieval quality
- Scalable Testing: Test with different knowledge bases and query types
- Production-Ready: Deploy and test in real-world scenarios
Conversation Tracking & Memory:
- Persistent Memory: Maintains conversation history across multiple queries
- Automatic Summarization: Generates summaries for each Q&A exchange and cumulative conversation context
- Context Continuity: Each query benefits from previous conversation context
- Memory Management: Configurable conversation storage limits with automatic cleanup
Conversation Configuration:
- Enable/Disable: Toggle conversation tracking via
CONVERSATION_MOODinkeys.json - Storage Limits: Configure maximum conversations to store (
NUMBER_OF_CONVERSATIONS_TO_STORE) - Dynamic Control: Change conversation settings without code modifications
- Memory Optimization: Automatic removal of oldest conversations when limit exceeded
Conversation API Endpoints:
GET /conversation/status: View conversation statistics and current statusGET /conversation/history: Retrieve complete conversation historyDELETE /conversation/clear: Clear all conversation memoryGET /conversation/config: Get current conversation configuration
Conversation Benefits:
- Enhanced Context: RAG responses improve with conversation history
- Better Continuity: Follow-up questions maintain context from previous exchanges
- Intelligent Summarization: Automatic generation of conversation summaries
- Memory Efficiency: Smart memory management prevents unlimited growth
Academic Research:
- Compare chunking strategies for different document types
- Evaluate embedding model performance across domains
- Study the impact of re-ranking methods on retrieval quality
- Analyze the relationship between chunk size and retrieval accuracy
Industry Applications:
- Optimize RAG systems for specific use cases
- Test different configurations for production deployment
- Benchmark performance across various embedding models
- Validate RAG improvements before deployment
Developer Testing:
- Rapid prototyping of RAG configurations
- Performance testing with real data
- Integration testing with different vector databases
- End-to-end RAG pipeline validation
- Intelligent Knowledge Ingestion: Process markdown files with configurable chunking strategies
- Advanced Embedding: Support for multiple Azure OpenAI embedding models (ADA-002, text-embedding-3-small, text-embedding-3-large)
- Vector Database Integration: Real-time Qdrant vector storage and retrieval
- RAG Pipeline: Complete retrieval-augmented generation with re-ranking and context assembly
- High Reusability: Configuration-driven architecture - change behavior via
keys.json - Professional Architecture: Modular design with single responsibility principle
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Knowledge │ │ Configurator │ │ keys.json │
│ Files (.md) │───▶│ (Settings) │◀───│ (Configuration)│
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Chunker.py │ │ Emdbeder.py │
│ (Text Chunking)│ │ (Embeddings) │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────┐
│ KnowledgeIngestion.py │
│ (Complete Ingestion Pipeline) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Qdrant Vector DB │
│ (Vector Storage) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ RAG Pipeline │
│ (Retrieval + Generation) │
└─────────────────────────────────────────┘
- Python 3.8+
- Azure OpenAI API access
- Qdrant Cloud account
git clone https://github.com/rezwanx/TestTheRAG
cd TestTheRAG
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt# Copy the template and add your credentials
cp secret.json.template secret.json
nano secret.json # Add your actual API keys
# Verify the file was created
ls -la secret.json # Should exist and be ignored by gitsecret.json before running the system!
- The system will not work without this file
- It contains your actual API keys and credentials
- This file is automatically ignored by git for security
Setup your configuration:
# 1. Copy the secret template and add your credentials
cp secret.json.template secret.json
nano secret.json # Add your actual API keys
# 2. Verify keys.json contains only placeholders (safe for git)
cat keys.json # Should show placeholder values like "your-api-key-here"
# 3. Test your configuration
python -c "from Configurator import get_config; print('✅ Configuration loaded successfully')"secret.json file is required for the system to work!
- This file contains your actual API keys and credentials
- It's automatically ignored by git (never committed to the repository)
- Without this file, the system will fail to load configuration
- Always use the template (
secret.json.template) as a starting point
Configuration Files:
secret.json: Contains sensitive API keys (🚫 NEVER COMMITTED TO GIT)keys.json: Contains non-sensitive configuration (✅ SAFE FOR PUBLIC REPOSITORY)
Required credentials in secret.json:
{
"AZURE_OPENAI_ENDPOINT": "https://your-endpoint.openai.azure.com",
"AZURE_OPENAI_API_KEY": "your-actual-api-key-here",
"AZURE_OPENAI_API_VERSION": "2024-02-01",
"QDRANT_URL": "https://your-cluster.qdrant.io",
"QDRANT_API_KEY": "your-actual-qdrant-key-here",
"QDRANT_COLLECTION": "knowledge_embeddings"
}Complete configuration options in keys.json (non-sensitive settings only):
{
"_comment": "All configuration options for the knowledge processing system",
"_comment_credentials": "Sensitive credentials are loaded from secret.json (ignored by git)",
"CHUNK_STRATEGY": "SEMANTIC",
"_chunk_strategy_options": "FIXED, OVERLAP, SENTENCE, PARAGRAPH, SEMANTIC, HYBRID, DOC_STRUCTURE, SLIDING_WINDOW",
"EMBED_MODEL": "EMB_3_LARGE",
"_embed_model_options": "ADA_002, EMB_3_SMALL, EMB_3_LARGE",
"LLM_MODEL": "gpt-4o",
"_llm_model_options": "gpt-4o, gpt-4o-mini, gpt-4-turbo",
"LLM_MAX_TOKENS": 1000,
"LLM_TEMPERATURE": 0.7,
"RAG_TOP_K": 10,
"_rag_top_k_range": "8-20",
"RAG_RERANK_METHOD": "MMR",
"_rag_rerank_options": "MMR, CROSS_ENCODER, LLM_RERANK",
"RAG_FINAL_CHUNKS": 5,
"_rag_final_chunks_range": "3-7",
"RAG_CONTEXT_MAX_TOKENS": 1500,
"_rag_context_max_tokens_range": "1000-1500",
"RAG_SYSTEM_PROMPT": "You are a retrieval-augmented assistant. Use only the provided context. Cite sources. If unsure, say you don't know.",
"RAG_MMR_DIVERSITY_THRESHOLD": 0.7,
"_rag_mmr_diversity_range": "0.5-0.9",
"CONVERSATION_MOOD": true,
"_conversation_mood_options": "true, false",
"NUMBER_OF_CONVERSATIONS_TO_STORE": 10,
"_conversations_to_store_range": "5-20",
"CONVERSATIONS": []
}CHUNK_STRATEGY: Text chunking strategy- Options:
FIXED,OVERLAP,SENTENCE,PARAGRAPH,SEMANTIC,HYBRID,DOC_STRUCTURE,SLIDING_WINDOW - Default:
SEMANTIC
- Options:
EMBED_MODEL: Embedding model to use- Options:
ADA_002,EMB_3_SMALL,EMB_3_LARGE - Default:
EMB_3_LARGE
- Options:
LLM_MODEL: Language model to use- Options:
gpt-4o,gpt-4o-mini,gpt-4-turbo - Default:
gpt-4o
- Options:
LLM_MAX_TOKENS: Maximum tokens for LLM responses- Range: 100-4000
- Default:
1000
LLM_TEMPERATURE: LLM temperature (creativity)- Range: 0.0-2.0
- Default:
0.7
RAG_TOP_K: Number of chunks to retrieve initially- Range: 8-20
- Default:
10
RAG_RERANK_METHOD: Re-ranking method- Options:
MMR,CROSS_ENCODER,LLM_RERANK - Default:
MMR
- Options:
RAG_FINAL_CHUNKS: Number of final chunks for context- Range: 3-7
- Default:
5
RAG_CONTEXT_MAX_TOKENS: Maximum tokens for context- Range: 1000-1500
- Default:
1500
RAG_SYSTEM_PROMPT: System prompt for LLM- Default:
"You are a retrieval-augmented assistant. Use only the provided context. Cite sources. If unsure, say you don't know."
- Default:
RAG_MMR_DIVERSITY_THRESHOLD: MMR diversity threshold- Range: 0.5-0.9
- Default:
0.7
CONVERSATION_MOOD: Enable/disable conversation tracking- Options:
true,false - Default:
false
- Options:
NUMBER_OF_CONVERSATIONS_TO_STORE: Maximum conversations to store- Range: 5-20
- Default:
10
CONVERSATIONS: Conversation storage array- Default:
[]
- Default:
🔐 Security Features:
- Automatic Git Protection:
secret.jsonis automatically excluded from version control - Clean Separation:
keys.jsoncontains only non-sensitive configuration settings - Dual Configuration: System loads from both files with proper priority
- Credential History Cleaned: Previous commits with exposed credentials have been removed from git history
-
secret.jsonexists locally with your real API keys -
secret.jsonis ignored by git (git check-ignore secret.jsonshould return the file) -
keys.jsoncontains only non-sensitive configuration settings - Never commit
secret.jsonto any repository - Use environment variables in production deployments
# Start the FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000Ingest all knowledge files from the Knowledge folder.
- Process: Reads .md files → Chunks content → Creates embeddings → Stores in Qdrant
- Response: Ingestion statistics and results
Get current knowledge ingestion status.
- Response: File counts, collection info, configuration details
Clear all data from the knowledge base.
- Response: Clear operation results
Process natural language queries using RAG.
- Request:
{"query": "Your question here"} - Response: AI-generated answer with sources and metadata
Validate RAG pipeline components.
- Response: Component status, errors, and warnings
Get RAG pipeline configuration information.
- Response: Current settings and model information
Get conversation tracking status and statistics.
- Response: Conversation enabled status, total conversations, storage limits
Retrieve all stored conversation history.
- Response: Complete conversation history with summaries and metadata
Clear all stored conversation memory.
- Response: Confirmation of conversation memory clearing
Get current conversation configuration settings.
- Response: Conversation mood, storage limits, and configuration details
Direct LLM processing without knowledge retrieval.
- Request:
{"input": "Your text here"} - Response:
{"output": "LLM response"}
This repository implements a dual-configuration system to protect your API credentials:
secret.json🚫 NEVER COMMITTED: Contains your actual API keyskeys.json✅ SAFE FOR GIT: Contains only placeholder valuessecret.json.template📋 SETUP GUIDE: Template for creating secret.json
- Automatic Git Protection:
secret.jsonis automatically excluded from version control - History Cleanup: Previous commits with exposed credentials have been completely removed
- Placeholder Safety: Public repository contains only safe placeholder values
- Dual Loading: System loads from both files with proper priority
- Copy Template:
cp secret.json.template secret.json - Add Credentials: Edit
secret.jsonwith your actual API keys - Verify Safety: Ensure
keys.jsoncontains only placeholders - Test Configuration: Run configuration test to verify setup
-
secret.jsonexists locally with real credentials -
secret.jsonis ignored by git (git check-ignore secret.json) -
keys.jsoncontains only placeholder values - Never commit
secret.jsonto any repository - Use environment variables in production
If you accidentally commit secret.json:
- Immediate Action: Remove the file from git tracking
- History Cleanup: Use
git filter-branchto remove from history - Force Push: Update remote repository to remove exposed credentials
- Rotate Keys: Generate new API keys from your service providers
- Verify Cleanup: Ensure no sensitive data remains in git history
ADA_002: 1536 dimensions, cost-effectiveEMB_3_SMALL: 1536 dimensions, balanced performanceEMB_3_LARGE: 3072 dimensions, highest quality
FIXED: Fixed character count chunksOVERLAP: Overlapping chunks for context preservationSENTENCE: Sentence-based chunkingPARAGRAPH: Paragraph-based chunkingSEMANTIC: Semantic boundary detection (placeholder)HYBRID: Heading-based with token windowsDOC_STRUCTURE: Markdown-aware structure preservationSLIDING_WINDOW: Token-based sliding window
RAG_TOP_K: Initial chunks to retrieve (default: 10)RAG_FINAL_CHUNKS: Chunks after re-ranking (default: 5)RAG_CONTEXT_MAX_TOKENS: Maximum context size (default: 1500)RAG_RERANK_METHOD: MMR, CROSS_ENCODER, or LLM_RERANKRAG_MMR_DIVERSITY_THRESHOLD: Diversity vs relevance balance (0.5-0.9)
Test Sentence-based Chunking:
// keys.json
{
"CHUNK_STRATEGY": "SENTENCE",
"EMBED_MODEL": "EMB_3_LARGE",
"LLM_MODEL": "gpt-4o"
}Test Semantic Chunking:
// keys.json
{
"CHUNK_STRATEGY": "SEMANTIC",
"EMBED_MODEL": "EMB_3_LARGE",
"LLM_MODEL": "gpt-4o"
}Test Hybrid Chunking:
// keys.json
{
"CHUNK_STRATEGY": "HYBRID",
"EMBED_MODEL": "EMB_3_LARGE",
"LLM_MODEL": "gpt-4o"
}Test ADA-002 (Cost-effective):
// keys.json
{
"EMBED_MODEL": "ADA_002",
"CHUNK_STRATEGY": "SENTENCE"
}Test text-embedding-3-small (Balanced):
// keys.json
{
"EMBED_MODEL": "EMB_3_SMALL",
"CHUNK_STRATEGY": "SENTENCE"
}Test text-embedding-3-large (Highest Quality):
// keys.json
{
"EMBED_MODEL": "EMB_3_LARGE",
"CHUNK_STRATEGY": "SENTENCE"
}Test GPT-4o (Latest):
// keys.json
{
"LLM_MODEL": "gpt-4o",
"LLM_MAX_TOKENS": 1000,
"LLM_TEMPERATURE": 0.7
}Test GPT-4 (Stable):
// keys.json
{
"LLM_MODEL": "gpt-4",
"LLM_MAX_TOKENS": 1000,
"LLM_TEMPERATURE": 0.7
}Test GPT-3.5-turbo (Fast):
// keys.json
{
"LLM_MODEL": "gpt-3.5-turbo",
"LLM_MAX_TOKENS": 1000,
"LLM_TEMPERATURE": 0.7
}Test GPT-4o-mini (Balanced):
// keys.json
{
"LLM_MODEL": "gpt-4o-mini",
"LLM_MAX_TOKENS": 1500,
"LLM_TEMPERATURE": 0.5
}High Precision Configuration:
// keys.json
{
"RAG_TOP_K": 20,
"RAG_FINAL_CHUNKS": 7,
"RAG_CONTEXT_MAX_TOKENS": 2000,
"RAG_RERANK_METHOD": "MMR",
"RAG_MMR_DIVERSITY_THRESHOLD": 0.8
}Enable Conversation Tracking:
// keys.json
{
"CONVERSATION_MOOD": true,
"NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}Disable Conversation Tracking (Default):
// keys.json
{
"CONVERSATION_MOOD": false,
"NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}Test Conversation Memory Limits:
// keys.json
{
"CONVERSATION_MOOD": true,
"NUMBER_OF_CONVERSATIONS_TO_STORE": 5
}Fast Response Configuration:
// keys.json
{
"RAG_TOP_K": 5,
"RAG_FINAL_CHUNKS": 3,
"RAG_CONTEXT_MAX_TOKENS": 1000,
"RAG_RERANK_METHOD": "MMR",
"RAG_MMR_DIVERSITY_THRESHOLD": 0.6
}High Quality Configuration:
// keys.json
{
"RAG_TOP_K": 20,
"RAG_FINAL_CHUNKS": 7,
"RAG_CONTEXT_MAX_TOKENS": 2000,
"RAG_RERANK_METHOD": "LLM_RERANK",
"RAG_MMR_DIVERSITY_THRESHOLD": 0.8
}Balanced Configuration:
// keys.json
{
"RAG_TOP_K": 10,
"RAG_FINAL_CHUNKS": 5,
"RAG_CONTEXT_MAX_TOKENS": 1500,
"RAG_RERANK_METHOD": "CROSS_ENCODER",
"RAG_MMR_DIVERSITY_THRESHOLD": 0.7
}Test All Chunking Strategies:
// Test each strategy by changing CHUNK_STRATEGY
{
"CHUNK_STRATEGY": "FIXED", // Fixed-size chunks
"CHUNK_STRATEGY": "OVERLAP", // Overlapping chunks
"CHUNK_STRATEGY": "SENTENCE", // Sentence-based chunks
"CHUNK_STRATEGY": "PARAGRAPH", // Paragraph-based chunks
"CHUNK_STRATEGY": "SEMANTIC", // Semantic chunks (default)
"CHUNK_STRATEGY": "HYBRID", // Hybrid approach
"CHUNK_STRATEGY": "DOC_STRUCTURE", // Document structure aware
"CHUNK_STRATEGY": "SLIDING_WINDOW" // Sliding window chunks
}Test All Embedding Models:
// Test each model by changing EMBED_MODEL
{
"EMBED_MODEL": "ADA_002", // Cost-effective
"EMBED_MODEL": "EMB_3_SMALL", // Balanced
"EMBED_MODEL": "EMB_3_LARGE" // Highest quality (default)
}Test All Re-ranking Methods:
// Test each method by changing RAG_RERANK_METHOD
{
"RAG_RERANK_METHOD": "MMR", // Maximal Marginal Relevance (default)
"RAG_RERANK_METHOD": "CROSS_ENCODER", // Cross-encoder re-ranking
"RAG_RERANK_METHOD": "LLM_RERANK" // LLM-based re-ranking
}Test System Prompt Customization:
// Customize the system prompt
{
"RAG_SYSTEM_PROMPT": "You are an expert assistant. Provide detailed, accurate answers based on the context. Always cite your sources."
}Step 1: Setup Configuration A
# Update keys.json with Configuration A
# Test and record results
curl -X POST "http://localhost:8000/rag" \
-H "Content-Type: application/json" \
-d '{"query": "Your test query here"}'Step 2: Switch to Configuration B
# Update keys.json with Configuration B
# Test same query and compare results
curl -X POST "http://localhost:8000/rag" \
-H "Content-Type: application/json" \
-d '{"query": "Your test query here"}'Step 3: Compare Results
- Compare response quality
- Measure latency differences
- Analyze source relevance
- Evaluate answer accuracy
curl -X GET "http://localhost:8000/FeedKnowledge"curl -X POST "http://localhost:8000/rag" \
-H "Content-Type: application/json" \
-d '{"query": "What are the delivery guidelines?"}'curl -X GET "http://localhost:8000/rag/validate"curl -X GET "http://localhost:8000/rag/info"Test One Variable at a Time:
- Keep all other settings constant when testing a specific component
- Change only the variable you want to evaluate
- Record results for each configuration
Use Consistent Test Queries:
- Create a standardized set of test queries
- Include different types: factual, analytical, comparative
- Test with various query lengths and complexities
Measure Multiple Metrics:
- Latency: Response time for each configuration
- Quality: Relevance and accuracy of answers
- Retrieval: Number and quality of source documents
- Cost: API usage and computational resources
Baseline Establishment:
// Start with a baseline configuration
{
"CHUNK_STRATEGY": "SENTENCE",
"EMBED_MODEL": "EMB_3_LARGE",
"LLM_MODEL": "gpt-4o",
"RAG_TOP_K": 10,
"RAG_FINAL_CHUNKS": 5
}Incremental Testing:
- Test different chunking strategies with same embedding model
- Test different embedding models with same chunking strategy
- Test different LLM models with same retrieval configuration
- Test different RAG parameters with same models
Performance Benchmarking:
- Run multiple queries with each configuration
- Calculate average latency and quality scores
- Compare results across different configurations
- Document findings and recommendations
Knowledge Base Variety:
- Test with different document types (technical, narrative, structured)
- Use various document lengths and complexities
- Include different domains and subject matters
Query Diversity:
- Factual Queries: "What is the definition of X?"
- Analytical Queries: "Compare and contrast X and Y"
- Procedural Queries: "How do I perform X?"
- Complex Queries: Multi-part questions requiring synthesis
Quantitative Metrics:
- Response time (latency)
- Token usage and costs
- Retrieval precision and recall
- Answer completeness scores
Qualitative Assessment:
- Answer relevance and accuracy
- Source document quality
- Response coherence and clarity
- Factual correctness
Comparative Analysis:
- Side-by-side configuration comparisons
- Performance trade-offs identification
- Cost-benefit analysis
- Use case optimization recommendations
Load Testing:
- Test with high query volumes
- Measure system performance under load
- Identify bottlenecks and optimization opportunities
Integration Testing:
- Test with real-world data
- Validate end-to-end pipeline performance
- Ensure system reliability and stability
Monitoring Setup:
- Implement performance monitoring
- Set up alerting for system health
- Track usage patterns and optimization opportunities
test-the-rag/
├── main.py # FastAPI application and endpoints
├── Configurator.py # Configuration management
├── Emdbeder.py # Embedding operations
├── Chunker.py # Text chunking strategies
├── KnowledgeIngestion.py # Complete ingestion pipeline
├── KnowledgeProcessor.py # Knowledge file processing
├── QdrantManager.py # Vector database operations
├── RAGPipeline.py # RAG orchestration
├── RAGRetriever.py # Vector search and retrieval
├── RAGProcessor.py # Post-processing and re-ranking
├── llmhandler.py # LLM interaction
├── keys.json # Non-sensitive configuration (safe for git)
├── secret.json # Sensitive credentials (ignored by git)
├── secret.json.template # Template for secret.json setup
├── Knowledge/ # Knowledge files directory
│ └── sample_knowledge.md
├── requirements.txt # Python dependencies
└── README.md # This file
- Single Source of Truth: All embedding operations use
Emdbeder.py - Centralized Chunking: All chunking operations use
Chunker.py - Configuration-Driven: Change
keys.json→ Change behavior everywhere
- Single Responsibility: Each module has one clear purpose
- Dependency Injection: Components receive configuration via constructor
- Error Handling: Comprehensive error messages and graceful fallbacks
- Type Safety: Full type hints and Pydantic models
- Modular Design: Easy to extend with new strategies
- Async Operations: Non-blocking I/O for better performance
- Vector Database: Scalable storage and retrieval
Visit http://localhost:8000/docs for automatic interactive API documentation with Swagger UI.
- Setup: Follow installation steps above
- Configure: Create
secret.jsonwith your API credentials (REQUIRED!) - Ingest:
GET /FeedKnowledgeto process your knowledge files - Query:
POST /ragto ask questions about your knowledge base
secret.json from the template before the system will work!
1. Enable Conversation Mode:
// keys.json
{
"CONVERSATION_MOOD": true,
"NUMBER_OF_CONVERSATIONS_TO_STORE": 10
}2. Start a Conversation:
# First query
curl -X POST "http://localhost:8000/rag" \
-H "Content-Type: application/json" \
-d '{"query": "What are the delivery management guidelines?"}'3. Follow-up Questions:
# Second query (benefits from conversation context)
curl -X POST "http://localhost:8000/rag" \
-H "Content-Type: application/json" \
-d '{"query": "How are they organized?"}'
# Third query (maintains conversation continuity)
curl -X POST "http://localhost:8000/rag" \
-H "Content-Type: application/json" \
-d '{"query": "What is the purpose of these guidelines?"}'4. Check Conversation Status:
# View conversation statistics
curl -X GET "http://localhost:8000/conversation/status"
# View conversation history
curl -X GET "http://localhost:8000/conversation/history"
# Clear conversation memory
curl -X DELETE "http://localhost:8000/conversation/clear"When conversation mode is enabled, RAG responses include conversation metadata:
{
"answer": "The delivery management guidelines are...",
"sources": [...],
"conversation": {
"enabled": true,
"serial_number": 3,
"current_qa_summary": "Summary of current Q&A exchange...",
"conversation_summary_so_far": "Cumulative conversation summary...",
"total_conversations": 3
},
"used_model": {...},
"latency_ms": 2500
}Enhanced Context:
- Each query benefits from previous conversation context
- Follow-up questions maintain continuity
- Better understanding of user intent
Automatic Summarization:
- Current Q&A summary for each exchange
- Cumulative conversation summary
- Intelligent memory management
Memory Management:
- Configurable storage limits
- Automatic cleanup of old conversations
- Efficient memory usage
The system includes extensive logging for all operations:
Log Files:
rag_system.log: Complete system logs with timestamps- Console Output: Real-time logging with emoji indicators
Log Categories:
- 🚀 [API]: API endpoint calls and responses
- 🔍 [RAG_PIPELINE]: RAG pipeline step-by-step processing
- 📚 [INGESTION]: Knowledge ingestion operations
- 💬 [CONVERSATION]: Conversation tracking and management
- 🤖 [LLM]: LLM processing and responses
- ✅ [SUCCESS]: Successful operations
- ❌ [ERROR]: Error conditions and failures
Logging Features:
- Step-by-step tracking: Every RAG pipeline step is logged
- Performance metrics: Timing information for all operations
- Error tracking: Detailed error logging with context
- Conversation tracking: Complete conversation flow logging
- API monitoring: All API calls and responses logged
Example Log Output:
2024-01-15 10:30:15 - main - INFO - 🚀 [API] /rag endpoint called with query length: 45 chars
2024-01-15 10:30:15 - main - INFO - 📝 [API] Query preview: What are the delivery guidelines?
2024-01-15 10:30:15 - RAGPipeline - INFO - 🔍 [RAG_PIPELINE] Step 1: Embedding query...
2024-01-15 10:30:16 - RAGPipeline - INFO - ✅ [RAG_PIPELINE] Step 1 completed - Retrieved 10 chunks in 1250.5ms
2024-01-15 10:30:16 - RAGPipeline - INFO - 🔄 [RAG_PIPELINE] Step 2: Re-ranking 10 chunks...
2024-01-15 10:30:16 - RAGPipeline - INFO - ✅ [RAG_PIPELINE] Step 2 completed - Re-ranked to 5 chunks in 45.2ms
❌ "Keys file 'keys.json' not found"
- Solution: The
keys.jsonfile should be in the repository. If missing, check your git clone.
❌ "Missing required keys" or "Missing or placeholder values"
- Solution: You need to create
secret.jsonfrom the template:cp secret.json.template secret.json # Then edit secret.json with your actual API keys
❌ "Configuration loaded successfully" but system fails
- Solution: Check that
secret.jsoncontains real API keys, not placeholder values.
❌ System works locally but fails in production
- Solution: Ensure
secret.jsonexists in your production environment and contains valid credentials.
- Embedding: ~100ms per chunk (varies by model)
- RAG Query: ~2-5 seconds end-to-end
- Knowledge Ingestion: ~1-2 seconds per file
- Vector Search: <100ms for similarity search
- All secrets stored in
secret.json(gitignored) - Environment variable overrides supported
- No hardcoded credentials in code
- Secure API key management
- Follow the established architecture patterns
- Maintain high reusability principles
- Add comprehensive docstrings
- Update configuration in
keys.json - Test all components thoroughly
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 WhatTheBot RAG System
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Built with ❤️ using FastAPI, Azure OpenAI, and Qdrant
Test the RAG is specifically designed as a comprehensive testing platform for Retrieval-Augmented Generation systems. It provides researchers, developers, and AI practitioners with a unified environment to:
- Test Multiple Configurations: Experiment with different chunking strategies, embedding models, and LLM combinations
- Compare Performance: Benchmark different RAG approaches side-by-side
- Optimize Systems: Find the best configuration for specific use cases
- Validate Improvements: Test RAG enhancements before production deployment
Single Repository Testing:
- Test all RAG configurations from one codebase
- No need to maintain multiple testing environments
- Consistent testing methodology across all experiments
Configuration-Driven:
- Change behavior by modifying
keys.json - No code changes required for different configurations
- Rapid prototyping and experimentation
Production-Ready:
- Real Azure OpenAI integration
- Actual Qdrant vector database
- Professional monitoring and validation endpoints
Research-Friendly:
- Comprehensive documentation and examples
- Built-in performance metrics and monitoring
- Easy A/B testing and comparison workflows
This RAG system represents a solid foundation for advanced retrieval-augmented generation applications, with several key areas identified for future enhancement. Architectural improvements include implementing dependency injection patterns, service containers, and event-driven architecture to improve modularity and testability. Code quality enhancements focus on comprehensive type safety with Pydantic models, advanced configuration validation, and extraction of magic numbers into well-defined constants. Performance optimizations will introduce connection pooling, caching mechanisms, batch processing for embeddings, and async/await patterns throughout the pipeline. Error handling and resilience improvements include custom exception hierarchies, circuit breaker patterns, retry mechanisms with exponential backoff, and comprehensive logging with structured formats. Testing infrastructure will be expanded with unit tests, integration tests, performance benchmarks, and automated testing pipelines. Security enhancements will implement input sanitization, rate limiting, API key rotation, and audit logging. Documentation improvements will add API documentation, architecture diagrams, deployment guides, and developer onboarding materials. These improvements will transform the system from a functional RAG platform into a production-ready, enterprise-grade solution that can scale to handle high-volume workloads while maintaining reliability, security, and performance standards.