pip install -r requirements.txt- Download and install LMStudio
- Load a model of your choice
- Start the local server (default: http://localhost:1234/v1)
python verify_tests.pypython interactive_rag.py# Test that everything is set up correctly
python test_basic.py- 🔍 ModernBERT Embeddings: High-quality semantic embeddings for document search
- 🎯 PyLate/ColBERT Reranking: Advanced reranking for improved relevance
- 📚 Multi-format Support: Processes PDF and Markdown files from ./data directory
- 🤖 LMStudio Integration: Local LLM inference for privacy
- 💬 Interactive Chat: Real-time question answering
- 📊 Smart Chunking: Intelligent text splitting with boundary detection
- 🔄 MD5 File Tracking: Incremental updates based on file changes
- 📈 Token Analysis: Comprehensive token counting and optimization
Once running the main system (python interactive_rag.py), you can use:
- Ask questions: Simply type your question about the documents
index- Check for document changes and update index incrementallyforce-reindex/reindex- Completely rebuild the entire indexstatus- Check system status (ChromaDB, LMStudio, reranker, files)files- Show tracked files and their MD5 statusquit/exit/q- Exit the system
The status command shows:
- ChromaDB collection document count
- LMStudio connection status and model name
- PyLate/ColBERT reranker status
- File tracking statistics
The files command displays:
- All tracked files with their relative paths
- MD5 checksums for change detection
- Last updated timestamps
The configuration utility helps manage chunk sizes and reindexing:
# Show current configuration
python config_util.py config
# Check if reindexing is needed after config changes
python config_util.py check
# Analyze collection chunks and tokens in detail
python config_util.py scan
# Force complete reindex with current configuration
python config_util.py reindex
# Show help
python config_util.py help- config: Displays chunk size, overlap, retrieval settings, and token estimates
- check: Compares current index with configuration settings
- scan: Detailed analysis of chunk sizes, token counts, and efficiency
- reindex: Complete rebuild when chunk size or other core settings change
- After changing
CHUNK_SIZEorCHUNK_OVERLAPin config.py - When switching embedding models
- To analyze current chunking efficiency
- Before/after major configuration changes
Provides detailed chunk and token analysis:
# Run comprehensive analysis
python enhanced_scan.py- Collection Overview: Total chunks and document count
- Chunk Size Statistics: Average, median, min/max chunk sizes
- Target Achievement: How well chunks match configured size
- Token Analysis: Estimated tokens per chunk and total context
- Size Distribution: Visual breakdown of chunk size ranges
- Per-Document Analysis: Detailed statistics for each source file
- Document Distribution: Analysis by chunk count per document
- Optimization Recommendations: Suggestions for improving chunking
- Target Achievement: Percentage of configured chunk size actually achieved
- Context Quality: Assessment based on estimated tokens with default results
- Token Distribution: Visual bars showing chunk size patterns
- Per-Document Stats: Individual file analysis with chunk counts and sizes
Key settings you can modify:
CHUNK_SIZE = 8000 # Target characters per chunk
CHUNK_OVERLAP = 400 # Character overlap between chunks
DEFAULT_N_RESULTS = 8 # Documents returned by default
INITIAL_RETRIEVAL_COUNT = 20 # Documents fetched before reranking
EMBEDDING_MODEL = "answerdotai/ModernBERT-base"
COLBERT_MODEL = "colbert-ir/colbertv2.0"- Changing
CHUNK_SIZEorCHUNK_OVERLAPrequires reindexing - Use
config_util.py checkto verify if reindexing is needed - Larger chunks = more context but fewer specific matches
- More overlap = better boundary handling but more storage
# 1. Install dependencies
pip install -r requirements.txt
# 2. Add documents to data/ folder
cp your_documents.pdf data/
# 3. Check configuration
python config_util.py config
# 4. Start the system
python interactive_rag.py# 1. Edit config.py with new settings
# 2. Check if reindexing is needed
python config_util.py check
# 3. If needed, reindex
python config_util.py reindex
# 4. Analyze results
python enhanced_scan.py# 1. Analyze current chunking efficiency
python enhanced_scan.py
# 2. Check specific collection stats
python config_util.py scan
# 3. Optimize settings based on recommendations
# 4. Test with different queries# 1. Start system (checks for file changes automatically)
python interactive_rag.py
# 2. Ask questions interactively
# 3. Use 'status' to check system health
# 4. Use 'index' if you've added new documents💭 Your question: What are the main topics in the AI agent research papers?
💭 Your question: How do multi-agent systems handle failures?
💭 Your question: What safety considerations are mentioned for LLM agents?
💭 Your question: Compare the different approaches to agent architectures
💭 Your question: status
💭 Your question: files
💭 Your question: index
💭 Your question: force-reindex
- Automatically reranks initial retrieval results for better relevance
- Fetches
INITIAL_RETRIEVAL_COUNTdocuments, reranks to topDEFAULT_N_RESULTS - Falls back to similarity-based ranking if reranker unavailable
- Shows both similarity and reranking scores in results
- MD5-based change detection for all documents
- Only reprocesses changed files
- Tracks files across subdirectories
- Automatic cleanup of deleted files
- Comprehensive token counting and analysis
- Context quality assessment
- Performance recommendations
- Memory usage optimization
- Respects paragraph, sentence, and word boundaries
- Adaptive chunk sizing (95%-115% of target)
- Configurable overlap for context preservation
- Quality scoring for boundary detection
- Ensure LMStudio is running with a model loaded
- Check the server URL in LMStudio settings (default: http://localhost:1234/v1)
- Verify the model is actively loaded (not just downloaded)
- Place PDF or Markdown files in the
./data/directory - Run
indexcommand to reprocess without restarting - Check file permissions and formats
- System automatically falls back to similarity ranking
- Check PyLate installation:
pip install pylate - GPU memory issues: System will use CPU fallback
- Use
enhanced_scan.pyto analyze chunking efficiency - Consider adjusting
CHUNK_SIZEbased on recommendations - Monitor context token usage with
statuscommand
pip install -r requirements.txt --upgrade# Check if reindexing needed
python config_util.py check
# Force reindex if configuration changed
python config_util.py reindexThe project includes comprehensive unit and integration tests to ensure reliability and correctness.
Before running the full test suite, verify that the basic test infrastructure works:
python verify_tests.pyThis runs a lightweight verification to ensure the testing environment is properly set up without importing heavy ML models.
The project includes a test runner script that supports multiple testing modes:
# Run all tests (unit + integration)
python run_tests.py all
# Run only unit tests (faster, no ML model loading)
python run_tests.py unit
# Run only integration tests
python run_tests.py integration
# Run tests for a specific class
python run_tests.py class TestTokenCounter
# Generate test coverage report
python run_tests.py coverage
# List available test classes and methods
python run_tests.py listTests core functionality without heavy dependencies:
- TokenCounter: Token counting and formatting
- FileTracker: MD5-based file change detection
- DocumentProcessor: Text extraction and chunking
- BERTEmbeddingFunction: Embedding function behavior (mocked)
- Utility Functions: GPU memory management
- Configuration: Config integration and validation
- Error Handling: Graceful error handling scenarios
Tests component interactions with mocks:
- PyLateReranker: Reranking functionality
- InteractiveRAGChain: End-to-end RAG workflow
- Document Processing: Full document processing pipeline
- LLM Integration: Language model interaction (mocked)
- Error Recovery: Fallback mechanisms
Lightweight tests that run without importing heavy modules:
- Python environment validation
- Basic file operations
- Configuration loading
Generate detailed test coverage reports:
python run_tests.py coverageThis creates an HTML coverage report in the htmlcov/ directory that you can open in a browser to see detailed line-by-line coverage.
You can also run specific tests using Python's unittest module:
# Run a specific test class
python -m unittest test_interactive_rag.TestTokenCounter -v
# Run a specific test method
python -m unittest test_interactive_rag.TestTokenCounter.test_count_tokens_simple -v
# Run all unit tests with verbose output
python -m unittest test_interactive_rag -vThe test suite uses Python's built-in unittest framework and includes:
- Mock objects: For isolating components and avoiding heavy ML model loading
- Temporary directories: For safe file system testing
- Coverage reporting: Via the
coveragepackage
If you encounter import errors when running tests:
pip install -r requirements.txtThe main interactive_rag.py module loads heavy ML models. Use these approaches for faster testing:
- Start with verification:
python verify_tests.py - Run unit tests only:
python run_tests.py unit - Use the test runner:
python run_tests.py(optimized loading)
Ensure you're in the correct directory and virtual environment:
cd /path/to/MyRAG
source .venv/bin/activate # or activate your virtual environment
python verify_tests.pyMyRAG/
├── interactive_rag.py # Main interactive RAG system
├── config.py # Central configuration
├── config_util.py # Configuration management utility
├── enhanced_scan.py # Detailed analysis tool
├── requirements.txt # Python dependencies
├── test_basic.py # Basic infrastructure tests
├── test_interactive_rag.py # Unit tests
├── test_integration.py # Integration tests
├── run_tests.py # Test runner script
├── verify_tests.py # Test verification utility
├── README.md # This documentation
├── SETUP.md # Detailed setup guide
├── .gitignore # Version control exclusions
├── data/ # Your documents (PDF, MD)
│ ├── *.pdf
│ ├── *.md
│ └── subdirectories/ # Recursive scanning supported
├── chroma_db/ # Vector database (auto-generated)
└── htmlcov/ # Test coverage reports (generated)
Set these before running for memory optimization:
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
export PYTORCH_MPS_LOW_WATERMARK_RATIO=0.1- Embedding Model: ModernBERT for high-quality embeddings
- Reranking Model: ColBERT v2.0 for relevance improvement
- LLM: Any model supported by LMStudio
- Adjust
INITIAL_RETRIEVAL_COUNTvsDEFAULT_N_RESULTSratio - Balance
CHUNK_SIZEvs retrieval specificity - Monitor token usage with analysis tools
The project includes comprehensive unit and integration tests to ensure reliability and help with development.
# Verify testing infrastructure works
python test_basic.pyThe test suite is designed with isolated, focused tests:
test_basic.py- Infrastructure verification (always run this first)test_interactive_rag.py- Unit tests for core componentstest_integration.py- Integration tests for full workflowsrun_tests.py- Test runner with multiple options
Quick verification that the environment is set up correctly:
python test_basic.pyTest individual components in isolation:
- TestTokenCounter: Token counting and formatting
- TestFileTracker: MD5-based file change detection
- TestDocumentProcessor: Document parsing and chunking
- TestBERTEmbeddingFunction: Embedding generation (mocked)
- TestUtilityFunctions: Helper functions and error handling
Test full system integration with mocked external dependencies:
- TestPyLateReranker: Reranking functionality and fallbacks
- TestInteractiveRAGChainIntegration: End-to-end RAG workflow
- TestErrorHandlingIntegration: Error scenarios and recovery
# Show available commands
python run_tests.py
# Run all tests (requires proper mocking setup)
python run_tests.py all
# Run only unit tests
python run_tests.py unit
# Run specific test class
python run_tests.py class TestTokenCounter
# List available test classes
python run_tests.py list# Run basic infrastructure tests (recommended first step)
python test_basic.py
# Run individual test files (may require environment setup)
python -m unittest test_interactive_rag.TestTokenCounter -v
python -m unittest test_integration.TestPyLateReranker -v# Install coverage tool
pip install coverage
# Run with coverage
python run_tests.py coverage
# View report in htmlcov/index.html- Mocking: External dependencies (ChromaDB, OpenAI, PyLate) are mocked for isolation
- Temporary Files: Tests use temporary directories to avoid conflicts
- Error Simulation: Tests verify graceful error handling
- No External Dependencies: Tests run without requiring LMStudio, ChromaDB, or model downloads
- Start with basics:
python test_basic.py - Test components: Run specific unit tests for the code you're working on
- Integration testing: Use integration tests for workflow verification
- Full suite: Run complete test suite before commits
- Import Errors: Run
test_basic.pyfirst to verify environment - Hanging Tests: Some tests may hang if they try to load actual models - ensure mocking is working
- Missing Dependencies: Install test dependencies with
pip install coverage(optional)
- Isolation: Each test is independent and can run alone
- Mocking: External services are mocked to ensure fast, reliable tests
- Clear Assertions: Tests have clear success/failure criteria
- Documentation: Each test class and method is documented