Skip to content

networkdowntime/personal_rag

Repository files navigation

Interactive RAG Chain Setup Guide

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Start LMStudio

  1. Download and install LMStudio
  2. Load a model of your choice
  3. Start the local server (default: http://localhost:1234/v1)

3. Quick Test Verification

python verify_tests.py

4. Run the Interactive RAG System

python interactive_rag.py

4. (Optional) Verify Installation

# Test that everything is set up correctly
python test_basic.py

Features

  • 🔍 ModernBERT Embeddings: High-quality semantic embeddings for document search
  • 🎯 PyLate/ColBERT Reranking: Advanced reranking for improved relevance
  • 📚 Multi-format Support: Processes PDF and Markdown files from ./data directory
  • 🤖 LMStudio Integration: Local LLM inference for privacy
  • 💬 Interactive Chat: Real-time question answering
  • 📊 Smart Chunking: Intelligent text splitting with boundary detection
  • 🔄 MD5 File Tracking: Incremental updates based on file changes
  • 📈 Token Analysis: Comprehensive token counting and optimization

Main Commands (interactive_rag.py)

Interactive Session Commands

Once running the main system (python interactive_rag.py), you can use:

  • Ask questions: Simply type your question about the documents
  • index - Check for document changes and update index incrementally
  • force-reindex / reindex - Completely rebuild the entire index
  • status - Check system status (ChromaDB, LMStudio, reranker, files)
  • files - Show tracked files and their MD5 status
  • quit / exit / q - Exit the system

System Status Information

The status command shows:

  • ChromaDB collection document count
  • LMStudio connection status and model name
  • PyLate/ColBERT reranker status
  • File tracking statistics

File Management

The files command displays:

  • All tracked files with their relative paths
  • MD5 checksums for change detection
  • Last updated timestamps

Configuration Management (config_util.py)

The configuration utility helps manage chunk sizes and reindexing:

Commands

# Show current configuration
python config_util.py config

# Check if reindexing is needed after config changes
python config_util.py check

# Analyze collection chunks and tokens in detail
python config_util.py scan

# Force complete reindex with current configuration
python config_util.py reindex

# Show help
python config_util.py help

Configuration Analysis

  • config: Displays chunk size, overlap, retrieval settings, and token estimates
  • check: Compares current index with configuration settings
  • scan: Detailed analysis of chunk sizes, token counts, and efficiency
  • reindex: Complete rebuild when chunk size or other core settings change

When to Use config_util.py

  • After changing CHUNK_SIZE or CHUNK_OVERLAP in config.py
  • When switching embedding models
  • To analyze current chunking efficiency
  • Before/after major configuration changes

Enhanced Analysis (enhanced_scan.py)

Provides detailed chunk and token analysis:

# Run comprehensive analysis
python enhanced_scan.py

Analysis Features

  • Collection Overview: Total chunks and document count
  • Chunk Size Statistics: Average, median, min/max chunk sizes
  • Target Achievement: How well chunks match configured size
  • Token Analysis: Estimated tokens per chunk and total context
  • Size Distribution: Visual breakdown of chunk size ranges
  • Per-Document Analysis: Detailed statistics for each source file
  • Document Distribution: Analysis by chunk count per document
  • Optimization Recommendations: Suggestions for improving chunking

Understanding the Output

  • Target Achievement: Percentage of configured chunk size actually achieved
  • Context Quality: Assessment based on estimated tokens with default results
  • Token Distribution: Visual bars showing chunk size patterns
  • Per-Document Stats: Individual file analysis with chunk counts and sizes

Configuration Files

config.py Settings

Key settings you can modify:

CHUNK_SIZE = 8000          # Target characters per chunk
CHUNK_OVERLAP = 400        # Character overlap between chunks
DEFAULT_N_RESULTS = 8      # Documents returned by default
INITIAL_RETRIEVAL_COUNT = 20 # Documents fetched before reranking
EMBEDDING_MODEL = "answerdotai/ModernBERT-base"
COLBERT_MODEL = "colbert-ir/colbertv2.0"

Important Notes

  • Changing CHUNK_SIZE or CHUNK_OVERLAP requires reindexing
  • Use config_util.py check to verify if reindexing is needed
  • Larger chunks = more context but fewer specific matches
  • More overlap = better boundary handling but more storage

Example Workflows

Initial Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Add documents to data/ folder
cp your_documents.pdf data/

# 3. Check configuration
python config_util.py config

# 4. Start the system
python interactive_rag.py

After Configuration Changes

# 1. Edit config.py with new settings
# 2. Check if reindexing is needed
python config_util.py check

# 3. If needed, reindex
python config_util.py reindex

# 4. Analyze results
python enhanced_scan.py

Performance Analysis

# 1. Analyze current chunking efficiency
python enhanced_scan.py

# 2. Check specific collection stats
python config_util.py scan

# 3. Optimize settings based on recommendations
# 4. Test with different queries

Daily Usage

# 1. Start system (checks for file changes automatically)
python interactive_rag.py

# 2. Ask questions interactively
# 3. Use 'status' to check system health
# 4. Use 'index' if you've added new documents

Usage Examples

Question Examples

💭 Your question: What are the main topics in the AI agent research papers?
💭 Your question: How do multi-agent systems handle failures?
💭 Your question: What safety considerations are mentioned for LLM agents?
💭 Your question: Compare the different approaches to agent architectures

System Management

💭 Your question: status
💭 Your question: files
💭 Your question: index
💭 Your question: force-reindex

Advanced Features

PyLate/ColBERT Reranking

  • Automatically reranks initial retrieval results for better relevance
  • Fetches INITIAL_RETRIEVAL_COUNT documents, reranks to top DEFAULT_N_RESULTS
  • Falls back to similarity-based ranking if reranker unavailable
  • Shows both similarity and reranking scores in results

Incremental File Updates

  • MD5-based change detection for all documents
  • Only reprocesses changed files
  • Tracks files across subdirectories
  • Automatic cleanup of deleted files

Token Management

  • Comprehensive token counting and analysis
  • Context quality assessment
  • Performance recommendations
  • Memory usage optimization

Smart Chunking

  • Respects paragraph, sentence, and word boundaries
  • Adaptive chunk sizing (95%-115% of target)
  • Configurable overlap for context preservation
  • Quality scoring for boundary detection

Troubleshooting

LMStudio Connection Issues

  • Ensure LMStudio is running with a model loaded
  • Check the server URL in LMStudio settings (default: http://localhost:1234/v1)
  • Verify the model is actively loaded (not just downloaded)

No Documents Found

  • Place PDF or Markdown files in the ./data/ directory
  • Run index command to reprocess without restarting
  • Check file permissions and formats

PyLate/ColBERT Issues

  • System automatically falls back to similarity ranking
  • Check PyLate installation: pip install pylate
  • GPU memory issues: System will use CPU fallback

Performance Issues

  • Use enhanced_scan.py to analyze chunking efficiency
  • Consider adjusting CHUNK_SIZE based on recommendations
  • Monitor context token usage with status command

Import Errors

pip install -r requirements.txt --upgrade

Configuration Mismatches

# Check if reindexing needed
python config_util.py check

# Force reindex if configuration changed
python config_util.py reindex

Testing

The project includes comprehensive unit and integration tests to ensure reliability and correctness.

Quick Test Verification

Before running the full test suite, verify that the basic test infrastructure works:

python verify_tests.py

This runs a lightweight verification to ensure the testing environment is properly set up without importing heavy ML models.

Running Tests

The project includes a test runner script that supports multiple testing modes:

# Run all tests (unit + integration)
python run_tests.py all

# Run only unit tests (faster, no ML model loading)
python run_tests.py unit

# Run only integration tests
python run_tests.py integration

# Run tests for a specific class
python run_tests.py class TestTokenCounter

# Generate test coverage report
python run_tests.py coverage

# List available test classes and methods
python run_tests.py list

Test Categories

Unit Tests (test_interactive_rag.py)

Tests core functionality without heavy dependencies:

  • TokenCounter: Token counting and formatting
  • FileTracker: MD5-based file change detection
  • DocumentProcessor: Text extraction and chunking
  • BERTEmbeddingFunction: Embedding function behavior (mocked)
  • Utility Functions: GPU memory management
  • Configuration: Config integration and validation
  • Error Handling: Graceful error handling scenarios

Integration Tests (test_integration.py)

Tests component interactions with mocks:

  • PyLateReranker: Reranking functionality
  • InteractiveRAGChain: End-to-end RAG workflow
  • Document Processing: Full document processing pipeline
  • LLM Integration: Language model interaction (mocked)
  • Error Recovery: Fallback mechanisms

Basic Infrastructure Tests (test_basic.py)

Lightweight tests that run without importing heavy modules:

  • Python environment validation
  • Basic file operations
  • Configuration loading

Test Coverage

Generate detailed test coverage reports:

python run_tests.py coverage

This creates an HTML coverage report in the htmlcov/ directory that you can open in a browser to see detailed line-by-line coverage.

Running Individual Tests

You can also run specific tests using Python's unittest module:

# Run a specific test class
python -m unittest test_interactive_rag.TestTokenCounter -v

# Run a specific test method
python -m unittest test_interactive_rag.TestTokenCounter.test_count_tokens_simple -v

# Run all unit tests with verbose output
python -m unittest test_interactive_rag -v

Test Dependencies

The test suite uses Python's built-in unittest framework and includes:

  • Mock objects: For isolating components and avoiding heavy ML model loading
  • Temporary directories: For safe file system testing
  • Coverage reporting: Via the coverage package

Troubleshooting Tests

Import Errors

If you encounter import errors when running tests:

pip install -r requirements.txt

Slow Test Execution

The main interactive_rag.py module loads heavy ML models. Use these approaches for faster testing:

  1. Start with verification: python verify_tests.py
  2. Run unit tests only: python run_tests.py unit
  3. Use the test runner: python run_tests.py (optimized loading)

Test Environment Issues

Ensure you're in the correct directory and virtual environment:

cd /path/to/MyRAG
source .venv/bin/activate  # or activate your virtual environment
python verify_tests.py

File Structure

MyRAG/
├── interactive_rag.py              # Main interactive RAG system
├── config.py              # Central configuration
├── config_util.py         # Configuration management utility
├── enhanced_scan.py       # Detailed analysis tool
├── requirements.txt       # Python dependencies
├── test_basic.py          # Basic infrastructure tests
├── test_interactive_rag.py         # Unit tests
├── test_integration.py    # Integration tests
├── run_tests.py           # Test runner script
├── verify_tests.py        # Test verification utility
├── README.md              # This documentation
├── SETUP.md               # Detailed setup guide
├── .gitignore             # Version control exclusions
├── data/                  # Your documents (PDF, MD)
│   ├── *.pdf
│   ├── *.md
│   └── subdirectories/    # Recursive scanning supported
├── chroma_db/            # Vector database (auto-generated)
└── htmlcov/              # Test coverage reports (generated)

Advanced Configuration

Environment Variables

Set these before running for memory optimization:

export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
export PYTORCH_MPS_LOW_WATERMARK_RATIO=0.1

Model Selection

  • Embedding Model: ModernBERT for high-quality embeddings
  • Reranking Model: ColBERT v2.0 for relevance improvement
  • LLM: Any model supported by LMStudio

Performance Tuning

  • Adjust INITIAL_RETRIEVAL_COUNT vs DEFAULT_N_RESULTS ratio
  • Balance CHUNK_SIZE vs retrieval specificity
  • Monitor token usage with analysis tools

Testing

The project includes comprehensive unit and integration tests to ensure reliability and help with development.

Quick Test Verification

# Verify testing infrastructure works
python test_basic.py

Test Structure

The test suite is designed with isolated, focused tests:

  • test_basic.py - Infrastructure verification (always run this first)
  • test_interactive_rag.py - Unit tests for core components
  • test_integration.py - Integration tests for full workflows
  • run_tests.py - Test runner with multiple options

Test Categories

Basic Tests (test_basic.py)

Quick verification that the environment is set up correctly:

python test_basic.py

Unit Tests (test_interactive_rag.py)

Test individual components in isolation:

  • TestTokenCounter: Token counting and formatting
  • TestFileTracker: MD5-based file change detection
  • TestDocumentProcessor: Document parsing and chunking
  • TestBERTEmbeddingFunction: Embedding generation (mocked)
  • TestUtilityFunctions: Helper functions and error handling

Integration Tests (test_integration.py)

Test full system integration with mocked external dependencies:

  • TestPyLateReranker: Reranking functionality and fallbacks
  • TestInteractiveRAGChainIntegration: End-to-end RAG workflow
  • TestErrorHandlingIntegration: Error scenarios and recovery

Running Tests

Using the Test Runner

# Show available commands
python run_tests.py

# Run all tests (requires proper mocking setup)
python run_tests.py all

# Run only unit tests
python run_tests.py unit

# Run specific test class
python run_tests.py class TestTokenCounter

# List available test classes
python run_tests.py list

Direct Test Execution

# Run basic infrastructure tests (recommended first step)
python test_basic.py

# Run individual test files (may require environment setup)
python -m unittest test_interactive_rag.TestTokenCounter -v
python -m unittest test_integration.TestPyLateReranker -v

Coverage Reports (Optional)

# Install coverage tool
pip install coverage

# Run with coverage
python run_tests.py coverage
# View report in htmlcov/index.html

Test Features

  • Mocking: External dependencies (ChromaDB, OpenAI, PyLate) are mocked for isolation
  • Temporary Files: Tests use temporary directories to avoid conflicts
  • Error Simulation: Tests verify graceful error handling
  • No External Dependencies: Tests run without requiring LMStudio, ChromaDB, or model downloads

Development Testing Workflow

  1. Start with basics: python test_basic.py
  2. Test components: Run specific unit tests for the code you're working on
  3. Integration testing: Use integration tests for workflow verification
  4. Full suite: Run complete test suite before commits

Troubleshooting Tests

  • Import Errors: Run test_basic.py first to verify environment
  • Hanging Tests: Some tests may hang if they try to load actual models - ensure mocking is working
  • Missing Dependencies: Install test dependencies with pip install coverage (optional)

Test Design Principles

  • Isolation: Each test is independent and can run alone
  • Mocking: External services are mocked to ensure fast, reliable tests
  • Clear Assertions: Tests have clear success/failure criteria
  • Documentation: Each test class and method is documented

About

Local Interactive RAG Chain with Re-Ranking

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages