Interactive RAG Chain Setup Guide

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Start LMStudio

Download and install LMStudio
Load a model of your choice
Start the local server (default: http://localhost:1234/v1)

3. Quick Test Verification

python verify_tests.py

4. Run the Interactive RAG System

python interactive_rag.py

4. (Optional) Verify Installation

# Test that everything is set up correctly
python test_basic.py

Features

🔍 ModernBERT Embeddings: High-quality semantic embeddings for document search
🎯 PyLate/ColBERT Reranking: Advanced reranking for improved relevance
📚 Multi-format Support: Processes PDF and Markdown files from ./data directory
🤖 LMStudio Integration: Local LLM inference for privacy
💬 Interactive Chat: Real-time question answering
📊 Smart Chunking: Intelligent text splitting with boundary detection
🔄 MD5 File Tracking: Incremental updates based on file changes
📈 Token Analysis: Comprehensive token counting and optimization

Main Commands (interactive_rag.py)

Interactive Session Commands

Once running the main system (python interactive_rag.py), you can use:

Ask questions: Simply type your question about the documents
index - Check for document changes and update index incrementally
force-reindex / reindex - Completely rebuild the entire index
status - Check system status (ChromaDB, LMStudio, reranker, files)
files - Show tracked files and their MD5 status
quit / exit / q - Exit the system

System Status Information

The status command shows:

ChromaDB collection document count
LMStudio connection status and model name
PyLate/ColBERT reranker status
File tracking statistics

File Management

The files command displays:

All tracked files with their relative paths
MD5 checksums for change detection
Last updated timestamps

Configuration Management (config_util.py)

The configuration utility helps manage chunk sizes and reindexing:

Commands

# Show current configuration
python config_util.py config

# Check if reindexing is needed after config changes
python config_util.py check

# Analyze collection chunks and tokens in detail
python config_util.py scan

# Force complete reindex with current configuration
python config_util.py reindex

# Show help
python config_util.py help

Configuration Analysis

config: Displays chunk size, overlap, retrieval settings, and token estimates
check: Compares current index with configuration settings
scan: Detailed analysis of chunk sizes, token counts, and efficiency
reindex: Complete rebuild when chunk size or other core settings change

When to Use config_util.py

After changing CHUNK_SIZE or CHUNK_OVERLAP in config.py
When switching embedding models
To analyze current chunking efficiency
Before/after major configuration changes

Enhanced Analysis (enhanced_scan.py)

Provides detailed chunk and token analysis:

# Run comprehensive analysis
python enhanced_scan.py

Analysis Features

Collection Overview: Total chunks and document count
Chunk Size Statistics: Average, median, min/max chunk sizes
Target Achievement: How well chunks match configured size
Token Analysis: Estimated tokens per chunk and total context
Size Distribution: Visual breakdown of chunk size ranges
Per-Document Analysis: Detailed statistics for each source file
Document Distribution: Analysis by chunk count per document
Optimization Recommendations: Suggestions for improving chunking

Understanding the Output

Target Achievement: Percentage of configured chunk size actually achieved
Context Quality: Assessment based on estimated tokens with default results
Token Distribution: Visual bars showing chunk size patterns
Per-Document Stats: Individual file analysis with chunk counts and sizes

Configuration Files

config.py Settings

Key settings you can modify:

CHUNK_SIZE = 8000          # Target characters per chunk
CHUNK_OVERLAP = 400        # Character overlap between chunks
DEFAULT_N_RESULTS = 8      # Documents returned by default
INITIAL_RETRIEVAL_COUNT = 20 # Documents fetched before reranking
EMBEDDING_MODEL = "answerdotai/ModernBERT-base"
COLBERT_MODEL = "colbert-ir/colbertv2.0"

Important Notes

Changing CHUNK_SIZE or CHUNK_OVERLAP requires reindexing
Use config_util.py check to verify if reindexing is needed
Larger chunks = more context but fewer specific matches
More overlap = better boundary handling but more storage

Example Workflows

Initial Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Add documents to data/ folder
cp your_documents.pdf data/

# 3. Check configuration
python config_util.py config

# 4. Start the system
python interactive_rag.py

After Configuration Changes

# 1. Edit config.py with new settings
# 2. Check if reindexing is needed
python config_util.py check

# 3. If needed, reindex
python config_util.py reindex

# 4. Analyze results
python enhanced_scan.py

Performance Analysis

# 1. Analyze current chunking efficiency
python enhanced_scan.py

# 2. Check specific collection stats
python config_util.py scan

# 3. Optimize settings based on recommendations
# 4. Test with different queries

Daily Usage

# 1. Start system (checks for file changes automatically)
python interactive_rag.py

# 2. Ask questions interactively
# 3. Use 'status' to check system health
# 4. Use 'index' if you've added new documents

Usage Examples

Question Examples

💭 Your question: What are the main topics in the AI agent research papers?
💭 Your question: How do multi-agent systems handle failures?
💭 Your question: What safety considerations are mentioned for LLM agents?
💭 Your question: Compare the different approaches to agent architectures

System Management

💭 Your question: status
💭 Your question: files
💭 Your question: index
💭 Your question: force-reindex

Advanced Features

PyLate/ColBERT Reranking

Automatically reranks initial retrieval results for better relevance
Fetches INITIAL_RETRIEVAL_COUNT documents, reranks to top DEFAULT_N_RESULTS
Falls back to similarity-based ranking if reranker unavailable
Shows both similarity and reranking scores in results

Incremental File Updates

MD5-based change detection for all documents
Only reprocesses changed files
Tracks files across subdirectories
Automatic cleanup of deleted files

Token Management

Comprehensive token counting and analysis
Context quality assessment
Performance recommendations
Memory usage optimization

Smart Chunking

Respects paragraph, sentence, and word boundaries
Adaptive chunk sizing (95%-115% of target)
Configurable overlap for context preservation
Quality scoring for boundary detection

Troubleshooting

LMStudio Connection Issues

Ensure LMStudio is running with a model loaded
Check the server URL in LMStudio settings (default: http://localhost:1234/v1)
Verify the model is actively loaded (not just downloaded)

No Documents Found

Place PDF or Markdown files in the ./data/ directory
Run index command to reprocess without restarting
Check file permissions and formats

PyLate/ColBERT Issues

System automatically falls back to similarity ranking
Check PyLate installation: pip install pylate
GPU memory issues: System will use CPU fallback

Performance Issues

Use enhanced_scan.py to analyze chunking efficiency
Consider adjusting CHUNK_SIZE based on recommendations
Monitor context token usage with status command

Import Errors

pip install -r requirements.txt --upgrade

Configuration Mismatches

# Check if reindexing needed
python config_util.py check

# Force reindex if configuration changed
python config_util.py reindex

Testing

The project includes comprehensive unit and integration tests to ensure reliability and correctness.

Quick Test Verification

Before running the full test suite, verify that the basic test infrastructure works:

python verify_tests.py

This runs a lightweight verification to ensure the testing environment is properly set up without importing heavy ML models.

Running Tests

The project includes a test runner script that supports multiple testing modes:

# Run all tests (unit + integration)
python run_tests.py all

# Run only unit tests (faster, no ML model loading)
python run_tests.py unit

# Run only integration tests
python run_tests.py integration

# Run tests for a specific class
python run_tests.py class TestTokenCounter

# Generate test coverage report
python run_tests.py coverage

# List available test classes and methods
python run_tests.py list

Test Categories

Unit Tests (`test_interactive_rag.py`)

Tests core functionality without heavy dependencies:

TokenCounter: Token counting and formatting
FileTracker: MD5-based file change detection
DocumentProcessor: Text extraction and chunking
BERTEmbeddingFunction: Embedding function behavior (mocked)
Utility Functions: GPU memory management
Configuration: Config integration and validation
Error Handling: Graceful error handling scenarios

Integration Tests (`test_integration.py`)

Tests component interactions with mocks:

PyLateReranker: Reranking functionality
InteractiveRAGChain: End-to-end RAG workflow
Document Processing: Full document processing pipeline
LLM Integration: Language model interaction (mocked)
Error Recovery: Fallback mechanisms

Basic Infrastructure Tests (`test_basic.py`)

Lightweight tests that run without importing heavy modules:

Python environment validation
Basic file operations
Configuration loading

Test Coverage

Generate detailed test coverage reports:

python run_tests.py coverage

This creates an HTML coverage report in the htmlcov/ directory that you can open in a browser to see detailed line-by-line coverage.

Running Individual Tests

You can also run specific tests using Python's unittest module:

# Run a specific test class
python -m unittest test_interactive_rag.TestTokenCounter -v

# Run a specific test method
python -m unittest test_interactive_rag.TestTokenCounter.test_count_tokens_simple -v

# Run all unit tests with verbose output
python -m unittest test_interactive_rag -v

Test Dependencies

The test suite uses Python's built-in unittest framework and includes:

Mock objects: For isolating components and avoiding heavy ML model loading
Temporary directories: For safe file system testing
Coverage reporting: Via the coverage package

Troubleshooting Tests

Import Errors

If you encounter import errors when running tests:

pip install -r requirements.txt

Slow Test Execution

The main interactive_rag.py module loads heavy ML models. Use these approaches for faster testing:

Start with verification: python verify_tests.py
Run unit tests only: python run_tests.py unit
Use the test runner: python run_tests.py (optimized loading)

Test Environment Issues

Ensure you're in the correct directory and virtual environment:

cd /path/to/MyRAG
source .venv/bin/activate  # or activate your virtual environment
python verify_tests.py

File Structure

MyRAG/
├── interactive_rag.py              # Main interactive RAG system
├── config.py              # Central configuration
├── config_util.py         # Configuration management utility
├── enhanced_scan.py       # Detailed analysis tool
├── requirements.txt       # Python dependencies
├── test_basic.py          # Basic infrastructure tests
├── test_interactive_rag.py         # Unit tests
├── test_integration.py    # Integration tests
├── run_tests.py           # Test runner script
├── verify_tests.py        # Test verification utility
├── README.md              # This documentation
├── SETUP.md               # Detailed setup guide
├── .gitignore             # Version control exclusions
├── data/                  # Your documents (PDF, MD)
│   ├── *.pdf
│   ├── *.md
│   └── subdirectories/    # Recursive scanning supported
├── chroma_db/            # Vector database (auto-generated)
└── htmlcov/              # Test coverage reports (generated)

Advanced Configuration

Environment Variables

Set these before running for memory optimization:

export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
export PYTORCH_MPS_LOW_WATERMARK_RATIO=0.1

Model Selection

Embedding Model: ModernBERT for high-quality embeddings
Reranking Model: ColBERT v2.0 for relevance improvement
LLM: Any model supported by LMStudio

Performance Tuning

Adjust INITIAL_RETRIEVAL_COUNT vs DEFAULT_N_RESULTS ratio
Balance CHUNK_SIZE vs retrieval specificity
Monitor token usage with analysis tools

Testing

The project includes comprehensive unit and integration tests to ensure reliability and help with development.

Quick Test Verification

# Verify testing infrastructure works
python test_basic.py

Test Structure

The test suite is designed with isolated, focused tests:

test_basic.py - Infrastructure verification (always run this first)
test_interactive_rag.py - Unit tests for core components
test_integration.py - Integration tests for full workflows
run_tests.py - Test runner with multiple options

Test Categories

Basic Tests (`test_basic.py`)

Quick verification that the environment is set up correctly:

python test_basic.py

Unit Tests (`test_interactive_rag.py`)

Test individual components in isolation:

TestTokenCounter: Token counting and formatting
TestFileTracker: MD5-based file change detection
TestDocumentProcessor: Document parsing and chunking
TestBERTEmbeddingFunction: Embedding generation (mocked)
TestUtilityFunctions: Helper functions and error handling

Integration Tests (`test_integration.py`)

Test full system integration with mocked external dependencies:

TestPyLateReranker: Reranking functionality and fallbacks
TestInteractiveRAGChainIntegration: End-to-end RAG workflow
TestErrorHandlingIntegration: Error scenarios and recovery

Running Tests

Using the Test Runner

# Show available commands
python run_tests.py

# Run all tests (requires proper mocking setup)
python run_tests.py all

# Run only unit tests
python run_tests.py unit

# Run specific test class
python run_tests.py class TestTokenCounter

# List available test classes
python run_tests.py list

Direct Test Execution

# Run basic infrastructure tests (recommended first step)
python test_basic.py

# Run individual test files (may require environment setup)
python -m unittest test_interactive_rag.TestTokenCounter -v
python -m unittest test_integration.TestPyLateReranker -v

Coverage Reports (Optional)

# Install coverage tool
pip install coverage

# Run with coverage
python run_tests.py coverage
# View report in htmlcov/index.html

Test Features

Mocking: External dependencies (ChromaDB, OpenAI, PyLate) are mocked for isolation
Temporary Files: Tests use temporary directories to avoid conflicts
Error Simulation: Tests verify graceful error handling
No External Dependencies: Tests run without requiring LMStudio, ChromaDB, or model downloads

Development Testing Workflow

Start with basics: python test_basic.py
Test components: Run specific unit tests for the code you're working on
Integration testing: Use integration tests for workflow verification
Full suite: Run complete test suite before commits

Troubleshooting Tests

Import Errors: Run test_basic.py first to verify environment
Hanging Tests: Some tests may hang if they try to load actual models - ensure mocking is working
Missing Dependencies: Install test dependencies with pip install coverage (optional)

Test Design Principles

Isolation: Each test is independent and can run alone
Mocking: External services are mocked to ensure fast, reliable tests
Clear Assertions: Tests have clear success/failure criteria
Documentation: Each test class and method is documented

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
config.py		config.py
config_util.py		config_util.py
enhanced_scan.py		enhanced_scan.py
interactive_rag.py		interactive_rag.py
requirements.txt		requirements.txt
run_tests.py		run_tests.py
test_basic.py		test_basic.py
test_integration.py		test_integration.py
test_interactive_rag.py		test_interactive_rag.py

Folders and files

Latest commit

History

Repository files navigation

Interactive RAG Chain Setup Guide

Quick Start

1. Install Dependencies

2. Start LMStudio

3. Quick Test Verification

4. Run the Interactive RAG System

4. (Optional) Verify Installation

Features

Main Commands (interactive_rag.py)

Interactive Session Commands

System Status Information

File Management

Configuration Management (config_util.py)

Commands

Configuration Analysis

When to Use config_util.py

Enhanced Analysis (enhanced_scan.py)

Analysis Features

Understanding the Output

Configuration Files

config.py Settings

Important Notes

Example Workflows

Initial Setup

After Configuration Changes

Performance Analysis

Daily Usage

Usage Examples

Question Examples

System Management

Advanced Features

PyLate/ColBERT Reranking

Incremental File Updates

Token Management

Smart Chunking

Troubleshooting

LMStudio Connection Issues

No Documents Found

PyLate/ColBERT Issues

Performance Issues

Import Errors

Configuration Mismatches

Testing

Quick Test Verification

Running Tests

Test Categories

Unit Tests (test_interactive_rag.py)

Integration Tests (test_integration.py)

Basic Infrastructure Tests (test_basic.py)

Test Coverage

Running Individual Tests

Test Dependencies

Troubleshooting Tests

Import Errors

Slow Test Execution

Test Environment Issues

File Structure

Advanced Configuration

Environment Variables

Model Selection

Performance Tuning

Testing

Quick Test Verification

Test Structure

Test Categories

Basic Tests (test_basic.py)

Unit Tests (test_interactive_rag.py)

Integration Tests (test_integration.py)

Running Tests

Using the Test Runner

Direct Test Execution

Coverage Reports (Optional)

Test Features

Development Testing Workflow

Troubleshooting Tests

Test Design Principles

Unit Tests (`test_interactive_rag.py`)

Integration Tests (`test_integration.py`)

Basic Infrastructure Tests (`test_basic.py`)

Basic Tests (`test_basic.py`)

Unit Tests (`test_interactive_rag.py`)

Integration Tests (`test_integration.py`)

Packages