Skip to content

TheLion-ai/RocketRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ RocketRAG

Fast, efficient, minimal, extendible and elegant RAG system

RocketRAG is a high-performance Retrieval-Augmented Generation (RAG) system designed with a focus on speed, simplicity, and extensibility. Built on top of state-of-the-art libraries, it provides both CLI and web server capabilities for seamless integration into any workflow.

rocketrag.mp4

🎯 Mission

RocketRAG aims to be the fastest and most efficient RAG library while maintaining:

  • Minimal footprint - Clean, lightweight codebase
  • Maximum extensibility - Pluggable architecture for all components
  • Peak performance - Leveraging the best-in-class libraries
  • Ease of use - Simple CLI and API interfaces

⚑ Performance-First Architecture

RocketRAG is built on top of cutting-edge, performance-optimized libraries:

πŸš€ Quick Start

Installation

Using pip

pip install rocketrag

Using uvx (recommended for CLI usage)

# Run directly without installation
uvx rocketrag --help

# Or install globally
uvx install rocketrag

Basic Usage

from rocketrag import RocketRAG

rag = RocketRAG("./data") # Path do your data (supports PDF, TXT, MD, etc.)
rag.prepare() # Construct vector database

# Ask questions
answer, sources = rag.ask("What is the main topic of the documents?")
print(answer)

CLI Usage

# Prepare documents from a directory
rocketrag prepare --data-dir ./documents

# Ask questions via CLI
rocketrag ask "What are the key findings?"

# Start web server
rocketrag server --port 8000

Using uvx (no installation required)

# Same commands work with uvx
uvx rocketrag prepare --data-dir ./documents
uvx rocketrag ask "What are the key findings?"
uvx rocketrag server --port 8000

# Run as module
uvx --from rocketrag python -m rocketrag --help

πŸ—οΈ Architecture

RocketRAG follows a modular, plugin-based architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Document      β”‚    β”‚    Chunking     β”‚    β”‚   Vectorization β”‚
β”‚   Loaders       │───▢│   (Chonkie)     │───▢│ (SentenceTransf)β”‚
β”‚  (Kreuzberg)    β”‚    β”‚                 β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                        β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚      LLM        β”‚    β”‚   Vector DB     β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ (llama-cpp-py)  │◀───│ (Milvus Lite)   β”‚
β”‚                 β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  • BaseLoader: Pluggable document loading (PDF, TXT, MD, etc.)
  • BaseChunker: Configurable chunking strategies (semantic, recursive, etc.)
  • BaseVectorizer: Flexible embedding models
  • BaseLLM: Swappable language models
  • MilvusLiteDB: High-performance vector storage and retrieval

πŸ”§ Configuration

Custom Components

from rocketrag import RocketRAG
from rocketrag.vectors import SentenceTransformersVectorizer
from rocketrag.chonk import ChonkieChunker
from rocketrag.llm import LLamaLLM
from rocketrag.loaders import KreuzbergLoader

# Configure high-performance components
vectorizer = SentenceTransformersVectorizer(
    model_name="minishlab/potion-multilingual-128M"  # Fast multilingual model
)

chunker = ChonkieChunker(
    method="semantic",  # Semantic chunking for better context
    embedding_model="minishlab/potion-multilingual-128M",
    chunk_size=512
)

llm = LLamaLLM(
    repo_id="unsloth/gemma-3n-E2B-it-GGUF",
    filename="*Q8_0.gguf"  # Quantized for speed
)

loader = KreuzbergLoader()  # Ultra-fast document processing

rag = RocketRAG(
    vectorizer=vectorizer,
    chunker=chunker,
    llm=llm,
    loader=loader
)

CLI Configuration

# Custom chunking strategy
rocketrag prepare \
  --chonker chonkie \
  --chonker-args '{"method": "semantic", "chunk_size": 512}' \
  --vectorizer-args '{"model_name": "all-MiniLM-L6-v2"}'

# Custom LLM for inference
rocketrag ask "Your question" \
  --repo-id "microsoft/DialoGPT-medium" \
  --filename "*.gguf"

🌐 Web Server

RocketRAG includes a FastAPI-based web server with OpenAI-compatible endpoints:

# Start server
rocketrag server --port 8000 --host 0.0.0.0

API Endpoints

  • GET / - Interactive web interface
  • POST /ask - Question answering
  • POST /ask/stream - Streaming responses
  • GET /chat - Chat interface
  • GET /browse - Document browser
  • GET /visualize - Vector visualization
  • GET /health - Health check

Example API Usage

import requests

response = requests.post(
    "http://localhost:8000/ask",
    json={"question": "What are the main findings?"}
)

result = response.json()
print(result["answer"])
print(result["sources"])

🎨 Features

Core Features

  • ⚑ Ultra-fast document processing with Kreuzberg
  • 🧠 Semantic chunking with Chonkie and model2vec
  • πŸ” High-performance vector search with Milvus Lite
  • πŸ€– Optimized LLM inference with llama-cpp-python
  • πŸ“Š Rich CLI interface with progress bars and formatting
  • 🌐 Web server with interactive UI
  • πŸ”Œ Pluggable architecture for easy customization

Advanced Features

  • πŸ“ˆ Vector visualization for debugging and analysis
  • πŸ“š Document browsing interface
  • πŸ’¬ Streaming responses for real-time interaction
  • πŸ”„ Batch processing for large document sets
  • πŸ“ Metadata preservation throughout the pipeline
  • 🎯 Context-aware chunking for better retrieval

πŸ› οΈ Development

Installation for Development

git clone https://github.com/yourusername/rocketrag.git
cd rocketrag
pip install -e ".[dev]"

Running Tests

pytest tests/

Code Quality

ruff check .
ruff format .

πŸ“Š Performance

RocketRAG is designed for speed:

  • Document Loading: 10x faster with Kreuzberg's optimized parsers
  • Chunking: Semantic chunking with model2vec for superior context preservation
  • Vectorization: Optimized batch processing with sentence-transformers
  • Retrieval: Sub-millisecond vector search with Milvus Lite
  • Generation: GGUF quantization for 4x faster inference

🀝 Contributing

We welcome contributions! RocketRAG's modular architecture makes it easy to:

  • Add new document loaders
  • Implement custom chunking strategies
  • Integrate different embedding models
  • Support additional LLM backends
  • Enhance the web interface

πŸ™ Acknowledgments

RocketRAG builds upon the excellent work of:

About

RocketRAG is a high-performance Retrieval-Augmented Generation (RAG) system designed with a focus on speed, simplicity, and extensibility. Built on top of state-of-the-art libraries, it provides both CLI and web server capabilities for seamless integration into any workflow.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors