RAG-Anything is a state-of-the-art Multimodal Retrieval-Augmented Generation (RAG) system running entirely on local hardware. It leverages MinerU for high-fidelity document parsing and LightRAG for advanced graph-based retrieval, all powered by local LLMs via Ollama.
- All-in-One Processing: Handles text, images, tables, and mathematical equations from complex PDFs and Office documents.
- 100% Local: No API keys required. Your data never leaves your machine.
- Multimodal Knowledge Graph: Automatically extracts entities and relationships across different content types for deeper understanding.
- Vision-Aware Retrieval: Uses vision-language models (VLM) to analyze figures and charts directly.
- Framework:
raganything(HKUDS) - Indexing Engine:
LightRAG(Graph-based RAG) - Document Parser:
MinerU(VLM-based parsing) - Local LLM Server:
Ollama - Models Used:
- LLM:
gemma4:latest(Reasoning & Chat) - Vision:
qwen3-vl:4b(Image & Chart understanding) - Embeddings:
nomic-embed-text:latest(Vector search)
- LLM:
ollama_rag.py: The main entry point. Configures the RAG-Anything pipeline to use local Ollama endpoints for text completion, vision tasks, and embeddings.rag_storage/: Directory containing the LightRAG knowledge graph, vector database, and document status.output/: Contains the structured output from MinerU (JSON, Markdown, and extracted images).
- Install Ollama.
- Pull the required models:
ollama run gemma4:e2b ollama run qwen3-vl:4b ollama run nomic-embed-text
pip install "raganything[all]"- Navigate to the project directory:
cd "D:\Ray Codes\AG Projects\RAGAnything" - Start the indexing and interactive query session:
python ollama_rag.py "path/to/your/document.pdf"
- Academic Research: Index research papers and ask for comparisons between graphs, formulas, and text across multiple documents.
- Technical Datasheets: Extract precise specifications from complex tables and circuit diagrams in engineering PDFs.
- Financial Analysis: Analyze annual reports where key data is often trapped in charts and nested tables.
- Legal Discovery: Parse and index large contracts where cross-referencing between sections and exhibits is critical.
- Multi-Document Support: Expand the UI to manage and search across large libraries of documents simultaneously.
- Gradio Dashboard: A full web interface for document upload, visual graph exploration, and chat.
- Adaptive Parsing: Dynamically switch between
MinerU,Docling, andOCRbased on document complexity to save resources. - Export to Obsidian/Logseq: Automatically convert processed multimodal documents into linked notes for personal knowledge management.