MultiModal RAG with ColPali

Overview

This repository is designed to demonstrate how to integrate ColPali embeddings for advanced multi-modal retrieval augmented generation (RAG). We use a PDF index for querying, combined with a Llama 3.2 Vision-Language model for result generation.

ColPali Model

We incorporate the ColPali Embedding model from Hugging Face, specifically vidore/colpali-v1.2, which provides robust embeddings for text and vision. The RAGMultiModalModel class is leveraged for indexing and retrieval.

Installation Steps

Install Requirements

pip install byaldi
sudo apt-get install -y poppler-utils
pip install huggingface_hub
!pip install together --q

Log in to Hugging Face
Provide your HF_TOKEN to authenticate with Hugging Face.
```
from huggingface_hub import login
login(token="HF_TOKEN")
```

Initialize the Model

from byaldi import RAGMultiModalModel
model = RAGMultiModalModel.from_pretrained('vidore/colpali-v1.2')

Index Creation

The PDF file colpali.pdf is downloaded, then passed to model.index, which creates an index for retrieval. The index_name argument is set to 'colpali'.

Querying the Model

After generating the index, we run a query such as:

query = "What is ColPali's (late interaction) evaluation base line score on DocQ and InfoQ?"
results = model.search(query, k=2)

The top retrievals are processed to gather the best possible answer from the colpali.pdf content.

MultiModal RAG Flow

User Query
Text + Vision Embedding via ColPali
Index -> Retrieve relevant pages
Llama 3.2 VLM processes both text query and retrieved PDF content
Generated Answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiModal RAG with ColPali

Overview

ColPali Model

Installation Steps

Index Creation

Querying the Model

MultiModal RAG Flow

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MultiModal RAG with ColPali

Overview

ColPali Model

Installation Steps

Index Creation

Querying the Model

MultiModal RAG Flow