Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:

- uses: actions/setup-python@v6
with:
python-version: "3.x"
python-version: "3.14"

- name: Build release distributions
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]

steps:
- uses: actions/checkout@v6
Expand Down
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,38 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
### Added

- Add ColQwen3 and BiQwen3 support (model + processor).
- Add regression tests for `ColPaliProcessor` to validate Transformers v5 modality registration and fallback loading behavior when a processor bundle is incomplete.

### Changed

- Bump runtime compatibility to `transformers>=5.0.0,<6.0.0`, `peft>=0.18.0,<0.19.0`, and `accelerate>=1.1.0,<2.0.0`.
- Update supported Python versions to `>=3.10,<3.15` and align CI workflows to Python 3.10–3.14.
- Update all affected processor subclasses (`Qwen2/Qwen2.5/Qwen3`, `Gemma3`, `Idefics3`, `ModernVBert`, `Qwen2.5 Omni`) to explicit `__init__` modality signatures required by Transformers v5 `ProcessorMixin`.

### Fixed

- Fix ColPali/PaliGemma model loading under Transformers v5 by adapting wrapper internals to new module layout and tied-weights expectations.
- Fix ColPali processor loading for checkpoints without a complete processor bundle by explicitly falling back to `AutoImageProcessor` + `AutoTokenizer`.
- Fix ColPali collator image token id lookup to use `convert_tokens_to_ids`, compatible with Transformers v5 tokenizer backend changes.
- Fix test collection on Python 3.14 by making `tests` an explicit package (`tests/__init__.py`).
- Fix CI formatting failure by applying `ruff format` to updated ColPali processing tests.
- Fix ColQwen2 and ColQwen2.5 initialization across Transformers versions by resolving hidden size from either `config.hidden_size` or `config.text_config.hidden_size`.
- Call `post_init()` in ColIdefics3 and ColModernVBert to align model initialization with Transformers v5 expectations.
- Improve `VisualRetrieverCollator` image token id resolution by preferring processor-level `image_token_id` when available.
- Fix ColQwen2 and ColQwen2.5 LoRA checkpoint key remapping for `custom_text_proj` (`base_model.model.*` -> model keys) to avoid missing/unexpected adapter keys at load time.
- Fix ColPali LoRA adapter key remapping for `custom_text_proj` (`base_model.model.*` -> model keys) and ignore expected missing `model.lm_head.weight` during load.
- Fix ColModernVBert LoRA adapter key remapping for `custom_text_proj` (`base_model.model.*` -> model keys) to avoid missing/unexpected adapter keys at load time.
- Fix ColQwen2.5-Omni LoRA adapter key remapping for `custom_text_proj` (`base_model.model.*` -> model keys) to avoid missing/unexpected adapter keys at load time.
- Fix ColQwen3 LoRA adapter key remapping for `custom_text_proj` (`base_model.model.*` -> model keys) to avoid missing/unexpected adapter keys at load time.
- Fix ColGemma3 LoRA adapter key remapping for `custom_text_proj` (`base_model.model.*` -> model keys) to avoid missing/unexpected adapter keys at load time.
- Ensure adapter loading remains robust across Transformers v5 base-load and PEFT adapter-load code paths, preventing silent fallback to randomly initialized projection adapters in retrieval models.

### Tests

- Cover ColQwen3 processing and modeling with slow integration tests.
- Run targeted non-slow processing tests for Gemma3, Idefics3, ModernVBert, Qwen2, Qwen2.5 and Qwen3 after the Transformers v5 processor-signature migration.
- Run slow ColPali model-loading and query-forward integration tests under Transformers v5 to validate end-to-end loading behavior.
- Expand adapter checkpoint key remapping regression tests to cover ColPali, ColGemma3, ColQwen2, ColQwen2.5, ColQwen3, ColQwen2.5-Omni and ColModernVBert, including registry-backed conversion checks where needed.

## [0.3.13] - 2025-11-15

Expand Down
11 changes: 5 additions & 6 deletions colpali_engine/collators/visual_retriever_collator.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,11 @@ def __init__(

# If processor is one of the supported types, extract the <image> token id.
if isinstance(self.processor, (ColPaliProcessor,)):
image_token = "<image>"
try:
idx = self.processor.tokenizer.additional_special_tokens.index(image_token)
self.image_token_id = self.processor.tokenizer.additional_special_tokens_ids[idx]
except ValueError:
self.image_token_id = None
if hasattr(self.processor, "image_token_id"):
token_id = self.processor.image_token_id
else:
token_id = self.processor.tokenizer.convert_tokens_to_ids("<image>")
self.image_token_id = token_id if token_id is not None and token_id >= 0 else None

# Force padding to be on the right for ColPaliProcessor.
if isinstance(self.processor, ColPaliProcessor) and self.processor.tokenizer.padding_side != "right":
Expand Down
13 changes: 11 additions & 2 deletions colpali_engine/models/gemma3/bigemma3/processing_bigemma.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,19 @@ class BiGemmaProcessor3(BaseVisualRetrieverProcessor, Gemma3Processor): # noqa:

def __init__(
self,
*args,
image_processor,
tokenizer,
chat_template=None,
image_seq_length: int = 256,
**kwargs,
):
super().__init__(*args, **kwargs)
super().__init__(
image_processor=image_processor,
tokenizer=tokenizer,
chat_template=chat_template,
image_seq_length=image_seq_length,
**kwargs,
)
self.tokenizer.padding_side = "left"

@classmethod
Expand Down
6 changes: 5 additions & 1 deletion colpali_engine/models/gemma3/colgemma3/modeling_colgemma.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ class ColGemma3(Gemma3Model):
"""

main_input_name: ClassVar[str] = "doc_input_ids" # transformers-related
_checkpoint_conversion_mapping = {
r"^base_model\.model\.custom_text_proj": "custom_text_proj",
}

def __init__(
self,
Expand All @@ -54,7 +57,8 @@ def __init__(
def from_pretrained(cls, *args, **kwargs):
key_mapping = kwargs.pop("key_mapping", None)
if key_mapping is None:
key_mapping = super()._checkpoint_conversion_mapping
key_mapping = dict(getattr(super(), "_checkpoint_conversion_mapping", {}))
key_mapping.update(cls._checkpoint_conversion_mapping)
return super().from_pretrained(*args, **kwargs, key_mapping=key_mapping)

@property
Expand Down
13 changes: 11 additions & 2 deletions colpali_engine/models/gemma3/colgemma3/processing_colgemma.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,19 @@ class ColGemmaProcessor3(BaseVisualRetrieverProcessor, Gemma3Processor):

def __init__(
self,
*args,
image_processor,
tokenizer,
chat_template=None,
image_seq_length: int = 256,
**kwargs,
):
super().__init__(*args, **kwargs)
super().__init__(
image_processor=image_processor,
tokenizer=tokenizer,
chat_template=chat_template,
image_seq_length=image_seq_length,
**kwargs,
)
# Set padding side to left (important for decoder-only models)
self.tokenizer.padding_side = "left"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ def __init__(self, config, mask_non_image_embeddings: bool = False):
self.linear = nn.Linear(self.model.config.text_config.hidden_size, self.dim)
self.mask_non_image_embeddings = mask_non_image_embeddings
self.main_input_name = "doc_input_ids"
self.post_init()

def forward(self, *args, **kwargs):
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,21 @@ class ColIdefics3Processor(
image_token: ClassVar[str] = "<image>"
visual_prompt_prefix: ClassVar[str] = "<|im_start|>User:<image>Describe the image.<end_of_utterance>\nAssistant:"

def __init__(self, *args, image_seq_len=64, **kwargs):
super().__init__(*args, image_seq_len=image_seq_len, **kwargs)
def __init__(
self,
image_processor,
tokenizer=None,
image_seq_len=64,
chat_template=None,
**kwargs,
):
super().__init__(
image_processor=image_processor,
tokenizer=tokenizer,
image_seq_len=image_seq_len,
chat_template=chat_template,
**kwargs,
)
self.tokenizer.padding_side = "left"

def process_images(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
from torch import nn
from transformers.conversion_mapping import get_checkpoint_conversion_mapping, register_checkpoint_conversion_mapping
from transformers.core_model_loading import WeightRenaming

from colpali_engine.models.modernvbert.modeling_modernvbert import ModernVBertModel, ModernVBertPreTrainedModel


class ColModernVBert(ModernVBertPreTrainedModel):
_checkpoint_conversion_mapping = {
r"^base_model\.model\.custom_text_proj": "custom_text_proj",
}
"""
Initializes the ColModernVBert model.

Expand All @@ -26,6 +31,15 @@ def __init__(self, config, mask_non_image_embeddings: bool = False, **kwargs):
self.custom_text_proj = nn.Linear(self.model.config.text_config.hidden_size, self.dim)
self.mask_non_image_embeddings = mask_non_image_embeddings
self.main_input_name = "doc_input_ids"
self.post_init()

@classmethod
def from_pretrained(cls, *args, **kwargs):
key_mapping = kwargs.pop("key_mapping", None)
if key_mapping is None:
key_mapping = dict(getattr(super(), "_checkpoint_conversion_mapping", {}))
key_mapping.update(cls._checkpoint_conversion_mapping)
return super().from_pretrained(*args, **kwargs, key_mapping=key_mapping)

def forward(self, *args, **kwargs):
"""
Expand All @@ -50,3 +64,13 @@ def forward(self, *args, **kwargs):
image_mask = (kwargs["input_ids"] == self.config.image_token_id).unsqueeze(-1)
proj = proj * image_mask
return proj


if get_checkpoint_conversion_mapping("modernvbert") is None:
register_checkpoint_conversion_mapping(
"modernvbert",
[
WeightRenaming(source_patterns=k, target_patterns=v)
for k, v in ColModernVBert._checkpoint_conversion_mapping.items()
],
)
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,21 @@ class ColModernVBertProcessor(
"<|begin_of_text|>User:<image>Describe the image.<end_of_utterance>\nAssistant:"
)

def __init__(self, *args, image_seq_len=64, **kwargs):
super().__init__(*args, image_seq_len=image_seq_len, **kwargs)
def __init__(
self,
image_processor,
tokenizer=None,
image_seq_len=64,
chat_template=None,
**kwargs,
):
super().__init__(
image_processor=image_processor,
tokenizer=tokenizer,
image_seq_len=image_seq_len,
chat_template=chat_template,
**kwargs,
)
self.tokenizer.padding_side = "left"

def process_images(
Expand Down
51 changes: 28 additions & 23 deletions colpali_engine/models/paligemma/bipali/modeling_bipali.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@


class BiPali(PaliGemmaPreTrainedModel):
_keys_to_ignore_on_load_missing = [r"model\.lm_head\.weight"]
"""
BiPali is an implementation from the "ColPali: Efficient Document Retrieval with Vision Language Models" paper.
Representations are average pooled to obtain a single vector representation.
Expand All @@ -17,6 +18,7 @@ class BiPali(PaliGemmaPreTrainedModel):
"^model.vision_tower": "model.model.vision_tower",
"^model.multi_modal_projector": "model.model.multi_modal_projector",
"^model.language_model.lm_head": "model.lm_head",
r"^base_model\.model\.custom_text_proj": "custom_text_proj",
}

@classmethod
Expand All @@ -29,36 +31,37 @@ def from_pretrained(cls, *args, **kwargs):
def __init__(self, config: PaliGemmaConfig):
super(BiPali, self).__init__(config=config)
model: PaliGemmaForConditionalGeneration = PaliGemmaForConditionalGeneration(config)
if model.language_model._tied_weights_keys is not None:
self._tied_weights_keys = [f"model.language_model.{k}" for k in model.language_model._tied_weights_keys]
if model.model.language_model._tied_weights_keys is not None:
self._tied_weights_keys = [
f"model.model.language_model.{k}" for k in model.model.language_model._tied_weights_keys
]
self.model: PaliGemmaForConditionalGeneration = model
self.model.lm_head = torch.nn.Identity()
self.main_input_name = "doc_input_ids"
self.post_init()

def get_input_embeddings(self):
return self.model.language_model.get_input_embeddings()
return self.model.model.language_model.get_input_embeddings()

def set_input_embeddings(self, value):
self.model.language_model.set_input_embeddings(value)
self.model.model.language_model.set_input_embeddings(value)

def get_output_embeddings(self):
return self.model.language_model.get_output_embeddings()
return self.model.model.language_model.get_output_embeddings()

def set_output_embeddings(self, new_embeddings):
self.model.language_model.set_output_embeddings(new_embeddings)
self.model.model.language_model.set_output_embeddings(new_embeddings)

def set_decoder(self, decoder):
self.model.language_model.set_decoder(decoder)
self.model.model.language_model.set_decoder(decoder)

def get_decoder(self):
return self.model.language_model.get_decoder()
return self.model.model.language_model.get_decoder()

def tie_weights(self):
return self.model.language_model.tie_weights()
def tie_weights(self, *args, **kwargs):
return self.model.model.language_model.tie_weights(*args, **kwargs)

def resize_token_embeddings(self, new_num_tokens: Optional[int] = None, pad_to_multiple_of=None) -> nn.Embedding:
model_embeds = self.model.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
model_embeds = self.model.model.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
# update vocab size
self.config.text_config.vocab_size = model_embeds.num_embeddings
self.config.vocab_size = model_embeds.num_embeddings
Expand Down Expand Up @@ -89,37 +92,39 @@ class BiPaliProj(PaliGemmaPreTrainedModel):
def __init__(self, config: PaliGemmaConfig):
super(BiPaliProj, self).__init__(config=config)
model: PaliGemmaForConditionalGeneration = PaliGemmaForConditionalGeneration(config)
if model.language_model._tied_weights_keys is not None:
self._tied_weights_keys = [f"model.language_model.{k}" for k in model.language_model._tied_weights_keys]
if model.model.language_model._tied_weights_keys is not None:
self._tied_weights_keys = [
f"model.model.language_model.{k}" for k in model.model.language_model._tied_weights_keys
]
self.model: PaliGemmaForConditionalGeneration = model
self.main_input_name = "doc_input_ids"
self.dim = 1024
self.custom_text_proj = nn.Linear(self.model.config.text_config.hidden_size, self.dim)
self.post_init()

def get_input_embeddings(self):
return self.model.language_model.get_input_embeddings()
return self.model.model.language_model.get_input_embeddings()

def set_input_embeddings(self, value):
self.model.language_model.set_input_embeddings(value)
self.model.model.language_model.set_input_embeddings(value)

def get_output_embeddings(self):
return self.model.language_model.get_output_embeddings()
return self.model.model.language_model.get_output_embeddings()

def set_output_embeddings(self, new_embeddings):
self.model.language_model.set_output_embeddings(new_embeddings)
self.model.model.language_model.set_output_embeddings(new_embeddings)

def set_decoder(self, decoder):
self.model.language_model.set_decoder(decoder)
self.model.model.language_model.set_decoder(decoder)

def get_decoder(self):
return self.model.language_model.get_decoder()
return self.model.model.language_model.get_decoder()

def tie_weights(self):
return self.model.language_model.tie_weights()
def tie_weights(self, *args, **kwargs):
return self.model.model.language_model.tie_weights(*args, **kwargs)

def resize_token_embeddings(self, new_num_tokens: Optional[int] = None, pad_to_multiple_of=None) -> nn.Embedding:
model_embeds = self.model.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
model_embeds = self.model.model.language_model.resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
# update vocab size
self.config.text_config.vocab_size = model_embeds.num_embeddings
self.config.vocab_size = model_embeds.num_embeddings
Expand Down
Loading