Failed to load mtmd context from separate mmproj gguf file


I installed the latest wheel [llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64.whl](https://github.com/JamePeng/llama-cpp-python/releases/download/v0.3.23-cu130-Basic-win-20260127/llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64.whl):

At first I had to manually rename it by adding a second "cp313" to its name, otherwise the installer complained about an invalid name:
```
python -m pip install https://github.com/JamePeng/llama-cpp-python/releases/download/v0.3.23-cu130-Basic-win-20260127/llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64.whl
ERROR: Invalid wheel filename (wrong number of parts): 'llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64'
```

And finally when I install it and try to run the create_chat_completion() function with model_path="Qwen3VL-4B-Instruct.Q8_0.gguf", clip_model_path="Qwen3VL-4B-Instruct.mmproj-Q8_0.gguf" I get this error:
```
  File "D:\ComfyUI\custom_nodes\my-llama-cpp\nodes.py", line 428, in generate
    response = _global_llm.create_chat_completion(
        messages=messages,
    ...<6 lines>...
        response_format=response_format_param,
    )
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama.py", line 2269, in create_chat_completion
    return handler(
        llama=self,
    ...<38 lines>...
        grammar=grammar,
    )
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 4289, in __call__
    return super().__call__(**kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 2969, in __call__
    self._init_mtmd_context(llama)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 2883, in _init_mtmd_context
    raise ValueError(f"Failed to load mtmd context from: {self.clip_model_path}")
ValueError: Failed to load mtmd context from: d:\models\Qwen3VL-4B-Instruct.mmproj-Q8_0.gguf
```

I tested it with another pair of Qwen25VL models in GGUF format and the result was the same.

Before this, I was building and installing the latest version 0.3.16 from the original repository https://github.com/abetlen/llama-cpp-python, and the mmproj models worked fine, the problem was that version 0.3.16, which is in the original repository, does not support Qwen3VL.
I also tried to simply update in original repository submodule vendor/lama-cpp via "git checkout master & git pull" and build it, but I received errors in the mtmd module during the build stage with the CUDA compiler.


my system is:

Windows 11

NVIDIA-SMI 591.74
Driver Version: 591.74
CUDA Version: 13.1
NVIDIA GeForce RTX 5070 Ti 16GiB

Python 3.13.9
fastapi 0.128.0
numpy 2.2.6
sse-starlette 3.1.2
uvicorn 0.40.0
pip 25.3


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to load mtmd context from separate mmproj gguf file #54

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Failed to load mtmd context from separate mmproj gguf file #54

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions