Skip to content

Failed to load mtmd context from separate mmproj gguf file #54

@biaks

Description

@biaks

I installed the latest wheel llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64.whl:

At first I had to manually rename it by adding a second "cp313" to its name, otherwise the installer complained about an invalid name:

python -m pip install https://github.com/JamePeng/llama-cpp-python/releases/download/v0.3.23-cu130-Basic-win-20260127/llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64.whl
ERROR: Invalid wheel filename (wrong number of parts): 'llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64'

And finally when I install it and try to run the create_chat_completion() function with model_path="Qwen3VL-4B-Instruct.Q8_0.gguf", clip_model_path="Qwen3VL-4B-Instruct.mmproj-Q8_0.gguf" I get this error:

  File "D:\ComfyUI\custom_nodes\my-llama-cpp\nodes.py", line 428, in generate
    response = _global_llm.create_chat_completion(
        messages=messages,
    ...<6 lines>...
        response_format=response_format_param,
    )
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama.py", line 2269, in create_chat_completion
    return handler(
        llama=self,
    ...<38 lines>...
        grammar=grammar,
    )
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 4289, in __call__
    return super().__call__(**kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 2969, in __call__
    self._init_mtmd_context(llama)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 2883, in _init_mtmd_context
    raise ValueError(f"Failed to load mtmd context from: {self.clip_model_path}")
ValueError: Failed to load mtmd context from: d:\models\Qwen3VL-4B-Instruct.mmproj-Q8_0.gguf

I tested it with another pair of Qwen25VL models in GGUF format and the result was the same.

Before this, I was building and installing the latest version 0.3.16 from the original repository https://github.com/abetlen/llama-cpp-python, and the mmproj models worked fine, the problem was that version 0.3.16, which is in the original repository, does not support Qwen3VL.
I also tried to simply update in original repository submodule vendor/lama-cpp via "git checkout master & git pull" and build it, but I received errors in the mtmd module during the build stage with the CUDA compiler.

my system is:

Windows 11

NVIDIA-SMI 591.74
Driver Version: 591.74
CUDA Version: 13.1
NVIDIA GeForce RTX 5070 Ti 16GiB

Python 3.13.9
fastapi 0.128.0
numpy 2.2.6
sse-starlette 3.1.2
uvicorn 0.40.0
pip 25.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions