-
Notifications
You must be signed in to change notification settings - Fork 22
Description
I installed the latest wheel llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64.whl:
At first I had to manually rename it by adding a second "cp313" to its name, otherwise the installer complained about an invalid name:
python -m pip install https://github.com/JamePeng/llama-cpp-python/releases/download/v0.3.23-cu130-Basic-win-20260127/llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64.whl
ERROR: Invalid wheel filename (wrong number of parts): 'llama_cpp_python-0.3.23+cu130.basic-cp313-win_amd64'
And finally when I install it and try to run the create_chat_completion() function with model_path="Qwen3VL-4B-Instruct.Q8_0.gguf", clip_model_path="Qwen3VL-4B-Instruct.mmproj-Q8_0.gguf" I get this error:
File "D:\ComfyUI\custom_nodes\my-llama-cpp\nodes.py", line 428, in generate
response = _global_llm.create_chat_completion(
messages=messages,
...<6 lines>...
response_format=response_format_param,
)
File "D:\python_embeded\Lib\site-packages\llama_cpp\llama.py", line 2269, in create_chat_completion
return handler(
llama=self,
...<38 lines>...
grammar=grammar,
)
File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 4289, in __call__
return super().__call__(**kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^
File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 2969, in __call__
self._init_mtmd_context(llama)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
File "D:\python_embeded\Lib\site-packages\llama_cpp\llama_chat_format.py", line 2883, in _init_mtmd_context
raise ValueError(f"Failed to load mtmd context from: {self.clip_model_path}")
ValueError: Failed to load mtmd context from: d:\models\Qwen3VL-4B-Instruct.mmproj-Q8_0.gguf
I tested it with another pair of Qwen25VL models in GGUF format and the result was the same.
Before this, I was building and installing the latest version 0.3.16 from the original repository https://github.com/abetlen/llama-cpp-python, and the mmproj models worked fine, the problem was that version 0.3.16, which is in the original repository, does not support Qwen3VL.
I also tried to simply update in original repository submodule vendor/lama-cpp via "git checkout master & git pull" and build it, but I received errors in the mtmd module during the build stage with the CUDA compiler.
my system is:
Windows 11
NVIDIA-SMI 591.74
Driver Version: 591.74
CUDA Version: 13.1
NVIDIA GeForce RTX 5070 Ti 16GiB
Python 3.13.9
fastapi 0.128.0
numpy 2.2.6
sse-starlette 3.1.2
uvicorn 0.40.0
pip 25.3