Prerequisites
Feature Description
Dont (re-)load the model weights if they are already loaded.
[Qwen3.5-9b]
m = o:\models\llama.cpp\Qwen3.5-9B-SOMPOA-heresy-MTP-Q6_K.gguf
mmproj = o:\models\llama.cpp\mmproj-Qwen3.5-9b-SOMPOA-heresy-MTP-BF16.gguf
chat-template-file = o:\models\llama.cpp\qwen_chat_template.jinja
[Qwen3.5-9b-noreason]
m = o:\models\llama.cpp\Qwen3.5-9B-SOMPOA-heresy-MTP-Q6_K.gguf
mmproj = o:\models\llama.cpp\mmproj-Qwen3.5-9b-SOMPOA-heresy-MTP-BF16.gguf
chat-template-file = o:\models\llama.cpp\qwen_chat_template.jinja
chat-template-kwargs = {"enable_thinking":false}
reasoning = 0
reasoning-budget = 0
;ctx-size = 111072
temp = 0.7
top-p = 0.95
top-k = 20
Lets say I want to have the same model with different parameters to it but a full reload is actually not required and would terminate the other models requests even though they are going to the same model.
Motivation
efficiency. running one model on a small GPU with different parameters.
Possible Implementation
store the hash or filepath of the currently loaded model and compare if a reload is required or not.
Prerequisites
Feature Description
Dont (re-)load the model weights if they are already loaded.
Lets say I want to have the same model with different parameters to it but a full reload is actually not required and would terminate the other models requests even though they are going to the same model.
Motivation
efficiency. running one model on a small GPU with different parameters.
Possible Implementation
store the hash or filepath of the currently loaded model and compare if a reload is required or not.