Skip to content

Feature Request: Router mode - Do not reload model files if they are already loaded. #25066

Description

@mcr-ksh

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Dont (re-)load the model weights if they are already loaded.

[Qwen3.5-9b]
m = o:\models\llama.cpp\Qwen3.5-9B-SOMPOA-heresy-MTP-Q6_K.gguf
mmproj = o:\models\llama.cpp\mmproj-Qwen3.5-9b-SOMPOA-heresy-MTP-BF16.gguf
chat-template-file = o:\models\llama.cpp\qwen_chat_template.jinja

[Qwen3.5-9b-noreason]
m = o:\models\llama.cpp\Qwen3.5-9B-SOMPOA-heresy-MTP-Q6_K.gguf
mmproj = o:\models\llama.cpp\mmproj-Qwen3.5-9b-SOMPOA-heresy-MTP-BF16.gguf
chat-template-file = o:\models\llama.cpp\qwen_chat_template.jinja
chat-template-kwargs = {"enable_thinking":false}
reasoning = 0
reasoning-budget = 0
;ctx-size = 111072
temp = 0.7
top-p = 0.95
top-k = 20

Lets say I want to have the same model with different parameters to it but a full reload is actually not required and would terminate the other models requests even though they are going to the same model.

Motivation

efficiency. running one model on a small GPU with different parameters.

Possible Implementation

store the hash or filepath of the currently loaded model and compare if a reload is required or not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions