Skip to content

Remove hardcoded number of output groups and rank#347

Open
LifeDJIK wants to merge 1 commit into
antirez:mainfrom
LifeDJIK:main
Open

Remove hardcoded number of output groups and rank#347
LifeDJIK wants to merge 1 commit into
antirez:mainfrom
LifeDJIK:main

Conversation

@LifeDJIK
Copy link
Copy Markdown

@LifeDJIK LifeDJIK commented Jun 6, 2026

What changed:
Replaced hardcoded values for n_groups and rank with DS4_N_OUT_GROUP and DS4_N_LORA_O.

Why:
This allows to run Pro models on CPU. With hardcoded values ds4 will stop with an error: "grouped Q8_0 tensor has an unexpected layout".

$ ./ds4 -m DeepSeek-V4-Pro-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-Instruct-imatrix.gguf
ds4: context buffers 631.09 MiB (ctx=32768, backend=cpu, prefill_chunk=0, raw_kv_rows=128, compressed_kv_rows=8194)
Commands:
  /help          Show this help.
  /think         Use normal thinking mode.
  /think-max     Use Think Max only when context is at least 393216 tokens.
  /nothink       Disable thinking mode.
  /ctx N         Set context size for following prompts.
  /power N       Set GPU duty cycle percentage, 1..100.
  /read FILE     Read a prompt from FILE and run it.
  /quit, /exit   Leave the prompt.
  Ctrl+C         Stop generation and return to the prompt.
ds4> /nothink
Thinking mode: none.
ds4> Hi!
processing 11 input tokens: 11/11 (100.0%)
Hello! How can I help you today?
ds4: prefill: 0.62 t/s, generation: 0.57 t/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant