Eval bug: Mistral Small 4 (mistral4 arch): repetitive/empty output on Metal

### Name and Version

 llama.cpp: b8390-b6c83aad5


### Operating systems

Mac

### GGML backends

Metal

### Hardware


**Environment:**
- llama.cpp: b8390-b6c83aad5
- Device: Apple M3 Ultra 512GB
- GGUF: unsloth/Mistral-Small-4-119B-2603-GGUF Q4_K_XL
- FA on/off: no difference



cc @ngxson

### Models

Mistral-Small-4

### Problem description & steps to reproduce

**Symptoms:**
1. Chat mode (via server/--jinja): outputs repetitive text 
   ("汉书后汉书后汉书后..." endlessly)
2. Think/reasoning mode: no output, hangs
3. Pure completion (--no-conversation): intermittent — 
   sometimes correct, sometimes falls into repetition loop


### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
 ./build/bin/llama-cli \
  -m unsloth/Mistral-Small-4-119B-2603-GGUF/Mistral-Small-4-119B-2603-UD-Q4_K_XL-00001-of-00003.gguf \
  -ngl 99 \
  -c 4096 \
  -n 512 \
  --no-conversation \
  -p '[SYSTEM_PROMPT]You are a helpful assistant[/SYSTEM_PROMPT][MODEL_SETTINGS]{"reasoning_effort": "high"}[/MODEL_SETTINGS][INST]What is the capital of France?[/INST]'
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 498216.21 MB
--no-conversation is not supported by llama-cli
please use llama-completion instead

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8390-b6c83aad5
model      : Mistral-Small-4-119B-2603-UD-Q4_K_XL-00001-of-00003.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> [SYSTEM_PROMPT]You are a helpful assistant[/SYSTEM_PROMPT][MODEL_SETTINGS]{"reasoning_effort": "high"}[/MODEL_SETTINGS][INST]What is the capital of France?[/INST]

 

[ Prompt: 206.2 t/s | Generation: 71.3 t/s ]

> who are you

 

[ Prompt: 128.1 t/s | Generation: 71.3 t/s ]

> 

Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]    |  total     free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - MTL0 (Apple M3 Ultra) | 475136 = 404407 + (70728 = 70374 +      90 +     264) +           0 |
llama_memory_breakdown_print: |   - Host                  |                      568 =   544 +       0 +      24                |
ggml_metal_free: deallocating

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Mistral Small 4 (mistral4 arch): repetitive/empty output on Metal #20668

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: Mistral Small 4 (mistral4 arch): repetitive/empty output on Metal #20668

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions