Eval bug: ggml_cuda_compute_forward: SOFT_MAX failed 0.11.348.725 E CUDA error: invalid argument

### Name and Version

build-cuda-debug/bin/llama-cli   --version
version: 9621 (597b6672e)
built with GNU 13.3.0 for Linux x86_64

[llama.log](https://github.com/user-attachments/files/29434701/llama.log)

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

ryzen 5900x + 2 rtx 3060.

### Models

unsloth/Qwen3.5-35B-A3B/Qwen3.5-35B-A3B-UD-IQ4_NL.gguf

### Problem description & steps to reproduce

see the attached log.

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
0.11.338.063 D ggml_cuda_graph_check_compability: disabling CUDA graphs due to unsupported node type
0.11.348.030 E ggml_cuda_compute_forward: SOFT_MAX failed
0.11.348.725 E CUDA error: invalid argument
0.11.348.728 E   current device: 0, in function ggml_cuda_compute_forward at /home/lf/codes/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:3163
0.11.348.729 E   err

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: ggml_cuda_compute_forward: SOFT_MAX failed 0.11.348.725 E CUDA error: invalid argument #25095

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: ggml_cuda_compute_forward: SOFT_MAX failed 0.11.348.725 E CUDA error: invalid argument #25095

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions