Skip to content

Eval bug: ggml_cuda_compute_forward: SOFT_MAX failed 0.11.348.725 E CUDA error: invalid argument #25095

Description

@freemanliu

Name and Version

build-cuda-debug/bin/llama-cli --version
version: 9621 (597b667)
built with GNU 13.3.0 for Linux x86_64

llama.log

Operating systems

Linux

GGML backends

CUDA

Hardware

ryzen 5900x + 2 rtx 3060.

Models

unsloth/Qwen3.5-35B-A3B/Qwen3.5-35B-A3B-UD-IQ4_NL.gguf

Problem description & steps to reproduce

see the attached log.

First Bad Commit

No response

Relevant log output

Logs
0.11.338.063 D ggml_cuda_graph_check_compability: disabling CUDA graphs due to unsupported node type
0.11.348.030 E ggml_cuda_compute_forward: SOFT_MAX failed
0.11.348.725 E CUDA error: invalid argument
0.11.348.728 E   current device: 0, in function ggml_cuda_compute_forward at /home/lf/codes/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:3163
0.11.348.729 E   err

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions