Eval bug: CUDA error: unsupported value or parameter in cublasSgemm_v2 during large context processing

### Name and Version

version: 9775 (be4a6a63e)
built with GNU 13.3.0 for Linux x86_64


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

2 x rtx3060

### Models

Qwen3.6-35B-A3B-MTP-GGUF/Qwen3.6-35B-A3B-UD-IQ4_XS.gguf

### Problem description & steps to reproduce

## Bug Description

The llama.cpp server crashes with a CUDA error when processing a prompt with a very large context. The error occurs specifically in `cublasSgemm_v2` during matrix multiplication.

## Error Message

### First Bad Commit


## Stack Trace#0  __GI___wait4
#1  ggml_print_backtrace
#2  ggml_abort
#3  ggml_cuda_error
#4  ggml_cuda_op_mul_mat_cublas
#5  ggml_cuda_op_mul_mat
#6  ggml_backend_cuda_graph_compute
#7  ggml_backend_sched_graph_compute_async
#8  llama_context::graph_compute
#9  llama_context::process_ubatch
#10 llama_context::decode
#11 llama_decode
#12 server_context_impl::decode
#13 server_context_impl::update_slots
#14 server_queue::start_loop
#15 llama_server
#16 __libc_start_call_main
#17 __libc_start_main_impl
#18 _start


### Relevant log output

<details>
## Context Information from Logs

- **Slot ID**: 1
- **Task ID**: 19910
- **Context slot size (n_ctx_slot)**: 131,072 tokens
- **Prompt tokens (task.n_tokens)**: 105,375 tokens
- **Cached tokens**: 105,371 (after incremental caching)
- **Context checkpoint created**: 32 of 32 checkpoints
- **Checkpoint size**: 269.744 MiB
- **Position**: 104,923 tokens

## Environment

- **OS**: Linux (Ubuntu/Debian based on paths)
- **llama.cpp path**: `/opt/llama.cpp-beta/`
- **CUDA device**: 0
- **Build type**: Beta branch (`llama.cpp-beta`)

## Steps to Reproduce

1. Start llama.cpp server with a model supporting large context (n_ctx = 131072)
2. Send a prompt with approximately 105,375 tokens
3. The server processes the prompt, creates context checkpoints, caches tokens
4. During the decode phase, the CUDA error occurs in `cublasSgemm_v2`

## Additional Notes

- The error happens after successfully caching most of the prompt (105,371 out of 105,375 tokens)
- Context checkpoints were being created successfully (32 checkpoints)
- The error specifically mentions "an unsupported value or parameter was passed to the function" in cuBLAS
- This suggests a potential integer overflow or dimension mismatch when passing matrix dimensions to `cublasSgemm_v2`

## Possible Related Issues

- Large matrix dimensions causing integer overflow in cuBLAS parameters
- Context size exceeding certain CUDA/cuBLAS limits
- Memory alignment issues with very large tensors
<summary>Logs</summary>


```console

```
</details>



/opt/llama.cpp-beta/build/bin/llama-server
-m /mnt/disco2/models/Qwen3.6-35B-A3B-MTP-GGUF/Qwen3.6-35B-A3B-UD-IQ4_NL.gguf
-mm /mnt/disco2/models/Qwen3.6-35B-A3B-GGUF/mmproj-Qwen3.6-35B-A3B-BF16.gguf
--image-min-tokens 1024
-mg 1
-ngl 999
-sm layer
-ts 10,12
-t 8
-tb 6
--temp 0.6
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0.0
--repeat-penalty 1.0
--reasoning on
--spec-type draft-mtp
--spec-draft-n-max 3
--jinja
-b 256
-ub 256
-np 3
--cache-idle-slots
-c 131072
-fa on
-kvu
--cache-type-k q8_0
--cache-type-v q8_0
--host 0.0.0.0
--port 11434

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: CUDA error: unsupported value or parameter in cublasSgemm_v2 during large context processing #25061

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Bug Description

Error Message

First Bad Commit

Stack Trace#0 GI_wait4

Relevant log output

Environment

Steps to Reproduce

Additional Notes

Possible Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: CUDA error: unsupported value or parameter in cublasSgemm_v2 during large context processing #25061

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Bug Description

Error Message

First Bad Commit

Stack Trace#0 __GI___wait4

Relevant log output

Environment

Steps to Reproduce

Additional Notes

Possible Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Stack Trace#0 GI_wait4