can't get local llama.cpp to work

Local performance using llama.cpp is slow. 
As an example, using Qwen3.5 2b, on my 8gb Snapdragon 8gen 3 device. 
Asking a sample question of "what's the capital of ...."

Operit can take a minimum of 1 minute vs the same on Google's AI edge gallery app with same model can take less than 10s.

My guess is that llama.cpp you have, only works with CPU, while Google can use GPU or npu.

I suggest to update llama.cpp + make it clear how it will be running in the users hardware prior to setup, as running on CPU on these low powered devices is essentially not worth the trouble.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

can't get local llama.cpp to work #458

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

can't get local llama.cpp to work #458

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions