Skip to content

can't get local llama.cpp to work #458

@fu8765

Description

@fu8765

Local performance using llama.cpp is slow.
As an example, using Qwen3.5 2b, on my 8gb Snapdragon 8gen 3 device.
Asking a sample question of "what's the capital of ...."

Operit can take a minimum of 1 minute vs the same on Google's AI edge gallery app with same model can take less than 10s.

My guess is that llama.cpp you have, only works with CPU, while Google can use GPU or npu.

I suggest to update llama.cpp + make it clear how it will be running in the users hardware prior to setup, as running on CPU on these low powered devices is essentially not worth the trouble.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions