Description
turbo-quant implementation for llama.cpp
Use Case
there is a turbo-quant implementation at https://github.com/unixsysdev/llama-turboquant that might be advantageous to integrate with the other amd-strix-halo optimizations.
The more we can squeeze out of consumer hardware, the better.
Proposed Solution
No response
Alternatives Considered
No response
Description
turbo-quant implementation for llama.cpp
Use Case
there is a turbo-quant implementation at https://github.com/unixsysdev/llama-turboquant that might be advantageous to integrate with the other amd-strix-halo optimizations.
The more we can squeeze out of consumer hardware, the better.
Proposed Solution
No response
Alternatives Considered
No response