feat: turboquant optimizations for llama.cpp

### Description

turbo-quant implementation for llama.cpp 

### Use Case

there is a turbo-quant implementation at https://github.com/unixsysdev/llama-turboquant that might be advantageous to integrate with the other amd-strix-halo optimizations. 

The more we can squeeze out of consumer hardware, the better.

### Proposed Solution

_No response_

### Alternatives Considered

_No response_