Skip to content

feat: turboquant optimizations for llama.cpp #863

@prelegalwonder

Description

@prelegalwonder

Description

turbo-quant implementation for llama.cpp

Use Case

there is a turbo-quant implementation at https://github.com/unixsysdev/llama-turboquant that might be advantageous to integrate with the other amd-strix-halo optimizations.

The more we can squeeze out of consumer hardware, the better.

Proposed Solution

No response

Alternatives Considered

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions