Skip to content

feat(inference): Implement Q3_K dequantization in aprender-serve #1892

@noahgift

Description

@noahgift

Description

The aprender-serve inference engine currently lacks support for Q3_K (GGML type 11) dequantization. When attempting to load a Q3_K model (e.g., qwen2.5-7b-instruct-q3_k_m.gguf), inference crashes with:

Inference failed: Operation 'get_tensor_f32' not supported: Unsupported quantization type: 11

Acceptance Criteria

  • Implement dequantize_q3_k in crates/aprender-serve/src/quantize/ (both standard and SIMD/parallel variants if applicable).
  • Hook the implementation into the get_tensor_f32 match arm in crates/aprender-serve/src/gguf/metadata.rs.
  • Add coverage tests for Q3_K tensors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions