Description
The aprender-serve inference engine currently lacks support for Q3_K (GGML type 11) dequantization. When attempting to load a Q3_K model (e.g., qwen2.5-7b-instruct-q3_k_m.gguf), inference crashes with:
Inference failed: Operation 'get_tensor_f32' not supported: Unsupported quantization type: 11
Acceptance Criteria
- Implement
dequantize_q3_k in crates/aprender-serve/src/quantize/ (both standard and SIMD/parallel variants if applicable).
- Hook the implementation into the
get_tensor_f32 match arm in crates/aprender-serve/src/gguf/metadata.rs.
- Add coverage tests for
Q3_K tensors.
Description
The
aprender-serveinference engine currently lacks support forQ3_K(GGML type 11) dequantization. When attempting to load aQ3_Kmodel (e.g.,qwen2.5-7b-instruct-q3_k_m.gguf), inference crashes with:Inference failed: Operation 'get_tensor_f32' not supported: Unsupported quantization type: 11Acceptance Criteria
dequantize_q3_kincrates/aprender-serve/src/quantize/(both standard and SIMD/parallel variants if applicable).get_tensor_f32match arm incrates/aprender-serve/src/gguf/metadata.rs.Q3_Ktensors.