Bug
The GCC pragmas in the SIMD source files enable AVX-512 instruction generation:
#pragma GCC target("avx2,fma,f16c,avx512f,avx512bw,avx512dq,avx512vl,avx512vnni")
This appears in three files:
quant/simd_dot.c:11
quant/simd_qq_dot.c:11
quant/cpool.c:7
Even though the code doesn't use explicit _mm512 intrinsics, GCC's auto-vectorizer emits AVX-512 instructions (e.g. EVEX-prefixed ops like vmovdqu64) when compiling with -O3 and these target flags. This causes a SIGILL crash on CPUs that only support AVX2.
Reproduction
On an Intel i7-8700K (AVX2+FMA, no AVX-512):
$ dlgo run tinyllama-1.1b-chat-v1.0.Q4_0.gguf
SIGILL: illegal instruction
PC=0x724334 m=4 sigcode=2
instruction bytes: 0x62 0x71 0xfd 0x28 0x6f ...
goroutine 1 [syscall]:
runtime.cgocall(...)
github.com/computerex/dlgo/quant._Cfunc_batch_quantize_for_type(...)
The 0x62 prefix byte is the EVEX prefix used by AVX-512 instructions.
Suggested Fix
Option A (simple): Remove AVX-512 from the pragma since no explicit AVX-512 intrinsics are used:
#pragma GCC target("avx2,fma,f16c")
Option B (better performance on capable CPUs): Add runtime CPUID detection and dispatch to AVX-512 vs AVX2 code paths. This preserves the AVX-512 auto-vectorization benefit on machines that support it.
Impact
This affects a large portion of consumer hardware. AVX-512 is absent from:
- All Intel consumer CPUs before Rocket Lake (11th gen, 2021)
- All AMD CPUs before Zen 4 (Ryzen 7000, 2022)
- Many laptop/mobile processors
The README advertises "AVX2/FMA/VNNI SIMD" support, implying AVX2 should be sufficient.
Environment
- Intel i7-8700K (AVX2+FMA, no AVX-512)
- Linux 6.6.87 (WSL2)
- GCC 11.4.0
- Go 1.26.0
Bug
The GCC pragmas in the SIMD source files enable AVX-512 instruction generation:
#pragma GCC target("avx2,fma,f16c,avx512f,avx512bw,avx512dq,avx512vl,avx512vnni")This appears in three files:
quant/simd_dot.c:11quant/simd_qq_dot.c:11quant/cpool.c:7Even though the code doesn't use explicit
_mm512intrinsics, GCC's auto-vectorizer emits AVX-512 instructions (e.g.EVEX-prefixed ops likevmovdqu64) when compiling with-O3and these target flags. This causes aSIGILLcrash on CPUs that only support AVX2.Reproduction
On an Intel i7-8700K (AVX2+FMA, no AVX-512):
The
0x62prefix byte is the EVEX prefix used by AVX-512 instructions.Suggested Fix
Option A (simple): Remove AVX-512 from the pragma since no explicit AVX-512 intrinsics are used:
#pragma GCC target("avx2,fma,f16c")Option B (better performance on capable CPUs): Add runtime CPUID detection and dispatch to AVX-512 vs AVX2 code paths. This preserves the AVX-512 auto-vectorization benefit on machines that support it.
Impact
This affects a large portion of consumer hardware. AVX-512 is absent from:
The README advertises "AVX2/FMA/VNNI SIMD" support, implying AVX2 should be sufficient.
Environment