Skip to content

SIGILL crash on CPUs without AVX-512 (e.g. i7-8700K) #4

@GavinPalmer1984

Description

@GavinPalmer1984

Bug

The GCC pragmas in the SIMD source files enable AVX-512 instruction generation:

#pragma GCC target("avx2,fma,f16c,avx512f,avx512bw,avx512dq,avx512vl,avx512vnni")

This appears in three files:

  • quant/simd_dot.c:11
  • quant/simd_qq_dot.c:11
  • quant/cpool.c:7

Even though the code doesn't use explicit _mm512 intrinsics, GCC's auto-vectorizer emits AVX-512 instructions (e.g. EVEX-prefixed ops like vmovdqu64) when compiling with -O3 and these target flags. This causes a SIGILL crash on CPUs that only support AVX2.

Reproduction

On an Intel i7-8700K (AVX2+FMA, no AVX-512):

$ dlgo run tinyllama-1.1b-chat-v1.0.Q4_0.gguf
SIGILL: illegal instruction
PC=0x724334 m=4 sigcode=2
instruction bytes: 0x62 0x71 0xfd 0x28 0x6f ...

goroutine 1 [syscall]:
runtime.cgocall(...)
github.com/computerex/dlgo/quant._Cfunc_batch_quantize_for_type(...)

The 0x62 prefix byte is the EVEX prefix used by AVX-512 instructions.

Suggested Fix

Option A (simple): Remove AVX-512 from the pragma since no explicit AVX-512 intrinsics are used:

#pragma GCC target("avx2,fma,f16c")

Option B (better performance on capable CPUs): Add runtime CPUID detection and dispatch to AVX-512 vs AVX2 code paths. This preserves the AVX-512 auto-vectorization benefit on machines that support it.

Impact

This affects a large portion of consumer hardware. AVX-512 is absent from:

  • All Intel consumer CPUs before Rocket Lake (11th gen, 2021)
  • All AMD CPUs before Zen 4 (Ryzen 7000, 2022)
  • Many laptop/mobile processors

The README advertises "AVX2/FMA/VNNI SIMD" support, implying AVX2 should be sufficient.

Environment

  • Intel i7-8700K (AVX2+FMA, no AVX-512)
  • Linux 6.6.87 (WSL2)
  • GCC 11.4.0
  • Go 1.26.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions