Skip to content

Add 2 bit SSE GEMM microkernels#9752

Open
copybara-service[bot] wants to merge 1 commit intomasterfrom
test_887841831
Open

Add 2 bit SSE GEMM microkernels#9752
copybara-service[bot] wants to merge 1 commit intomasterfrom
test_887841831

Conversation

@copybara-service
Copy link
Copy Markdown
Contributor

Add 2 bit SSE GEMM microkernels

These updates enable 2-bit quantization support for both QS8-QC2W and QD8-F32-QC2W using SSSE3 instructions with MADD optimization.

  1. src/qs8-gemm/MRx4c8-ssevnni.c.in:
    • Added support for QS8_QC2, QC2_F32, and QC2_F16 datatypes.
    • Introduced the _MM_SET1_EPI8 macro for consistent constant generation.
    • Updated the ISA and instruction selection logic to support MADD variants (specifically _mm_dpbusd_epi32_madd_kzp2 for 2-bit variants).
    • Updated the function signature to include the row_sum parameter for QD8 variants.
  2. scripts/generate-qs8-gemm.sh:
    • Added generation rules for SSSE3 with MADD=1 for both QS8_QC2 and QC2_F32 variants.
  3. src/xnnpack/gemm.h:
    • Added microkernel declarations for xnn_qd8_f32_qc2w_gemm_minmax_ukernel_*x4c8__ssse3_madd.
    • Added microkernel declarations for xnn_qs8_qc2w_gemm_minmax_fp32_ukernel_*x4c8__ssse3_madd.

These updates enable 2-bit quantization support for both QS8-QC2W and QD8-F32-QC2W using SSSE3 instructions with MADD optimization.
 1. src/qs8-gemm/MRx4c8-ssevnni.c.in:
     * Added support for QS8_QC2, QC2_F32, and QC2_F16 datatypes.
     * Introduced the _MM_SET1_EPI8 macro for consistent constant generation.
     * Updated the ISA and instruction selection logic to support MADD variants (specifically _mm_dpbusd_epi32_madd_kzp2 for 2-bit variants).
     * Updated the function signature to include the row_sum parameter for QD8 variants.
 2. scripts/generate-qs8-gemm.sh:
     * Added generation rules for SSSE3 with MADD=1 for both QS8_QC2 and QC2_F32 variants.
 3. src/xnnpack/gemm.h:
     * Added microkernel declarations for xnn_qd8_f32_qc2w_gemm_minmax_ukernel_*x4c8__ssse3_madd.
     * Added microkernel declarations for xnn_qs8_qc2w_gemm_minmax_fp32_ukernel_*x4c8__ssse3_madd.

PiperOrigin-RevId: 887841831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant