Remove HipW4A16SkinnyLinearKernel and no-group wvSplitK_int4 by mgehre-amd · Pull Request #869 · ROCm/vllm

mgehre-amd · 2026-04-13T11:50:24Z

Remove wvSplitK_int4 no-groups kernel variant and remove HipW4A16SkinnyLinearKernel. Both are not unused (all int4 model use groups), and HybridW4A16 supersedes HipW4A16SkinnyLinearKernel.

eble-amd

This is my favorite kind of change. Please merge it as soon as you receive all appropriate approval.

No real INT4 quantized model uses per-channel (group_size=-1) quantization. All AWQ, GPTQ, and compressed-tensors models use per-group scales with group_size=32 or 128, handled by wvSplitK_int4_g. Removing the unused no-groups variant reduces compile time and code complexity. Changes: - Removes ~260 lines of C++ kernel entry point and sweep code - Eliminates group_size=-1 dispatch path in Python kernel wrapper - Keeps shared kernel templates (used by grouped variant via template param) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

HybridW4A16LinearKernel fully supersedes it — it handles both the skinny decode path (wvSplitK_int4_g) and large-batch prefill (Triton), including asymmetric/zero-point models. The skinny-only kernel class added no unique capability. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

The test_triton_w4a16_skinny_fmt_gemm_matches_reference test was failing with 1/16384 elements marginally exceeding atol=1e-2 (actual diff: 0.0106). Relax to atol=5e-2, matching the asymmetric test's tolerance on the same kernel. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

mgehre-amd requested a review from eble-amd April 13, 2026 11:50

mgehre-amd requested a review from gshtras as a code owner April 13, 2026 11:50

mgehre-amd changed the title ~~Matthias.remove wvsplitk int4 nogrouped~~ Remove HipW4A16SkinnyLinearKernel and no-group wvSplitK_int4 Apr 13, 2026

eble-amd approved these changes Apr 13, 2026

View reviewed changes

mgehre-amd added 2 commits April 14, 2026 15:33

mgehre-amd force-pushed the matthias.remove-wvsplitk-int4-nogrouped branch from 055ceeb to 7ba0080 Compare April 14, 2026 21:34

mgehre-amd merged commit 68d9e72 into gfx11 Apr 17, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove HipW4A16SkinnyLinearKernel and no-group wvSplitK_int4#869

Remove HipW4A16SkinnyLinearKernel and no-group wvSplitK_int4#869
mgehre-amd merged 3 commits intogfx11from
matthias.remove-wvsplitk-int4-nogrouped

mgehre-amd commented Apr 13, 2026

Uh oh!

eble-amd left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mgehre-amd commented Apr 13, 2026

Uh oh!

eble-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants