cpu: fix ARM NEON nvfp4 vec dot by aidaiprivate-source · Pull Request #1 · aidaiprivate-source/llama.cpp

aidaiprivate-source · 2026-06-04T21:34:58Z

Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

Summary by CodeRabbit

Refactor
- Optimized CPU performance for quantized vector operations on ARM-based systems through enhanced computation methods.

coderabbitai · 2026-06-04T21:35:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b99754bf-5bb5-4587-b8d4-2a45ae1cf6cf

📥 Commits

Reviewing files that changed from the base of the PR and between 94a220c and a30369d.

📒 Files selected for processing (2)

ggml/src/ggml-cpu/arch/arm/quants.c
ggml/src/ggml-cpu/ggml-cpu-impl.h

📝 Walkthrough

Walkthrough

This PR refactors the ARM NEON implementation of 4-bit quantized dot product calculation. It introduces a new ggml_nvfp4_dot8 helper function and modifies ggml_vec_dot_nvfp4_q8_0 to process data in four 8-lane chunks instead of two 16-lane vectors, computing four partial results that are then directly used in the final fused multiply-add operation.

Changes

NVFP4 Quantized Dot Product Optimization

Layer / File(s)	Summary
NVFP4 dot8 helper function `ggml/src/ggml-cpu/ggml-cpu-impl.h`	New `ggml_nvfp4_dot8` NEON helper multiplies paired `int8x8_t` lanes using `vmull_s8`, widens pairwise sums via `vpaddlq_s16`, and combines low/high results via `vaddq_s32` into a final `int32x4_t`.
vec_dot_nvfp4_q8_0 implementation refactor `ggml/src/ggml-cpu/arch/arm/quants.c`	Operand loading splits `q8` and `q4` into four 8-lane chunks each; `ggml_nvfp4_dot8` computes four partial results (`p0`–`p3`); accumulation builds a `float32x4_t` `sums` vector from horizontal sums of all four partials and performs fused multiply-add with scales, removing prior `int32` to `float` conversion.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Four lanes split wide, the chunks align,
Where dot products dance in parallel line,
NEON helpers bloom with widening grace,
Quantized math speeds up this ARM race!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is a blank template with all sections empty; no actual description of changes, rationale, or AI usage disclosure is provided.	Fill in the Overview section with a clear description of what the fix addresses and why it was needed. Complete the AI usage disclosure field and provide any relevant context in Additional information.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main change: a fix to the ARM NEON nvfp4 vector dot implementation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 0cc4m/cpu-arm-nvfp4-fix

⚔️ Resolve merge conflicts

Resolve merge conflict in branch 0cc4m/cpu-arm-nvfp4-fix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)

ggml/src/ggml-cpu/arch/arm/quants.c

ggml/src/ggml-cpu/arch/arm/quants.c:2:10: fatal error: 'ggml-common.h' file not found
2 | #include "ggml-common.h"
| ^~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/a30369d51585883a578bf7272075e9f0412ca86d-55390c3f7c50f038/tmp/clang_command_.tmp.d0bc3a.txt
++Contents of '/tmp/coderabbit-infer/a30369d51585883a578bf7272075e9f0412ca86d-55390c3f7c50f038/tmp/clang_command_.tmp.d0bc3a.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"
"x86_64-unknown-linux-gnu" "-emit-obj" "-mrelax-all" "-dis

... [truncated 718 characters] ...

x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/lib/clang/18/include"
"-internal-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fskip-odr-check-in-gmf"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/55390c3f7c50f038/file.o" "-x" "c"
"ggml/src/ggml-cpu/arch/arm/quants.c" "-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cpu: fix ARM NEON nvfp4 vec dot

a30369d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: fix ARM NEON nvfp4 vec dot#1

cpu: fix ARM NEON nvfp4 vec dot#1
aidaiprivate-source wants to merge 1 commit into
masterfrom
0cc4m/cpu-arm-nvfp4-fix

aidaiprivate-source commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aidaiprivate-source commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aidaiprivate-source commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading