Skip to content

[codex] fix Metal custom V-cache set rows#34

Draft
lalalune wants to merge 1 commit into
feat/gemma4-assistant-archfrom
fix/metal-v-cache-set-rows-9258
Draft

[codex] fix Metal custom V-cache set rows#34
lalalune wants to merge 1 commit into
feat/gemma4-assistant-archfrom
fix/metal-v-cache-set-rows-9258

Conversation

@lalalune

Copy link
Copy Markdown
Member

Summary

Part of elizaOS/eliza#9258.

This adds Metal support for writing and reading the custom V-cache formats used by local inference:

  • F32 -> TBQ3_0, TBQ4_0, and Q4_POLAR CPY kernels.
  • TBQ3_0, TBQ4_0, and Q4_POLAR -> F32/F16 CPY kernels.
  • SET_ROWS kernels for TBQ3_0, TBQ4_0, and Q4_POLAR with I64/I32 row indices.
  • A flash-attention fallback that dequantizes custom V-cache tensors before calling stock ggml_flash_attn_ext.
  • Backend coverage for Metal SET_ROWS and CPY on these custom types.

Validation

  • xcrun -sdk macosx metal -I ggml/src -I ggml/include -I ggml/src/ggml-metal -c ggml/src/ggml-metal/ggml-metal.metal -o /tmp/ggml-metal-9258.air
  • cmake -S . -B build-metal-9258 -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_EXAMPLES=ON -DLLAMA_CURL=OFF
  • cmake --build build-metal-9258 --target test-backend-ops llama-cli -j 12
  • build-metal-9258/bin/test-backend-ops test -b MTL0 -o SET_ROWS -p "(tbq3_0|tbq4_0|q4_polar)" -> 12/12 passed
  • build-metal-9258/bin/test-backend-ops test -b MTL0 -o CPY -p "(tbq3_0|tbq4_0|q4_polar)" -> 6/6 passed
  • Real GGUF llama-cli smoke runs with -fa on -ctv tbq3_0, -ctv tbq4_0, and -ctv q4_polar all exited 0.

Full evidence is included in the parent eliza branch at .github/issue-evidence/9258-metal-v-cache-set-rows.md.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95ce213e-fb68-46dc-b5cd-c8224c549f2d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/metal-v-cache-set-rows-9258

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant