You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Key findings:
- V-norm: Gemma 4 applies weight-free RMS norm to V after projection
(llama.cpp line 92: ggml_rms_norm(Vcur, eps)). Implemented.
- NeoX RoPE: auto-detection excluded Gemma (model_type==1) — Gemma
uses standard interleaved RoPE, not NeoX, despite head*dim != hidden
- Debug: token 100 (<|channel>) logit is -12.17 in our code but
should be near 0 (top-1 in llama.cpp). ~50 logit difference.
- layer_output_scale confirmed as simple multiply (llama.cpp reference)
Status: garbage persists. Investigated 15 hypotheses.
Root cause narrowed to forward pass numerics — all individual
components verified but combined output wrong.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments