Skip to content

Commit e12fcbd

Browse files
unamedkrclaude
andcommitted
fix(qwen35): early SSM probe fixes DeltaNet layer detection — Qwen3.5-4B now coherent
The fused QKV guard (delta_a_log check) ran BEFORE delta_a_log was set in the per-layer loop, so DeltaNet layers still got gguf_w_qkv assigned. Fixed by probing blk.N.ssm_a BEFORE the fused QKV check. Result: n_attn_layers correctly reports 8 (not 32) for Qwen3.5-4B. DeltaNet layers dispatch to deltanet_forward, full attention layers to self_attn_forward with partial RoPE + NeoX rotation. Validated: Qwen3.5-4B: 'The capital of France is **Paris**.' Qwen3-4B: coherent (4.5 tok/s) Phi-3.5: coherent (1.9 tok/s) 35/35 unit tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b8a27d2 commit e12fcbd

File tree

1 file changed

+12
-12
lines changed

1 file changed

+12
-12
lines changed

quant.h

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11851,20 +11851,20 @@ tq_model_t* tq_load_gguf(const char* path) {
1185111851
}
1185211852
}
1185311853

11854-
/* Phi-3 fused QKV detection.
11855-
*
11856-
* Phi-3 ships `blk.N.attn_qkv.weight` with shape [hidden, 3*hidden]
11857-
* instead of three separate `attn_q/k/v.weight` tensors. We store
11858-
* the fused pointer in `gguf_w_qkv` and the forward path dispatches
11859-
* one matmul + split. The layer is marked as an attention layer
11860-
* via the same `is_attn_layer` flag the standard path uses, so
11861-
* the rest of the loader and tq_forward treat it normally. */
11854+
/* Early DeltaNet probe: check if this layer has SSM weights BEFORE
11855+
* the fused QKV detection. DeltaNet layers also have attn_qkv.weight
11856+
* (for conv1d input), and we must NOT treat it as a Phi-3 fused QKV. */
11857+
int layer_is_deltanet = 0;
11858+
{
11859+
char ssm_probe[128];
11860+
snprintf(ssm_probe, sizeof(ssm_probe), "blk.%d.ssm_a", l);
11861+
if (find_gguf_tensor(gguf, ssm_probe)) layer_is_deltanet = 1;
11862+
}
11863+
11864+
/* Phi-3 fused QKV detection (skip for DeltaNet layers). */
1186211865
snprintf(tname, sizeof(tname), "blk.%d.attn_qkv.weight", l);
1186311866
const tq_gguf_tensor_t* wqkv_t = find_gguf_tensor(gguf, tname);
11864-
if (wqkv_t && !layer->delta_a_log) {
11865-
/* Phi-3 fused QKV (NOT DeltaNet). DeltaNet layers also have
11866-
* attn_qkv.weight but it's the conv1d input, not a fused
11867-
* attention projection. The delta_a_log check distinguishes. */
11867+
if (wqkv_t && !layer_is_deltanet) {
1186811868
layer->gguf_w_qkv = wqkv_t->data;
1186911869
layer->gguf_w_qkv_type = wqkv_t->type;
1187011870
c->has_fused_qkv = 1;

0 commit comments

Comments
 (0)