Prevent the sum of the dequantized activation in q8_1 from overflowing by bartowski1182 · Pull Request #21652 · ggml-org/llama.cpp

bartowski1182 · 2026-04-09T00:51:57Z

Overview

During Mistral 4 small quantization and subsequent testing, I found that the PPL of Q4_1 ended up with NaN

When testing the reason, it only happened when later FFN_DOWN layers were quantized to Q4_1, IE:

llama-quantize ./Mistral-Small-4-119B-2603-bf16.gguf Mistral-Small-4-119B-2603-Q4_0.gguf Q4_0

Works as expected, but:

llama-quantize --tensor-type ffn_down=q4_1 ./Mistral-Small-4-119B-2603-bf16.gguf Mistral-Small-4-119B-2603-Q4_0.gguf Q4_0

(note the --tensor-type ffn_down=q4_1) gets NaN with PPL

After digging around with Claude and debug code, found that 16 Q8_1 blocks have s = Inf because the fp16 value is overflowing

In Claude's words:

Q8_1's s field stores sum * d in fp16 (max 65504), but when activation values in a 32-element block are large and same-sign, sum * d ≈ 32 * amax can exceed 65504. The max finite |s| is only 410, so the 16 overflowing blocks are massive outliers — their activations must be ~160x larger than typical.

Additional information

I ran the same model with the updated activation code and yielded a PPL of 5.5535 +/- 0.1235

For completeness, also tested with ignoring the pre-computed s value and recalculating the results as f32, and got a PPL of 5.5725 +/- 0.12469

Note that in either case, the PPL without this change was NaN, so while this clamping is lossy, it does result in a model that produces literally anything at all instead of failing spectacularly

Note that this only updates the reference, AVX2, AVX1, and CUDA implementations, not familiar enough with the other archs to touch those

Mistral 4 small PPL before these changes

[1]nan,[2]nan,[3]nan,[4]nan,[5]nan,[6]nan,[7]nan,[8]nan,[9]nan,[10]nan,[11]nan,[12]nan,[13]nan,[14]nan,[15]nan,[16]nan,[17]nan,[18]nan,[19]nan,[20]nan,[21]nan,[22]nan,[23]nan,[24]nan,[25]nan,[26]nan,[27]nan,[28]nan,[29]nan,[30]nan,[31]nan,[32]nan,[33]nan,[34]nan,[35]nan,[36]nan,[37]nan,[38]nan,[39]nan,[40]nan,[41]nan,[42]nan,[43]nan,[44]nan,[45]nan,[46]nan,[47]nan,[48]nan,[49]nan,[50]nan,[51]nan,[52]nan,[53]nan,[54]nan,[55]nan,[56]nan,[57]nan,[58]nan,[59]nan,[60]nan,[61]nan,[62]nan,[63]nan,[64]nan,[65]nan,[66]nan,[67]nan,[68]nan,[69]nan,[70]nan,[71]nan,[72]nan,[73]nan,[74]nan,[75]nan,[76]nan,[77]nan,[78]nan,[79]nan,[80]nan,[81]nan,[82]nan,[83]nan,[84]nan,[85]nan,[86]nan,[87]nan,[88]nan,[89]nan,[90]nan,[91]nan,[92]nan,[93]nan,[94]nan,[95]nan,[96]nan,[97]nan,[98]nan,[99]nan,[100]nan,
Unexpected negative standard deviation of log(prob)

Mistral 4 small PPL after these changes

[1]3.4955,[2]5.1043,[3]4.3632,[4]4.0977,[5]4.2305,[6]4.4037,[7]4.5087,[8]4.5073,[9]4.4639,[10]4.5297,[11]4.5263,[12]4.5587,[13]4.7890,[14]4.8800,[15]4.9211,[16]5.1019,[17]4.9447,[18]5.0835,[19]5.3203,[20]5.2572,[21]5.2755,[22]5.2618,[23]5.2392,[24]5.0943,[25]4.9508,[26]4.8844,[27]4.7809,[28]4.7516,[29]4.6887,[30]4.6612,[31]4.7435,[32]4.7837,[33]4.9112,[34]4.9299,[35]4.9606,[36]5.0317,[37]5.1810,[38]5.2880,[39]5.2647,[40]5.3024,[41]5.3430,[42]5.3550,[43]5.3820,[44]5.4165,[45]5.3988,[46]5.3975,[47]5.4033,[48]5.4926,[49]5.5758,[50]5.5696,[51]5.5633,[52]5.5685,[53]5.5910,[54]5.6138,[55]5.6804,[56]5.6624,[57]5.7382,[58]5.7408,[59]5.7636,[60]5.8286,[61]5.8487,[62]5.8500,[63]5.8483,[64]5.8840,[65]5.9183,[66]5.9979,[67]6.0463,[68]6.0607,[69]6.0864,[70]6.0978,[71]6.1055,[72]6.0793,[73]6.1318,[74]6.1260,[75]6.1347,[76]6.1301,[77]6.1563,[78]6.1131,[79]6.1374,[80]6.0724,[81]6.0041,[82]5.9799,[83]5.9689,[84]5.9874,[85]5.9820,[86]5.9677,[87]5.9715,[88]6.0430,[89]6.0806,[90]6.0899,[91]6.0997,[92]6.0917,[93]6.1328,[94]6.1264,[95]6.1512,[96]6.1638,[97]6.1756,[98]6.1676,[99]6.1591,[100]6.1809,
Final estimate: PPL = 6.1809 +/- 0.09843

Also tested on a Q4_1 quant of Qwen 3.5 9B and got identical PPL results both with and without this change

Qwen 3.5 9B before these changes

[1]5.4693,[2]7.8183,[3]7.9967,[4]7.6863,[5]7.6045,[6]7.8830,[7]8.1620,[8]8.6953,[9]9.0948,[10]9.4159,[11]9.2208,[12]9.2591,[13]9.7531,[14]9.2597,[15]9.1784,[16]9.2925,[17]8.7051,[18]8.7208,[19]8.6739,[20]8.6143,[21]8.3104,[22]8.2161,[23]7.9049,[24]7.5473,[25]7.4064,[26]7.2133,[27]7.0963,[28]7.0035,[29]6.9969,[30]6.9612,[31]6.9099,[32]6.9075,[33]6.8637,[34]6.9363,[35]7.0285,[36]7.1741,[37]7.2542,[38]7.2405,[39]7.2368,[40]7.2920,[41]7.3035,[42]7.3447,[43]7.3416,[44]7.3447,[45]7.4416,[46]7.4029,[47]7.5285,[48]7.5930,[49]7.5287,[50]7.5751,[51]7.5716,[52]7.6133,[53]7.6466,[54]7.6818,[55]7.6809,[56]7.6989,[57]7.7229,[58]7.7238,[59]7.7321,[60]7.7508,[61]7.7775,[62]7.8264,[63]7.8687,[64]7.9271,[65]7.9943,[66]8.0362,[67]8.1292,[68]8.1672,[69]8.1757,[70]8.1486,[71]8.2084,[72]8.2052,[73]8.2490,[74]8.2449,[75]8.2189,[76]8.2017,[77]8.2362,[78]8.2535,[79]8.1724,[80]8.1116,[81]8.0884,[82]8.1005,[83]8.1097,[84]8.1072,[85]8.1208,[86]8.1595,[87]8.1614,[88]8.1653,[89]8.1234,[90]8.0978,[91]8.0926,[92]8.0734,[93]8.0991,[94]8.1069,[95]8.1173,[96]8.1096,[97]8.0955,[98]8.0777,[99]8.0775,[100]8.0963,
Final estimate: PPL = 8.0963 +/- 0.12933

Qwen 3.5 9B before these changes

[1]5.4693,[2]7.8183,[3]7.9967,[4]7.6863,[5]7.6045,[6]7.8830,[7]8.1620,[8]8.6953,[9]9.0948,[10]9.4159,[11]9.2208,[12]9.2591,[13]9.7531,[14]9.2597,[15]9.1784,[16]9.2925,[17]8.7051,[18]8.7208,[19]8.6739,[20]8.6143,[21]8.3104,[22]8.2161,[23]7.9049,[24]7.5473,[25]7.4064,[26]7.2133,[27]7.0963,[28]7.0035,[29]6.9969,[30]6.9612,[31]6.9099,[32]6.9075,[33]6.8637,[34]6.9363,[35]7.0285,[36]7.1741,[37]7.2542,[38]7.2405,[39]7.2368,[40]7.2920,[41]7.3035,[42]7.3447,[43]7.3416,[44]7.3447,[45]7.4416,[46]7.4029,[47]7.5285,[48]7.5930,[49]7.5287,[50]7.5751,[51]7.5716,[52]7.6133,[53]7.6466,[54]7.6818,[55]7.6809,[56]7.6989,[57]7.7229,[58]7.7238,[59]7.7321,[60]7.7508,[61]7.7775,[62]7.8264,[63]7.8687,[64]7.9271,[65]7.9943,[66]8.0362,[67]8.1292,[68]8.1672,[69]8.1757,[70]8.1486,[71]8.2084,[72]8.2052,[73]8.2490,[74]8.2449,[75]8.2189,[76]8.2017,[77]8.2362,[78]8.2535,[79]8.1724,[80]8.1116,[81]8.0884,[82]8.1005,[83]8.1097,[84]8.1072,[85]8.1208,[86]8.1595,[87]8.1614,[88]8.1653,[89]8.1234,[90]8.0978,[91]8.0926,[92]8.0734,[93]8.0991,[94]8.1069,[95]8.1173,[96]8.1096,[97]8.0955,[98]8.0777,[99]8.0775,[100]8.0963,
Final estimate: PPL = 8.0963 +/- 0.12933

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, Claude was used extensively for discovering the issue through trial/error and debugging code

ggerganov · 2026-04-09T08:30:18Z

Can you dump the BF16 values of the problematic tensor?

I also noticed some irregularities in this specific model in #20668 (comment)

To me it looks like the model data is not sound, so I don't think patching the code is warranted.

bartowski1182 · 2026-04-09T17:54:27Z

@ggerganov Yeah sure, and it's the same tensor that you noted in that eval bug.

I added the debugging code back so you can see this:

Q8_1 FP16 OVERFLOW: dst=ffn_moe_down-32 src0=blk.32.ffn_down_exps.weight src1=ffn_moe_swiglu-32 total_blocks=131072 s_inf=16 s_nan=0 max_finite_|s|=410

For the BF16 weights, ran a similar command to what you ran in the linked report:

./build/bin/llama-debug -m Mistral-Small-4-bf16.gguf -p "[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST]" -n 1 --tensor-filter "ffn_moe_weighted-32

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 128, 2, 1}, ffn_moe_weights_norm-32 (reshaped){1, 128, 2, 1}}) = {4096, 128, 2, 1}
    [
        [
            [     -0.0511,      -0.0220,       0.0349,    ...,      -0.0577,      -0.2088,      -0.0310  ],
            [      0.0186,       0.0214,       0.0072,    ...,       0.0049,      -0.0257,      -0.0022  ],
            [     -0.0089,      -0.0244,      -0.0031,    ...,       0.0025,       0.0092,      -0.0025  ],
            ..., 
            [     -0.0008,       0.0003,       0.0007,    ...,       0.0005,       0.0009,       0.0015  ],
            [      0.0001,       0.0002,      -0.0002,    ...,      -0.0004,       0.0017,      -0.0007  ],
            [     -0.0001,      -0.0002,       0.0000,    ...,       0.0001,       0.0006,      -0.0001  ],
        ],
        [
            [      0.0032,      -0.0039,       0.0005,    ...,      -0.0004,       0.0037,      -0.0035  ],
            [      0.0004,       0.0005,       0.0008,    ...,      -0.0012,      -0.0022,       0.0004  ],
            [      0.0009,      -0.0013,      -0.0016,    ...,      -0.0004,      -0.0001,       0.0038  ],
            ..., 
            [      0.0004,       0.0002,      -0.0001,    ...,       0.0002,      -0.0016,      -0.0001  ],
            [      0.0001,       0.0006,      -0.0000,    ...,       0.0001,       0.0000,      -0.0003  ],
            [      0.0004,      -0.0001,      -0.0000,    ...,       0.0000,       0.0001,       0.0005  ],
        ],
    ]
    sum = -21.966097
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0511,      -0.0220,       0.0349,    ...,      -0.0577,      -0.2088,      -0.0310  ],
            [      0.0032,      -0.0039,       0.0005,    ...,      -0.0004,       0.0037,      -0.0035  ],
        ],
    ]
    sum = -25.767612
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [      0.0186,       0.0214,       0.0072,    ...,       0.0049,      -0.0257,      -0.0022  ],
            [      0.0004,       0.0005,       0.0008,    ...,      -0.0012,      -0.0022,       0.0004  ],
        ],
    ]
    sum = 0.872716
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0089,      -0.0244,      -0.0031,    ...,       0.0025,       0.0092,      -0.0025  ],
            [      0.0009,      -0.0013,      -0.0016,    ...,      -0.0004,      -0.0001,       0.0038  ],
        ],
    ]
    sum = 1.773644
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0008,      -0.0067,       0.0098,    ...,      -0.0082,       0.0038,       0.0027  ],
            [      0.0002,      -0.0025,      -0.0003,    ...,      -0.0008,      -0.0006,      -0.0033  ],
        ],
    ]
    sum = 0.260006

system_info: n_threads = 96 (n_threads_batch = 96) / 192 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 4, 30, 1}, ffn_moe_weights_norm-32 (reshaped){1, 4, 30, 1}}) = {4096, 4, 30, 1}
    [
        [
            [  -2526.8816,    -468.2716,    5885.8682,    ...,    -205.6809,    1444.4355,   -1961.9440  ],
            [     -0.0054,       0.0031,      -0.0012,    ...,       0.0069,       0.0003,      -0.0012  ],
            [     -0.0001,      -0.0018,       0.0027,    ...,       0.0008,       0.0006,      -0.0027  ],
            [     -0.0023,      -0.0003,      -0.0008,    ...,      -0.0013,      -0.0000,      -0.0046  ],
        ],
        [
            [      0.5057,       0.3572,       0.7750,    ...,      -0.1811,       0.2851,       0.1452  ],
            [      0.0019,       0.2234,      -0.1807,    ...,       0.1392,       0.2369,       0.1664  ],
            [     -0.1560,      -0.0971,       0.1389,    ...,      -0.0336,       0.0982,      -0.0383  ],
            [     -0.0777,      -0.2010,       0.1259,    ...,       0.0096,       0.0090,       0.0095  ],
        ],
        [
            [     -0.1498,       0.1264,      -0.3107,    ...,       0.4183,      -0.3672,      -0.2537  ],
            [     -0.2451,       0.1512,       0.0539,    ...,      -0.0710,       0.0478,      -0.0811  ],
            [     -0.0553,       0.0492,       0.0706,    ...,       0.0006,       0.0383,      -0.0575  ],
            [      0.0330,       0.0586,      -0.0472,    ...,       0.0649,      -0.0267,       0.0149  ],
        ],
        ..., 
        [
            [      0.3370,      -0.4041,      -1.5760,    ...,       0.2102,       0.1811,       0.6098  ],
            [     -0.0259,       0.0100,      -0.0058,    ...,      -0.0275,      -0.0011,      -0.0559  ],
            [     -0.0052,       0.0170,      -0.0072,    ...,      -0.0358,       0.0118,      -0.0224  ],
            [     -0.0130,       0.0112,       0.0027,    ...,       0.0276,      -0.0050,       0.0055  ],
        ],
        [
            [     -0.0828,      -0.3576,      -0.2163,    ...,      -0.4464,       0.1573,       0.7532  ],
            [     -0.0241,       0.0534,       0.0650,    ...,      -0.0410,      -0.0222,      -0.0236  ],
            [      0.0251,       0.0047,      -0.0601,    ...,      -0.0091,       0.0226,      -0.0236  ],
            [     -0.0489,       0.0651,       0.1208,    ...,       0.0190,       0.0436,      -0.0181  ],
        ],
        [
            [      0.0869,      -0.0826,      -0.1747,    ...,       0.0032,      -0.4236,       0.2473  ],
            [      0.0479,      -0.1143,       0.0173,    ...,      -0.2089,       0.3555,      -0.0883  ],
            [      0.0113,       0.0037,      -0.0545,    ...,      -0.0280,      -0.0623,       0.0449  ],
            [     -0.0140,       0.0095,       0.0075,    ...,      -0.0045,      -0.0471,       0.0529  ],
        ],
    ]
    sum = -577249.375000
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [  -2526.8816,    -468.2716,    5885.8682,    ...,    -205.6809,    1444.4355,   -1961.9440  ],
            [      0.5057,       0.3572,       0.7750,    ...,      -0.1811,       0.2851,       0.1452  ],
            [     -0.1498,       0.1264,      -0.3107,    ...,       0.4183,      -0.3672,      -0.2537  ],
            ..., 
            [      0.3370,      -0.4041,      -1.5760,    ...,       0.2102,       0.1811,       0.6098  ],
            [     -0.0828,      -0.3576,      -0.2163,    ...,      -0.4464,       0.1573,       0.7532  ],
            [      0.0869,      -0.0826,      -0.1747,    ...,       0.0032,      -0.4236,       0.2473  ],
        ],
    ]
    sum = -577287.187500
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0054,       0.0031,      -0.0012,    ...,       0.0069,       0.0003,      -0.0012  ],
            [      0.0019,       0.2234,      -0.1807,    ...,       0.1392,       0.2369,       0.1664  ],
            [     -0.2451,       0.1512,       0.0539,    ...,      -0.0710,       0.0478,      -0.0811  ],
            ..., 
            [     -0.0259,       0.0100,      -0.0058,    ...,      -0.0275,      -0.0011,      -0.0559  ],
            [     -0.0241,       0.0534,       0.0650,    ...,      -0.0410,      -0.0222,      -0.0236  ],
            [      0.0479,      -0.1143,       0.0173,    ...,      -0.2089,       0.3555,      -0.0883  ],
        ],
    ]
    sum = 36.512115
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0001,      -0.0018,       0.0027,    ...,       0.0008,       0.0006,      -0.0027  ],
            [     -0.1560,      -0.0971,       0.1389,    ...,      -0.0336,       0.0982,      -0.0383  ],
            [     -0.0553,       0.0492,       0.0706,    ...,       0.0006,       0.0383,      -0.0575  ],
            ..., 
            [     -0.0052,       0.0170,      -0.0072,    ...,      -0.0358,       0.0118,      -0.0224  ],
            [      0.0251,       0.0047,      -0.0601,    ...,      -0.0091,       0.0226,      -0.0236  ],
            [      0.0113,       0.0037,      -0.0545,    ...,      -0.0280,      -0.0623,       0.0449  ],
        ],
    ]
    sum = 8.844553
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0023,      -0.0003,      -0.0008,    ...,      -0.0013,      -0.0000,      -0.0046  ],
            [     -0.0777,      -0.2010,       0.1259,    ...,       0.0096,       0.0090,       0.0095  ],
            [      0.0330,       0.0586,      -0.0472,    ...,       0.0649,      -0.0267,       0.0149  ],
            ..., 
            [     -0.0130,       0.0112,       0.0027,    ...,       0.0276,      -0.0050,       0.0055  ],
            [     -0.0489,       0.0651,       0.1208,    ...,       0.0190,       0.0436,      -0.0181  ],
            [     -0.0140,       0.0095,       0.0075,    ...,      -0.0045,      -0.0471,       0.0529  ],
        ],
    ]
    sum = -14.182524
Model add_bos: true
Input prompt: "[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST]"
Token ids (30):
<s>(1) [(1091) SY(101289) STEM(58343) _PRO(25396) MP(7690) T(1084) ](1093)  You(3213)  are(1584)  a(1261)  helpful(20351)  assistant(27089) [(1091) /(1047) SY(101289) STEM(58343) _PRO(25396) MP(7690) T(1084) ][(5371) IN(3174) ST(3074) ](1093)  Hello(45383) [(1091) /(1047) IN(3174) ST(3074) ](1093)

And with Q4_0 (with ffn_down set to Q4_1) in case it's relevant:

./build/bin/llama-debug -m Mistral-Small-4-Q4_0.gguf -p "[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST]" -n 1 --tensor-filter "ffn_moe_weighted-32

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 128, 2, 1}, ffn_moe_weights_norm-32 (reshaped){1, 128, 2, 1}}) = {4096, 128, 2, 1}
    [
        [
            [     -0.0377,      -0.0534,       0.0481,    ...,      -0.0858,      -0.2257,       0.0617  ],
            [      0.0159,       0.0224,       0.0115,    ...,       0.0055,      -0.0229,      -0.0037  ],
            [     -0.0093,      -0.0266,      -0.0029,    ...,       0.0008,       0.0078,       0.0002  ],
            ..., 
            [      0.0001,       0.0001,      -0.0001,    ...,      -0.0003,       0.0014,      -0.0006  ],
            [     -0.0007,       0.0004,       0.0007,    ...,       0.0005,       0.0010,       0.0014  ],
            [      0.0001,      -0.0003,       0.0001,    ...,       0.0001,       0.0006,      -0.0001  ],
        ],
        [
            [      0.0030,      -0.0027,       0.0018,    ...,      -0.0004,       0.0024,      -0.0029  ],
            [      0.0002,       0.0004,       0.0005,    ...,      -0.0010,      -0.0020,       0.0001  ],
            [      0.0008,      -0.0008,      -0.0016,    ...,      -0.0013,      -0.0007,       0.0037  ],
            ..., 
            [      0.0005,       0.0003,      -0.0000,    ...,       0.0002,      -0.0017,      -0.0001  ],
            [      0.0001,       0.0008,       0.0000,    ...,       0.0002,      -0.0003,      -0.0005  ],
            [      0.0005,      -0.0002,       0.0000,    ...,      -0.0001,      -0.0000,       0.0003  ],
        ],
    ]
    sum = -21.766863
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0377,      -0.0534,       0.0481,    ...,      -0.0858,      -0.2257,       0.0617  ],
            [      0.0030,      -0.0027,       0.0018,    ...,      -0.0004,       0.0024,      -0.0029  ],
        ],
    ]
    sum = -25.451143
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [      0.0159,       0.0224,       0.0115,    ...,       0.0055,      -0.0229,      -0.0037  ],
            [      0.0002,       0.0004,       0.0005,    ...,      -0.0010,      -0.0020,       0.0001  ],
        ],
    ]
    sum = 0.668126
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0093,      -0.0266,      -0.0029,    ...,       0.0008,       0.0078,       0.0002  ],
            [      0.0008,      -0.0008,      -0.0016,    ...,      -0.0013,      -0.0007,       0.0037  ],
        ],
    ]
    sum = 1.542167
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0005,      -0.0085,       0.0111,    ...,      -0.0100,       0.0063,       0.0026  ],
            [      0.0006,      -0.0000,       0.0004,    ...,      -0.0011,       0.0013,      -0.0015  ],
        ],
    ]
    sum = 0.344204

system_info: n_threads = 96 (n_threads_batch = 96) / 192 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 4, 30, 1}, ffn_moe_weights_norm-32 (reshaped){1, 4, 30, 1}}) = {4096, 4, 30, 1}
    [
        [
            [        -nan,         -nan,         -nan,    ...,         -nan,         -nan,         -nan  ],
            [     -0.0046,       0.0024,      -0.0007,    ...,       0.0076,      -0.0001,      -0.0018  ],
            [     -0.0003,      -0.0017,       0.0028,    ...,       0.0011,       0.0001,      -0.0032  ],
            [     -0.0024,      -0.0001,      -0.0010,    ...,      -0.0012,      -0.0003,      -0.0046  ],
        ],
        [
            [      0.4818,       0.3821,       0.7114,    ...,      -0.1320,       0.2678,       0.1817  ],
            [     -0.0309,       0.1859,      -0.1440,    ...,       0.1226,       0.2021,       0.1455  ],
            [     -0.1730,      -0.1076,       0.1479,    ...,      -0.0188,       0.1073,      -0.0484  ],
            [     -0.0807,      -0.1962,       0.1267,    ...,      -0.0168,       0.0028,       0.0135  ],
        ],
        [
            [     -0.1117,       0.1570,      -0.2545,    ...,       0.3830,      -0.3810,      -0.1876  ],
            [     -0.2143,       0.1105,       0.0423,    ...,      -0.0512,       0.0245,      -0.0892  ],
            [     -0.0464,       0.0308,       0.0665,    ...,      -0.0129,       0.0517,      -0.0598  ],
            [      0.0488,       0.0612,      -0.0372,    ...,       0.0743,      -0.0307,       0.0143  ],
        ],
        ..., 
        [
            [     -0.0786,      -0.2572,      -0.2437,    ...,      -0.1087,       0.0904,       0.1486  ],
            [      0.0056,       0.0713,       0.0265,    ...,      -0.0325,      -0.0137,      -0.1162  ],
            [      0.0425,       0.1282,       0.0186,    ...,       0.0156,      -0.0127,       0.0121  ],
            [      0.0365,      -0.0189,      -0.0376,    ...,       0.0461,       0.0720,       0.0437  ],
        ],
        [
            [      0.1010,      -0.5950,      -0.6300,    ...,      -0.5743,       0.3738,       0.5940  ],
            [     -0.0817,       0.0937,       0.0383,    ...,       0.1123,      -0.0569,       0.0118  ],
            [      0.0167,      -0.0310,      -0.0787,    ...,       0.0418,      -0.0279,      -0.0389  ],
            [     -0.0318,      -0.0012,      -0.0139,    ...,      -0.0261,      -0.0425,      -0.0318  ],
        ],
        [
            [      0.0268,      -0.1215,       0.0420,    ...,       0.1526,       0.3784,      -0.2065  ],
            [      0.1985,      -0.0054,      -0.2593,    ...,      -0.0466,      -0.3999,       0.2104  ],
            [      0.0128,      -0.0004,       0.0352,    ...,       0.0763,      -0.0343,      -0.0307  ],
            [      0.0292,       0.0563,      -0.0295,    ...,      -0.0066,      -0.0192,       0.0690  ],
        ],
    ]
    sum = -nan
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [        -nan,         -nan,         -nan,    ...,         -nan,         -nan,         -nan  ],
            [      0.4818,       0.3821,       0.7114,    ...,      -0.1320,       0.2678,       0.1817  ],
            [     -0.1117,       0.1570,      -0.2545,    ...,       0.3830,      -0.3810,      -0.1876  ],
            ..., 
            [     -0.0786,      -0.2572,      -0.2437,    ...,      -0.1087,       0.0904,       0.1486  ],
            [      0.1010,      -0.5950,      -0.6300,    ...,      -0.5743,       0.3738,       0.5940  ],
            [      0.0268,      -0.1215,       0.0420,    ...,       0.1526,       0.3784,      -0.2065  ],
        ],
    ]
    sum = -nan
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0046,       0.0024,      -0.0007,    ...,       0.0076,      -0.0001,      -0.0018  ],
            [     -0.0309,       0.1859,      -0.1440,    ...,       0.1226,       0.2021,       0.1455  ],
            [     -0.2143,       0.1105,       0.0423,    ...,      -0.0512,       0.0245,      -0.0892  ],
            ..., 
            [      0.0056,       0.0713,       0.0265,    ...,      -0.0325,      -0.0137,      -0.1162  ],
            [     -0.0817,       0.0937,       0.0383,    ...,       0.1123,      -0.0569,       0.0118  ],
            [      0.1985,      -0.0054,      -0.2593,    ...,      -0.0466,      -0.3999,       0.2104  ],
        ],
    ]
    sum = 26.722431
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0003,      -0.0017,       0.0028,    ...,       0.0011,       0.0001,      -0.0032  ],
            [     -0.1730,      -0.1076,       0.1479,    ...,      -0.0188,       0.1073,      -0.0484  ],
            [     -0.0464,       0.0308,       0.0665,    ...,      -0.0129,       0.0517,      -0.0598  ],
            ..., 
            [      0.0425,       0.1282,       0.0186,    ...,       0.0156,      -0.0127,       0.0121  ],
            [      0.0167,      -0.0310,      -0.0787,    ...,       0.0418,      -0.0279,      -0.0389  ],
            [      0.0128,      -0.0004,       0.0352,    ...,       0.0763,      -0.0343,      -0.0307  ],
        ],
    ]
    sum = 43.818310
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0024,      -0.0001,      -0.0010,    ...,      -0.0012,      -0.0003,      -0.0046  ],
            [     -0.0807,      -0.1962,       0.1267,    ...,      -0.0168,       0.0028,       0.0135  ],
            [      0.0488,       0.0612,      -0.0372,    ...,       0.0743,      -0.0307,       0.0143  ],
            ..., 
            [      0.0365,      -0.0189,      -0.0376,    ...,       0.0461,       0.0720,       0.0437  ],
            [     -0.0318,      -0.0012,      -0.0139,    ...,      -0.0261,      -0.0425,      -0.0318  ],
            [      0.0292,       0.0563,      -0.0295,    ...,      -0.0066,      -0.0192,       0.0690  ],
        ],
    ]
    sum = -40.639267
llama-debug: /home/colin/git_repos/forks/mistral4-q4_0/ggml/src/ggml-cpu/ops.cpp:3756: void ggml_compute_forward_rms_norm_f32(const ggml_compute_params*, ggml_tensor*): Assertion `scale > 0.0f' failed.

(this was without my changes so it asserted)

If the model data is not sound, not sure where to go from here, though this clamping does make it run and doesn't affect any sound model.. But I totally understand not wanting to put arbitrary code that masks bugs with the model itself, so more than happy to hear your personal judgement

bartowski1182 · 2026-04-13T15:05:09Z

@ggerganov just curious if I should close this and we call Mistral broken or if I should continue investigating

IMbackK · 2026-04-13T15:30:44Z

Any sutch clamping should probubly be performed at convert-time

bartowski1182 · 2026-04-13T15:33:33Z

@IMbackK this overflow happens during activation calculations, so it can't be done at convert-time

IMbackK · 2026-04-13T15:36:01Z

right, yeah. I dont see a good solution then.

JohannesGaessler · 2026-04-14T09:10:03Z

One fix that could be done is to scale down the FP32 activations prior to the matrix multiplication and to then scale up the FP32 results afterwards again. You would lose some information on activations with very small absolute values that may now get flushed to zero but you would become more robust against overflow.

ggerganov · 2026-04-14T11:16:55Z

IMO it's not worth patching this without understanding better what exactly causes one of the activations to explode.

Maybe something goes wrong in the normalization logic:

llama.cpp/src/llama-graph.cpp

Lines 1398 to 1412 in 6a6780a

    
           if (norm_w) { 
        
               weights = ggml_reshape_2d(ctx0, weights, n_expert_used, n_tokens); 
        
               ggml_tensor * weights_sum = ggml_sum_rows(ctx0, weights); // [1, n_tokens] 
        
               cb(weights_sum, "ffn_moe_weights_sum", il); 
        
               // Avoid division by zero, clamp to smallest number representable by F16 
        
               weights_sum = ggml_clamp(ctx0, weights_sum, 6.103515625e-5, INFINITY); 
        
               cb(weights_sum, "ffn_moe_weights_sum_clamped", il); 
        
               weights = ggml_div(ctx0, weights, weights_sum); // [n_expert_used, n_tokens] 
        
               cb(weights, "ffn_moe_weights_norm", il); 
        
               weights = ggml_reshape_3d(ctx0, weights, 1, n_expert_used, n_tokens); 
        
           }

bartowski1182 · 2026-04-15T01:34:08Z

well you won't like this development...

MiniMax M2.7 is showing a similar issue but this time with Q4_K and Q5_K: when the last FFN_DOWN_EXPS is Q4_K, the perplexity gets NaN values

However this time it only happens with CUDA, CPU gets no such NaNs

Compiling with -DGGML_CUDA_FORCE_CUBLAS=ON fixes the NaNs, as does performing an identical clamp to the F16 range inside the quantize_q8_1 DS4 layout:

diff --git a/ggml/src/ggml-cuda/quantize.cu b/ggml/src/ggml-cuda/quantize.cu
index 4300ffc14..ea23fed7f 100644
--- a/ggml/src/ggml-cuda/quantize.cu
+++ b/ggml/src/ggml-cuda/quantize.cu
@@ -44,7 +44,9 @@ static __global__ void quantize_q8_1(
         return;
     }

-    y[ib].ds = make_half2(d, sum);
+    // DEBUG: clamp sum to F16 range to test if this is the NaN source
+    const float sum_clamped = fminf(fmaxf(sum, -65504.0f), 65504.0f);
+    y[ib].ds = make_half2(d, sum_clamped);
 }

 __device__ __forceinline__ uint8_t compute_e8m0_scale(float amax) {
@@ -264,7 +266,9 @@ static __global__ void quantize_mmq_q8_1(
     const float d = 1.0f / d_inv;

     if (ds_layout == MMQ_Q8_1_DS_LAYOUT_DS4) {
-        y[ib].ds4[iqs/32] = make_half2(d, sum);
+        // DEBUG: clamp sum to F16 range to test if this is the NaN source
+        const float sum_clamped = fminf(fmaxf(sum, -65504.0f), 65504.0f);
+        y[ib].ds4[iqs/32] = make_half2(d, sum_clamped);
     } else {
         y[ib].d4[iqs/32]  = d;
     }

unfortunately this is debugging done with Claude, would need someone more well versed with CUDA to suggest a proper fix (though Claude suggested swapping DS4 for D2S6, however I have no clue what the implications are), the clamping is done purely as a show of where the issue seems to be :')

but it does lead me to believe we have some strange numerical issues on our hands and it's more widespread than initially thought

CUDA Version 12.2.2 btw

Edit: nevermind, the compile flag only delays the nan values, they eventually happen..

am17an · 2026-04-15T12:34:29Z

@bartowski1182 recently GGML_CUDA_FORCE_CUBLAS_COMPUTE_32F was added, does that still cause the NaNs? Also do you have a minimal repro for this?

bartowski1182 · 2026-04-15T14:52:21Z

I'll give that a shot

Minimal repro:

./build/bin/llama-quantize --pure ./MiniMax-M2.7-bf16.gguf ./MiniMax-M2.7-Q4_K.gguf Q4_K

If you have an imatrix, you can make a small version like this:

./build/bin/llama-quantize --imatrix ./MiniMaxAI_MiniMax-M2.7-imatrix.gguf --tensor-type 61.ffn_down_exps=q4_k ./MiniMax-M2.7-bf16.gguf ./MiniMax-M2.7-IQ1_S.gguf IQ1_S

Can use mine from here: https://huggingface.co/bartowski/MiniMaxAI_MiniMax-M2.7-GGUF/blob/main/MiniMaxAI_MiniMax-M2.7-imatrix.gguf

Should I consider opening a new issue for this or does this discussion feel related?

am17an · 2026-04-15T17:19:09Z

I didn't realize that would involve download 500GB of weights, I'm guessing you don't have a smaller model to work with

bartowski1182 · 2026-04-15T17:20:43Z

Oh right, meant to link at the end

@ubergarm made a copy of that exact setup here:

https://huggingface.co/ubergarm/MiniMax-M2.7-GGUF/blob/ed148d9b6be8a935caf693efcc92863d117fc8e8/BROKEN-TEST-ONLY-DONT-DOWNLOAD-MiniMax-M2.7-iq1_s_q4_K.gguf

ORippler · 2026-04-16T09:12:09Z

y[ib].ds4[iqs/32] = make_half2(d, sum);

If sum exceeds the value range for F16, CUDA will follow IEEE 754 and sum will be INF. Unsure if/how the CPU backend handles this case

https://docs.nvidia.com/cuda/parallel-thread-execution/#scalar-conversions

ORippler · 2026-04-16T09:13:32Z

If we need more dynamic value ranges, it makes sense to go towards BF16 datatypes (we have been confronted with numerical stability issues related to F16 multiple times already) imo

bartowski1182 · 2026-04-21T17:40:32Z

Wanted to provide an update on investigation

Here's what I know:

With mainline, running pure Q4_K and pure Q6_K with the last FFN_DOWN_EXPS set to Q4_K results in NaN on CUDA

With the clamp fix proposed above, those two quants are fixed.

HOWEVER, the recipe used for (my at least, can check mainline) Q3_K_M is NOT fixed with the clamping above. Those NaN values still appear and only go away when disabling flash attention

To fix that, an additional clamp on d is ALSO required:

      269 -        // DEBUG: clamp sum to F16 range to test if this is the NaN source                                                                                                                                                                                                                            
      269 +        // clamp d and sum to f16 range to avoid inf from large activations                                                                                                                                                                                                                         
      270 +        const float d_clamped   = fminf(d, 65504.0f);                                                                                                                                                                                                                                                 
      271          const float sum_clamped = fminf(fmaxf(sum, -65504.0f), 65504.0f);                                                                                                                                                                                                                           
      271 -        y[ib].ds4[iqs/32] = make_half2(d, sum_clamped);                                                                                                                                                                                                                                             
      272 +        y[ib].ds4[iqs/32] = make_half2(d_clamped, sum_clamped);

this was discovered when dumping the op that produced the issue:

NAN DEBUG: op #57489 'ffn_moe_down-61' (op=30) produced NaN/inf at index 9732096 (of 12582912): -inf
  src0: 'CUDA0#blk.61.ffn_down_exps.weight#0' type=12
  src1: 'ffn_moe_swiglu-61' type=0

Something about Q3_K for earlier tensors causes d to also overflow to -inf and result in NaN

CPU is immune to all of this because it uses block_q8_K with f32 scale and int16 partial sums

With the two clamps to quantize.cu, we get proper values for the blocks that used to give NaN

I think we should introduce these clamps since the model's PPL seems to be acceptable with them and investigate other solutions like switching to F32 scales and/or sums in the future, but for now I think it best to get a working solution into mainline so we can at least alleviate these current standing issues with existing models

Other things I tried while experimenting:

Bumping FATTN_KQ_MAX_OFFSET to 10.0f

Using -ctk f32 -ctv f32

GGML_CUDA_FORCE_CUBLAS_COMPUTE_32F

GGML_CUDA_DISABLE_FUSION=1

I've pushed the latest fixes for both Mistral and Minimax to this branch so they can be seen together, like I said I think it best to merge these fixes for now, if a further investigation is requested I can continue digging (if provided some direction) but this is a short-term easy fix that doesn't break anything existing

bartowski1182 · 2026-04-24T14:56:07Z

@ggerganov I will continue investigating to see if I can find a root cause, but I think we should consider merging this in the meantime to fix any models that exist with this issue (mistral and minimax)

ggerganov · 2026-04-24T15:02:38Z

Does Minimax need just the CUDA clamp in Q8_1?

bartowski1182 · 2026-04-24T15:18:46Z

Correct yes

am17an · 2026-04-24T15:33:54Z

This is a real problem, even deepseek v4 uses the OAI style clamp post gate + swiglu. These can reasonably overflow in later layers of larger models as you train with more tokens.

ggerganov · 2026-04-24T15:40:39Z

Clamping the weights is not a good idea. The basic assumption for using F16 for most quantization scales is that weights are standard normal distributed, which I think is still a valid assumption. Mistral 4 small deviates from this - I suspect something went wrong during training. But in any case, clamping the weights during quantization is very hacky.

For the activations - I think we have to prepare a separate fix where we switch to a wider-range for the sum in Q8_1. Either BF16 or F32 - depending which one is feasible and performant. Since this type is "internal", we can afford to make such a change.

bartowski1182 · 2026-04-24T15:49:08Z

I agree 100%, but clamping weights is a quick easy fix in the interim, unless you think the swap to f32/bf16 will be quick and painless

The "depending on which is feasible and performant" is the only part that concerns me for getting the swap in quickly :)

Personally I'd rather merge this, investigate the better options, then implement the real fix and revert this

mrexodia · 2026-04-30T14:01:39Z

Yesterday night I did an experiment where I had gpt-5.5 integrate a patch for Talkie 13B support (https://github.com/solwyc/talkie-1930-13b-it-q5) into llama.cpp, perform quantizations and test if the full CUDA offload works. Initially the workaround was to keep the offending layer in Q8_0, but deep research pointed to this PR so I had it try that.

The clamp approach from this PR removed the NaN values but changed the first token from My to <|end|>, making the output unusable. I have experience with low-level development and some familiarity with CUDA, so I was able to guide the model. However I cannot claim to understand the solution fully, so I do not feel comfortable submitting a patch. If it is helpful here is my commit: mrexodia@8c8ebce. I tested different quantization levels that previously failed and the tested quantization formats no longer produced NaNs under full CUDA offload

Happy to share more details/notes/pi session if that is in any way helpful.

Relevant technical details Codex really wants to mention:

The problematic Talkie activation block had a finite 32-value sum around 145894, above FP16 max 65504.
The experiment stores sum(qs) as int16 in the upper half of ds and reconstructs d * sum_q in FP32.
ffn_swiglu-14 was finite, max_abs_sum32 ~= 145894; first unexpected Inf was ffn_out-14, followed by NaNs in attn_norm-15.

bssrdf · 2026-05-01T16:01:09Z

Just to point out this Fp16 overflow has also been bugging several SD.cpp models using the ds4 layout. See leejet/stable-diffusion.cpp#851 (comment). The SD author has to scale down the activation to get around the problem. #22571 is a nice and clean fix.

thomasgauthier · 2026-05-01T20:26:44Z

@mrexodia I also have an implementation of the Talkie architecture. You can find it on my fork (GGUF).

I've had no problems running the model on CUDA, and the logits seem to match the original PyTorch model (or almost match, some slight difference when using flash-attn). I'm planning on opening a PR once I have everything properly reviewed. Hope that helps!

mrexodia · 2026-05-01T20:51:36Z

Yeah I indeed used your patch @thomasgauthier, thanks for publishing it! The issues didn't show up for all quantizations and I used https://huggingface.co/lewtun/talkie-1930-13b-it-hf as a base instead of their custom checkpoints. Not exactly sure why it happened, but there is definitely a real issue in the CUDA implementation (since everything works fine when running on the CPU).

thomasgauthier · 2026-05-01T21:15:23Z

@mrexodia yeah ok I have only tested Q8_0, I'll try the other quants and investigate the issue. Thanks for flagging this.

mrexodia · 2026-05-01T21:44:03Z

FYI @thomasgauthier the workaround is to specify --tensor-type "blocks.14.mlp.mlp_resid.weight=q8_0" because that was the layer with large values that triggered the CUDA problem.

bartowski1182 added 3 commits April 8, 2026 18:34

Prevent the sum of the dequantized activation in q8_1 from overflowing

835acb7

Merge branch 'ggml-org:master' into mistral4-q4_0

d071411

Add CUDA overflow protection

48f1d71

bartowski1182 marked this pull request as ready for review April 9, 2026 02:35

bartowski1182 requested review from a team and ggerganov as code owners April 9, 2026 02:35

adhusch mentioned this pull request Apr 17, 2026

Eval bug: Mistral Small 4 (mistral4 arch): repetitive/empty output on Metal #20668

Closed

bartowski1182 added 2 commits April 21, 2026 12:56

Merge branch 'ggml-org:master' into mistral4-q4_0

b92ea4a

Clamp d and sum in CUDA q8_1 path to valid f16 range

a5ca8d6

Standardize clamp code format for ggml-quants.c

d307b8e

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 21, 2026

Missing semicolon

49fcb4e

bartowski1182 mentioned this pull request May 1, 2026

Swap out F16 for BF16 in Q8_1 activations to avoid overflowing values #22571

Draft

Uh oh!

Conversation

bartowski1182 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

ggerganov commented Apr 9, 2026

Uh oh!

bartowski1182 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Apr 13, 2026

Uh oh!

IMbackK commented Apr 13, 2026

Uh oh!

bartowski1182 commented Apr 13, 2026

Uh oh!

IMbackK commented Apr 13, 2026

Uh oh!

JohannesGaessler commented Apr 14, 2026

Uh oh!

ggerganov commented Apr 14, 2026

Uh oh!

bartowski1182 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Apr 15, 2026

Uh oh!

bartowski1182 commented Apr 15, 2026

Uh oh!

am17an commented Apr 15, 2026

Uh oh!

bartowski1182 commented Apr 15, 2026

Uh oh!

ORippler commented Apr 16, 2026

Uh oh!

ORippler commented Apr 16, 2026

Uh oh!

bartowski1182 commented Apr 21, 2026

Uh oh!

bartowski1182 commented Apr 24, 2026

Uh oh!

ggerganov commented Apr 24, 2026

Uh oh!

bartowski1182 commented Apr 24, 2026

Uh oh!

am17an commented Apr 24, 2026

Uh oh!

ggerganov commented Apr 24, 2026

Uh oh!

bartowski1182 commented Apr 24, 2026

Uh oh!

mrexodia commented Apr 30, 2026

Uh oh!

bssrdf commented May 1, 2026

Uh oh!

thomasgauthier commented May 1, 2026

Uh oh!

mrexodia commented May 1, 2026

Uh oh!

thomasgauthier commented May 1, 2026

Uh oh!

mrexodia commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

bartowski1182 commented Apr 9, 2026 •

edited

Loading

bartowski1182 commented Apr 9, 2026 •

edited

Loading

bartowski1182 commented Apr 15, 2026 •

edited

Loading