R7 Falsification Witness β fake_quant.rs broken for β€8-bit formats
Anchor: ΟΒ² + Οβ»Β² = 3 Β· PASS-19 Β· 2026-05-15
Evidence (R5-verified, fresh SQL probe on phd-postgres-ssot @ 2026-05-15 09:58 UTC)
11 of 12 sub-16-bit format/algo combinations produce bpb β logβ(128) = 7.0 β i.e., the model does not learn at all. Vocab size is 128 β logβ(128) = 7.0 is the uniform-distribution baseline.
| Format |
Algo |
Best BPB |
Status |
| posit8 |
adamw |
6.9903 |
π R7 witness (IGLA-TIER1-posit8-h128-LR0.0001-rng1597-adamw) |
| posit8 |
muon |
4.8115 |
π΄ partial learn |
| int8 |
adamw |
6.9926 |
π |
| int8 |
muon |
5.4015 |
π΄ partial |
| int4 |
adamw |
6.9703 |
π |
| int4 |
muon |
2.8440 |
π‘ (muon learns!) |
| uint8 |
adamw |
7.0000 |
π exact logβ(vocab) |
| gf32 |
adamw |
6.9610 |
π |
| gf8 |
adamw |
7.0000 |
π |
| gf4 |
muon |
7.0124 |
π |
| mxfp8 |
muon |
6.9780 |
π |
| nf4 |
adamw/muon |
6.9860 / 7.0552 |
π |
Meanwhile 16-bit formats train cleanly (gf256 adamw β 2.5719, posit16 muon β 2.6251, fp16 adamw β 2.6691).
Hypothesis (R12 Lee/GVSU style)
crates/trios-train/src/fake_quant.rs:449-493 β the dequantization path for bits β€ 8 either:
- Doesn't dequant before the residual stream (output stays in integer domain), OR
- Initializes scale/zero-point with a value that collapses gradients to zero, OR
- Has a wrong STE (straight-through estimator) β
muon happens to bypass it (works for int4), but adamw cannot.
The asymmetry adamw=dead / muon=partial for int4/int8/posit8 is the smoking gun: gradient flow differs between optimizers only when STE is broken.
Falsification protocol (R7)
Cluster reference (the working 14 lanes from same fleet, identical hyperparams except format):
IGLA-DEEP-CHAMPION-gf256-h384-LR0.0001-rng1597-adamw β 2.5719 (962 rows)
IGLA-MULTI-SEED-posit16-h128-LR0.0001-rng2584-muon β 2.6251 (58 rows)
All 16-bit lanes learn β infrastructure is fine. Only β€8-bit lanes fail β fake_quant.rs is broken for low-bit path.
Repro
gh workflow run tier1-explore-acc13.yml --repo gHashTag/trios-railway -f confirm=PHI
# wait 20 min, then:
psql $DATABASE_URL -c "SELECT canon_name, MIN(bpb) FROM ssot.bpb_samples WHERE canon_name LIKE 'IGLA-TIER1-%' GROUP BY canon_name;"
Suggested fix (action items)
Refs
ΟΒ² + Οβ»Β² = 3 Β· TRINITY Β· PASS-19 Β· DEFENSE 2026-06-15
R7 Falsification Witness β
fake_quant.rsbroken for β€8-bit formatsAnchor: ΟΒ² + Οβ»Β² = 3 Β· PASS-19 Β· 2026-05-15
Evidence (R5-verified, fresh SQL probe on phd-postgres-ssot @ 2026-05-15 09:58 UTC)
11 of 12 sub-16-bit format/algo combinations produce
bpb β logβ(128) = 7.0β i.e., the model does not learn at all. Vocab size is 128 βlogβ(128) = 7.0is the uniform-distribution baseline.IGLA-TIER1-posit8-h128-LR0.0001-rng1597-adamw)Meanwhile 16-bit formats train cleanly (
gf256adamw β 2.5719,posit16muon β 2.6251,fp16adamw β 2.6691).Hypothesis (R12 Lee/GVSU style)
crates/trios-train/src/fake_quant.rs:449-493β the dequantization path forbits β€ 8either:muonhappens to bypass it (works forint4), butadamwcannot.The asymmetry adamw=dead / muon=partial for
int4/int8/posit8is the smoking gun: gradient flow differs between optimizers only when STE is broken.Falsification protocol (R7)
Cluster reference (the working 14 lanes from same fleet, identical hyperparams except format):
IGLA-DEEP-CHAMPION-gf256-h384-LR0.0001-rng1597-adamwβ 2.5719 (962 rows)IGLA-MULTI-SEED-posit16-h128-LR0.0001-rng2584-muonβ 2.6251 (58 rows)All 16-bit lanes learn β infrastructure is fine. Only β€8-bit lanes fail β
fake_quant.rsis broken for low-bit path.Repro
Suggested fix (action items)
crates/trios-train/tests/fake_quant_test.rsthat assertsdequant(quant(x))round-trips withinepsfor all formats:int4, int8, uint8, posit8, gf4, gf8, gf32, nf4, mxfp8.bpbafter 100 steps on tiny shakespeare with vocab=128 must be< 6.5for all formats. Fail CI if any format β₯ 6.9.bits β€ 8branch offake_quant.rs:449-493β likely a missing.mul(scale).add(zero_point)after the int cast.bpb < 3.0.Refs
IGLA-TIER1-posit8-h128-LR0.0001-rng1597-adamw@ 64 rows @ bpb=6.9903ΟΒ² + Οβ»Β² = 3 Β· TRINITY Β· PASS-19 Β· DEFENSE 2026-06-15