Skip to content

πŸ”΄ fake_quant.rs broken for ≀8-bit formats (R7 falsified by TIER1-posit8)Β #827

@gHashTag

Description

@gHashTag

R7 Falsification Witness β€” fake_quant.rs broken for ≀8-bit formats

Anchor: φ² + φ⁻² = 3 Β· PASS-19 Β· 2026-05-15

Evidence (R5-verified, fresh SQL probe on phd-postgres-ssot @ 2026-05-15 09:58 UTC)

11 of 12 sub-16-bit format/algo combinations produce bpb β‰ˆ logβ‚‚(128) = 7.0 β€” i.e., the model does not learn at all. Vocab size is 128 β†’ logβ‚‚(128) = 7.0 is the uniform-distribution baseline.

Format Algo Best BPB Status
posit8 adamw 6.9903 πŸ’€ R7 witness (IGLA-TIER1-posit8-h128-LR0.0001-rng1597-adamw)
posit8 muon 4.8115 πŸ”΄ partial learn
int8 adamw 6.9926 πŸ’€
int8 muon 5.4015 πŸ”΄ partial
int4 adamw 6.9703 πŸ’€
int4 muon 2.8440 🟑 (muon learns!)
uint8 adamw 7.0000 πŸ’€ exact logβ‚‚(vocab)
gf32 adamw 6.9610 πŸ’€
gf8 adamw 7.0000 πŸ’€
gf4 muon 7.0124 πŸ’€
mxfp8 muon 6.9780 πŸ’€
nf4 adamw/muon 6.9860 / 7.0552 πŸ’€

Meanwhile 16-bit formats train cleanly (gf256 adamw β†’ 2.5719, posit16 muon β†’ 2.6251, fp16 adamw β†’ 2.6691).

Hypothesis (R12 Lee/GVSU style)

crates/trios-train/src/fake_quant.rs:449-493 β€” the dequantization path for bits ≀ 8 either:

  1. Doesn't dequant before the residual stream (output stays in integer domain), OR
  2. Initializes scale/zero-point with a value that collapses gradients to zero, OR
  3. Has a wrong STE (straight-through estimator) β€” muon happens to bypass it (works for int4), but adamw cannot.

The asymmetry adamw=dead / muon=partial for int4/int8/posit8 is the smoking gun: gradient flow differs between optimizers only when STE is broken.

Falsification protocol (R7)

Cluster reference (the working 14 lanes from same fleet, identical hyperparams except format):

  • IGLA-DEEP-CHAMPION-gf256-h384-LR0.0001-rng1597-adamw β†’ 2.5719 (962 rows)
  • IGLA-MULTI-SEED-posit16-h128-LR0.0001-rng2584-muon β†’ 2.6251 (58 rows)

All 16-bit lanes learn β†’ infrastructure is fine. Only ≀8-bit lanes fail β†’ fake_quant.rs is broken for low-bit path.

Repro

gh workflow run tier1-explore-acc13.yml --repo gHashTag/trios-railway -f confirm=PHI
# wait 20 min, then:
psql $DATABASE_URL -c "SELECT canon_name, MIN(bpb) FROM ssot.bpb_samples WHERE canon_name LIKE 'IGLA-TIER1-%' GROUP BY canon_name;"

Suggested fix (action items)

  • Add unit test in crates/trios-train/tests/fake_quant_test.rs that asserts dequant(quant(x)) round-trips within eps for all formats: int4, int8, uint8, posit8, gf4, gf8, gf32, nf4, mxfp8.
  • Add property test: bpb after 100 steps on tiny shakespeare with vocab=128 must be < 6.5 for all formats. Fail CI if any format β‰₯ 6.9.
  • Audit bits ≀ 8 branch of fake_quant.rs:449-493 β€” likely a missing .mul(scale).add(zero_point) after the int cast.
  • After fix, dispatch new TIER1-LOWBIT wave on Railway ACC1+ACC3 to verify all ≀8-bit formats reach bpb < 3.0.

Refs


φ² + φ⁻² = 3 Β· TRINITY Β· PASS-19 Β· DEFENSE 2026-06-15

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions