Non-record: 11L mixed int5/int6 + working QAT + TTT (val_bpb=1.1466) by vytautas-bunevicius · Pull Request #421 · openai/parameter-golf

vytautas-bunevicius · 2026-03-22T12:48:33Z

Summary

Non-record submission stacking 8 techniques on PR #315 (1.1248):

Working QAT fix (PR Record: 11L Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1248) #315's was dead code due to torch.compile)
Mixed int5 (MLP) / int6 (attention) quantization + 3% magnitude pruning
Test-time training (3 epochs SGD post-quant, 83s on 8xH100)
BigramHash 10240 (up from 2048)
64 learnable memory tokens
Backout connection (1 scalar param)
Per-head temperature (88 params)
Eval stride 32

val_bpb = 1.1466 on 8xH100 SXM. Ran with PyTorch SDPA instead of FA3 (110ms/step, 5129 steps instead of ~7000). Artifact: 14.7MB.

…bpb=1.1466)

feat: add non-record submission 11L mixed int5/int6 + QAT + TTT (val_…

fca68ed

…bpb=1.1466)

vytautas-bunevicius force-pushed the submission/sota-attempt branch from 0e801e0 to fca68ed Compare March 22, 2026 12:51

feat(records): add earlier QAT controls and experiment log

ac8273e

vytautas-bunevicius force-pushed the submission/sota-attempt branch 2 times, most recently from 6836e31 to c4d2bef Compare March 22, 2026 13:21

leloykun mentioned this pull request Mar 22, 2026

Invalid submissions due to information leakage during TTT #402

Open

feat(records): add earlier QAT controls and causal TTT eval

75168ef

vytautas-bunevicius force-pushed the submission/sota-attempt branch from c4d2bef to 75168ef Compare March 22, 2026 15:19