Record: 10L Int5-MLP + Mixed Quant + GradClip + Warmdown3k (mean val_bpb=1.20262) by aniketio-ctrl · Pull Request #426 · openai/parameter-golf

aniketio-ctrl · 2026-03-22T14:04:59Z

Summary

Improvements over the naive baseline (1.2244 bpb) using mixed precision quantization
to fund a 10th layer within the 16MB budget.

mean val_bpb: 1.20262 (3 seeds, post int5/int6+zlib quantization roundtrip)

All artifacts under 16MB (~15.7MB).

NUM_LAYERS 9 → 10: extra layer funded by Int5 MLP quantization
MLP_MULT 2 → 3: wider MLP (hidden 1024 → 1536)
Mixed quantization: Int5 for MLP weights, Int6 for attention, FP16 for embeddings
WARMDOWN_ITERS 1200 → 3000: better convergence
GRAD_CLIP_NORM 0.0 → 0.3: stable training

…bpb=1.20262)

Record: 10L Int5-MLP + Mixed Quant + GradClip + Warmdown3k (mean val_…

2010a39

…bpb=1.20262)