Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) by signalrush · Pull Request #414 · openai/parameter-golf

signalrush · 2026-03-22T07:48:40Z

Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15

val_bpb: 1.1233 (sliding window stride=64, 3-seed mean) | 15.55 MB (mean) | 8xH100 SXM, 600s

Key Innovations Over PR #374

Change	PR #374	This	Impact
GPTQ-lite	Fixed clip (row max)	5 clip percentiles per row, pick min MSE	-0.0006 BPB
EMA (decay=0.997)	None (Tight SWA only)	EMA every step	-0.0006 BPB
Warmdown	3000	3500	-0.0002 BPB
Late QAT threshold	0.1	0.15	-0.0001 BPB
Total	1.1246	1.1233	-0.0013 BPB

GPTQ-lite: Per-Layer Optimal Clip Percentile

Instead of using row maximum for int6 scale, try 5 clip percentiles (0.999, 0.9995, 0.9999, 0.99999, 1.0) per weight matrix row and pick the one minimizing reconstruction MSE. Zero training cost.

Results (3 seeds, 8xH100 SXM)

Seed	Steps	val_loss	Sliding BPB (s64)	Artifact
1337	7101	1.8958	1.1228	15.56 MB
42	~7100	1.8972	1.1236	15.54 MB
2024	~7100	1.8971	1.1236	15.59 MB

Mean: 1.1233 | Std: 0.0005

Architecture

11L, 512d, 8H/4KV, MLP 3x (relu²), U-Net skips, XSA4, Partial RoPE 16/64, LN Scale, VE128, SmearGate, BigramHash(2048), FA3, Muon WD=0.04, EMA(0.997), Tight SWA, Late QAT@0.15, int6+zstd-22.

Run Command

SEED=1337 bash eval/eval.sh

Test plan

All 3 seeds under 16MB
All 3 seeds train in 600s on 8xH100
Post-quant roundtrip verified
Sliding window eval (stride=64) consistent across seeds (std=0.0005)
train_gpt.py under 1500 lines (1402)
No TTT on validation data

🤖 Generated with Claude Code

…, 3-seed mean)

Seed 1337: 81.86ms, 1.1241 bpb, 15.83MB Seed 42: 81.88ms, 1.1253 bpb, 15.82MB Seed 2025: 81.86ms, 1.1247 bpb, 15.80MB Mean: 81.87ms, 1.1247 bpb Also adds GPTQ-lite (PR openai#414's per-row optimal clip percentile search) for improved int6 quantization quality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233…

15776db

…, 3-seed mean)

notapplica mentioned this pull request Mar 22, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

newjordan mentioned this pull request Mar 22, 2026

Tiny bump: 11L TTT Burst + EMA + GPTQ-lite (val_bpb=1.1232) #445

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233)#414

Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233)#414
signalrush wants to merge 1 commit intoopenai:mainfrom
signalrush:submission/ema-gptqlite-1.1233

signalrush commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

signalrush commented Mar 22, 2026