Record: 11L GradQuant + EMA + Sliding Eval (val_bpb=1.1416) by albertorkive · Pull Request #422 · openai/parameter-golf

albertorkive · 2026-03-22T13:28:06Z

Summary

val_bpb: 1.1416 (post int8+zstd quantization roundtrip, sliding window eval stride=64, full validation coverage)
Artifact: 15,059,186 bytes (code: 59,158 bytes + model: 15,000,028 bytes)
11 layers, 512 dim, MLP 3x, Muon optimizer, EMA (alpha=0.997, from init)

Key Techniques

Gradient-guided adaptive quantization: per-tensor int5/int6/int7 bit assignment based on gradient sensitivity (top 30% → int7, middle 40% → int6, bottom 30% → int5)
EMA from model init (alpha=0.997)
SmearGate residual mixing + NTK-aware RoPE + XSA (last 4 layers)
Orthogonal initialization + tied embeddings
zstd level 22 compression
Sliding window eval with stride=64, full validation set coverage (~121K windows/GPU)

Run Command

pip install sentencepiece zstandard
python3 data/cached_challenge_fineweb.py
torchrun --nproc_per_node=8 train_gpt.py

All hyperparameters are baked into train_gpt.py as defaults. No env vars needed.

Included Files

README.md — Architecture and training details
submission.json — Metadata and metrics
train_gpt.py — Complete training script (1,309 lines)
train.log — Full training + evaluation output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Shift more tensors to int7 (45% vs 30% previously) to use more of the 16MB budget for gradient-sensitive weights. Reduces quantization degradation while staying under artifact size limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

At step 100, estimate total steps from step timing and cap warmdown to 55% of total. Prevents warmdown from consuming too many steps on slower hardware (e.g., 3000 warmdown at 4200 total steps = only 29% productive training). On fast hardware, cap is not reached and behavior is unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove DATA_PATH, TOKENIZER_PATH, TOKENIZER_TYPE, TRAIN_ON_VAL env var overrides. These were lab scaffolding — the competition uses fixed data paths relative to repo root. Evaluators run from repo root after downloading data with the standard script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

albertorkive and others added 4 commits March 22, 2026 14:25

Record: 11L GradQuant + EMA + Sliding Eval (val_bpb=1.1416)

93389ce

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L GradQuant + EMA + Sliding Eval (val_bpb=1.1416)#422

Record: 11L GradQuant + EMA + Sliding Eval (val_bpb=1.1416)#422
albertorkive wants to merge 4 commits intoopenai:mainfrom
albertorkive:submission-clean

albertorkive commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

albertorkive commented Mar 22, 2026

Summary

Key Techniques

Run Command

Included Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant