Skip to content

Record: 10L Int5-MLP + Mixed Quant + GradClip + Warmdown3k (mean val_bpb=1.20262)#426

Open
aniketio-ctrl wants to merge 1 commit intoopenai:mainfrom
aniketio-ctrl:submission/10L-int5mlp-20260322
Open

Record: 10L Int5-MLP + Mixed Quant + GradClip + Warmdown3k (mean val_bpb=1.20262)#426
aniketio-ctrl wants to merge 1 commit intoopenai:mainfrom
aniketio-ctrl:submission/10L-int5mlp-20260322

Conversation

@aniketio-ctrl
Copy link

Summary

Improvements over the naive baseline (1.2244 bpb) using mixed precision quantization
to fund a 10th layer within the 16MB budget.

mean val_bpb: 1.20262 (3 seeds, post int5/int6+zlib quantization roundtrip)

3-Seed Results

Seed val_bpb
1337 1.20375402
42 1.20134663
123 1.20275459
Mean 1.20262

All artifacts under 16MB (~15.7MB).

Key Changes vs Baseline

  • NUM_LAYERS 9 → 10: extra layer funded by Int5 MLP quantization
  • MLP_MULT 2 → 3: wider MLP (hidden 1024 → 1536)
  • Mixed quantization: Int5 for MLP weights, Int6 for attention, FP16 for embeddings
  • WARMDOWN_ITERS 1200 → 3000: better convergence
  • GRAD_CLIP_NORM 0.0 → 0.3: stable training

Checklist

  • Beats baseline by ≥ 0.005 nats
  • 3 seeds provided
  • All artifacts < 16,000,000 bytes
  • train_gpt.py included
  • Logs included

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant