Skip to content

Non-record: 11L XSA4 + EMA + SDTTT (3-seed mean val_bpb=1.1287)#406

Open
dentity007 wants to merge 1 commit intoopenai:mainfrom
NathanMaine:submission/11L-SDTTT-XSA4-EMA-NathanMaine
Open

Non-record: 11L XSA4 + EMA + SDTTT (3-seed mean val_bpb=1.1287)#406
dentity007 wants to merge 1 commit intoopenai:mainfrom
NathanMaine:submission/11L-SDTTT-XSA4-EMA-NathanMaine

Conversation

@dentity007
Copy link

Summary

Mean val_bpb = 1.1287 (3-seed verified, sliding window stride=64)

Third progressive submission. Uses PR #379 architecture with Self-Distillation TTT.

Seed val_bpb (sliding) Artifact
1337 1.1280 15.7MB
42 1.1287 15.7MB
7 1.1294 15.7MB
Mean 1.1287

Std: 0.0007 | All under 16MB

Progression (4 days, $150 total compute)

PR BPB What changed
#273 1.1575 Baseline, 10L
#385 1.1488 WD+SWA tuning, 11L
This 1.1287 XSA4 + EMA + SDTTT

Running on stock PyTorch SDPA (no FA3, no custom kernels). 99ms/step vs SOTA's 55ms.

Submission checklist

  • 3-seed verification (mean=1.1287, std=0.0007)
  • All artifacts < 16MB
  • Wallclock < 600s on 8×H100
  • Train logs included (3 seeds)
  • Reproducible train_gpt.py included

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant