Skip to content

Non-record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 control (val_bpb=1.1231, 8xH100 verified)#429

Open
AbhisekBasu1 wants to merge 2 commits intoopenai:mainfrom
AbhisekBasu1:codex/record-1.1231-control
Open

Non-record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 control (val_bpb=1.1231, 8xH100 verified)#429
AbhisekBasu1 wants to merge 2 commits intoopenai:mainfrom
AbhisekBasu1:codex/record-1.1231-control

Conversation

@AbhisekBasu1
Copy link

Summary

This is a non-record submission for a validated 8xH100 SXM control run of the EMA + GPTQ-lite + warmdown3500 + QAT@0.15 stack.

The goal of this submission is to provide a stronger clean non-TTT control, not to claim current SOTA. This run improves our earlier validated #414-class control result while staying under the 16,000,000 byte cap.

Key Techniques

  • EMA every step (decay=0.997)
  • GPTQ-lite post-training quantization with per-row clip-percentile search
  • warmdown extended to 3500
  • late QAT threshold set to 0.15
  • XSA-last-4 + VE128 + LN Scale + SmearGate + BigramHash
  • int6 + zstd-22 export
  • sliding-window eval with stride 64

Result

  • val_bpb = 1.12311898
  • total bytes: 15,683,276
  • model bytes (int6+zstd): 15,610,171
  • hardware: 8xH100 SXM
  • training wallclock: 600.065s
  • seed: 1337
  • steps reached: 7142

Why Submit This

  • improves our earlier validated #414-class control run (1.12946402)
  • gives a stronger fully validated non-TTT reference point against the current TTT-heavy frontier
  • includes synced logs, exact script, and artifact metadata
  • not a current SOTA claim

Included Files

  • README.md
  • submission.json
  • train.log
  • train_seed1337.log
  • train_gpt.py

Test Plan

  • full 8xH100 SXM training run completed under the 600s limit
  • final sliding-window eval completed with stride 64
  • post-quant roundtrip verified
  • final artifact verified under 16,000,000 total bytes

Submission Folder

  • records/track_10min_16mb/2026-03-22_11L_EMA_GPTQ-lite_warmdown3500_QAT015_1.1231

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant