Skip to content

Add non-record 1xH200 fp16-embed baseline sweep submission#407

Open
itu-itis24-buyukhelvacigilm24 wants to merge 1 commit intoopenai:mainfrom
itu-itis24-buyukhelvacigilm24:codex/non-record-h200-fp16-embed
Open

Add non-record 1xH200 fp16-embed baseline sweep submission#407
itu-itis24-buyukhelvacigilm24 wants to merge 1 commit intoopenai:mainfrom
itu-itis24-buyukhelvacigilm24:codex/non-record-h200-fp16-embed

Conversation

@itu-itis24-buyukhelvacigilm24

Summary

This PR adds a non-record submission that captures the best completed result from a controlled 1xH200, 10-minute screening sweep on the Parameter Golf baseline family.

The submitted run keeps the published SP-1024 baseline layout and only changes export behavior by preserving tok_emb.weight in fp16 during int8 export. In our H200 sweep, this was the strongest completed branch under the 16,000,000 byte cap.

What is included

  • exact train_gpt.py snapshot used for the run
  • exact train.log
  • submission.json
  • README.md with configuration, metrics, and sweep context

Why non-record

  • this run was screened on 1xH200, not yet verified on 8xH100
  • it does not claim a new leaderboard record
  • it is intended as a reproducible, baseline-adjacent result and an honest report of ablations that did and did not help

Key result

  • final_int8_zlib_roundtrip_exact val_bpb: 1.32078403
  • total artifact size: 14,327,135 bytes

Sweep context

Variant Exact final val_bpb
Clean baseline 1.32171904
FP16 embed passthrough 1.32078403
Sink4 1.32107101
MTP1 fixed 1.33126842
MTP2 fixed 1.32792538
FP16 embed + MTP2 fixed 1.32718519

Takeaway

In this budget, a small exporter-side precision change beat the more ambitious objective-side changes we tested. Our next step is to validate this baseline-adjacent fp16 embedding path on official 8xH100 hardware rather than continuing to add training complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant