Skip to content

Add non-record 1xH100 auto precision budget experiment#431

Open
spatnala18 wants to merge 3 commits intoopenai:mainfrom
spatnala18:codex/nonrecord-autoprecision-1xh100-clean
Open

Add non-record 1xH100 auto precision budget experiment#431
spatnala18 wants to merge 3 commits intoopenai:mainfrom
spatnala18:codex/nonrecord-autoprecision-1xh100-clean

Conversation

@spatnala18
Copy link

Summary

This PR adds a non-record exploratory submission under records/track_non_record_16mb based on the 2026-03-20_10L_Int5MLP_MuonWD04_SWA50 recipe.

My idea is to replace fixed mixed-precision export exceptions with a calibration-driven precision allocator. After training, SWA, and pruning, the script evaluates a small set of candidate tensor promotions and greedily spends bytes where quantization appears most harmful, while staying under the 16,000,000-byte cap.

What’s included

  • records/track_non_record_16mb/2026-03-22_AutoPrecisionBudget_10L_1xH100/train_gpt.py
  • records/track_non_record_16mb/2026-03-22_AutoPrecisionBudget_10L_1xH100/train.log
  • records/track_non_record_16mb/2026-03-22_AutoPrecisionBudget_10L_1xH100/submission.json
  • records/track_non_record_16mb/2026-03-22_AutoPrecisionBudget_10L_1xH100/README.md

Run details

This is a cheap 1xH100 run with free credist on Modal platform, not a leaderboard attempt.

  • GPU: 1xH100
  • Train shards: 1
  • MAX_WALLCLOCK_SECONDS=60
  • ITERATIONS=150
  • AUTO_CALIBRATION_WINDOWS=16
  • FINAL_EVAL_MAX_WINDOWS=16

Final exact metric from train.log:

  • val_loss: 5.53668879
  • val_bpb: 3.08435975

Artifact size:

  • model bytes: 15,771,560
  • total submission bytes: 15,836,818

Selected promotions:

  • blocks.9.attn.c_k.weight
  • blocks.9.attn.c_v.weight

Why submit this

This is an in-progress non-record submission meant to document a concrete compression-aware direction rather than claim a strong score. The motivation is that current strong recipes already rely on hand-tuned mixed precision, and a sensitivity-driven allocator is a natural next step that may transfer better across future architecture changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant