Skip to content

Add SP4096 qk45 budget reproduction candidate#2161

Open
adiprathapa wants to merge 1 commit into
openai:mainfrom
adiprathapa:codex/sp4096-qk45-budget
Open

Add SP4096 qk45 budget reproduction candidate#2161
adiprathapa wants to merge 1 commit into
openai:mainfrom
adiprathapa:codex/sp4096-qk45-budget

Conversation

@adiprathapa
Copy link
Copy Markdown

Summary

Adds a non-record SP4096 budget reproduction/iteration based on Kevin Clark's 2026-04-01 SP4096 record. The run uses a 1xH100 RunPod budget setup with 86 SP4096 train shards and a 3600 second cap.

Best valid result:

  • val_bpb: 1.10743376 sliding-window eval
  • total artifact bytes: 15,987,195
  • seed: 42
  • QK_GAIN_INIT=4.5

Notes

This is submitted as non-record because it does not meet the official 10 minute / 8xH100 compute constraint. The folder includes run logs and notes, but not the ignored .ptz artifacts.

Validation

  • python3 -m py_compile records/track_non_record_16mb/2026-05-07_sp4096_budget_repro/train_gpt.py
  • python3 -m json.tool records/track_non_record_16mb/2026-05-07_sp4096_budget_repro/submission.json
  • RunPod 1xH100 log included under runpod_results/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant