add pr2069 best 8xh100 submission package#2125
Open
tenet-diver wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Post-deadline negative rerun of PR #2069 on 8xH100
This PR preserves an 8xH100 target-hardware rerun of the same candidate I submitted in PR #2069:
#2069
PR #2069 packaged the best completed deadline-search result from a 4xH100 machine as a non-record / unlimited-compute 16MB submission because I did not get access to an 8xH100 box before the deadline. I am opening this follow-up after the deadline because I finally got access to an 8xH100 machine and wanted to preserve the direct reproduction evidence for the same candidate, including the fact that the 8xH100 rerun did not beat the leaderboard baseline.
Candidate
best4x_ttt_disabled_qk5258x NVIDIA H100 80GB HBM34xH100to8xH100Background
The original deadline search promoted the strongest completed 1xH100 candidate into a 4xH100 batch. The best completed promoted candidate was:
best4x_ttt_disabled_qk5251.26066159This PR reruns that same candidate on the intended 8xH100 hardware.
Configuration
The rerun uses the same candidate configuration:
CANDIDATE_IMPL=autoregressive_gptQK_GAIN_INIT=5.25TTT_ENABLED=0TRAIN_BATCH_TOKENS=2097152VOCAB_SIZE=1024MAX_WALLCLOCK_SECONDS=600Results
1.234855832.085002351584331015738419104891702.6561337records/track_10min_16mb/2026-05-01_pr2069-best-8xh100_20260501T153233ZLeaderboard Context
This is a negative result relative to the current
10min_16mbleaderboard. The upstream README lists the naive baseline at1.2244, so this rerun is worse by0.01045583bpb. It is included only to document the target-hardware reproduction attempt for PR #2069.Validation
python3 -m json.tool records/track_10min_16mb/2026-05-01_pr2069-best-8xh100_20260501T153233Z/submission.json >/dev/nullpython3 -m py_compile records/track_10min_16mb/2026-05-01_pr2069-best-8xh100_20260501T153233Z/train_gpt.pyCompliance Notes
This is not meant to retroactively change the deadline status of PR #2069.
The package is under the 16MB artifact limit, but the single rerun took
702.656sand includes one seed. The generatedsubmission.jsontherefore records:artifact_under_16mb: truetrain_under_600s: falsethree_seeds: falseI am submitting this as post-deadline reproduction evidence for the same candidate, not as a compliant 10-minute record claim or leaderboard improvement.