Add non-record 1x5090 autoresearch submission with two-campaign analysis#432
Open
jadechip wants to merge 2 commits intoopenai:mainfrom
Open
Add non-record 1x5090 autoresearch submission with two-campaign analysis#432jadechip wants to merge 2 commits intoopenai:mainfrom
jadechip wants to merge 2 commits intoopenai:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Non-Record Submission: Two 5090 Autoresearch Campaigns
This submission documents a git-native autoresearch loop on a single RTX 5090. The repo runs one fixed-budget experiment, commits the code change, keeps real wins, reverts losers, and uses the git history itself as search memory.
Across two sweep tables and 188 measured runs with
val_bpb, the search moved from 1.733958794 on the first baseline to 1.535119154 at the end of the second campaign, with a best numeric excursion to 1.528664372.The packaged
train_gpt.pysnapshot is the stable accepted-best branch point from commit905cc4d/ runar5090-20260321-231130:val_bpb: 1.529478563The README focuses on the larger search story because that is the real contribution here: many committed ideas, many reverted ideas, and a clear model-design trajectory rather than a single isolated run.
Campaign At A Glance
1.7339587941.5637426951.5722033271.5701376651.5286643721.535119154q_proj, short-to-full context warmup, attention-QAT-off, smaller update batches, targeted precision spendsOverall headline:
1.733958794 -> 1.535119154(-0.198839640)1.733958794 -> 1.528664372(-0.205294422)8.485920M params / 5.679745 MB artifact -> 17.467168M params / 12.007446 MB artifact22.351912Mparams and17.560314 MBartifacts, which was useful to explore but not the final answerVisual Summary
Full score trajectory across both campaigns:
Where the search wandered in size/score space:
What The Git History Actually Tried
The interesting part of this repo is not “make the model bigger.” The committed search history shows a more specific progression.
1. Stop Repeating The Carrier
The early wins came from reducing repeated/shared compute and spending that budget on unique late blocks. The search moved from the starting recurrent layout toward a compact carrier plus deeper unique tail, and that alone pulled the score from
1.733958794down into the low1.67xrange.2. Spend Bytes On Unique Tail Capacity
The next wave of wins came from turning the model into a stemless or nearly stemless compact line and using MLP-only int6 export to reclaim artifact budget. That reclaimed room was repeatedly spent on stronger tail MLPs. The winning direction was not more depth forever. It was fewer repeated blocks and fatter useful tail blocks.
3. Use Low-Rank Q To Buy Compute
Campaign 2 is where the repo started acting less like architecture roulette and more like a disciplined compute allocator. The biggest improvements came from:
q_projon most blocks4 x 30720to3 x 30720and then2 x 30720That sequence is what drove the second campaign from
1.570137665down through the1.53xband.4. Spend Precision Like It Hurts
The late-stage precision lesson was very consistent: broad float bundles usually lost, but narrow targeted precision spends could help.
What worked:
q_projonly on the final tail blockWhat usually lost:
Things That Mostly Lost Anyway
The git history is full of committed negative results, which is part of why this repo is useful to read. The recurring losers were:
d_modelincreasesmlp_mult=3conversionsseq_len=960or1024on this fixed 5090 budgetnum_kv_heads=8In other words, the search kept rediscovering the same lesson: under a tight wallclock budget, compute allocation mattered more than making every component richer.
What Is Included
train_gpt.pythe packaged accepted-best training script from commit
905cc4d, adjusted only so it runs from this records folder and countstrain_gpt.pyin artifact bytessubmission.jsonmetadata for that packaged accepted-best snapshot
results_stage1.tsvthe first sweep table, including the baseline at
1.733958794results_stage2.tsvthe second sweep table, ending at
1.535119154campaign_val_bpb.svgchart generated from both sweep tables
size_vs_score.svgsize/score chart generated from both sweep tables
Raw per-run stdout logs were not preserved in this clone. The reliable search record here is the pair of structured sweep tables plus the commit history and notes in the source repo.
Source repo: