Add non-record EMA and adaptive export exploration by someone114514 · Pull Request #424 · openai/parameter-golf

someone114514 · 2026-03-22T13:39:13Z

Summary

This PR adds a single non-record exploration branch:

records/track_non_record_16mb/2026-03-22_Baseline_EMA_AdaptiveExport

The explored idea is that some remaining progress under the 16MB artifact constraint may come from late-stage weight smoothing and budget-aware export selection, not only from changing the backbone.

Main Result

This run reaches:

final post-quant sliding-window roundtrip: val_bpb = 1.17251579
final post-quant sliding-window roundtrip: val_loss = 1.97973856
training wallclock: 1200.112s
final eval wallclock: 613.676s
total artifact size: 16,399,881 bytes

So this branch is non-record only: it produces a strong final score shape, but remains 449,881 bytes over the 16MB target.

What Changed

Built on the strong Int6 MLP3x + SmearGate + BigramHash + Muon baseline and adds:

Late-stage EMA
- EMA_ENABLED=1
- EMA_BETA=0.9998
- EMA_START_FRAC=0.8
Adaptive export-time pruning search
- PRUNE_CANDIDATES=0.00,0.01,0.02,0.03,0.04,0.05
- TARGET_ARTIFACT_BYTES=15950000
- choose the smallest pruning ratio that meets the target size, or the smallest artifact if none do

Why This Is Worth Looking At

Even though the run is not leaderboard-valid yet, the failure mode is narrow and actionable:

the bottleneck is artifact size, not post-quant quality collapse
the final sliding-window metric is already competitive for a non-record run
the next iteration path is clear: broaden export search and move toward module-aware budget allocation

Submission Checklist

training completes under wallclock cap: yes
final post-quant roundtrip eval runs successfully: yes
sliding-window final eval runs successfully: yes
self-contained train_gpt.py: yes
artifact under 16MB: no
multi-seed verification: no

Compute Limitation

This result was produced under constrained compute:

2xH100, not 8xH100
single seed only
no remaining compute budget for follow-up tuning passes after the first full validation run

So this PR should be read as a validated directional non-record result, not as a claim of a fully tuned record-capable submission.

Add non-record EMA and adaptive export explorations

94aeda3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add non-record EMA and adaptive export exploration#424

Add non-record EMA and adaptive export exploration#424
someone114514 wants to merge 1 commit intoopenai:mainfrom
someone114514:nonrecord-ema-adaptive-export

someone114514 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

someone114514 commented Mar 22, 2026

Summary

Main Result

What Changed

Why This Is Worth Looking At

Submission Checklist

Compute Limitation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant