WIP: Shared-transformer + warmdown-aligned training (not final submis… by leofeasby · Pull Request #420 · openai/parameter-golf

leofeasby · 2026-03-22T12:34:37Z

Shared-transformer architecture with MLP×5 expansion, bigram hash embedding
(4096),
weight decay tuning, SWA, and warmdown-aligned training schedule.

Key finding: performance is dominated by a late-stage phase transition driven
by LR
warmdown. Aligning WARMDOWN_ITERS with the wallclock budget moves this
transition
earlier and converts wasted steps into usable convergence.

Results on 8×H100 (within 10-min budget):

~1.26 bpb int8 roundtrip
~1.30 bpb TTT LoRA

Work in progress — not a final submission.

…sion)

WIP: Shared-transformer + warmdown-aligned training (not final submis…

836b5ce

…sion)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Shared-transformer + warmdown-aligned training (not final submis…#420

WIP: Shared-transformer + warmdown-aligned training (not final submis…#420
leofeasby wants to merge 1 commit intoopenai:mainfrom
leofeasby:leo-shared-mlp5-warmdown

leofeasby commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leofeasby commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant