Skip to content

WIP: Shared-transformer + warmdown-aligned training (not final submis…#420

Open
leofeasby wants to merge 1 commit intoopenai:mainfrom
leofeasby:leo-shared-mlp5-warmdown
Open

WIP: Shared-transformer + warmdown-aligned training (not final submis…#420
leofeasby wants to merge 1 commit intoopenai:mainfrom
leofeasby:leo-shared-mlp5-warmdown

Conversation

@leofeasby
Copy link

Shared-transformer architecture with MLP×5 expansion, bigram hash embedding
(4096),
weight decay tuning, SWA, and warmdown-aligned training schedule.

Key finding: performance is dominated by a late-stage phase transition driven
by LR
warmdown. Aligning WARMDOWN_ITERS with the wallclock budget moves this
transition
earlier and converts wasted steps into usable convergence.

Results on 8×H100 (within 10-min budget):

  • ~1.26 bpb int8 roundtrip
  • ~1.30 bpb TTT LoRA

Work in progress — not a final submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant