TrigramHash + XSA + TTT on thwu1 SOTA stack

sahiee-dev · 2026-03-22T15:12:45Z

Base: 10L Int5 MLP + BigramHash(10240) + SWA(0.4) + WD=0.04 by thwu1 —> 1.1428 val_bpb

Novel additions

TrigramHash(20480, dim=32)
Adds trigram (t-2, t-1, t) embedding signal alongside BigramHash.
Captures 3 token phrase patterns and morphological structure that
bigrams cannot represent. Budget: bigram reduced 10240→4096 to fund
trigram within 16MB. Zero runtime overhead pure embedding table lookup.

XSA — Exclusive Self Attention (last 4 layers)
Removes self value bias from attention output via orthogonal projection.
GQA aware implementation from PR #287, adapted for our transposed layout.
Zero parameter cost. Enables last 4 layers to attend more purely to
context rather than self reinforcing their own value vectors.

TTT — Test Time Training
3 epoch SGD (lr=0.002, momentum=0.9) over validation tokens before
eval, with bottom 6 layers frozen. Runs identically on all 8 ranks —
deterministic in order SGD on identical data, no broadcast needed.
Original weights restored after evaluation. Budget: ~47 seconds.
QAT was evaluated and dropped — confirmed negative result (PR #360),
8% throughput penalty outweighs regularization within 600s budget.

Artifact: ~15.64MB | Status: Draft — H100 validation pending
Smoke tests passing. 3 seed results and ablation table to be added
before marking ready for review.

Dropped QAT: 8% throughput penalty kills 600s budget (per PR openai#360). Three novel additions on thwu1 SOTA base (1.1428): - TrigramHash(20480, dim=32): trigram embedding signal, bigram 10240->4096 - XSA: orthogonal self-value removal, last 4 layers, from PR openai#287 - TTT: 3-epoch SGD on val tokens before eval, all ranks, ~47s budget Fixed rank bug: TTT runs on all 8 ranks independently (not rank 0 only) Artifact: ~15.64MB. Smoke tests passing. H100 validation pending.

sahiee-dev changed the title ~~TrigramHash + XSA + TTT on thwu1 SOTA stack — val_bpb pending H100~~ TrigramHash + XSA + TTT on thwu1 SOTA stack - val_bpb pending H100 Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TrigramHash + XSA + TTT on thwu1 SOTA stack - val_bpb pending H100#430

TrigramHash + XSA + TTT on thwu1 SOTA stack - val_bpb pending H100#430
sahiee-dev wants to merge 1 commit intoopenai:mainfrom
sahiee-dev:main

sahiee-dev commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sahiee-dev commented Mar 22, 2026

TrigramHash + XSA + TTT on thwu1 SOTA stack

Novel additions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant