Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715)#418
Open
yashverms wants to merge 1 commit intoopenai:mainfrom
Open
Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715)#418yashverms wants to merge 1 commit intoopenai:mainfrom
yashverms wants to merge 1 commit intoopenai:mainfrom
Conversation
…igramHash Three novel techniques on top of PR openai#315's stack: 1. DiffTransformer V2 attention (last 2 layers) for noise-cancelled attention 2. NorMuon optimizer with per-neuron row normalization 3. TrigramHash + context-aware n-gram gating 11L/512d, XSA6, Partial RoPE, int6+zstd-22. Post-quant val_bpb=1.1715 (without sliding window eval). 8xH100, 600s, 15.59MB artifact. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Non-record submission exploring 3 novel techniques not yet attempted in any merged or open PR, built on the proven PR #315 technique stack.
Novel Contributions
Architecture
Results
Gap Analysis
Score is ~0.029 bpb behind merged SOTA (1.1428). Key factors: no sliding window eval (~0.03 bpb), small BigramHash (2048 vs 10240), NorMuon momentum=0.95 vs proven 0.99, SDPA fallback instead of Flash Attention 3. The submitted code has these issues fixed (sliding window re-enabled, correct 16MB decimal limit).
Why This Is Interesting
Test plan
train_gpt.pyMade with Cursor