EBLS Learned Sharing (10min/16MB) by Robby955 · Pull Request #433 · openai/parameter-golf

Robby955 · 2026-03-22T15:59:58Z

Summary

Val BPB: 1.3441 (post-quant) / 1.2105 (pre-quant)
Artifact: 16,224,826 bytes (int6+zstd-22)
Compute: 8×H100 SXM, 4572 steps, 10 min wallclock

Empirical Bayes Layer Sharing (EBLS): 3 shared transformer blocks × 3 virtual layers = 9 effective layers, with per-virtual-layer rank-8 LoRA deviations gated by learned shrinkage factors γ_i = σ(logit_i).

Key finding

The model discovers the optimal sharing pattern from data: MLP gammas converge to 0 (fully shared) across all virtual layers, while attention shows minimal specialization only in early layers. This provides empirical evidence for architectural choices that other submissions make by intuition.

Virtual Layer	Attn γ	MLP γ
0	0.0035	0.0012
1	0.0013	0.0000
2	0.0012	0.0000
3–8	0.0000	0.0000

Architecture

1024-dim, 16Q/4KV heads (GQA), 3× MLP with ReLU²
SmearGate, BigramHash(10240), U-Net skip connections
Int6 STE QAT + zstd-22, Muon+Adam, SWA

Technical writeup

Full method description with James-Stein statistical foundations: https://github.com/Robby955/parameter-golf-ebls

🤖 Generated with Claude Code

Empirical Bayes Layer Sharing: 3 shared blocks × 3 virtual layers with per-virtual-layer LoRA deviations gated by learned shrinkage gammas. Val BPB: 1.3441 (post-quant) / 1.2105 (pre-quant) Artifact: 16,224,826 bytes | 8×H100 SXM, 4572 steps, 10 min Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EBLS Learned Sharing (10min/16MB)#433

EBLS Learned Sharing (10min/16MB)#433
Robby955 wants to merge 1 commit intoopenai:mainfrom
Robby955:submission/ebls-learned-sharing

Robby955 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Robby955 commented Mar 22, 2026

Summary

Key finding

Architecture

Technical writeup

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant