Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions records/track_10min_16mb/2026-03-22_EBLS_LearnedSharing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# EBLS Learned Sharing

**Track**: 10 min / 16 MB
**Val BPB (post-quant)**: 1.3441
**Val BPB (pre-quant)**: 1.2105
**Artifact size**: 16,224,826 bytes
**Date**: 2026-03-22

## Approach

Empirical Bayes Layer Sharing (EBLS): 3 shared transformer blocks, each applied 3× for 9 effective layers. Per-virtual-layer rank-8 LoRA deviations gated by learned shrinkage factors γ_i = σ(logit_i). Shrinkage regularization encourages weight sharing unless deviation helps.

## Architecture

- **Dimension**: 1024, **Heads**: 16Q / 4KV (GQA)
- **Layers**: 3 shared blocks × 3 = 9 virtual layers
- **LoRA rank**: 8 (attention + MLP)
- **MLP**: 3× expansion with ReLU²
- **Features**: SmearGate, BigramHash(10240), U-Net skips
- **Optimizer**: Muon (WD=0.04) + Adam (LoRA, embeddings, scalars)
- **Quantization**: Int6 STE QAT + zstd-22

## Key Finding

The model discovers optimal sharing automatically:
- MLP gammas → 0.0000 across all virtual layers (fully shared)
- Attention gammas → 0.0035 for layer 0, ~0 otherwise (minimal specialization)

## Reproduce

```bash
bash eval/eval.sh
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"author": "Robby Sneiderman",
"github_id": "Robby955",
"name": "EBLS Learned Sharing",
"blurb": "Empirical Bayes Layer Sharing: 3 shared blocks × 3 virtual layers with per-virtual-layer LoRA deviations gated by learned shrinkage gammas. Model discovers MLP weights should be fully shared (gamma→0) while attention needs minimal specialization. 1024-dim, int6+zstd-22, Muon+Adam.",
"date": "2026-03-22T00:00:00Z",
"val_loss": 2.2694,
"val_bpb": 1.3441,
"bytes_total": 16224826,
"bytes_code": 62684
}
Loading