Commit 4b9260a
OMC 6-seed L0-vs-L1: result is inconclusive at this scale
Wins: 2/6 for L1; means essentially tied (L0=2.536, L1=2.546, +0.4%).
Per-seed delta range: -0.70 to +0.40 — very high variance.
Honest interpretation: at 186-char corpus + 300 steps + d_model=16,
OMC's slower pure-OMC tape arithmetic produces noisier training
endpoints than PyTorch's BLAS-backed loop. The L1-vs-L0
architectural difference doesn't dominate at this signal-to-noise.
What this DOESN'T undermine: PyTorch results at TinyShakespeare scale
(L1 wins -8.0% with proper train/val split, 3/3 seeds, 1.1MB corpus,
1500 steps) — that's the load-bearing finding.
What this honestly tells us: pure-OMC training at this scale is below
the threshold where small architectural deltas show up. Would need
either (a) more OMC training compute or (b) the JIT to make OMC
training fast enough to run at the regime where signals emerge.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent 908a57e commit 4b9260a
1 file changed
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
0 commit comments