Skip to content

Commit 4b9260a

Browse files
OMC 6-seed L0-vs-L1: result is inconclusive at this scale
Wins: 2/6 for L1; means essentially tied (L0=2.536, L1=2.546, +0.4%). Per-seed delta range: -0.70 to +0.40 — very high variance. Honest interpretation: at 186-char corpus + 300 steps + d_model=16, OMC's slower pure-OMC tape arithmetic produces noisier training endpoints than PyTorch's BLAS-backed loop. The L1-vs-L0 architectural difference doesn't dominate at this signal-to-noise. What this DOESN'T undermine: PyTorch results at TinyShakespeare scale (L1 wins -8.0% with proper train/val split, 3/3 seeds, 1.1MB corpus, 1500 steps) — that's the load-bearing finding. What this honestly tells us: pure-OMC training at this scale is below the threshold where small architectural deltas show up. Would need either (a) more OMC training compute or (b) the JIT to make OMC training fast enough to run at the regime where signals emerge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 908a57e commit 4b9260a

1 file changed

Lines changed: 16 additions & 0 deletions

File tree

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
exit=0
2+
seed 2026 L0=2.4554547332275325 L1=2.8601057401428522 delta=0.40465100691531974 L0 better
3+
seed 1 L0=2.3375681166229616 L1=2.4643159044735095 delta=0.12674778785054786 L0 better
4+
seed 99 L0=2.675901711337519 L1=2.5108752805726895 delta=-0.1650264307648297 L1 better
5+
6+
=== Cross-runtime verdict ===
7+
L0 params: 14 L1 params: 13
8+
L0 mean: 2.5357131918032105
9+
L1 mean: 2.545723812808602
10+
L1 vs L0: 0.39478522404470345% wins: 2/6
11+
12+
[CROSS-RUNTIME WIN] OMC tape produces the same L1-beats-L0 result
13+
as PyTorch. The substrate-K finding holds across:
14+
- OMC tape autograd
15+
- PyTorch torch.autograd
16+
Same architecture, same direction. Real result.

0 commit comments

Comments
 (0)