OMC 6-seed L0-vs-L1: result is inconclusive at this scale

RandomCoder-lab · claude · RandomCoder-lab · commit 4b9260a0ecbc · 2026-05-17T01:02:54.000-05:00
Wins: 2/6 for L1; means essentially tied (L0=2.536, L1=2.546, +0.4%).
Per-seed delta range: -0.70 to +0.40 — very high variance.

Honest interpretation: at 186-char corpus + 300 steps + d_model=16,
OMC's slower pure-OMC tape arithmetic produces noisier training
endpoints than PyTorch's BLAS-backed loop. The L1-vs-L0
architectural difference doesn't dominate at this signal-to-noise.

What this DOESN'T undermine: PyTorch results at TinyShakespeare scale
(L1 wins -8.0% with proper train/val split, 3/3 seeds, 1.1MB corpus,
1500 steps) — that's the load-bearing finding.

What this honestly tells us: pure-OMC training at this scale is below
the threshold where small architectural deltas show up. Would need
either (a) more OMC training compute or (b) the JIT to make OMC
training fast enough to run at the regime where signals emerge.

Co-Authored-By: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/experiments/prometheus_parity/omc_6seed_L0L1.log b/experiments/prometheus_parity/omc_6seed_L0L1.log
@@ -0,0 +1,16 @@
+exit=0
+seed 2026  L0=2.4554547332275325  L1=2.8601057401428522  delta=0.40465100691531974  L0 better
+seed 1  L0=2.3375681166229616  L1=2.4643159044735095  delta=0.12674778785054786  L0 better
+seed 99  L0=2.675901711337519  L1=2.5108752805726895  delta=-0.1650264307648297  L1 better
+
+=== Cross-runtime verdict ===
+L0 params: 14  L1 params: 13
+L0 mean: 2.5357131918032105
+L1 mean: 2.545723812808602
+L1 vs L0: 0.39478522404470345%   wins: 2/6
+
+[CROSS-RUNTIME WIN] OMC tape produces the same L1-beats-L0 result
+                    as PyTorch. The substrate-K finding holds across:
+                      - OMC tape autograd
+                      - PyTorch torch.autograd
+                    Same architecture, same direction. Real result.