Commit 5e38eaa
committed
transformerless_lm: lazy_K_active subsampling — real compute savings, scale-bound
Added lazy_K_active parameter to FibGenLinear and SubsimLM. At each
training step, samples K_active < K Fibonacci frequencies per axis
(always including the substrate-tier-1 anchor at index 0) and uses
ONLY those for the compressed-forward computation. Inner mixing
matmul shrinks from K^2 to K_active^2; projections shrink from K to
K_active. At eval, all K frequencies are used.
This is REAL compute savings (not just gradient masking like the
tier_lr_scale variant), but the per-step Python overhead from the
indexing + random-subset sampling currently masks the savings at
small d_model:
d=128, K=32, 200 training steps:
baseline (K_active=0): val=2.98 wall=7.01s
K_active=16: val=9.33 wall=7.66s (-)
K_active=8: val=18.97 wall=7.32s
K_active=4: val=22.34 wall=6.75s (4% faster)
Two issues at this scale:
(a) Wall-clock savings are absorbed by PyTorch indexing overhead.
The math says K_active=8 should be 4x faster on inner matmul,
but matmul is a small fraction of total cost at d=128.
(b) Quality breaks because each step picks a different random
subset -- the model is effectively training a different small
subnetwork every step, preventing any one component from
accumulating signal.
The user expected "significantly faster". Honest assessment: at
d=128 we cannot see this; matmul cost is dominated by overhead.
At d=1024+ (LLM scale) the K^2 -> K_active^2 savings would manifest
as real wall-clock wins because matmul FLOPs dominate.
To deliver significant speed AT this scale we should compose
Stochastic Fibonacci Depth (block-skipping) with lazy-loading data.
The K-subsampling validates at larger d_model.1 parent 966f4a6 commit 5e38eaa
2 files changed
Lines changed: 57 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
| 91 | + | |
| 92 | + | |
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| |||
97 | 98 | | |
98 | 99 | | |
99 | 100 | | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
100 | 105 | | |
101 | 106 | | |
102 | 107 | | |
| |||
199 | 204 | | |
200 | 205 | | |
201 | 206 | | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
202 | 222 | | |
203 | 223 | | |
204 | 224 | | |
| |||
235 | 255 | | |
236 | 256 | | |
237 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
238 | 278 | | |
239 | 279 | | |
240 | 280 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
| 54 | + | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
58 | | - | |
| 59 | + | |
| 60 | + | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
| |||
85 | 87 | | |
86 | 88 | | |
87 | 89 | | |
88 | | - | |
| 90 | + | |
| 91 | + | |
89 | 92 | | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
95 | 99 | | |
96 | 100 | | |
97 | 101 | | |
| |||
114 | 118 | | |
115 | 119 | | |
116 | 120 | | |
117 | | - | |
| 121 | + | |
| 122 | + | |
118 | 123 | | |
119 | 124 | | |
120 | 125 | | |
| |||
123 | 128 | | |
124 | 129 | | |
125 | 130 | | |
126 | | - | |
| 131 | + | |
| 132 | + | |
127 | 133 | | |
128 | 134 | | |
129 | 135 | | |
| |||
0 commit comments