Skip to content

fix: support quantized MLX safetensors in extract-index#27

Draft
awnion wants to merge 1 commit intochrishayuk:mainfrom
awnion:fix/mlx-affine-extract-index
Draft

fix: support quantized MLX safetensors in extract-index#27
awnion wants to merge 1 commit intochrishayuk:mainfrom
awnion:fix/mlx-affine-extract-index

Conversation

@awnion
Copy link
Copy Markdown

@awnion awnion commented Apr 21, 2026

Unsloth Gemma 4 MLX checkpoints store packed U32 weights with separate scales and biases, so extraction skipped embeddings and failed on Apple Silicon. Dequantize those tensors during loading and streaming extraction so quantized MLX models can build vindexes.

E.i. I tried this

RUST_BACKTRACE=full cargo run -r --bin larql -- extract-index unsloth/gemma-4-E4B-it-UD-MLX-4bit -o gemma4-e4b.vindex --level inference --f16

And it didn't work :)

Unsloth Gemma 4 MLX checkpoints store packed U32 weights with separate scales and biases, so extraction skipped embeddings and failed on Apple Silicon. Dequantize those tensors during loading and streaming extraction so quantized MLX models can build vindexes.
@awnion awnion marked this pull request as draft April 21, 2026 11:53
@awnion
Copy link
Copy Markdown
Author

awnion commented Apr 21, 2026

For now I have something like this. I guess something is off :) Any suggesions?


❯ cargo run -r --bin larql -- repl
    Finished `release` profile [optimized] target(s) in 0.14s


   ╦   ╔═╗ ╦═╗ ╔═╗ ╦
   ║   ╠═╣ ╠╦╝ ║═╬╗║
   ╩═╝ ╩ ╩ ╩╚═ ╚═╝╚╩═╝
   Lazarus Query Language v0.1

larql> USE "gemma4-e4b.vindex";
Using: gemma4-e4b.vindex (42 layers, 430.1K features, model: unsloth/gemma-4-E4B-it-UD-MLX-4
larql> DESCRIBE "France";
France
  signal: clean (243 edges, max gate 28.4)
  Syntax (L0-15):
                 → adero                    9.7  L8
                 → printed                  8.5  L5
                 → worn                     7.9  L9
                 → Treat                    7.9  L13
                 → ayin                     7.7  L10
                 → iint                     7.7  L8
                 → exercise                 7.5  L10
                 → wirelessly               7.5  L5
                 → themed                   7.5  L13
                 → Satt                     7.2  L1
  Edges (L16-32):
                 → INSEE                   22.9  L30
                 → Philippines             22.2  L31
                 → itiva                   19.9  L32
                 → Olson                   19.7  L31
                 → glo                     11.8  L25
                 → manifiesta              11.7  L30
                 → demais                  10.2  L29
                 → Germany                  9.3  L32
                 → viên                     9.0  L30
                 → working                  8.8  L21
  Output (L33-41):
                 → French                  28.4  L34
                 → lichem                  25.7  L35
                 → German                  20.4  L33
                 → ophageal                19.9  L41
                 → India                   13.4  L33
larql> INFER "The capital of France is" TOP 3;
Predictions (walk FFN):
   1. mutable              (64.18%)
   2. ceral                (16.51%)
   3. ッケ                   (9.41%)
  16439ms

Inference trace (features that fired with attention):
  L 0:  F423   gate=+8.7  → res             [res, Peng, BN]
  L 0:  F2512  gate=-8.5  → coup            [coup, oga, gem]
  L 0:  F7229  gate=-8.3  → belly           [belly, 荒, hostel]
  L 1:  F5660  gate=+75.5  → odz             [odz, lés, activer]
  L 1:  F7661  gate=+65.2  → 싶은              [싶은, منى, 站]
  L 1:  F210   gate=+57.2  → are             [are, estavam, estén]
  L 2:  F3237  gate=+64.4  → ist             [ist, beträgt, promos]
  L 2:  F1408  gate=+62.3  → acl             [acl, wer, clothe]
  L 2:  F10207 gate=+61.7  → improv          [improv, 마찬가지, başlı]
  L 3:  F2297  gate=+41.2  → 今は              [今は, firing, accounts]
  L 3:  F7498  gate=+37.3  → swering         [swering, xiety, sefer]
  L 3:  F9979  gate=+35.9  → ЕНИЕ            [ЕНИЕ, নি, ];]
  L 4:  F2406  gate=+52.5  → scala           [scala, ス, ა]
  L 4:  F3838  gate=+42.0  → MgCl            [MgCl, Cola, 肥]
  L 4:  F5309  gate=+39.6  → vs              [vs, ]))., hus]
  L 5:  F5837  gate=+280.1  → と思いました          [と思いました, redir, bang]
  L 5:  F839   gate=+196.1  → ビッグ             [ビッグ, sizes, oids]
  L 5:  F3742  gate=+192.2  → hame            [hame, Readers, least]
  L 6:  F678   gate=+177.6  → NavigationView  [NavigationView, Rum, rê]
  L 6:  F778   gate=+109.2  → ご紹介             [ご紹介, そこ, ATIONAL]
  L 6:  F6145  gate=+98.1  → imidazol        [imidazol, dile, estimations]
  L 7:  F933   gate=+127.0  → ría             [ría, 너지, ometric]
  L 7:  F4284  gate=+88.3  → Tracks          [Tracks, tracks, Mirrors]
  L 7:  F8094  gate=+85.9  → temper          [temper, toa, مرات]
  L 8:  F5297  gate=+64.0  → profiles        [profiles, Playing, cái]
  L 8:  F7276  gate=+62.7  → Eventually      [Eventually, 씨, await]
  L 8:  F10153 gate=+54.8  → itte            [itte, abges, 춤]
  L 9:  F7000  gate=+116.4  → อน             [อน, ைகளையும, なのです]
  L 9:  F9579  gate=+88.2  → したもの            [したもの, 样的, API]
  L 9:  F4925  gate=+85.9  → nên             [nên, зра, ]
  L10:  F6072  gate=+175.1  → ট               [ট, वन, ofo]
  L10:  F4558  gate=+163.9  → ลอง             [ลอง, литься, rant]
  L10:  F2541  gate=+163.5  → rees            [rees, oriente, อะ]
  L11:  F832   gate=-107.3  → kaha            [kaha, portant, молодо]
  L11:  F6064  gate=+93.0  → up              [up, etan, 半分]
  L11:  F187   gate=+90.5  → واكب            [واكب, oban, 与其他]
  L12:  F9588  gate=-89.0  → shp             [shp, Material, misa]
  L12:  F7143  gate=-85.3  → 踊               [踊, electric, parat]
  L12:  F725   gate=-84.9  → optimistic      [optimistic, Weise, व]
  L13:  F8448  gate=+38.8  → alike           [alike, elseif, amazon]
  L13:  F6169  gate=+38.8  → MouseButton     [MouseButton, iml, gn]
  L13:  F834   gate=-38.4  → socket          [socket, disp, 까요]
  L14:  F5588  gate=-13.4  → アレンジ            [アレンジ, ملک, 博客]
  L14:  F7561  gate=+13.2  → しまい             [しまい, ammad, romb]
  L14:  F9834  gate=+13.0  → ###             [###, backs, Sears]
  L15:  F2385  gate=+10.2  → 표               [표, ثار, èse]
  L15:  F1427  gate=+7.8  → 的文章             [的文章, 잘, の記事]
  L15:  F9592  gate=+7.2  → Monster         [Monster, profile, Profile]
  L16:  F5313  gate=-5.5  → Spur            [Spur, Speaking, 凵]
  L16:  F3901  gate=-4.5  → repeal          [repeal, knowing, گرفته]
  L16:  F9584  gate=-3.9  → pellets         [pellets, , ComponentName]
  L17:  F2092  gate=-5.0  → dumps           [dumps, lr, kcal]
  L17:  F5005  gate=-4.7  → etur            [etur, exclusiva, プローチ]
  L17:  F3082  gate=-4.6  → 일까지             [일까지, ટર, icom]
  L18:  F7891  gate=+10.8  → oscope          [oscope, arle, khẩu]
  L18:  F5903  gate=-10.6  → dul             [dul, ዱ, fandom]
  L18:  F7971  gate=-8.8  → 습               [습, уве, ხ]
  L19:  F3126  gate=-15.5  → igkeits         [igkeits, indef, imper]
  L19:  F6237  gate=+15.0  → निल             [निल, 純正, пон]
  L19:  F731   gate=-14.4  → 애               [애, libre, DONE]
  L20:  F4333  gate=+28.7  → Crown           [Crown, crown, Crown]
  L20:  F2972  gate=+26.9  → CCCCCCCC        [CCCCCCCC, sm, VISA]
  L20:  F3970  gate=+24.3  → leau            [leau, большую, interi]
  L21:  F7695  gate=+60.8  → ikka            [ikka, ydia, aniline]
  L21:  F8553  gate=+58.3  → перево          [перево, 용, capitalize]
  L21:  F3934  gate=+56.4  → hark            [hark, ádz, """)]
  L22:  F4849  gate=+39.4  → ẫ               [ẫ, рей, phal]
  L22:  F1723  gate=-39.1  → éta             [éta, ージ, टो]
  L22:  F5544  gate=-37.8  → 관한              [관한, जिसन, 日まで]
  L23:  F2559  gate=+28.6  → 积               [积, Clo, ชม]
  L23:  F3201  gate=-28.3  → бры             [бры, cstdlib, breeding]
  L23:  F3171  gate=-27.9  → ogne            [ogne, ब, Quan]
  L24:  F5890  gate=-10.0  → ह              [ह, ós, 홉]
  L24:  F3744  gate=+8.8  → ophil           [ophil, hatched, bang]
  L24:  F5329  gate=-6.5  → lun             [lun, vet, ofe]
  L25:  F7044  gate=+4.0  → 计数              [计数, เร, matters]
  L25:  F403   gate=+3.0  → 忍               [忍, واء, फरम]
  L25:  F1844  gate=+3.0  → Lastly          [Lastly, Finally, Lastly]
  L26:  F3237  gate=-4.1  → éc              [éc, app, alt]
  L26:  F1970  gate=+3.3  → told            [told, Telling, vg]
  L26:  F9685  gate=-3.1  → fors            [fors, forever, forever]
  L27:  F813   gate=-3.7  → từng            [từng, dom, Zul]
  L27:  F9578  gate=+3.4  → गढ             [गढ, حد, 有效]
  L27:  F7877  gate=-3.3  → 那就是             [那就是, नह, とともに]
  L28:  F173   gate=-5.7  → upon            [upon, бя, upon]
  L28:  F135   gate=-4.5  → Flo             [Flo, Adel, flashes]
  L28:  F2876  gate=+4.3  → 〝               [〝, 数百, 时间内]
  L29:  F9969  gate=+4.1  → chacune         [chacune, justru, briefings]
  L29:  F553   gate=+4.0  → उप              [उप, ymes, ênt]
  L29:  F1798  gate=-3.3  → 이고              [이고, しかも, Artinya]
  L30:  F3088  gate=+5.9  → viên            [viên, 者, 者を]
  L30:  F2025  gate=-5.4  → your            [your, our, their]
  L30:  F7414  gate=+5.3  → وں              [وں, रील, тні]
  L31:  F9380  gate=-4.8  → wearing         [wearing, Wearing, wears]
  L31:  F9746  gate=+4.8  → the             [the, the, these]
  L31:  F2381  gate=-4.2  → eben            [eben, ไง, ไง]
  L32:  F3384  gate=+6.8  → antera          [antera, ariously, uster]
  L32:  F5255  gate=+6.3  → ings            [ings, ing, ies]
  L32:  F399   gate=+6.1  → Against         [Against, iyet, itto]
  L33:  F6097  gate=+5.7  → packaging       [packaging, Packaging, Packaging]
  L33:  F5647  gate=+5.5  → Continuing      [Continuing, continuing, Continuing]
  L33:  F2780  gate=+5.4  → iben            [iben, benzyl, )-\]
  L34:  F5590  gate=+6.2  → と思った            [と思った, }))$, ")).]
  L34:  F2229  gate=+6.1  → infl            [infl, repeats, variable]
  L34:  F7     gate=+6.1  → agnet           [agnet, coco, trit]
  L35:  F2025  gate=+4.4  → Existe          [Existe, DICT, டடும]
  L35:  F7871  gate=+4.1  → 했어요             [했어요, የሆነ, соотно]
  L35:  F5319  gate=-4.1  → де              [де, ディ, デ]
  L36:  F3177  gate=-5.3  → mission         [mission, missions, mission]
  L36:  F1557  gate=-3.9  → 最有              [最有, 中最, 物的]
  L36:  F446   gate=-3.5  → on              [on, ऑन, på]
  L37:  F7872  gate=-3.0  → yellow          [yellow, black, blue]
  L37:  F3113  gate=+2.8  → erz             [erz, 등장, ยว]
  L37:  F9715  gate=+2.8  → other           [other, other, Sag]
  L38:  F8386  gate=-3.1  → 드               [드, dr, डर]
  L38:  F1755  gate=-2.9  → きち              [きち, 아요, obarb]
  L38:  F3827  gate=-2.9  → show            [show, Show, show]
  L39:  F9739  gate=+3.6  → 토               [토, 토, то]
  L39:  F7189  gate=+3.2  → CC              [CC, CC, cc]
  L39:  F3860  gate=-2.9  → para            [para, para, Para]
  L40:  F10223 gate=-4.3  → rceil           [rceil, 一定的, unittest]
  L40:  F2801  gate=-3.4  → ,               [,, ،, ፣]
  L40:  F7710  gate=+3.3  → การ             [การ, LA, U]
  L41:  F1855  gate=-4.5  → F               [F, F, f]
  L41:  F4211  gate=-4.0  → hand            [hand, Hand, Hand]
  L41:  F4917  gate=-3.6  → af              [af, Aff, aff]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant