extract-index and convert safetensors-to-vindex when there's no gguf gives broadcast array from [2560] to [2048] panic.
~/git/larql-fix-gguf-shape
❯ larql extract-index google/ -o gemma3-4b.vindex --f16
Extracting: google/ → gemma3-4b.vindex (level=browse, dtype=f16)
── loading ──
Streaming mode: 2 safetensors shards (mmap'd, not loaded)
loading: 0.0s
── gate_vectors ──
gate L 0: 0.2s
gate L 1: 0.1s
gate L 2: 0.1s
gate L 3: 0.1s
gate L 4: 0.1s
gate L 5: 0.1s
gate L 6: 0.1s
gate L 7: 0.1s
gate L 8: 0.1s
gate L 9: 0.2s
gate L10: 0.1s
gate L11: 0.2s
gate L12: 0.1s
gate L13: 0.1s
gate L14: 0.1s
gate L15: 0.1s
gate L16: 0.1s
gate L17: 0.1s
gate L18: 0.1s
gate L19: 0.1s
gate L20: 0.1s
gate L21: 0.1s
gate L22: 0.1s
gate L23: 0.1s
gate L24: 0.1s
gate L25: 0.1s
gate L26: 0.1s
gate L27: 0.1s
gate L28: 0.1s
gate L29: 0.1s
gate L30: 0.1s
gate L31: 0.1s
gate_vectors: 4.6s
── embeddings ──
embeddings: 3.8s
── down_meta ──
thread 'main' (767580) panicked at /home/deric/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/ndarray-0.16.1/src/lib.rs:1551:13:
ndarray: could not broadcast array from shape: [2560] to: [2048]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
OS: CachyOS x86_64
Host: 82JQ (Legion 5 Pro 16ACH6H)
Kernel: Linux 6.19.9-1-cachyos
Shell: fish 4.5.0
CPU: AMD Ryzen 7 5800H (16) @ 3.20 GHz
GPU: NVIDIA GeForce RTX 3070 Mobile / Max-Q [Discrete]
Memory: 5.67 GiB / 31.19 GiB (18%)
extract-indexandconvert safetensors-to-vindexwhen there's no gguf gives broadcast array from [2560] to [2048] panic.