`--ssd-streaming` broken for Flash q4-imatrix: forward pass hits model ranges not covered by mapped views (q2-imatrix works)

--ssd-streaming works on Flash q2-imatrix but fails on Flash q4-imatrix (M5 Max, latest main)

Same machine/flags, only the model differs.

q2-imatrix: runs fine (coherent output), prefill 3.54 / gen 13.19 t/s.
q4-imatrix: fails immediately —
  non-routed weights: 8.20 GiB, routed expert 13.50 MiB, cached 5902 (77.81 GiB)
  initial model map restricted to token embedding (0.99 GiB)
  Metal model range 0.01..3.39 GiB is not covered by mapped model views
  prompt processing failed: metal prefill failed

Same failure with --ssd-streaming-cold and --ssd-streaming-cache-experts 16GB.
The streaming path itself works (q2 runs), so this looks specific to the q4
tensor layout (Q4_K experts + F16 indexer/compressor/HC); the map never covers
the low-offset (0..3.4 GiB) non-routed F16 tensors for q4.
Is q4/Flash intended to be supported by --ssd-streaming, or is it PRO/q4-layout WIP?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--ssd-streaming` broken for Flash q4-imatrix: forward pass hits model ranges not covered by mapped views (q2-imatrix works) #341

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

--ssd-streaming broken for Flash q4-imatrix: forward pass hits model ranges not covered by mapped views (q2-imatrix works) #341

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`--ssd-streaming` broken for Flash q4-imatrix: forward pass hits model ranges not covered by mapped views (q2-imatrix works) #341