Summary
Gemma 4 models (e.g., gemma-4-e2b-it) cannot be served because vllm-mlx 0.2.6's bundled mlx_lm (v0.31.1) does not include a gemma4 model architecture module.
Error
ModuleNotFoundError: No module named 'mlx_lm.models.gemma4'
Full traceback from mlx-stack logs:
File ".../mlx_lm/utils.py", line 176, in _get_classes
arch = importlib.import_module(f"mlx_lm.models.{model_type}")
ModuleNotFoundError: No module named 'mlx_lm.models.gemma4'
ValueError: Model type gemma4 not supported.
ERROR: Application startup failed. Exiting.
Catch-22 with #17
This creates a dependency deadlock:
Gemma 4 support is blocked until vllm-mlx ships a version that both fixes the 0.2.7 regression AND bundles a newer mlx_lm with gemma4 support.
Workaround
None currently — Gemma 4 models cannot be used. Substitute with Qwen 3 8B or similar for the fast tier.
Environment
Summary
Gemma 4 models (e.g.,
gemma-4-e2b-it) cannot be served because vllm-mlx 0.2.6's bundled mlx_lm (v0.31.1) does not include agemma4model architecture module.Error
Full traceback from
mlx-stack logs:Catch-22 with #17
This creates a dependency deadlock:
gemma4returninload_model_with_fallback()(vllm-mlx continuous batching disabled due to upstream bug (waybarrios/vllm-mlx#211) #17)Gemma 4 support is blocked until vllm-mlx ships a version that both fixes the 0.2.7 regression AND bundles a newer mlx_lm with
gemma4support.Workaround
None currently — Gemma 4 models cannot be used. Substitute with Qwen 3 8B or similar for the fast tier.
Environment