Skip to content

Add Gemma 4, LFM2, and Qwen3-Coder models to catalog #47

@weklund-agent

Description

@weklund-agent

Summary

The catalog is missing several high-performing models that are available on mlx-community and score well on agentic coding benchmarks (per mlx_transformers_benchmark):

Model Params Quality (M4 Pro 64GB) Gen tok/s RAM (int4)
Qwen3-Coder-30B-A3B 30B MoE (3B active) 65.7% 80 17.8 GB
Gemma 4 E2B-it 2.3B dense 65.3% 121 3.5 GB
LFM2-24B-A2B 24B MoE (2.3B active) 67.3% 117 14.2 GB

On M5 Max 128GB, Qwen3-Coder-30B-A3B is the top model at 75.5% quality / 129 tok/s.

HuggingFace sources

Qwen3-Coder-30B-A3B:

  • int4: mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
  • int8: mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit
  • bf16: Qwen/Qwen3-Coder-30B-A3B-Instruct
  • Capabilities: tool_calling (hermes), thinking (qwen3)

Gemma 4 E2B-it:

  • int4: mlx-community/gemma-4-e2b-it-4bit
  • int8: mlx-community/gemma-4-e2b-it-8bit
  • bf16: google/gemma-4-E2B-it
  • Note: blocked by mlx_lm gemma4 arch support — see related issue

LFM2-24B-A2B:

  • int4: mlx-community/LFM2-24B-A2B-4bit
  • int8: LiquidAI/LFM2-24B-A2B-MLX-8bit (published by LiquidAI, not mlx-community)
  • bf16: LiquidAI/LFM2-24B-A2B-MLX-bf16
  • Architecture: mamba2-hybrid

Notes

  • None of these models are gated on HuggingFace
  • Gemma 4 is currently blocked by vllm-mlx's bundled mlx_lm not supporting the gemma4 architecture (separate issue)
  • All three were manually added to catalog YAML during testing and work correctly (except Gemma 4 due to the mlx_lm issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions