Lightweight CLI to run and benchmark MLX-compatible Vision-Language Models (VLMs) on Apple Silicon. Produces HTML/Markdown/gallery Markdown/TSV/JSONL reports and captures performance metrics (tokens/sec, memory, timings).
Note
This tool runs MLX-format Vision-Language Models hosted on the Hugging Face Hub. By default it runs all models found in your local HF cache (use --models to specify explicit model IDs).
# Create the recommended conda environment and install runtime dependencies
bash src/tools/setup_conda_env.sh
conda activate mlx-vlm
make install
# Run all models against a folder (auto-selects most recent image) using the default built in prompt
python -m check_models --folder ~/Pictures/Processed
# Run them on a single image
python -m check_models --image /path/to/photo.jpgpython -m check_models --image ~/Pictures/sample.jpgExpected outputs (default location: src/output/):
- results.html
- results.md
- model_gallery.md
- results.tsv
- results.jsonl
- results.history.jsonl
- diagnostics.md (only when failures/harness issues are detected)
- check_models.log
- environment.log
- Batch run multiple models against an image.
- Standardized metrics + rich reports for easy comparison and qualitative review.
- Robust error handling and metadata-aware prompts.
- User Guide & CLI Reference: Full parameter reference, advanced usage, and troubleshooting.
- Contributor Guide: Setup, workflow, and quality standards.
make install # install runtime dependencies
make dev # install dev dependencies (dev + extras + torch)
make test # run test suite
make quality # run full gate (ruff + mypy + ty + pyrefly + pytest + shellcheck + markdownlint)Tip
Platform: macOS with Apple Silicon is required. Python: 3.13+ is recommended and tested.
- MLX: Array framework for Apple Silicon.
- MLX VLM: Underlying VLM runtime.
- Hugging Face Hub: Model source (look for
mlx-communityormlxtags).
License: See the LICENSE file.