All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.3.8 (2026-04-04)
0.3.7 (2026-04-04)
- bump pygments from 2.19.2 to 2.20.0 — fixes catastrophic backtracking in archetype, devicetree, and Lua lexers (#36) (15859f1)
0.3.6 (2026-04-04)
- add explicit
permissions: contents: readto CI, nightly, and pre-release workflows to enforce least-privilege on GITHUB_TOKEN (#34) (0f8bfb0)
- replace sleep-based sync with polling in flaky follow test (#34)
0.3.5 (2026-04-04)
- tests: replace brittle mock-heavy tests with behavioral tests and shared factories (#32) (9af6078)
FakeServiceLayerreplaces 10-deep@patchstacks inTestRunUp- Consolidate ~50 duplicate helpers into
tests/factories.py - AAA comments (
# Arrange,# Act,# Assert) across 17 test files make lintnow includes pyright for shift-left type checking- Net: -577 lines, 1,481 tests pass, 73% reduction in
@patchusage
0.3.4 (2026-04-03)
0.3.3 (2026-04-02)
0.3.2 (2026-04-02)
- correct inverted sign on benchmark delta percentage display (ae7d209)
- pass explicit token to release-please action (96c9c68)
0.1.0 - 2025-03-31
Initial release of mlx-stack — a CLI control plane for local LLM inference infrastructure on Apple Silicon.
- Hardware detection (
mlx-stack profile) — Detects Apple Silicon chip model (M1–M5, including Pro/Max/Ultra variants), GPU core count, unified memory, and memory bandwidth from a lookup table of 17 known variants. Writes profile to~/.mlx-stack/profile.json. Handles unknown chips with bandwidth estimation and rejects non-Apple-Silicon hardware gracefully. - Model catalog — 15 curated model entries across 5 families (Qwen 3.5, Gemma 3, DeepSeek R1, Nemotron, Qwen 3 / Llama 3.3) with per-hardware benchmark data, quality scores, tool-calling metadata, and disk size information.
- Configuration management (
mlx-stack config) — Persistent key-value configuration withset,get,list, andresetsubcommands. Supports 8 config keys with type validation, default values, and API key masking. Stored in~/.mlx-stack/config.yaml. - Dependency management — Auto-detection and installation of vllm-mlx and litellm as pinned uv tools. Version mismatch warnings. Triggered only by commands that need these tools.
- Data directory auto-creation —
~/.mlx-stack/created automatically on first use with appropriate subdirectories.
- Scoring engine — Weighted composite scoring across speed, quality, tool-calling, and memory efficiency dimensions. Two intents:
balancedandagent-fleet. Log-scaled gen_tps normalization. Memory budget filtering (default 40% of unified memory). - Recommendation command (
mlx-stack recommend) — Recommends an optimal model stack based on hardware profile and intent. Supports--budget,--intent, and--show-allflags. Cloud fallback tier shown when OpenRouter key is configured. Display-only — no files written. - Config generation (
mlx-stack init) — Generates stack definition (~/.mlx-stack/stacks/default.yaml) and LiteLLM proxy config (~/.mlx-stack/litellm.yaml). Supports--accept-defaults,--intent,--add/--remove, and--force. Includes port collision detection, overwrite protection, and missing model warnings. - Model listing (
mlx-stack models) — Lists locally downloaded models with disk size, quantization, and active stack indicator.--catalogshows full catalog with hardware-specific benchmark data. Supports--family,--tag, and--tool-callingfilters.
- Process management — PID file tracking, fcntl-based lockfile for concurrent invocation prevention, HTTP health checks with exponential backoff, and SIGTERM/SIGKILL graceful shutdown.
- Start services (
mlx-stack up) — Starts vllm-mlx instances and LiteLLM proxy from stack definition. Sequential startup (largest model first). Supports--dry-runand--tier. Handles port conflicts, missing models, partial failures, stale PID cleanup, and auto-installs dependencies. - Stop services (
mlx-stack down) — Stops all managed processes (LiteLLM first, then model servers in reverse order). SIGTERM with 10s grace period, SIGKILL escalation. Supports--tierfor selective stop. - Health monitoring (
mlx-stack status) — 5-state health reporting: healthy, degraded, down, crashed, stopped. Formatted table with uptime.--jsonfor machine-parseable output. Read-only — does not modify files.
- Model download (
mlx-stack pull) — Downloads models from HuggingFace, preferring mlx-community pre-converted weights with fallback to mlx_lm conversion. Disk space checking, progress display, duplicate detection, inventory tracking. Supports--quant,--bench, and--force. - Benchmarking (
mlx-stack bench) — Runs 3-iteration benchmarks (1024-token prompt, 100-token generation). Compares against catalog thresholds (PASS/WARN/FAIL). Supports running tiers and local models (temporary vllm-mlx instance).--savepersists results for scoring. Tool-calling benchmark for capable models.
- Log rotation — Copytruncate-based rotation for service logs. Configurable size threshold (
log-max-size-mb, default 50MB) and retention count (log-max-files, default 5). Gzip compression with sequential numbering. - Logs command (
mlx-stack logs) — View service logs with--follow,--tail,--servicefiltering.--rotatefor on-demand rotation.--allfor current and archived log viewing. - Watchdog (
mlx-stack watch) — Health monitor with auto-restart of crashed services. Configurable polling interval, flap detection, exponential backoff on restart delay.--daemonfor background operation. Triggers log rotation during polling. - launchd integration (
mlx-stack install/mlx-stack uninstall) — Generates macOS LaunchAgent plist for the watchdog. RunAtLoad + KeepAlive.--statusflag for agent state reporting.
- Rich-formatted output — All commands use Rich tables and styled output.
- Typo suggestions — Near-miss command names show "Did you mean ...?" suggestions.
- Grouped help —
mlx-stack --helporganizes commands by category (Setup, Model Management, Lifecycle, Diagnostics, Ops). - Version flag —
mlx-stack --versionprints the current version.
- GitHub Actions CI — macOS runner with lint (ruff), typecheck (pyright), and test (pytest) on push and PR.
- Comprehensive test suite — 1300+ unit tests covering core modules and CLI commands with mocked external calls.
N/A — initial release.
N/A — initial release.