feat: CLI rework — ungate catalog, eliminate redundancies, add stack modification (#40) by weklund · Pull Request #41 · weklund/mlx-stack

weklund · 2026-04-04T22:20:11Z

Summary

Implements phases 1-4 of #40. Reworks the CLI to eliminate redundant commands, remove the catalog-as-whitelist pattern, and add the missing stack modification path.

Closes #40
Supersedes #27

Changes

Milestone 1: Ungate Pull

pull accepts any HuggingFace repo string (e.g., mlx-community/Phi-5-Mini-4bit) in addition to catalog IDs
Benchmark target resolution supports HF repos with sanitized service names
26 new tests

Milestone 2: Absorb Profile into Status

profile command deleted
status now shows hardware info (chip, GPU cores, memory, bandwidth) with estimate indicators
Hardware data included in status --json output
16 new tests, 28 removed (test_cli_profile.py deleted)

Milestone 3: Absorb Recommend into Models, Remove Init

recommend command deleted
init command deleted
models gains --recommend flag (scored tier recommendations with --budget, --intent, --show-all)
models gains --available flag (live HuggingFace discovery with scoring overlay)
--recommend, --available, --catalog are mutually exclusive
All stale command references across codebase updated
58 new tests, 115 removed (test_cli_recommend.py + test_cli_init.py deleted)

Milestone 4: Setup Modification Flags

setup --add MODEL — add a model to existing stack (HF repo or catalog ID, repeatable)
setup --as TIER — custom tier name for --add
setup --remove TIER — remove a tier (repeatable, prevents empty stack)
setup --model MODEL — single-model quick setup (skips wizard)
setup --no-pull — generate config without downloading
setup --no-start — generate config without starting services
32+ new tests

Impact

Metric	Before	After
Top-level commands	14	11
Catalog-gated commands	3	0
Stack modification paths	0	2

Validation

96/96 contract assertions passed
All 4 milestones passed scrutiny (code review) + user testing validation
pytest, pyright, ruff all clean

The workflow permissions fix resolved 4 CodeQL code-scanning alerts (actions/missing-workflow-permissions) and should be documented under a Security heading rather than just Bug Fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Dependabot PR #36 (pygments 2.19.2 → 2.20.0) fixes catastrophic backtracking CVEs but was missed by release-please because build(deps) is not a tracked conventional commit type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Allow `mlx-stack pull` to accept arbitrary HuggingFace repo strings (containing '/') in addition to catalog IDs. HF repos bypass catalog lookup and download directly. Catalog ID behavior is unchanged. - Add hf_repo_override param to pull_model() in core/pull.py - Route MODEL arg in cli/pull.py based on '/' detection - Update help text documenting both input types - Add 26 new tests covering HF repo acceptance, error handling, flag combinations, disk space checks, and inventory tracking Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Add a third resolution path in resolve_target() that detects HF repo strings (containing '/') and handles them: checks local models dir for already-downloaded copy, creates a minimal synthetic CatalogEntry for benchmarking, finds a free port, and starts a temp vllm-mlx instance. This enables both 'mlx-stack bench mlx-community/Model-4bit' as a standalone command and 'mlx-stack pull mlx-community/Model-4bit --bench' to resolve the target correctly. Also updates bench CLI help text to document HF repo support and fixes stale references to removed 'recommend' and 'init' commands in bench and pull CLI output. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…h '--' HF repo IDs (e.g. mlx-community/Model-4bit) were used directly as benchmark service names, creating invalid PID/log file paths since process.py uses service_name for filesystem operations. Now replaces '/' with '--' in _start_temp_instance() to produce path-safe names like bench-temp-mlx-community--Model-4bit. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Remove the profile CLI command entirely. Add hardware info section (chip, GPU cores, memory, bandwidth) to status output reading from profile.json via core/hardware.py load_profile(). Add hardware data to status --json under 'hardware' key. Handle missing/corrupt profile gracefully. Update no-stack messages to reference 'setup' instead of 'init'. Delete test_cli_profile.py and add 16 new hardware tests. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Add is_estimate to HardwareProfile.to_dict() so it persists in profile.json. Update load_profile() to read is_estimate from saved data (defaulting to False for legacy profiles). Include is_estimate in status --json hardware output. Add 9 new tests covering round-trip serialization, table display, and JSON output for estimated bandwidth. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Remove the standalone 'recommend' CLI command and absorb its functionality into 'models --recommend'. Add --budget, --intent, --show-all flags that work with --recommend, and --available flag that queries HF API. Make --recommend, --available, and --catalog mutually exclusive. Ensure --budget/--intent/--show-all require --recommend. Update display-only notice to reference 'setup' not 'init'. Delete test_cli_recommend.py and add comprehensive new tests in test_cli_models.py. Update all user-facing strings that referenced 'recommend' or 'init' as commands. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Delete cli/init.py and test_cli_init.py. Remove init import and registration from cli/main.py. Update _COMMAND_CATEGORIES to remove init from 'Setup & Configuration'. Update test_cli.py, test_cross_area.py, test_cli_up.py, and test_cli_watch.py to reflect init removal. Update launchd.py and recommend.py docstrings. core/stack_init.py preserved for internal use by setup. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…indings

- Remove src/mlx_stack/cli/recommend.py (deregistered from main.py but file remained) - Update cli/setup.py module docstring to remove old init flow reference - Update test_cross_area.py comments to reference setup instead of init command Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

… reference Replace the post-benchmark message in cli/pull.py that referenced 'models --recommend' with a generic message: 'Results saved. These will be used for model scoring.' Add VAL-CROSS-008 test to verify the output no longer references removed commands. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…ack modification Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

--model MODEL creates a single-tier 'standard' stack without the wizard. --no-pull skips model download in wizard, --model, and --add flows. --no-start skips stack startup. --no-pull implies --no-start. --model is mutually exclusive with --add/--remove. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…--model Two scrutiny fixes: 1. _resolve_model_source() now returns entry.id instead of entry.name for catalog ID resolution, so tiers[].model stores the canonical ID (e.g., 'qwen3.5-8b') rather than the display name. 2. _single_model_setup() no longer auto-starts services — it always prints 'mlx-stack up' guidance per the no-auto-restart convention. Tests updated to explicitly verify both model and source fields, and to assert start_stack is never called from --model path. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

🤖 I have created a release *beep* *boop* --- ## [0.3.8](v0.3.7...v0.3.8) (2026-04-04) ### Features * CLI rework — ungate catalog, eliminate redundancies, add stack modification ([#40](#40)) ([#41](#41)) ([3bee7d9](3bee7d9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions bot and others added 30 commits April 4, 2026 15:12

chore(main): release 0.3.7

cd60ddb

chore: add mission infrastructure for CLI rework (#40)

c6511f3

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

chore(validation): add scrutiny report for ungate-pull

00f4552

chore(validation): rerun ungate-pull scrutiny synthesis

f78f5a6

chore(validation): rerun ungate-pull scrutiny synthesis

c59486a

chore(validation): add ungate-pull user-testing synthesis

6c1e930

chore(validation): synthesize absorb-profile scrutiny findings

41ebf2a

chore(validation): synthesize absorb-profile scrutiny findings

2a79918

chore(validation): add absorb-profile user-testing synthesis

7bc342c

chore(validation): synthesize absorb-recommend-remove-init scrutiny f…

bd15ee5

…indings

chore(validation): rerun absorb-recommend-remove-init scrutiny synthesis

9addfe6

chore(validation): rerun absorb-recommend-remove-init user testing

cd8e680

feat: add --add, --as, --remove flags to setup for non-interactive st…

5289f16

…ack modification Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

chore(validation): synthesize scrutiny for setup-modification

708b554

chore(validation): rerun scrutiny for setup-modification

1f1a3b7

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

chore(validation): add setup-modification user-testing synthesis

eee5995

docs: update README for CLI rework (#40)

e378eb0

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

weklund merged commit 3bee7d9 into main Apr 4, 2026
5 checks passed

github-actions bot mentioned this pull request Apr 4, 2026

chore(main): release 0.3.8 #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CLI rework — ungate catalog, eliminate redundancies, add stack modification (#40)#41

feat: CLI rework — ungate catalog, eliminate redundancies, add stack modification (#40)#41
weklund merged 30 commits intomainfrom
feat/cli-rework-40

weklund commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

weklund commented Apr 4, 2026

Summary

Changes

Milestone 1: Ungate Pull

Milestone 2: Absorb Profile into Status

Milestone 3: Absorb Recommend into Models, Remove Init

Milestone 4: Setup Modification Flags

Impact

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant