Skip to content

feat: CLI rework — ungate catalog, eliminate redundancies, add stack modification (#40)#41

Merged
weklund merged 30 commits intomainfrom
feat/cli-rework-40
Apr 4, 2026
Merged

feat: CLI rework — ungate catalog, eliminate redundancies, add stack modification (#40)#41
weklund merged 30 commits intomainfrom
feat/cli-rework-40

Conversation

@weklund
Copy link
Copy Markdown
Owner

@weklund weklund commented Apr 4, 2026

Summary

Implements phases 1-4 of #40. Reworks the CLI to eliminate redundant commands, remove the catalog-as-whitelist pattern, and add the missing stack modification path.

Closes #40
Supersedes #27

Changes

Milestone 1: Ungate Pull

  • pull accepts any HuggingFace repo string (e.g., mlx-community/Phi-5-Mini-4bit) in addition to catalog IDs
  • Benchmark target resolution supports HF repos with sanitized service names
  • 26 new tests

Milestone 2: Absorb Profile into Status

  • profile command deleted
  • status now shows hardware info (chip, GPU cores, memory, bandwidth) with estimate indicators
  • Hardware data included in status --json output
  • 16 new tests, 28 removed (test_cli_profile.py deleted)

Milestone 3: Absorb Recommend into Models, Remove Init

  • recommend command deleted
  • init command deleted
  • models gains --recommend flag (scored tier recommendations with --budget, --intent, --show-all)
  • models gains --available flag (live HuggingFace discovery with scoring overlay)
  • --recommend, --available, --catalog are mutually exclusive
  • All stale command references across codebase updated
  • 58 new tests, 115 removed (test_cli_recommend.py + test_cli_init.py deleted)

Milestone 4: Setup Modification Flags

  • setup --add MODEL — add a model to existing stack (HF repo or catalog ID, repeatable)
  • setup --as TIER — custom tier name for --add
  • setup --remove TIER — remove a tier (repeatable, prevents empty stack)
  • setup --model MODEL — single-model quick setup (skips wizard)
  • setup --no-pull — generate config without downloading
  • setup --no-start — generate config without starting services
  • 32+ new tests

Impact

Metric Before After
Top-level commands 14 11
Catalog-gated commands 3 0
Stack modification paths 0 2

Validation

  • 96/96 contract assertions passed
  • All 4 milestones passed scrutiny (code review) + user testing validation
  • pytest, pyright, ruff all clean

github-actions bot and others added 30 commits April 4, 2026 15:12
The workflow permissions fix resolved 4 CodeQL code-scanning alerts
(actions/missing-workflow-permissions) and should be documented under
a Security heading rather than just Bug Fixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dependabot PR #36 (pygments 2.19.2 → 2.20.0) fixes catastrophic
backtracking CVEs but was missed by release-please because build(deps)
is not a tracked conventional commit type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Allow `mlx-stack pull` to accept arbitrary HuggingFace repo strings
(containing '/') in addition to catalog IDs. HF repos bypass catalog
lookup and download directly. Catalog ID behavior is unchanged.

- Add hf_repo_override param to pull_model() in core/pull.py
- Route MODEL arg in cli/pull.py based on '/' detection
- Update help text documenting both input types
- Add 26 new tests covering HF repo acceptance, error handling,
  flag combinations, disk space checks, and inventory tracking

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add a third resolution path in resolve_target() that detects HF repo
strings (containing '/') and handles them: checks local models dir for
already-downloaded copy, creates a minimal synthetic CatalogEntry for
benchmarking, finds a free port, and starts a temp vllm-mlx instance.

This enables both 'mlx-stack bench mlx-community/Model-4bit' as a
standalone command and 'mlx-stack pull mlx-community/Model-4bit --bench'
to resolve the target correctly.

Also updates bench CLI help text to document HF repo support and fixes
stale references to removed 'recommend' and 'init' commands in bench
and pull CLI output.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…h '--'

HF repo IDs (e.g. mlx-community/Model-4bit) were used directly as
benchmark service names, creating invalid PID/log file paths since
process.py uses service_name for filesystem operations. Now replaces
'/' with '--' in _start_temp_instance() to produce path-safe names
like bench-temp-mlx-community--Model-4bit.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Remove the profile CLI command entirely. Add hardware info section
(chip, GPU cores, memory, bandwidth) to status output reading from
profile.json via core/hardware.py load_profile(). Add hardware data
to status --json under 'hardware' key. Handle missing/corrupt profile
gracefully. Update no-stack messages to reference 'setup' instead of
'init'. Delete test_cli_profile.py and add 16 new hardware tests.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add is_estimate to HardwareProfile.to_dict() so it persists in
profile.json. Update load_profile() to read is_estimate from saved
data (defaulting to False for legacy profiles). Include is_estimate
in status --json hardware output. Add 9 new tests covering round-trip
serialization, table display, and JSON output for estimated bandwidth.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Remove the standalone 'recommend' CLI command and absorb its functionality
into 'models --recommend'. Add --budget, --intent, --show-all flags that
work with --recommend, and --available flag that queries HF API. Make
--recommend, --available, and --catalog mutually exclusive. Ensure
--budget/--intent/--show-all require --recommend. Update display-only
notice to reference 'setup' not 'init'. Delete test_cli_recommend.py
and add comprehensive new tests in test_cli_models.py. Update all
user-facing strings that referenced 'recommend' or 'init' as commands.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Delete cli/init.py and test_cli_init.py. Remove init import and
registration from cli/main.py. Update _COMMAND_CATEGORIES to remove
init from 'Setup & Configuration'. Update test_cli.py, test_cross_area.py,
test_cli_up.py, and test_cli_watch.py to reflect init removal. Update
launchd.py and recommend.py docstrings. core/stack_init.py preserved
for internal use by setup.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Remove src/mlx_stack/cli/recommend.py (deregistered from main.py but file remained)
- Update cli/setup.py module docstring to remove old init flow reference
- Update test_cross_area.py comments to reference setup instead of init command

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
… reference

Replace the post-benchmark message in cli/pull.py that referenced
'models --recommend' with a generic message: 'Results saved. These
will be used for model scoring.' Add VAL-CROSS-008 test to verify
the output no longer references removed commands.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…ack modification

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
--model MODEL creates a single-tier 'standard' stack without the wizard.
--no-pull skips model download in wizard, --model, and --add flows.
--no-start skips stack startup. --no-pull implies --no-start.
--model is mutually exclusive with --add/--remove.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…--model

Two scrutiny fixes:
1. _resolve_model_source() now returns entry.id instead of entry.name
   for catalog ID resolution, so tiers[].model stores the canonical ID
   (e.g., 'qwen3.5-8b') rather than the display name.
2. _single_model_setup() no longer auto-starts services — it always
   prints 'mlx-stack up' guidance per the no-auto-restart convention.

Tests updated to explicitly verify both model and source fields, and to
assert start_stack is never called from --model path.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@weklund weklund merged commit 3bee7d9 into main Apr 4, 2026
5 checks passed
weklund pushed a commit that referenced this pull request Apr 5, 2026
🤖 I have created a release *beep* *boop*
---


## [0.3.8](v0.3.7...v0.3.8)
(2026-04-04)


### Features

* CLI rework — ungate catalog, eliminate redundancies, add stack
modification ([#40](#40))
([#41](#41))
([3bee7d9](3bee7d9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI rework: eliminate redundancies, ungate catalog, add stack modification

1 participant