Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
cd60ddb
chore(main): release 0.3.7
github-actions[bot] Apr 4, 2026
3f61dab
chore: add Security section to 0.3.6 changelog entry
weklund Apr 4, 2026
84650ab
chore: add pygments security bump to 0.3.7 changelog
weklund Apr 4, 2026
c6511f3
chore: add mission infrastructure for CLI rework (#40)
weklund Apr 4, 2026
4243472
feat: ungate pull command to accept HuggingFace repo strings
weklund Apr 4, 2026
00f4552
chore(validation): add scrutiny report for ungate-pull
weklund Apr 4, 2026
5c08089
feat: add HF repo string resolution to benchmark target resolver
weklund Apr 4, 2026
f78f5a6
chore(validation): rerun ungate-pull scrutiny synthesis
weklund Apr 4, 2026
ff593f2
fix: sanitize HF repo service names in benchmark by replacing '/' wit…
weklund Apr 4, 2026
c59486a
chore(validation): rerun ungate-pull scrutiny synthesis
weklund Apr 4, 2026
6c1e930
chore(validation): add ungate-pull user-testing synthesis
weklund Apr 4, 2026
f684034
feat: absorb profile command into status with hardware info section
weklund Apr 4, 2026
41ebf2a
chore(validation): synthesize absorb-profile scrutiny findings
weklund Apr 4, 2026
a477711
fix: preserve is_estimate field in hardware profile serialization
weklund Apr 4, 2026
2a79918
chore(validation): synthesize absorb-profile scrutiny findings
weklund Apr 4, 2026
7bc342c
chore(validation): add absorb-profile user-testing synthesis
weklund Apr 4, 2026
728d756
feat: absorb recommend command into models --recommend
weklund Apr 4, 2026
0838579
feat: remove init CLI command
weklund Apr 4, 2026
bd15ee5
chore(validation): synthesize absorb-recommend-remove-init scrutiny f…
weklund Apr 4, 2026
b3ce728
fix: delete unused cli/recommend.py and update stale init references
weklund Apr 4, 2026
9addfe6
chore(validation): rerun absorb-recommend-remove-init scrutiny synthesis
weklund Apr 4, 2026
9b6776d
fix: update pull --bench message to remove stale 'models --recommend'…
weklund Apr 4, 2026
cd8e680
chore(validation): rerun absorb-recommend-remove-init user testing
weklund Apr 4, 2026
5289f16
feat: add --add, --as, --remove flags to setup for non-interactive st…
weklund Apr 4, 2026
df7a553
feat: add --model, --no-pull, and --no-start flags to setup command
weklund Apr 4, 2026
708b554
chore(validation): synthesize scrutiny for setup-modification
weklund Apr 4, 2026
daac9f0
fix: use entry.id for tiers[].model and remove auto-start from setup …
weklund Apr 4, 2026
1f1a3b7
chore(validation): rerun scrutiny for setup-modification
weklund Apr 4, 2026
eee5995
chore(validation): add setup-modification user-testing synthesis
weklund Apr 4, 2026
e378eb0
docs: update README for CLI rework (#40)
weklund Apr 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 5 additions & 13 deletions .factory/init.sh
Original file line number Diff line number Diff line change
@@ -1,15 +1,7 @@
#!/usr/bin/env bash
set -euo pipefail
#!/bin/bash
set -e

# Verify Python version
python_version=$(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")
required="3.13"
if [ "$(printf '%s\n' "$required" "$python_version" | sort -V | head -n1)" != "$required" ]; then
echo "ERROR: Python >= 3.13 required (found $python_version)"
exit 1
fi
cd /Users/weae1504/Projects/mlx-stack

# Install dependencies if pyproject.toml exists
if [ -f pyproject.toml ]; then
uv sync
fi
# Install dev dependencies (idempotent)
uv sync --dev
168 changes: 78 additions & 90 deletions .factory/library/architecture.md
Original file line number Diff line number Diff line change
@@ -1,92 +1,80 @@
# Architecture

Architectural decisions, patterns discovered, and conventions.

**What belongs here:** Architecture decisions, module patterns, code conventions.

---

## Project Structure
- `src/mlx_stack/` — main package (src layout)
- `src/mlx_stack/cli/` — Click CLI package
- `cli/__init__.py` — package init
- `cli/main.py` — CLI entry point with Click command group
- `cli/profile.py` — `mlx-stack profile` command
- `cli/config.py` — `mlx-stack config` commands
- `cli/init.py` — `mlx-stack init` command (stack + LiteLLM config generation)
- `cli/recommend.py` — `mlx-stack recommend` command
- `cli/models.py` — `mlx-stack models` command (local model listing + catalog browsing)
- `src/mlx_stack/core/` — shared business logic modules
- `core/hardware.py` — hardware detection (Apple Silicon profiling)
- `core/config.py` — configuration management (YAML-based)
- `core/catalog.py` — model catalog system (query API over YAML entries)
- `core/deps.py` — dependency management (auto-installing uv tools)
- `core/paths.py` — path utilities (`~/.mlx-stack/` and friends)
- `core/scoring.py` — recommendation scoring engine (intent-weighted composite scoring)
- `core/litellm_gen.py` — LiteLLM proxy config generation (model_list, router_settings, fallbacks)
- `core/stack_init.py` — stack initialization logic (port allocation, vllm_flags, overwrite protection)
- `core/models.py` — local model scanning, catalog listing, size formatting
- `src/mlx_stack/data/` — static data files
- `data/catalog/` — shipped YAML catalog files (15 models)
- `src/mlx_stack/utils/` — utility modules
- `tests/` — pytest tests
- `tests/fixtures/` — mock data (profiles, catalogs, etc.)

## Conventions
- Click for CLI, Rich for terminal output
- PyYAML for all YAML operations
- httpx for HTTP requests (async not needed — use sync client)
- psutil for process management
- All state lives in `~/.mlx-stack/` (configurable via `model-dir` for models)
- Tests use `tmp_path` pytest fixture — NEVER touch real `~/.mlx-stack/`
- External commands (sysctl, system_profiler, subprocess) are always mocked in unit tests
- Click eager options (`--help`, `--version`) may exit before the group callback runs, so callback-based setup hooks should not be relied on for those code paths
- Note: The config module currently sends success output to stderr. Future features should use stdout for successful output and stderr only for errors/warnings.

## Key Design Decisions
- One vllm-mlx process per model (ADR-003)
- vllm-mlx and litellm managed as pinned uv tools, auto-installed on first use
- Catalog schema: no int6, disk_size_gb per quant source, min_mlx_lm_version top-level, verified_on in separate data/verification.yaml
- 2 intents for MVP: balanced, agent-fleet (architecture supports more)
- 40% default memory budget of total unified memory
- Recommendation/init budget behavior: budget filtering is per-model eligibility (`model.memory_gb <= budget`); the combined memory of selected tiers can exceed the budget

## Ops Layer (Milestone 5)

### New Modules
- `core/log_rotation.py` — Copytruncate-based log rotation (copy → gzip → truncate)
- `core/log_viewer.py` — Log viewing/following/listing logic
- `core/watchdog.py` — Health polling loop, auto-restart, flap detection, daemon mode
- `core/launchd.py` — Plist generation/loading/unloading via plistlib + launchctl
- `cli/logs.py` — `mlx-stack logs` command
- `cli/watch.py` — `mlx-stack watch` command
- `cli/install.py` — `mlx-stack install` / `mlx-stack uninstall` commands

### Key Integration Points
- `process.py:start_service` — Log file open mode changed from "w" to "a" for rotation compatibility
- `core/config.py` — 2 new keys: log-max-size-mb (int, default 50), log-max-files (int, default 5)
- `process.py:acquire_lock` — Watchdog uses per-restart lock, not held during polling
- `paths.py` — Watchdog PID at get_pids_dir()/watchdog.pid
- `stack_status.py:run_status` — Used by watchdog for health polling
- `process.py:start_service` / `stop_service` — Used by watchdog for restart
- `cli/main.py` — 3 new commands registered: logs (Diagnostics), watch (Lifecycle), install/uninstall (Lifecycle)

### Log Rotation Strategy
- Copytruncate: copy log to archive, gzip compress, truncate original in-place
- Service FDs remain valid (point to same inode, just at offset 0 after truncation)
- Naming: service.log.1.gz (most recent) → service.log.N.gz (oldest)
- Archives shifted up before new rotation
- No cooperation needed from child processes (vllm-mlx, litellm)

### Log Follow Caveat
- `core/log_viewer.py:follow_log` detects truncation when `current_size < position`.
- Edge case: truncate + immediate rewrite back to exactly the previous byte length may not trigger truncation detection (`current_size == position`), so the stream can miss lines until new writes advance file size.

### Watchdog Architecture
- Single foreground loop (or daemonized with --daemon)
- Polls get_service_status for all services each interval
- Restart trigger: crashed state only (PID file exists, process dead)
- NOT restarted: stopped (no PID file), healthy, degraded
- Flap detection: rolling window of restart timestamps per service
- Lock: acquire_lock only during actual restart, released immediately
- Log rotation: triggered as side-effect of each poll cycle
How the mlx-stack system works at a high level.

## Overview

mlx-stack is a CLI tool that manages local LLM infrastructure on Apple Silicon. It orchestrates vllm-mlx model servers behind a LiteLLM proxy, providing a unified OpenAI-compatible API endpoint.

## Layers

```
CLI Layer (src/mlx_stack/cli/)
├── Commands: setup, up, down, status, models, pull, bench, logs, config, watch, install, uninstall
└── Each command is a Click command registered in main.py

Core Layer (src/mlx_stack/core/)
├── hardware.py — Apple Silicon detection (chip, GPU cores, memory, bandwidth)
├── catalog.py — YAML catalog loading, validation, querying (15 curated models)
├── discovery.py — Live HuggingFace API query for mlx-community models
├── scoring.py — Hardware-aware model recommendation engine
├── onboarding.py — Setup wizard orchestration (scoring variant for DiscoveredModel)
├── stack_init.py — Stack definition generation (stack.yaml + litellm.yaml)
├── litellm_gen.py — LiteLLM proxy config generation
├── stack_up.py — Process management (start/stop vllm-mlx + LiteLLM)
├── pull.py — Model download (HuggingFace snapshot_download)
├── benchmark.py — Performance benchmarking
├── watchdog.py — Health monitoring + auto-restart
├── launchd.py — macOS LaunchAgent management
├── config.py — User config (~/.mlx-stack/config.yaml)
├── paths.py — Path resolution for data/config/stacks
└── process.py — Low-level process management

Data Layer (src/mlx_stack/data/)
├── catalog/*.yaml — Curated model entries (15 files)
└── benchmark_data.json — Static performance overlay from mlx_transformers_benchmark
```

## Data Flow

1. **Hardware detection** → `HardwareProfile` (chip, memory, bandwidth, GPU cores)
2. **Model discovery** → `CatalogEntry` (from YAML catalog) or `DiscoveredModel` (from HF API)
3. **Scoring** → `ScoredModel` / `ScoredDiscoveredModel` with composite scores
4. **Tier assignment** → `TierAssignment` (model → tier name mapping)
5. **Config generation** → `stack.yaml` (tier definitions) + `litellm.yaml` (proxy config)
6. **Process management** → vllm-mlx subprocesses + LiteLLM proxy process

## Stack Tier Field Semantics

- `stack.yaml` tier objects use:
- `name`: tier identifier (e.g., `standard`, `fast`, `reasoning`)
- `model`: canonical model identifier used by mlx-stack logic
- `source`: concrete model source for runtime/download
- For catalog-backed tiers, keep `model` as the catalog model ID (for example `qwen3.5-8b`) rather than a display label, and keep the resolved Hugging Face repo in `source`.

## Key Files for This Mission

- `cli/main.py` — Command registration, `_COMMAND_CATEGORIES`, welcome screen, help formatting
- `cli/pull.py` — Pull command (being ungated to accept HF repos)
- `cli/status.py` — Status command (absorbing hardware display from profile)
- `cli/models.py` — Models command (absorbing recommend functionality)
- `cli/setup.py` — Setup command (gaining modification flags)
- `cli/profile.py` — Being DELETED
- `cli/recommend.py` — Being DELETED
- `cli/init.py` — Being DELETED
- `core/pull.py` — Download infrastructure (already accepts arbitrary HF repos)
- `core/stack_init.py` — Config generation (preserved for internal use by setup)
- `core/onboarding.py` — Setup wizard orchestration

## Testing Patterns

- All CLI tests use Click's `CliRunner().invoke(cli, ["command", ...])`
- Core functions mocked via `@patch("mlx_stack.core.module.function")` or `monkeypatch.setattr`
- `FakeServiceLayer` test double for stack_up/watchdog tests
- Test factories in `tests/factories.py` for creating test data
- No real HF downloads, no real hardware detection in unit tests

## Operational Constraint: Service Name Safety

- Service names are reused as PID/log filename stems by `core/process.py` (`pid_file` and log path construction).
- Any dynamically generated `service_name` must be filesystem-safe (no path separators like `/`), or temp process startup can fail before health checks run.
43 changes: 29 additions & 14 deletions .factory/library/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,32 @@ Environment variables, external dependencies, and setup notes.

---

## Machine
- Apple MacBook Pro M5 Max, 128 GB unified memory, 18 CPU cores, 40 GPU cores
- macOS 26.x
- Python 3.14.3 (targeting 3.13+ compatibility)

## Tools
- uv 0.10.12 (package manager)
- vllm-mlx v0.2.6 (installed as uv tool at ~/.local/bin/vllm-mlx)
- litellm (installed as uv tool at ~/.local/bin/litellm)
- For robust `uv tool list` parsing, set `NO_COLOR=1` when invoking uv to avoid ANSI escape sequences in output

## External Dependencies
- HuggingFace Hub (for model downloads — optional HF_TOKEN for rate limiting)
- OpenRouter API (optional, for cloud fallback — key stored in ~/.mlx-stack/config.yaml)
## Python Environment

- Python 3.14+ via `uv`
- All dependencies managed by `uv sync --dev`
- Virtual environment at `.venv/` (created by uv)

## Key Dependencies

- `click` — CLI framework
- `rich` — Terminal UI (tables, colors, progress)
- `pyyaml` — YAML parsing
- `huggingface_hub` — HF API + model downloads
- `pytest` + `pytest-cov` — Testing
- `ruff` — Linting
- `pyright` — Type checking

## Environment Variables

- `MLX_STACK_HOME` — Override data directory (default: `~/.mlx-stack/`). Used extensively in tests via `mlx_stack_home` fixture.

## Data Directories

- `~/.mlx-stack/` — User data home
- `~/.mlx-stack/stacks/default.yaml` — Stack definition
- `~/.mlx-stack/litellm.yaml` — LiteLLM proxy config
- `~/.mlx-stack/profile.json` — Hardware profile
- `~/.mlx-stack/config.yaml` — User configuration
- `~/.mlx-stack/models/` — Downloaded model files
- `~/.mlx-stack/benchmarks/` — Saved benchmark results
Loading
Loading