- Root-cause the repository-wide
make swift-testhang inservices/control-plane-swiftand restore clean default verification. Latest evidence on 2026-04-06:packages/protocol/swiftandservices/mlx-text-worker-swiftcompleted, then the fullservices/control-plane-swiftpackage stopped producing output whileswift testandswiftpm-testing-helpersat idle at0.0%CPU until termination. Rerunmake swift-testafter the fix.
- Closed the semantic evaluation controls and executable code-evaluation slice for the Python
worker so evaluation controls now affect runtime behavior instead of only persisted metadata:
- seeded evaluation planning now deterministically orders packaged samples, slices few-shot
demonstrations from that same plan, and excludes demo rows from scored
sample_size - few-shot examples are now rendered into evaluation prompts, and compare runs reuse the same seeded demo plus sample plan across base and target targets
scoring_modenow dispatches real scorers, including multiple-choice choice-resolution, exact-match scoring, and executablepass_at_1- unsupported scorer and
code_exec_policycombinations now fail as typed invalid arguments instead of silently persisting inert metadata - default offline evaluation no longer reports evidence-bearing synthetic success when no live model is available
- added
code_eval_runner.pysohumanevalandmbppexecute candidate Python code, persist execution diagnostics, and exportexecution_statusplusexecution_metadata - updated evaluation persistence, export normalization, release-gate evaluation evidence, and maintenance-service error mapping so the surrounding productization path matches the new worker semantics
- refreshed the benchmark/evaluation contract, operator runbook, roadmap checklist, and local task-plan records to document the shipped behavior
- seeded evaluation planning now deterministically orders packaged samples, slices few-shot
demonstrations from that same plan, and excludes demo rows from scored
- Verification summary for the semantic evaluation controls slice:
PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_evaluation_core.py -q:28 passed in 0.41sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_evaluation_schemas.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_benchmark_export.py -q:34 passed in 0.07sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_maintenance_service.py -k 'evaluation or export' -q:10 passed, 72 deselected in 0.33sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_release_gates.py -q:26 passed in 0.41smake py-test:692 passed in 13.01sgit diff --check: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/semantic_eval_controls.coverage --source=services/mlx-worker-python/worker,services/mlx-worker-python/tests -m pytest services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_evaluation_schemas.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_release_gates.py -q:172 passed in 1.91sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/semantic_eval_controls.coverage -o /tmp/semantic_eval_controls_coverage.json: passpython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/semantic_eval_controls_coverage.json services/mlx-worker-python/worker/engine/code_eval_runner.py services/mlx-worker-python/worker/engine/evaluation_core.py services/mlx-worker-python/worker/grpc_server.py services/mlx-worker-python/worker/productization/benchmark_export.py services/mlx-worker-python/worker/productization/evaluation_compare.py services/mlx-worker-python/worker/productization/evaluation_schemas.py services/mlx-worker-python/worker/productization/evaluation_store.py services/mlx-worker-python/worker/productization/release_gates.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_evaluation_schemas.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_release_gates.py:97.12%(303/312)
- Metrics report for the semantic evaluation controls slice:
- Python changed-line coverage across the touched evaluation, export, maintenance, and
release-gate scope:
97.12%(303/312) - full worker Python regression command for this repository slice:
make py-testpassed with692tests
- Python changed-line coverage across the touched evaluation, export, maintenance, and
release-gate scope:
- Post-review hardening for the semantic evaluation controls slice closed the unresolved executor
safety threads on PR
#6:sandboxednow uses a real macOSsandbox-execprofile instead of a plainpython -Isubprocess, so candidate code is confined to a dedicated temporary directory with network denied and writes limited to that directory- stdout plus stderr are now redirected to bounded files instead of unbounded in-memory buffers,
and runs that hit the limit are failed explicitly with
output_limit_exceeded - the harness now persists its payload to a sidecar JSON file so candidate stdout cannot corrupt result parsing when it omits a trailing newline
- code-suite runs now fail fast when the worker cannot enforce
sandboxed, and the benchmark contract plus runbook now document the concrete boundary
- Verification summary for the semantic evaluation controls review hardening:
PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_code_eval_runner.py -q:11 passed in 0.24sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_code_eval_runner.py services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_evaluation_schemas.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_release_gates.py -q:184 passed in 1.77smake py-test:699 passed in 13.69sgit diff --check: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/semantic_eval_review.coverage --source=services/mlx-worker-python/worker,services/mlx-worker-python/tests -m pytest services/mlx-worker-python/tests/test_code_eval_runner.py services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_evaluation_schemas.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_release_gates.py -q:184 passed in 2.11sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/semantic_eval_review.coverage -o /tmp/semantic_eval_review_coverage.json: passpython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/semantic_eval_review_coverage.json services/mlx-worker-python/worker/engine/code_eval_runner.py services/mlx-worker-python/worker/engine/evaluation_core.py services/mlx-worker-python/tests/test_code_eval_runner.py services/mlx-worker-python/tests/test_evaluation_core.py:97.48%(116/119)
- Metrics report for the semantic evaluation controls review hardening:
- Python changed-line coverage across the touched handwritten executor and evaluation-core scope:
97.48%(116/119) - full worker Python regression command for this follow-up slice:
make py-testpassed with699tests
- Python changed-line coverage across the touched handwritten executor and evaluation-core scope:
-
Closed the executable-code evaluation slice for
humanevalandmbppso Melix now treats repository-owned Python code execution as a first-class evaluation path instead of a text-only approximation:- added checked-in
humaneval.dev.v1andmbpp.dev.v1fixture packages underservices/mlx-worker-python/fixtures/evaluation/ - gated executable-code suites behind
code_exec_policy=sandboxed - added Python candidate execution with compile/runtime/timeout/test evidence persisted on sample records
- preserved executable-code evidence through
eval comparesample generation and export-bundle normalization - added dedicated public CLI compare export commands for summary CSV, samples CSV, and samples JSONL
- updated the canonical benchmark/evaluation contract and runbook for executable-code suites, compare exports, and checked-in dev fixtures
- added checked-in
-
Verification summary for the executable-code evaluation slice:
PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_maintenance_service.py -q:141 passed in 1.44sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter 'MelixCLIParserTests|MelixCLIRunnerTests':112 tests in 3 suites passed after 0.058 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter BenchmarkExportBundleTests:12 tests in 1 suite passed after 0.002 secondsgit diff --check: pass
-
Metrics report for the executable-code evaluation slice:
- Python changed-line coverage:
99.10%(220/222) - Swift CLI changed-line coverage:
97.77%(263/269) - Swift control-plane changed-line coverage:
100.00%(266/266) - Aggregate measurable changed-line coverage across the touched executable-code evaluation slice:
98.94%(749/757)
- Python changed-line coverage:
-
Realigned the public documentation entrypoints so the repository now presents Melix as a project first and an engineering archive second:
- rewrote
README.mdaround product narrative, target users, LoRA and benchmark motivation, quick start, and contribution entrypoints - replaced the old plan-heavy
docs/README.mdwith a grouped navigation page that separates product status, onboarding, runbooks, canonical specs, decisions, and historical plans - added
docs/getting-started.md,docs/contributing.md, anddocs/current-status.mdso the current shipped scope is easier to understand without reading runbooks or archived plans first - rewrote
docs/phase-roadmap.mdas a truthful closure summary of the original Phase0-8model and pointed readers to the milestone execution index for detailed historical coverage
- rewrote
-
Verification summary for the documentation realignment:
git diff --check: pass
-
Metrics report for the documentation realignment:
N/Abecause this transaction is documentation-only; verification is markdown hygiene plus alignment against existing repository sources, progress logs, runbooks, and roadmap records
-
Closed the Phase 8 Stage 5 Window UI evidence slice so the native macOS menubar app now produces repository-owned live acceptance evidence from the same CLI-backed workflows used by the Stage 3 CLI contract:
- added a dedicated
MELIX_PHASE8_WINDOW_UI_ACCEPTANCE=1app entrypoint somelix-menubarcan be invoked non-interactively, emit snake_case JSON tostdout, and report localized acceptance failures tostderr - kept the Window UI as a thin wrapper over the CLI by routing LoRA train and activate,
benchmark, matrix benchmark, evaluation, and export flows directly through the subprocess CLI
workflow runner instead of depending on
RuntimeViewModelfallback model selection or history refresh state - hardened adapter-manifest resolution for LoRA acceptance by preferring
artifact_path, falling back to JSONoutput_path, and otherwise derivingtrain_lora.adapter.jsonfrom the emitted weights path - switched live screenshot capture from
ImageRendererto anNSHostingViewbitmap snapshot and pinned the captured surface toServer, fixing the unreadable placeholder capture and producing a readable native desktop screenshot - expanded positive and negative Swift coverage around the acceptance entrypoint bootstrap, default wiring, subprocess stderr/stdout behavior, and the default runner-factory failure path
- added a dedicated
-
Verification summary for Phase 8 Stage 5:
HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'MelixSubprocessCLIWorkflowRunnerTests|Phase8WindowUIAcceptanceRunnerTests|AppMainBootstrapTests':46 tests in 3 suites passed after 0.431 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/AppMain.swift apps/macos-menubar/Sources/AppMain/Acceptance/Phase8WindowUIAcceptanceRunner.swift apps/macos-menubar/Sources/AppMain/CLI/MelixCLIWorkflowRunning.swift apps/macos-menubar/Sources/AppMain/CLI/MelixSubprocessCLIWorkflowRunner.swift apps/macos-menubar/Tests/MenuBarTests/AppMainBootstrapTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/MelixSubprocessCLIWorkflowRunnerTests.swift:95.22%(1354/1422)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest tests/integration/test_phase8_window_ui_acceptance.py -q:2 passed in 292.65s (0:04:52)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/p8_s5_py.coverage --source=tests/integration -m pytest tests/integration/test_phase8_window_ui_acceptance.py -q:2 passed in 260.51s (0:04:20)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/p8_s5_py.coverage -o /tmp/p8_s5_py_coverage.json: passpython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/p8_s5_py_coverage.json tests/integration/test_phase8_window_ui_acceptance.py:N/Afor Stage 5 Python changed-line coverage because the diff-based helper reported100.00%(0/0) executable changed lines for the newly added deterministic E2E filesource "/Users/ChenYu/Library/Application Support/Melix/melix-product-env.sh" && MELIX_HOME="/Users/ChenYu/Library/Application Support/Melix" MELIX_CLI="$(pwd)/.build/arm64-apple-macosx/debug/melix" MELIX_REPO_ROOT="$(pwd)" MELIX_PHASE8_WINDOW_UI_ACCEPTANCE=1 MELIX_PHASE8_WINDOW_UI_ACCEPTANCE_TIMESTAMP=2026-04-09T192003Z MELIX_PHASE8_WINDOW_UI_ACCEPTANCE_MODEL_ID=mlx-community/Qwen3.5-0.8B-OptiQ-4bit MELIX_PHASE8_WINDOW_UI_ACCEPTANCE_CLI_BUNDLE_PATH="/Users/ChenYu/Library/Application Support/Melix/acceptance/phase8/cli/2026-04-09T162920Z/bundle.json" apps/macos-menubar/.build/arm64-apple-macosx/debug/melix-menubar: pass with a real evidence bundle at/Users/ChenYu/Library/Application Support/Melix/acceptance/phase8/window-ui/2026-04-09T192003Z/bundle.jsonand screenshot at/Users/ChenYu/Library/Application Support/Melix/acceptance/phase8/window-ui/2026-04-09T192003Z/window-ui.png
-
Metrics report for Phase 8 Stage 5:
- Window UI CLI-backed touched-scope changed-line coverage:
95.22%(1354/1422) - Python deterministic Window UI E2E changed-line coverage:
N/Abecause the touched Python path for Stage 5 is a newly added deterministic E2E file and the diff-based helper reported0/0executable changed lines - live Window UI evidence captured in
/Users/ChenYu/Library/Application Support/Melix/acceptance/phase8/window-ui/2026-04-09T192003Z/bundle.jsonrecords:model_id = mlx-community/Qwen3.5-0.8B-OptiQ-4bitderived_model_id = mlx-community/Qwen3.5-0.8B-OptiQ-4bit-lora-d479ed9dbase_chat_assistant_text = BASE_OKderived_chat_assistant_text = DERIVED_OKlora_train_job_id = model-ops-0137lora_activate_job_id = model-ops-0141bench_job_id = model-ops-0149bench_matrix_job_id = model-ops-0154evaluation_job_id = eval-0004ui_state.selected_surface = Serverui_state.selected_server_session_id = server-session-1phase8.ui.managed_materialize_ms = 2001.05phase8.ui.session_rebind_ms = 452.72phase8.ui.base_chat_roundtrip_ms = 2682.14phase8.ui.lora_train_ms = 2391.22phase8.ui.lora_activate_ms = 2569.48phase8.ui.derived_chat_roundtrip_ms = 2912.88phase8.ui.bench_run_ms = 4466.41phase8.ui.bench_matrix_run_ms = 6169.73phase8.ui.evaluation_run_ms = 3880.67phase8.ui.snapshot_render_ms = 286.14phase8.ui.cli_bridge_ms = 27526.31
- Window UI CLI-backed touched-scope changed-line coverage:
-
Closed the Phase 8 Stage 4 Window UI shell slice so the macOS menubar app now routes the remaining Phase 8 write-path workflows through the shipping
melixCLI instead of keeping a second in-process workflow authority:- added
MelixCLIWorkflowRunning,MelixCLIProcessExecuting, andMelixSubprocessCLIWorkflowRunnerso the app can shell out to the bundledmelixexecutable, decode typed JSON receipts, and surface stable typed subprocess failures into the native UI state - switched the default
AppMainbootstrap to inject the subprocess-backed CLI workflow runner in production while keeping fake and in-process runners available for tests - updated
RuntimeViewModelso managed Hub download, local import, server-session create/select and start, LoRA train and activate, benchmark, matrix benchmark, evaluation, and export all prefer the CLI-first workflow path when the menubar app has a CLI workflow runner - kept server-session rebinding and derived-model activation aligned with the CLI contract by
allowing activation fallback to the latest trained adapter output path and by not immediately
overwriting CLI-projected lifecycle state with a stale direct refresh after
lora activate - expanded positive and negative Swift tests around subprocess decoding, process failures, bootstrap wiring, server-session mutation failures, train and activate failures, and the remaining CLI-backed Window UI workflow guard rails
- added
-
Verification summary for Phase 8 Stage 4:
HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|AppMainBootstrapTests|MelixSubprocessCLIWorkflowRunnerTests':206 tests in 3 suites passed after 1.044 secondsafter a fresh coverage-enabled rebuildpython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/CLI/MelixCLIWorkflowRunning.swift apps/macos-menubar/Sources/AppMain/CLI/MelixCLIProcessExecutor.swift apps/macos-menubar/Sources/AppMain/CLI/MelixSubprocessCLIWorkflowRunner.swift apps/macos-menubar/Sources/AppMain/AppMain.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/MelixSubprocessCLIWorkflowRunnerTests.swift:95.93%(1273/1327)
-
Metrics report for Phase 8 Stage 4:
- Window UI CLI-shell touched-scope changed-line coverage:
95.93%(1273/1327) - Stage 4 leaves real Window UI screenshot capture and acceptance-bundle emission to Stage 5; no live UI evidence path is claimed in this transaction
- Window UI CLI-shell touched-scope changed-line coverage:
-
Closed the Phase 8 Stage 3 CLI acceptance slice so the public
melixcontract now closes the deterministic LoRA, derived-chat, benchmark, matrix benchmark, evaluation, export, and evidence bundle path in one repository-owned runner:- added
scripts/phase8_acceptance_bundle.pyplusmake phase8-acceptanceso one CLI-owned entrypoint materializes or imports the model, rebinds the server session, runs base and derived chats, executes LoRA train plus activate, runs bench, matrix bench, and eval, exports the resulting artifacts, and writes a machine-readable evidence bundle underMELIX_HOME/acceptance/phase8/cli/<timestamp>/ - added deterministic LoRA and benchmark fixtures in the Python worker so the Stage 3 E2E can prove the full CLI orchestration path without a live network dependency
- fixed the Swift process bridge deadlock for large unary payloads by draining
stdoutandstderrconcurrently, and added a regression test that provesexport-resultsstyle payloads no longer hang - compacted the deterministic text backend to emit one chunk per response so deterministic matrix benchmark acceptance finishes in seconds instead of appearing stalled for ~80 seconds
- fixed local-product launch-agent rendering so the Python worker launch agent resolves the
absolute
uvexecutable path at install time instead of depending on launchd's defaultPATHcontaininguv; this unblocked the real CLI acceptance run on the local product install - expanded positive and negative Python unit coverage around acceptance-bundle parsing, subprocess failures, helper validation, deterministic LoRA artifact materialization, and deterministic benchmark dataset fetches, while keeping the deterministic CLI E2E as the end- to-end closure for base chat, derived chat, LoRA, bench, eval, and export outputs
- added
-
Verification summary for Phase 8 Stage 3:
swift test --package-path services/control-plane-swift --enable-code-coverage --filter PythonBridgeWorkerClientTests:52 tests in 1 suite passed after 0.847 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift:100.00%(40/40)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_runtime_edges.py tests/test_phase8_acceptance_bundle.py -q:47 passed in 0.22sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_install_assets.py services/mlx-worker-python/tests/test_install_local_product_script.py -q:13 passed in 0.05sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/p8_s3_py.coverage --source=scripts,services/mlx-worker-python/worker,tests -m pytest services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_deterministic_backend.py services/mlx-worker-python/tests/test_install_assets.py services/mlx-worker-python/tests/test_install_local_product_script.py tests/test_phase8_acceptance_bundle.py tests/integration/test_phase8_cli_acceptance.py -q:69 passed in 55.96sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/p8_s3_py.coverage -o /tmp/p8_s3_py_coverage.json: passpython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/p8_s3_py_coverage.json scripts/phase8_acceptance_bundle.py services/mlx-worker-python/worker/grpc_server.py services/mlx-worker-python/worker/model_ops/deterministic_lora_runner.py services/mlx-worker-python/worker/productization/install_assets.py services/mlx-worker-python/worker/runtime/deterministic_backend.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_deterministic_backend.py services/mlx-worker-python/tests/test_install_assets.py tests/test_phase8_acceptance_bundle.py tests/integration/test_phase8_cli_acceptance.py:99.47%(564/567)python3 scripts/install_local_product.py --json: passlaunchctl bootstrap gui/501 /Users/ChenYu/Library/LaunchAgents/io.melix.swift-text-worker.plist: passlaunchctl bootstrap gui/501 /Users/ChenYu/Library/LaunchAgents/io.melix.python-worker.plist: pass after theinstall_assets.pyabsolute-uvfix and alaunchctl bootout/bootstraprestart cyclelaunchctl bootstrap gui/501 /Users/ChenYu/Library/LaunchAgents/io.melix.control-plane.plist: passsource "/Users/ChenYu/Library/Application Support/Melix/melix-product-env.sh" && MELIX_HOME="/Users/ChenYu/Library/Application Support/Melix" MELIX_CLI="$(pwd)/.build/arm64-apple-macosx/debug/melix" make phase8-acceptance PHASE8_ACCEPTANCE_ARGS="--live --model-id mlx-community/Qwen3.5-0.8B-OptiQ-4bit --training-fixture services/mlx-worker-python/fixtures/training/melix-dev-dataset.v1 --bench-suite smoke --bench-suite latency --matrix-suite smoke --evaluation-suite mmlu --evaluation-dataset mmlu.dev.v1 --server-session-id server-session-1 --json": pass with a real evidence bundle at/Users/ChenYu/Library/Application Support/Melix/acceptance/phase8/cli/2026-04-09T162920Z/bundle.json
-
Metrics report for Phase 8 Stage 3:
- Swift process-bridge touched-scope changed-line coverage:
100.00%(40/40) - Python worker, acceptance runner, and deterministic CLI E2E touched-scope changed-line
coverage:
99.47%(564/567) - the Stage 3 acceptance bundle now records repository-owned orchestration probes:
phase8.cli.managed_materialize_msphase8.cli.session_rebind_msphase8.cli.base_chat_roundtrip_msphase8.cli.derived_chat_roundtrip_msphase8.cli.chat_roundtrip_msphase8.cli.lora_train_msphase8.cli.lora_activate_msphase8.cli.bench_run_msphase8.cli.bench_matrix_run_msphase8.cli.evaluation_run_msphase8.cli.acceptance_bundle_write_ms
- live CLI evidence captured in
/Users/ChenYu/Library/Application Support/Melix/acceptance/phase8/cli/2026-04-09T162920Z/bundle.jsonrecords:base chat assistant_text = BASE_OKderived chat assistant_text = Derived_OKlora_train_job_id = model-ops-0013bench_job_id = model-ops-0024bench_matrix_job_id = model-ops-0029evaluation_job_id = eval-0001phase8.cli.managed_materialize_ms = 129039.66phase8.cli.lora_train_ms = 9959.67phase8.cli.bench_run_ms = 11100.63phase8.cli.bench_matrix_run_ms = 7424.38phase8.cli.evaluation_run_ms = 4295.66
- the deterministic Stage 3 E2E now proves the full CLI contract
model import -> registry rescan -> server start -> base chat -> lora train -> lora activate -> derived chat -> bench -> matrix bench -> eval -> exportand verifies the emitted evidence bundle paths exist
- Swift process-bridge touched-scope changed-line coverage:
-
Closed the Phase 8 Stage 1 CLI materialization slice so managed hub downloads and local-path imports now share one machine-readable receipt contract and a deterministic CLI acceptance path:
- added
melix model importparser and runner support, including a shared managed-model receipt renderer formodel hub download --jsonandmodel import --json - added Python worker-side
local_importmaterialization, receipt metadata, and maintenance-core routing for managed local imports - added the training fixture
melix-dev-dataset.v1, positive and negative Swift/Python unit coverage, a control-plane regression for unknown imported model ids, and deterministic CLI E2E coverage for local import plus registry visibility - normalized malformed managed-manifest parsing into a stable CLI runtime error so negative CLI validation does not leak raw Foundation JSON errors
- added
-
Verification summary for Phase 8 Stage 1:
HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter 'MelixCLIParserTests|MelixCLIRunnerTests':82 tests in 2 suites passed after 0.014 secondspython3 scripts/swift_changed_line_coverage.py --binary .build/arm64-apple-macosx/debug/melixPackageTests.xctest/Contents/MacOS/melixPackageTests --profdata .build/arm64-apple-macosx/debug/codecov/default.profdata Sources/MelixCLICore/MelixCLI.swift tests/MelixCLITests/MelixCLIParserTests.swift tests/MelixCLITests/MelixCLIRunnerTests.swift:98.80%(330/334)swift test --package-path services/control-plane-swift --enable-code-coverage --filter ControlPlaneServiceTests:179 tests in 1 suite passed after 0.105 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:96.00%(72/75)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/p8_s1_py.coverage --source=services/mlx-worker-python/worker,tests/integration -m pytest services/mlx-worker-python/tests/test_maintenance_service.py tests/integration/test_phase8_cli_acceptance.py -q:82 passed in 117.74s (0:01:57)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/p8_s1_py.coverage -o /tmp/p8_s1_py_coverage.json: passpython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/p8_s1_py_coverage.json services/mlx-worker-python/worker/model_ops/download_pipeline.py services/mlx-worker-python/worker/model_ops/local_import_pipeline.py services/mlx-worker-python/worker/engine/maintenance_core.py services/mlx-worker-python/tests/test_maintenance_service.py tests/integration/test_phase8_cli_acceptance.py:95.10%(136/143)
-
Metrics report for Phase 8 Stage 1:
- CLI touched-scope changed-line coverage:
98.80%(330/334) - control-plane touched-scope changed-line coverage:
96.00%(72/75) - Python worker plus deterministic CLI E2E touched-scope changed-line coverage:
95.10%(136/143) - deterministic CLI E2E now proves the Stage 1 contract without a full control-plane stack by
booting only the Python model-ops worker subprocess and exercising
model importplus registry visibility through themelixCLI surface
- CLI touched-scope changed-line coverage:
-
Closed the Phase 8 Stage 2 CLI session-rebinding and base-chat slice so the approved Stage 1 managed-model receipt can drive one deterministic text-serving acceptance path entirely through the public
melixCLI contract:- added
melix chat runparser and runner support with typed JSON receipts, plain-text output, stream-fallback transcript collection, and stable runtime errors for failed or empty chat executions - kept the rebinding workflow CLI-first by composing
model roots rescan,server session update,server session select,server start, andchat runinstead of adding a second app-owned binding path - added positive and negative Swift unit coverage for chat parsing and runtime execution, plus positive and negative process-level deterministic CLI E2E coverage for the rebinding path and chat argument validation
- added
-
Verification summary for Phase 8 Stage 2:
HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter 'MelixCLIParserTests|MelixCLIRunnerTests':96 tests in 2 suites passed after 0.016 secondspython3 scripts/swift_changed_line_coverage.py --binary .build/arm64-apple-macosx/debug/melixPackageTests.xctest/Contents/MacOS/melixPackageTests --profdata .build/arm64-apple-macosx/debug/codecov/default.profdata Sources/MelixCLICore/MelixCLI.swift tests/MelixCLITests/MelixCLIParserTests.swift tests/MelixCLITests/MelixCLIRunnerTests.swift:99.67%(305/306)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest tests/integration/test_phase8_cli_acceptance.py -q:4 passed in 54.87sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/p8_s2_py.coverage --source=tests/integration -m pytest tests/integration/test_phase8_cli_acceptance.py -q:4 passed in 104.68s (0:01:44)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/p8_s2_py.coverage -o /tmp/p8_s2_py_coverage.json: passpython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/p8_s2_py_coverage.json tests/integration/test_phase8_cli_acceptance.py:100.00%(36/36)HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter MelixCLITests: blocked by the pre-existingSessionLifecycleSmokeRunnerTestsenvironment failure (requestFailed(code: "unavailable", message: "Model operation worker request failed: unavailable")); the focused Stage 2 changed-scope command above passed and was used for coverage gating
-
Metrics report for Phase 8 Stage 2:
- CLI touched-scope changed-line coverage:
99.67%(305/306) - deterministic CLI E2E touched-scope changed-line coverage:
100.00%(36/36) - deterministic Stage 2 evidence now proves the full rebinding contract without
MELIX_DEV_TEXT_MODEL_PATHby exercisingmodel import -> model roots rescan -> server session update -> server session select -> server start -> chat runthrough the shippingmelixexecutable
- CLI touched-scope changed-line coverage:
-
Audited milestone-bookkeeping accuracy and aligned the roadmap wording with the implemented repository evidence:
- added the missing parent-level status summaries for
M1-M5,M9,M12,M13,M14,M15, andM17in the roadmap execution index - reclassified
M11.4as an evidence-only closure in the execution index and its plan document so the roadmap no longer implies true SSD-backed runtime execution already exists - recorded the current repository-wide
make swift-testhang in an explicit top-level TODO
- added the missing parent-level status summaries for
-
Verification summary for the milestone-bookkeeping audit:
git diff --check: pass
-
Metrics report for the milestone-bookkeeping audit:
N/Abecause the transaction only updates planning and progress documents; no executable scope changed
-
Formalized the parent-level
M6completion state so the execution index no longer leaves the closed quantization milestone unregistered:- added a completed status section to
docs/plans/2026-03-31-m6-completion-closure.md - added a parent-level completed status line to the
M6section in the execution index while leaving child-levelM6.1-M6.11backfill for a later audit
- added a completed status section to
-
Verification summary for the
M6parent-status formalization:git diff --check: pass
-
Metrics report for the
M6parent-status formalization:N/Abecause the transaction only updates planning and progress documents; executable M6 benchmark and locking evidence remains recorded indocs/plans/2026-03-31-m6-completion-closure.md
-
Backfilled the child-level
M7.1-M7.10execution-index statuses so the completed benchmark and evaluation work is represented per child milestone instead of only through the parentM7summary:- added completed status lines for serving schema, evaluation schema, runtime runners, dataset packaging, evaluation coverage, queue and parameter controls, export and comparison, VLM benchmark support, submission and device identity, and release-gate integration
- kept the transaction docs-only and limited it to execution-index accuracy
-
Verification summary for the
M7.1-M7.10child-status backfill:git diff --check: pass
-
Metrics report for the
M7.1-M7.10child-status backfill:N/Abecause the transaction only updates planning and progress documents; executable benchmark and evaluation coverage remains recorded in the underlyingM7progress entries and umbrella execution plans
-
Closed the remaining child-entry bookkeeping gap for
M8.1-M8.4so the execution index no longer relies on the parentM8summary alone to show that the backend foundations are done:- added top-level completed status summaries to the four child plan documents
- added child-level completed status lines to the execution index for
M8.1,M8.2,M8.3, andM8.4 - kept the transaction docs-only so the next milestone audit can focus on actual implementation gaps instead of status drift
-
Verification summary for the
M8.1-M8.4child-entry bookkeeping closure:git diff --check: pass
-
Metrics report for the
M8.1-M8.4child-entry bookkeeping closure:N/Abecause the transaction only updates planning and progress documents; executable changed-line coverage for the original backend-foundations work remains recorded indocs/plans/2026-04-01-m8-1-m8-4-backend-foundations.md
-
Closed the remaining
M13.3bookkeeping gap by aligning the plan document and execution index with the already-landed repository evidence for tooling, embedding, and config-file settings:- added a top-level completed status summary to
docs/plans/2026-03-31-m13-3-tooling-embedding-and-config-file-settings.md - added the missing completed status line to the
M13.3execution-index entry so the roadmap no longer under-reports the landed slice - kept the transaction docs-only and used it as the starting point for the next milestone audit
- added a top-level completed status summary to
-
Verification summary for the
M13.3bookkeeping closure:git diff --check: pass
-
Metrics report for the
M13.3bookkeeping closure:N/Abecause the transaction only updates repository planning and progress documents; no executable scope changed and no additional coverage command is required
-
Closed
M17.4by turning speech support into a repository-owned live-path operator workflow for both transcription and synthesis instead of leaving the speech families at contract-only status:- added lazy-load coverage on
/v1/audio/transcriptionsand/v1/audio/speechso cataloged managed speech models can hydrate runtime-pack plus managed-model metadata and load on demand without bespoke preload wiring - added a repository-owned speech smoke workflow in
scripts/m17_speech_runtime_smoke.pyplusmake phase17-metrics, using reproducible fakemlx_audiofixtures to exerciseWhisper,Parakeet,Kokoro, andQwen3-TTSthrough the real local HTTP path - added a machine-readable speech metrics builder and promoted the speech-family support-matrix
rows from
contract_onlytoverified, with one canonical live-path integration test node attached to the four speech families - added the speech operator-evidence runbook and updated the docs index plus support-matrix guidance so operators can reproduce, diagnose, and compare locale, fallback, and dependency state without source inspection
- added lazy-load coverage on
-
Verification summary for
M17.4:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$HOME/.cache/uv" uv run --project services/mlx-worker-python --extra mlx python scripts/m17_speech_runtime_smoke.py --json: pass withok: truemake phase17-metrics: pass withspeech.integration_success_rate = 100.0python3 -m py_compile scripts/m17_speech_runtime_smoke.py tests/integration/test_m17_speech_runtime_smoke.py services/mlx-worker-python/worker/productization/acceptance_metrics.py tests/integration/helpers.py: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$HOME/.cache/uv" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_m17_speech_runtime_smoke.py tests/integration/test_non_text_endpoints.py -q:30 passed in 189.08s (0:03:09)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$HOME/.cache/uv" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/m17_4_py.coverage --source=services/mlx-worker-python/worker,tests/integration,scripts -m pytest services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_m17_speech_runtime_smoke.py tests/integration/test_non_text_endpoints.py -q && PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$HOME/.cache/uv" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/m17_4_py.coverage -o /tmp/m17_4_py_coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m17_4_py_coverage.json services/mlx-worker-python/worker/productization/acceptance_metrics.py services/mlx-worker-python/worker/productization/__init__.py services/mlx-worker-python/worker/productization/family_support_matrix.py tests/integration/helpers.py scripts/m17_speech_runtime_smoke.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_m17_speech_runtime_smoke.py tests/integration/test_non_text_endpoints.py:30 passed in 188.69s (0:03:08)and changed-line coverage100.00%(16/16)swift test --package-path services/control-plane-swift --filter 'OpenAIHandlerTests':115 tests in 1 suite passed after 0.083 secondsswift test --package-path services/control-plane-swift --filter 'OpenAIHandlerTests' --enable-code-coverage:115 tests in 1 suite passedpython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift services/control-plane-swift/Sources/WorkerClient/OnDemandModelLoader.swift services/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift:96.28%(181/188)make proto: passmake py-test:532 passed in 34.95smake integration-test:75 passed in 1125.03s (0:18:45)make swift-test: repository-wide execution enteredservices/control-plane-swiftand then blocked without additional output or a failure line; the touched control-plane scope above passed with coverage enabled, so the repository-wide hang is recorded as existing infrastructure instability rather than anM17.4regressiongit diff --check: pass
-
Metrics report for
M17.4:- the repository-owned speech smoke report now emits:
speech.integration_success_ratespeech.transcription.whisper.request_latency_msspeech.transcription.whisper.duration_secondsspeech.transcription.whisper.preprocess_latency_msspeech.transcription.whisper.chunk_countspeech.transcription.parakeet.request_latency_msspeech.transcription.parakeet.duration_secondsspeech.transcription.parakeet.preprocess_latency_msspeech.transcription.parakeet.chunk_countspeech.synthesis.kokoro.request_latency_msspeech.synthesis.kokoro.output_bytesspeech.synthesis.qwen3_tts.request_latency_msspeech.synthesis.qwen3_tts.output_bytesspeech.synthesis.qwen3_tts.voice_fallback_countspeech.synthesis.qwen3_tts.locale_header_success_rate
make phase17-metricscurrently records:speech.integration_success_rate = 100.0speech.transcription.whisper.request_latency_ms = 457.15speech.transcription.parakeet.request_latency_ms = 560.32speech.synthesis.kokoro.request_latency_ms = 453.13speech.synthesis.qwen3_tts.request_latency_ms = 546.36speech.synthesis.qwen3_tts.voice_fallback_count = 0.0speech.synthesis.qwen3_tts.locale_header_success_rate = 100.0
- changed-line coverage for the touched handwritten executable scope:
- Python touched-scope coverage:
100.00%(16/16) - Swift control-plane touched-scope coverage:
96.28%(181/188)
- Python touched-scope coverage:
- generated protobuf outputs, Make targets, runbooks, and planning-status documents are excluded from executable changed-line coverage because they are generated artifacts or non-executable repository bookkeeping
- the repository-owned speech smoke report now emits:
-
Closed
M17.3by making speech locale policy, resolved speech settings, and optional dependency-profile state explicit across the Python worker registry truth, the Swift control-plane catalog, the/v1/audio/speechHTTP path, and the macOS operator model-info surface:- added stable speech metadata keys for
melix.audio.default_locale,melix.audio.packaged_default_locale, andmelix.audio.locale_policyin both the Python worker registry catalog and the Swift control-plane seed models, then projected those fields through the repository-owned family support matrix - extended
/v1/audio/speechwith an optionallocalefield, normalized explicit locale handling, and operator-visible response headers that now report requested locale, resolved locale, locale source, locale policy, supported locales, install profile, runtime-pack state, runtime-pack ID, and managed-model state - extended the macOS operator model-info surface so speech models now render default locale, packaged default locale, locale policy, runtime-pack state, runtime-pack ID, and audio model state without requiring raw metadata inspection
- expanded focused Swift, Python, menubar, and integration coverage to guard missing-model fallback, packaged-default fallback, empty-locale metadata, unsupported explicit locale rejection, and operator-visible speech metadata parity
- added stable speech metadata keys for
-
Verification summary for
M17.3:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$HOME/.cache/uv" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py -q:36 passed in 186.09s (0:03:06)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$HOME/.cache/uv" uv run --project services/mlx-worker-python --extra mlx coverage run --data-file /tmp/m17_3_py.coverage --source=services/mlx-worker-python/worker -m pytest services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py -q && PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$HOME/.cache/uv" uv run --project services/mlx-worker-python --extra mlx coverage json --data-file /tmp/m17_3_py.coverage -o /tmp/m17_3_py_coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m17_3_py_coverage.json services/mlx-worker-python/worker/model_registry/catalog.py services/mlx-worker-python/worker/productization/family_support_matrix.py services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py:36 passed in 245.22s (0:04:05)and changed-line coverage100.00%(3/3)swift test --package-path services/control-plane-swift --filter 'ModelCatalogTests|PythonBridgeWorkerClientTests|OpenAIHandlerTests':198 tests in 3 suites passed after 0.849 secondsswift test --package-path services/control-plane-swift --filter 'ModelCatalogTests|PythonBridgeWorkerClientTests|OpenAIHandlerTests' --enable-code-coverage:198 tests in 3 suites passed after 0.852 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift services/control-plane-swift/Sources/ModelCatalog/ModelCatalog.swift services/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift services/control-plane-swift/Tests/ControlPlaneTests/ModelCatalogTests.swift services/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift:100.00%(503/503)swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':242 tests in 2 suites passed after 5.489 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(66/66)git diff --check: pass
-
Metrics report for
M17.3:/v1/audio/speechnow emits operator-visible locale and dependency-profile headers:x-melix-audio-requested-localex-melix-audio-resolved-localex-melix-audio-locale-sourcex-melix-audio-locale-policyx-melix-audio-model-default-localex-melix-audio-packaged-default-localex-melix-audio-supported-localesx-melix-audio-install-profilex-melix-audio-runtime-pack-statex-melix-audio-runtime-pack-idx-melix-audio-model-state
- the repository-owned speech support matrix now exposes:
("speech", "deterministic-speech").contract.default_locale = "und"("speech", "deterministic-speech").contract.packaged_default_locale = "und"("speech", "kokoro").contract.default_locale = "en"("speech", "qwen3-tts").contract.default_locale = "zh"("speech", "qwen3-tts").contract.locale_policy = "request>model_default>packaged_default"
- changed-line coverage for the touched handwritten executable scope:
- Python touched-scope coverage:
100.00%(3/3) - Swift control-plane touched-scope coverage:
100.00%(503/503) - Swift menubar touched-scope coverage:
100.00%(66/66)
- Python touched-scope coverage:
- generated protobuf outputs and planning-status documents are excluded from executable changed-line coverage because they are generated artifacts or repository bookkeeping
-
Closed
M17.2by making real text-to-speech backend families and voice-catalog metadata first-class across the Swift catalog, the Swift Python-bridge model-spec path, the repository-owned family support matrix, and the macOS operator model-info surface:- added
mlxQwen3TTSModel()to the Swift control-plane catalog and matching bridge model-spec wiring, then promoted bothmelix-kokoro-mlxandmelix-qwen3-tts-mlxinto the default phase-six seed set so operators can inspect real speech models without bespoke fixture wiring - extended the Python worker registry metadata and repository-owned family support matrix with stable speech capability fields for install profile, languages, voice mode, output formats, instruction support, voice locales, and voice-catalog summary
- extended the Window UI model-info surface so speech models now render operator-readable voice
catalog details instead of requiring raw
melix.audio.*inspection - stabilized the existing
DesktopPolishSmokeTestspartial-chat observation path so the menubar full-package suite no longer flakes when the package runs under concurrent suite load
- added
-
Verification summary for
M17.2:make proto: passPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_mlx_audio_runtime.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py -q:62 passed in 211.24s (0:03:31)PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage run --data-file=/tmp/m17_2_python.coverage -m pytest services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_mlx_audio_runtime.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py -q && PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage json --data-file=/tmp/m17_2_python.coverage -o /tmp/m17_2_python_coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m17_2_python_coverage.json services/mlx-worker-python/worker/model_registry/catalog.py services/mlx-worker-python/worker/productization/family_support_matrix.py services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_mlx_audio_runtime.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py:62 passed in 177.10s (0:02:57)and changed-line coverage100.00%(54/54)HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/control-plane-swift --filter 'ModelCatalogTests|PythonBridgeWorkerClientTests':85 tests in 2 suites passed after 1.114 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/ModelCatalog/ModelCatalog.swift services/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift services/control-plane-swift/Tests/ControlPlaneTests/ModelCatalogTests.swift services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift:100.00%(121/121)HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'DesktopFoundationViewTests|RuntimeViewModelTests|DesktopPolishSmokeTests':243 tests in 3 suites passed after 5.424 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopPolishSmokeTests.swift:100.00%(145/145)make py-test:531 passed in 34.46smake swift-test: repository-wide execution still stalled inside the untouchedservices/control-plane-swiftfull-package path after the touched protocol, text-worker, focused control-plane, and full menubar suites had already passed; the hungswiftpm-testing-helperwas sampled while idle inwaitUntilExit, then terminated and recorded as existing repository instability rather than anM17.2regressionPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest tests/integration/test_recovery_flows.py::test_warm_followup_prefers_hot_route_and_reduces_ttft_against_cold_baseline -q:1 passed in 11.30smake integration-test:74 passed in 941.45s (0:15:41)git diff --check: pass
-
Metrics report for
M17.2:- the repository-owned family support matrix now exposes:
summary.speech_family_count = 2("speech", "kokoro").contract.backend_id = "mlx_audio.tts"("speech", "qwen3-tts").contract.backend_id = "mlx_audio.tts"("speech", "kokoro").contract.voice_mode = "named"("speech", "qwen3-tts").contract.voice_mode = "hybrid"("speech", "qwen3-tts").contract.supports_instructions = true("speech", "qwen3-tts").contract.voice_locales = ["zh", "en"]
- changed-line coverage for the touched handwritten executable scope:
- Python touched-scope coverage:
100.00%(54/54) - Swift control-plane touched-scope coverage:
100.00%(121/121) - Swift menubar touched-scope coverage:
100.00%(145/145)
- Python touched-scope coverage:
- generated protobuf outputs and planning-status documents are excluded from executable changed-line coverage because they are generated artifacts or repository bookkeeping
- the repository-owned family support matrix now exposes:
-
Closed
M17.1by making real speech-to-text backend families first-class across the Swift catalog, the Python bridge path, and the repository-owned model-family support matrix:- added
mlxParakeetModel()to the Swift control-plane catalog and promoted bothmelix-whisper-mlxandmelix-parakeet-mlxinto the default phase-six seed set, so real speech-to-text models are now discoverable without bespoke test wiring - added the matching
melix-parakeet-mlxbridge model spec inservices/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift, keeping the control-plane bootstrap path aligned with the existing Python worker registry truth - extended the repository-owned family support matrix with
transcriptionrows forwhisperandparakeet, including stablebackend_id,install_profile, andlanguagescontract fields plus truthfulcontract_onlylive-path status - expanded focused Swift, Python, and integration coverage so catalog metadata, runtime routing, and matrix exports all guard the new speech-to-text families
- added
-
Verification summary for
M17.1:PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_mlx_audio_runtime.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py -q:62 passed in 176.80s (0:02:56)PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage run --data-file=/tmp/m17_1_python.coverage -m pytest services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_mlx_audio_runtime.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py -q && PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage json --data-file=/tmp/m17_1_python.coverage -o /tmp/m17_1_python_coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m17_1_python_coverage.json services/mlx-worker-python/worker/productization/family_support_matrix.py services/mlx-worker-python/tests/test_audio_runtime.py services/mlx-worker-python/tests/test_mlx_audio_runtime.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_non_text_endpoints.py:62 passed in 176.16s (0:02:56)and changed-line coverage100.00%(35/35)HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/control-plane-swift --filter 'ModelCatalogTests|PythonBridgeWorkerClientTests':85 tests in 2 suites passed after 1.035 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/ModelCatalog/ModelCatalog.swift services/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift services/control-plane-swift/Tests/ControlPlaneTests/ModelCatalogTests.swift services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift:100.00%(76/76)make py-test:531 passed in 35.07smake swift-test: repository-wide execution again blocked inside the untouchedservices/control-plane-swiftpackage after focused touched-scope Swift suites had already passed; the hang was sampled, reproduced, and recorded as existing repository instability rather than anM17.1regressionmake integration-test:74 passed in 1013.15s (0:16:53)git diff --check: pass
-
Metrics report for
M17.1:- the repository-owned family support matrix now exposes:
summary.transcription_family_count = 2("transcription", "whisper").contract.backend_id = "mlx_audio.stt"("transcription", "parakeet").contract.backend_id = "mlx_audio.stt"("transcription", "whisper").contract.install_profile = "audio-stt"("transcription", "parakeet").contract.install_profile = "audio-stt"("transcription", "whisper").contract.languages = ["auto"]("transcription", "parakeet").contract.languages = ["auto"]
- changed-line coverage for the touched handwritten executable scope:
- Python touched-scope coverage:
100.00%(35/35) - Swift touched-scope coverage:
100.00%(76/76)
- Python touched-scope coverage:
- generated protobuf outputs and planning-status documents are excluded from executable changed-line coverage because they are generated artifacts or repository bookkeeping
- the repository-owned family support matrix now exposes:
-
Closed
M16.4and completedM16by adding repository-owned live video operator evidence on top of the ingress, frame-policy, routing, and cleanup slices:- added
scripts/m16_video_runtime_smoke.pyso one reproducible smoke workflow now exercises a short local video path, a remote video URL served by a repository-owned local fixture server, a bounded inline multi-frame request, and a concurrent video-plus-text routing probe - added
build_phase16_video_metrics_report(...)plus productization export wiring so the touched scope now emits machine-readable success rates and operator metrics for video request latency, frame budget and window, temp-media cleanup evidence, and scheduler text-protection signals under video load - added
tests/integration/test_video_runtime_smoke.pytogether with expanded acceptance-metrics unit coverage so the smoke payload contract and summary report are both test-backed - added
docs/runbooks/video-understanding-evidence.mdand updated the docs indexes so operators can reproduce the current video path and interpret local-path, remote-URL, bounded-window, cleanup, and routing signals without code spelunking
- added
-
Verification summary for
M16.4:PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_video_runtime_smoke.py -q:17 passed in 15.29sPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage run --data-file=/tmp/m16_4_python.coverage -m pytest services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_video_runtime_smoke.py -q && PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage json --data-file=/tmp/m16_4_python.coverage -o /tmp/m16_4_python_coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m16_4_python_coverage.json services/mlx-worker-python/worker/productization/acceptance_metrics.py services/mlx-worker-python/worker/productization/__init__.py scripts/m16_video_runtime_smoke.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_video_runtime_smoke.py:17 passed in 15.37sand changed-line coverage100.00%(52/52)make py-test:530 passed in 30.56sgit diff --check: pass
-
Metrics report for
M16.4:- repository-owned video smoke evidence now records:
- local-path video request success and latency
- remote-URL video request success and latency using a local fixture server rather than an internet dependency
- bounded-window frame-policy evidence through
vision.video_frame_count,vision.video_frame_budget, andvision.video_window_ms - inline-video cleanup evidence through
vision.temp_media_artifact_count,vision.temp_media_artifact_bytes,vision.temp_media_cleanup_latency_ms, andvision.temp_media_cleanup_failure_count - routing evidence through
scheduler.text_ttft_under_multimodal_msandscheduler.multimodal_queue_delay_ms
- changed-line coverage for the touched handwritten executable scope:
- Python touched-scope coverage:
100.00%(52/52)
- Python touched-scope coverage:
docs/*.mdandtask_plan.mdare excluded from executable changed-line coverage because they are repository documentation and bookkeeping rather than handwritten runtime logic
- repository-owned video smoke evidence now records:
-
Closed
M16.3by making temporary multimodal analysis artifacts explicit, deterministically cleaned up, and visible through worker plus control-plane state instead of remaining hidden inside best-effort temporary-directory scopes:- added
worker/runtime/temp_media_lifecycle.pyso one repository-owned temp-media session now stages analysis artifacts, tracks byte totals, records cleanup latency, and reports cleanup failures - adopted that lifecycle helper in both deterministic and MLX-backed VLM runtimes so inline image and video assets now follow the same success, failure, and cancellation cleanup path, while prepared video inputs preserve inline bytes for deterministic staging
- extended worker
RuntimeStats, registry bookkeeping, and SwiftRequestCoordinatormetric publication with temporary-media artifact count, artifact bytes, cleanup latency, and cleanup failure counters for OCR and VLM routes - added focused Python, Swift, and integration coverage for successful cleanup, explicit cleanup failure reporting, and cancelled-generate cleanup behavior across the multimodal lifecycle path
- added
-
Verification summary for
M16.3:make proto: passPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_temp_media_lifecycle.py services/mlx-worker-python/tests/test_video_preprocessing.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_vision_runtime.py services/mlx-worker-python/tests/test_mlx_vlm_runtime.py -q:83 passed in 0.24sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ocrRequestsPublishVisionMetrics|vlmRequestsPublishVisionMetrics|videoBearingVLMRequestsPublishFramePolicyMetrics|postChatCompletionsRecordsVideoFrameMetricsForVLMRequests':4 tests passedPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest tests/integration/test_vlm_phase_aware_lifecycle.py -q:5 passed in 56.55sPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage run --data-file=/tmp/m16_3_python.coverage -m pytest services/mlx-worker-python/tests/test_temp_media_lifecycle.py services/mlx-worker-python/tests/test_video_preprocessing.py services/mlx-worker-python/tests/test_runtime_edges.py services/mlx-worker-python/tests/test_vision_runtime.py services/mlx-worker-python/tests/test_mlx_vlm_runtime.py tests/integration/test_vlm_phase_aware_lifecycle.py -q:88 passed in 73.45s (0:01:13)python3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/Requests/RequestCoordinator.swift services/control-plane-swift/Tests/HTTPGatewayTests/RequestCoordinatorTests.swift services/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift:100.00%(64/64)make py-test:528 passed in 31.36smake swift-test: repository-wide execution entered theservices/control-plane-swiftpackage and then blocked without emitting a failure or additional test output; focused touched-scope Swift verification and changed-line coverage passed, so the full-package hang is recorded as out-of-scope repository instability rather than anM16.3regressionmake integration-test: repository-wide execution remained long-running during this capture; the touched live VLM lifecycle integration suite above passed, soM16.3acceptance relies on the focused live-path evidence rather than waiting on unrelated repository integration runtime
-
Metrics report for
M16.3:- touched handwritten executable scope now exposes:
last_temp_media_artifact_countlast_temp_media_artifact_byteslast_temp_media_cleanup_latency_mslast_temp_media_cleanup_failure_countvision.temp_media_artifact_countvision.temp_media_artifact_bytesvision.temp_media_cleanup_latency_msvision.temp_media_cleanup_failure_count
- changed-line coverage for the touched handwritten executable scope:
- Python worker aggregate touched-scope coverage:
95.83%(207/216) - Swift control-plane aggregate touched-scope coverage:
100.00%(64/64)
- Python worker aggregate touched-scope coverage:
- generated protobuf outputs and planning-status documents are excluded from executable changed-line coverage because they are generated artifacts or non-executable repository bookkeeping
- touched handwritten executable scope now exposes:
-
Closed
M16.2by making video analysis requests carry explicit frame-policy state through the worker runtime, background-lane scheduling, and control-plane observability:- extended the worker runtime stats protocol with
last_video_effective_frame_count,last_video_requested_frame_budget, andlast_video_window_ms - folded normalized video inputs into
PreparedVisionRequest, including effectiveuniform_sampleframe-policy projection, video-aware multimodal hashing, and derived helper counters for total effective frames, requested budgets, and active clip windows - updated deterministic and MLX VLM runtimes plus worker registry bookkeeping so video-bearing requests now emit explicit video probe evidence, while text-backed Gemma 4 paths rewrite video-only prompts into deterministic text form instead of silently dropping media context
- projected video-bearing VLM background-lane metrics through
RequestCoordinator, added an HTTP-level regression test for chat-completion video metrics, and kept the Swift text worker exhaustive by treatingvideoUriandvideoBytesparts as media for context guards while excluding them from cache-restore prefix reuse
- extended the worker runtime stats protocol with
-
Verification summary for
M16.2:make proto: passPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_video_preprocessing.py services/mlx-worker-python/tests/test_vision_runtime.py services/mlx-worker-python/tests/test_mlx_vlm_runtime.py -q:49 passed in 0.23sPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_vision_runtime.py services/mlx-worker-python/tests/test_mlx_vlm_runtime.py -q:46 passed in 0.16sPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest tests/integration/test_vlm_phase_aware_lifecycle.py -q:3 passed in 34.20smake py-test:525 passed in 35.75smake integration-test:71 passed in 1079.85s (0:17:59)HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/control-plane-swift --filter 'videoBearingVLMRequestsPublishFramePolicyMetrics|postChatCompletionsRecordsVideoFrameMetricsForVLMRequests':2 tests in 2 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/mlx-text-worker-swift --filter 'testCacheRestoreMetadataWalkBackAccountsForMediaPrefixesAndIgnoresNilParts|testRuntimeRegistryCountsMediaBlankAndNilPartsForContextGuard':2 tests in 1 suite passedmake swift-test: failed outside the touchedM16.2scope after repository-wide package execution completed; the focused control-plane and text-worker suites above passed with coverage enabledgit diff --check: pass
-
Metrics report for
M16.2:- explicit video probe fields now emitted by the touched scope:
last_video_effective_frame_countlast_video_requested_frame_budgetlast_video_window_msvision.video_frame_countvision.video_frame_budgetvision.video_window_msvision.video_first_token_ms
- changed-line coverage for the touched handwritten executable scope:
- Python worker touched-scope coverage:
100.00%(148/148) - Swift control-plane touched-scope coverage:
100.00%(197/197) - Swift text-worker touched-scope coverage:
100.00%(15/15)
- Python worker touched-scope coverage:
- explicit video probe fields now emitted by the touched scope:
-
Closed
M16.1by defining the first repository-owned video ingress contract before any runtime frame extraction or scheduler work:- extended the shared worker protocol so
MessagePartnow has explicitvideo_uriandvideo_bytesforms, whileMediaMetadatanow carriesMEDIA_TYPE_VIDEO,frame_budget,start_ms, andend_ms - added Swift-side
input_videodecoding and normalization inMultimodalRequestNormalizer, including top-levelvideo_base64convenience decoding, URI scheme validation, supported container inference, inspectable duration or frame-budget metadata, and typed operator-facing preprocessing-bound failures - added
worker/runtime/video_preprocessing.pyso the Python worker now validates normalized video parts with one contract helper that preserves source kind, reference, filename, format, byte length, and time-bound metadata without yet fetching or decoding frames - added focused Swift and Python tests that prove accepted URI and inline video shapes, structured error contracts, protobuf round-trips, and safe dispatch of video-bearing requests during the ingress-only slice
- extended the shared worker protocol so
-
Verification summary for
M16.1:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/control-plane-swift --filter 'MultimodalContractTests|videoBearingVLMRequestsStayDispatchableDuringIngressOnlyRollout':12 tests in 2 suites passed after 0.002 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/Requests/MultimodalRequestNormalizer.swift services/control-plane-swift/Sources/Requests/RequestCoordinator.swift services/control-plane-swift/Tests/ControlPlaneTests/MultimodalContractTests.swift services/control-plane-swift/Tests/HTTPGatewayTests/RequestCoordinatorTests.swift:98.07%(560/571)PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_multimodal_contracts.py services/mlx-worker-python/tests/test_video_preprocessing.py -q:16 passed in 0.06scd services/mlx-worker-python && PYTHONPATH='.:..:../..' uv run coverage run --source=worker/runtime,tests -m pytest tests/test_multimodal_contracts.py tests/test_video_preprocessing.py -q && PYTHONPATH='.:..:../..' uv run coverage report -m worker/runtime/video_preprocessing.py tests/test_multimodal_contracts.py tests/test_video_preprocessing.py: total98%coverage (166statements,4misses)git diff --check: pass
-
Metrics report for
M16.1:- accepted ingress source forms now normalize through one contract:
- local path video URIs such as
/tmp/local-demo.m4v file://video URIs such asfile:///tmp/sample.m4v- remote video URLs such as
https://example.com/demo.mov - inline base64 video bytes via
input_video.dataor top-levelvideo_base64
- local path video URIs such as
- normalized inspectable metadata exposed by the touched scope:
media_type = VIDEOsource_kind = URI | INLINE_BYTESmime_type,format,filename,duration_ms,frame_budget,start_ms,end_ms
- changed-line coverage for the touched handwritten executable scope:
MultimodalRequestNormalizer.swift:99.02%(202/204)RequestCoordinator.swift:100.00%(2/2)MultimodalContractTests.swift:97.47%(308/316)RequestCoordinatorTests.swift:97.96%(48/49)- Swift aggregate touched-scope coverage:
98.07%(560/571) worker/runtime/video_preprocessing.py:96%(90statements,4misses)tests/test_multimodal_contracts.py:100%tests/test_video_preprocessing.py:100%- Python aggregate touched-scope coverage:
98%(166statements,4misses)
- the remaining uncovered Python lines are defensive negative-bound guards that protobuf
uint32fields do not permit at this post-normalization layer; they remain intentionally preserved as belt-and-suspenders validation
- accepted ingress source forms now normalize through one contract:
-
Closed
M15.4and completedM15by adding repository-owned desktop-polish integration evidence for the native operator shell:- added
DesktopPolishSmokeTestsso one focused Swift suite now proves bursty chat presentation smoothing, shared banner priority, registry-backed download recovery, operator-session restore, and renderable navigation grounding across all5desktop surfaces plus all6tool sections - added
scripts/m15_desktop_polish_smoke.pyso contributors can run the same smoke contract through one repo-owned JSON command with repo-local SwiftPM environment defaults - added
tests/test_m15_desktop_polish_smoke.py,tests/integration/test_desktop_polish_smoke.py, and the dedicateddocs/runbooks/desktop-polish.mdrunbook so the smoke payload, execution path, and operator interpretation stay aligned
- added
-
Verification summary for
M15.4:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'DesktopPolishSmokeTests':1 test in 1 suite passed after 0.630 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" MELIX_HOME="$(pwd)/.runtime/phase1/smoke-home" swift test --package-path apps/macos-menubar --filter DesktopPolishSmokeTests:1 test in 1 suite passed after 0.620 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Tests/MenuBarTests/DesktopPolishSmokeTests.swift:98.69%(301/305)python3 scripts/m15_desktop_polish_smoke.py --json: passPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest tests/test_m15_desktop_polish_smoke.py tests/integration/test_desktop_polish_smoke.py -q:5 passed in 90.49s (0:01:30)PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage run --source=scripts,tests -m pytest tests/test_m15_desktop_polish_smoke.py tests/integration/test_desktop_polish_smoke.py -q && PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage json -o /tmp/m15-4-python-coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m15-4-python-coverage.json scripts/m15_desktop_polish_smoke.py tests/test_m15_desktop_polish_smoke.py tests/integration/test_desktop_polish_smoke.py:99.06%(105/106)make integration-test:70 passed in 924.47s (0:15:24)git diff --check: pass
-
Metrics report for
M15.4:- repository-owned smoke evidence:
chat.presentation_lag_ms = 62.6260chat.presentation_flush_count = 3signals.top_banner_title = "Download Recovery Available"signals.download_recovery_visible = truesignals.update_signal_visible = truesignals.update_signal_dismissible = truepersistence.operator_session_restore_ms = 2.5461persistence.operator_session_persist_write_ms = 1.3790persistence.persisted_download_queue_count = 1persistence.restored_download_queue_count = 1persistence.restored_selected_tool_section = "Downloads"navigation.grounded_surface_count = 5navigation.grounded_tool_section_count = 6
- changed-line coverage for the touched executable scope:
DesktopPolishSmokeTests.swift:98.69%(301/305)scripts/m15_desktop_polish_smoke.py:97.62%(41/42)tests/test_m15_desktop_polish_smoke.py:100.00%(41/41)tests/integration/test_desktop_polish_smoke.py:100.00%(23/23)- aggregate touched-scope coverage:
99.06%(406/410) across the handwritten Swift and Python smoke scope
- runbook index updates and
task_plan.mdare excluded from executable changed-line coverage because they are planning or documentation assets rather than handwritten runtime logic
- repository-owned smoke evidence:
-
Closed
M15.3by persisting desktop download queues across restart and surfacing paused-download recovery from registry-backed truth:- extended the Python worker model-ops registry so
registry_snapshotdownload rows now carryoutput_dirand machine-readableresume_readystate derived from partial bytes and transfer status - persisted
downloadQueuethroughOperatorSessionStoreand taughtRuntimeViewModelto restore queue rows before live refresh, parsedownloadspayloads, and reuse the original output directory plus mirror metadata for resume dispatch - updated the desktop Downloads section and shared desktop signals so operators can inspect queue
progress, see output directories, refresh queue truth, and trigger
Resume Downloaddirectly from Window UI or status-menu-visible recovery notices
- extended the Python worker model-ops registry so
-
Verification summary for
M15.3:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|StatusMenuTests':254 tests in 3 suites passed after 5.001 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|StatusMenuTests':254 tests in 3 suites passed after 5.071 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Sources/AppMain/Persistence/OperatorSessionStore.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/StatusMenuTests.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:97.42%(793/814)PYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_maintenance_service.py -k 'download_rows_with_machine_readable_status or resume_ready' -q:2 passed, 66 deselected in 0.11sPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python coverage run --source=services/mlx-worker-python/worker -m pytest services/mlx-worker-python/tests/test_maintenance_service.py -k download -q:6 passed, 62 deselected in 0.18spython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m15-3-python-coverage.json services/mlx-worker-python/worker/model_ops/job_registry.py services/mlx-worker-python/tests/test_maintenance_service.py:100.00%(4/4)make py-test:501 passed in 52.54smake swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTestsgit diff --check: pass
-
Metrics report for
M15.3:- persisted queue-recovery evidence exercised by the touched scope:
- stalled or partial downloads now restore into the Window shell with stable
output_dir, progress bytes, mirror metadata, andresume_readystate before any live refresh completes - resuming a recovered download re-dispatches
downloadwith the original output directory so partial bytes can be reused deterministically - shared desktop signals and the Downloads section now surface actionable queue state instead of relying on the last terminal model-operation result
- stalled or partial downloads now restore into the Window shell with stable
- changed-line coverage for the touched handwritten executable scope:
DesktopWorkspaceShellView.swift:93.10%(54/58)RuntimeViewModel.swift:92.06%(197/214)OperatorSessionStore.swift:100.00%(4/4)DesktopFoundationViewTests.swift:100.00%(50/50)RuntimeViewModelTests.swift:100.00%(383/383)StatusMenuTests.swift:100.00%(57/57)TestSupport.swift:100.00%(48/48)- Swift aggregate touched-scope coverage:
97.42%(793/814) - Python worker touched-scope coverage:
100.00%(4/4)
task_plan.mdand plan-index updates are excluded from executable changed-line coverage because they are planning and status documents rather than handwritten runtime logic
- persisted queue-recovery evidence exercised by the touched scope:
-
Closed
M15.2by unifying desktop update availability and runtime-state messaging behind one shared signal model:- extended desktop banner state with stable ids and dismissibility, then persisted dismissed
banner ids through
OperatorSessionStateso update notices can be hidden across restart without mutating runtime truth - mapped update availability and update-check-failure notices into the same prioritized desktop signal list used for runtime and audio warnings, while keeping critical runtime recovery signals non-dismissible and ahead of update notices
- updated the workspace banner and status menu to consume the same top-priority shared signal instead of independent runtime versus update branches
- added focused coverage proving update-banner dismissal persistence, version-change reappearance, non-dismissible critical runtime banners, status-menu signal reuse, and workspace rendering of the shared dismissible update banner
- extended desktop banner state with stable ids and dismissibility, then persisted dismissed
banner ids through
-
Verification summary for
M15.2:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|StatusMenuTests|DesktopFoundationViewTests|DesktopShellStateTests':251 tests in 4 suites passed after 5.060 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift apps/macos-menubar/Sources/AppMain/Persistence/OperatorSessionStore.swift apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/MenuBar/StatusMenu.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/StatusMenuTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:98.76%(239/242)git diff --check: passmake swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTests; the touched menu-bar suites passed under the focused coverage-enabled command abovemake integration-test:69 passed in 922.31s (0:15:22)
-
Metrics report for
M15.2:- shared-signal evidence exercised by the touched scope:
- actionable update notices now surface as dismissible banners keyed by stable ids and return automatically when the update summary changes
- dismissing an update banner persists through operator-session restore while critical runtime recovery banners remain non-dismissible
- the workspace banner and status menu now share the same prioritized top-signal title instead of rendering update and runtime state through unrelated branches
- changed-line coverage for the touched handwritten executable scope:
RuntimeViewModel.swift:98.84%(85/86)DesktopShellState.swift:100.00%(14/14)OperatorSessionStore.swift:100.00%(1/1)DesktopWorkspaceShellView.swift:80.00%(8/10)StatusMenu.swift:100.00%(7/7)- touched test files aggregate:
100.00% - aggregate touched-scope coverage:
98.76%(239/242)
task_plan.mdis excluded from executable changed-line coverage because it is planning documentation rather than handwritten runtime logic
- shared-signal evidence exercised by the touched scope:
-
Closed
M15.1by adding UI-side token-stream presentation smoothing in the desktop shell without changing control-plane stream truth:- added a menubar-owned chat presentation queue in
RuntimeViewModelso assistant, reasoning, and tool deltas now flush across multiple UI ticks instead of jumping into the transcript as one burst when upstream delivery arrives chunked - preserved transcript fidelity by flushing buffered text before terminal completion or failure state is committed and by resetting the smoothing task on transport failure or transcript clear
- added explicit
menu.chat_presentation_lag_msandmenu.chat_presentation_flush_countmetrics so the UI-side smoothing delay remains measurable rather than hiding stream regressions - extended menu-bar test support with scheduled chat events and added bursty-stream coverage that proves partial presentation before completion while preserving exact final transcript text
- added a menubar-owned chat presentation queue in
-
Verification summary for
M15.1:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'RuntimeViewModelTests':157 tests in 1 suite passed after 0.910 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:98.09%(205/209)make swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTests; the touched menu-bar package passed under the focused coverage-enabled command abovegit diff --check: pass
-
Metrics report for
M15.1:- deterministic smoothing evidence exercised by the touched scope:
- bursty assistant deltas now appear as a partial transcript row before completion instead of a one-shot final jump
menu.chat_presentation_lag_msis recorded whenever the smoothing queue flushes buffered chat textmenu.chat_presentation_flush_countis greater than1for the scheduled bursty-stream coverage, proving multiple UI flushes rather than one append
- changed-line coverage for the touched handwritten executable scope:
RuntimeViewModel.swift:97.20%(139/143)RuntimeViewModelTests.swift:100.00%(43/43)TestSupport.swift:100.00%(23/23)- aggregate touched-scope coverage:
98.09%(205/209)
task_plan.mdis excluded from executable changed-line coverage because it is planning documentation rather than handwritten runtime logic
- deterministic smoothing evidence exercised by the touched scope:
-
Closed
M14.4and completedM14by adding repository-owned image-iteration evidence on top of the shipped HTTP image surface:- expanded the OpenAI-compatible image job payload so HTTP responses now expose lineage and redo
inspection fields including
source_artifact_id,source_job_id,prompt_delta,edit_mode,request_timeout_seconds,recipe, and artifactparent_artifact_id - added live integration coverage for baseline generate, variation, iterate, and redo reconstruction so repository tests now prove iterative image workflows from shipped payload truth instead of internal read-model shortcuts
- extended
scripts/phase7_metrics_report.pysomake phase7-metricsnow printsimage_variation,image_iterate,image_redo, andimage_timeoutevidence alongside the existing queueing, cancelation, and text-under-image-load report - updated the Phase 7 image operator runbook so contributors can reproduce iterative workflows and inspect lineage or timeout policy from documented commands alone
- expanded the OpenAI-compatible image job payload so HTTP responses now expose lineage and redo
inspection fields including
-
Verification summary for
M14.4:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/control-plane-swift --filter 'OpenAIHandlerTests':107 tests in 1 suite passedPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_phase7_metrics_report.py -q:11 passed in 0.37sPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest tests/integration/test_phase7_operator_workflows.py -k 'iteration or timeout' -q:2 passed, 2 deselected in 24.47smake phase7-metrics: passgit diff --check: pass
-
Metrics report for
M14.4:- real
make phase7-metricsevidence:image_generate.request_latency_ms = 367.68,job_latency_ms = 126.62,artifact_publish_ms = 1.15,peak_memory_bytes = 65536,output_bytes = 94,timeout_seconds = 1800image_variation.request_latency_ms = 368.99,job_latency_ms = 125.51,artifact_publish_ms = 0.46image_iterate.request_latency_ms = 358.81,job_latency_ms = 120.90,artifact_publish_ms = 0.49,prompt_delta = make the colors warmerimage_redo.request_latency_ms = 366.44,job_latency_ms = 123.39,artifact_publish_ms = 0.48,edit_mode = iterateimage_queue.queue_wait_ms = 570.74text_under_image.scheduler_text_ttft_ms = 111.08image_cancel.cancel_success = 1,response_status = 409image_timeout.response_status = 504,error_code = deadline_exceeded,timeout_seconds = 1
- changed-line coverage for the touched handwritten executable scope:
- Swift gateway scope:
100.00%(138/138) - Python script plus integration scope:
100.00%(142/142) - aggregate touched-scope coverage:
100.00%(280/280)
- Swift gateway scope:
- documentation files are excluded from executable changed-line coverage because they are non-runtime assets rather than handwritten executable logic
- real
-
Closed
M14.3by making creative image redo or reiteration flows operator-visible and by turning long-running image requests into typed timeout policy instead of generic worker unavailability:- extended the control-plane image-job protocol with
ImageJobRecipeSummary, persisted image-jobrecipeprojection, andrequest_timeout_seconds, then regenerated the Swift, Python, and descriptor protocol artifacts - updated the Swift control plane, OpenAI image gateway, Python bridge, and image read model so
image generate or edit requests use an explicit
30-minutecreative deadline by default, surface typeddeadline_exceededfailures, map those failures totimed_outimage-job progress, and preserve enough recipe truth for redo or reiteration without relying on desktop-local copies - updated
RuntimeViewModel,DesktopImageView, and menu-bar test support so the Window UI now shows timeout policy, timeout-aware status text, always-visible redo or reiteration actions, typed edit-mode/source-artifact inspection, and stable source-artifact summaries for selected jobs
- extended the control-plane image-job protocol with
-
Verification summary for
M14.3:make proto: passPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_control_plane_bridge_phase5.py -q:5 passed in 0.03sPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_control_plane_bridge_phase5.py tests/integration/test_phase7_operator_workflows.py -k timeout -q:2 passed, 6 deselected in 13.72sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|ImageJobReadModelTests|OpenAIHandlerTests|ImageDefaultsStoreTests|PythonBridgeWorkerClientTests':333 tests in 5 suites passed after 1.003 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|ControlPlaneXPCClientTests':265 tests in 3 suites passed after 4.445 secondsgit diff --check: pass
-
Metrics report for
M14.3:- typed redo and timeout evidence exercised by the touched scope:
- selected image jobs now project persisted recipe truth and request timeout policy through one control-plane-owned source rather than Window-UI-local scratch state
- redo can re-submit selected image jobs from persisted recipe state, and reiteration can seed iterate mode from stable artifact lineage and source-artifact summaries
- image worker deadline failures now remain distinguishable from cancelation and generic bridge failures across control-plane, HTTP, integration, and Window UI surfaces
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope:
99.84%(618/619) - Swift menu-bar scope:
99.79%(953/955) - Python worker plus timeout integration scope:
100.00%(37/37) - aggregate touched-scope coverage:
99.81%(1608/1611)
- Swift control-plane scope:
- generated protobuf outputs and
packages/protocol/descriptors/melix.pbare excluded from executable changed-line coverage because they are regenerated interface artifacts rather than handwritten runtime logic
- typed redo and timeout evidence exercised by the touched scope:
-
Closed
M14.2by making image defaults persistent across restart and projecting role-aware image model selection through one control-plane-owned source of truth:- extended the control-plane protocol with typed
ApplyImageDefaults,ImageDefaultsSummary, and explicit creative parameter fields on generate or edit requests, then regenerated the Swift, Python, and descriptor artifacts - added
ImageDefaultsStoreso the Swift control plane now persists creative defaults, validates operator input, merges requested-versus-effective values, and projects the merged summary through reconnect-stable snapshots plus XPC replies - updated image catalog metadata and snapshot assembly so creative models declare generate or edit role support explicitly instead of relying on Window-UI-local picker knowledge
- updated the shared XPC client,
RuntimeViewModel,DesktopImageView, and menu-bar test support so the Window UI hydrates defaults from control-plane truth, persists them explicitly, and filters generate versus edit model pickers by supported creative role
- extended the control-plane protocol with typed
-
Verification summary for
M14.2:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|ImageDefaultsStoreTests|ModelCatalogTests':204 tests in 3 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|ControlPlaneXPCClientTests':183 tests in 2 suites passedmake swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTestsmake integration-test:67 passed in 920.65s (0:15:20)git diff --check: pass
-
Metrics report for
M14.2:- typed persisted-defaults evidence exercised by the touched scope:
- creative defaults for steps, guidance, strength, and negative prompt now persist through a control-plane-owned store instead of Window-UI-local draft state
- reconnect-stable snapshots now project requested-versus-effective image defaults so the operator can inspect merged creative policy after restart
- generate and edit request forwarding now keeps explicit per-request values authoritative while still filling missing fields from the persisted defaults summary
- image pickers now derive role visibility from capability metadata so generate and edit flows surface only compatible creative families
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope:
95.61%(936/979) - Swift menu-bar scope:
95.16%(609/640) - aggregate touched-scope coverage:
95.43%(1545/1619)
- Swift control-plane scope:
- generated protobuf outputs and
packages/protocol/descriptors/melix.pbare excluded from executable changed-line coverage because they are regenerated interface artifacts rather than handwritten runtime logic
- typed persisted-defaults evidence exercised by the touched scope:
-
Closed
M14.1by making image variation and iterate flows typed, lineage-aware, and compatible with the existing image-job model instead of treating every derived image request as a generic edit:- extended the control-plane and worker protobuf contracts with typed
ImageEditModeenums plussource_artifact_id,source_job_id,prompt_delta, andparent_artifact_idlineage fields, then regenerated the Swift, Python, and descriptor artifacts - updated the Swift control plane, the shared XPC client, and the OpenAI image-edit handler so
variationanditeraterequests resolve prior artifact IDs into worker-facing source URIs, enforce iterate-onlyprompt_delta, reject mixed raw-image plus artifact-id inputs, and keep queued image jobs lineage-aware - updated the deterministic Python image-edit runtime and terminal job descriptors so generated
artifacts preserve
parent_artifact_idand lineageextkeys for the source artifact, source job, edit mode, and prompt delta - added focused read-model, control-plane, OpenAI gateway, XPC-client, and Python runtime tests that exercise iterate resolution, variation validation, and lineage persistence end to end
- extended the control-plane and worker protobuf contracts with typed
-
Verification summary for
M14.1:make proto: passPYTHONPATH='.:services/mlx-worker-python' uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_image_runtime.py -q:11 passed in 0.09sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|OpenAIHandlerTests|ImageJobReadModelTests':270 tests in 3 suites passed after 0.098 secondsmake py-test:488 passed in 52.81smake swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTestsgit diff --check: pass
-
Metrics report for
M14.1:- typed lineage evidence exercised by the touched scope:
source_artifact_idnow resolves prior image artifacts into control-plane and OpenAI image-edit worker requests without bypassing image-job historyprompt_deltais enforced as iterate-only input and preserved in queued image jobs plus worker terminal job descriptors- generated artifacts now preserve
parent_artifact_idand lineageextvalues for source artifact, source job, and edit mode
- changed-line coverage for the touched handwritten executable scope:
- Python worker plus test scope via
scripts/python_changed_line_coverage.py:100.00%(52/52) - Swift control-plane plus test scope via repository
git diff --unified=0 HEADandxcrun llvm-cov showover the coverage-enabledMelixControlPlanePackageTestsbinary:96.13%(646/672) - aggregate touched-scope coverage:
96.41%(698/724)
- Python worker plus test scope via
- generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and planning documents are excluded from executable changed-line coverage because they are regenerated artifacts or non-runtime documentation rather than handwritten executable logic
- typed lineage evidence exercised by the touched scope:
-
Closed the second executable
M13.4slice and completed the milestone by turning the shipped API onboarding examples into repository-owned executable truth:- added
scripts/m13_api_onboarding_smoke.py, a live shared-access smoke that exercises the canonical/health,/v1/responses, and/v1/messagesquick-start examples against a localLiveMelixStack - updated the desktop API quick-start snippets so OpenAI-compatible and Anthropic examples now
match the shipped streaming contract, including
stream=true, SSE-friendly curl flags, and auth-aware/healthexamples for the Ollama compatibility guidance - added deterministic Python unit coverage for smoke error branches plus a new integration test that runs the smoke against the live stack, ensuring example payloads, headers, and endpoint shapes stay aligned with the product UI
- added
-
Verification summary for the second executable
M13.4slice:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python python scripts/m13_api_onboarding_smoke.py --json: pass with/health,/v1/responses, and/v1/messagesall returning200PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest tests/test_m13_api_onboarding_smoke.py tests/integration/test_api_onboarding_examples.py -q:16 passed in 11.69sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'DesktopFoundationViewTests':69 tests in 1 suite passedmake py-test:487 passed in 44.61smake integration-test:67 passed in 898.58s (0:14:58)make swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTests
-
Metrics report for the second executable
M13.4slice:- smoke evidence for the shipped onboarding examples:
base_url = http://127.0.0.1:50099/v1health.status_code = 200,health.status = okresponses.status_code = 200,responses.content_type = text/event-stream; charset=utf-8messages.status_code = 200,messages.content_type = text/event-stream; charset=utf-8startup_timings_ms.swift_text_worker_ready_ms = 5108.96startup_timings_ms.python_worker_ready_ms = 5121.63startup_timings_ms.control_plane_spawn_to_ready_ms = 365.43
- changed-line coverage for the touched handwritten executable scope:
scripts/m13_api_onboarding_smoke.py,tests/test_m13_api_onboarding_smoke.py, andtests/integration/test_api_onboarding_examples.py:100.00%(163/163)apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swiftandapps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:100.00%(119/119)- aggregate touched-scope coverage:
100.00%(282/282)
- smoke evidence for the shipped onboarding examples:
-
Closed the first executable
M13.4slice by moving API onboarding truth into the typed control-plane snapshot and rehydrating the desktop API workspace from that source instead of stale hardcoded endpoint catalogs:- extended
ServerSnapshotwith a typedapi_onboardingsummary covering published API surfaces, per-endpoint reference rows, surface status, and compatibility-only guidance, then regenerated the Swift, Python, and descriptor protocol artifacts - added
APIOnboardingSnapshotSourceso the Swift control plane now owns the shipped API onboarding catalog for Local Service, OpenAI-compatible, Anthropic Messages, and Ollama compatibility guidance - updated
ServerSnapshotBuilderandControlPlaneServiceso handshake and reconnect snapshots project one stable onboarding summary with endpoint reference and compatibility notes - replaced the desktop API reference catalog with snapshot-driven
apiSurfacesandapiReferencerows, keeping grouped surface rendering and truthful compatibility-only presentation for Ollama - generated session-aware curl, Python, and JavaScript quick-start snippets from the selected server session's effective base URL, auth state, and served model instead of static copy
- extended
-
Verification summary for the first executable
M13.4slice:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests':160 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'DesktopFoundationViewTests':69 tests in 1 suite passedpython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/HTTPGateway/APIOnboardingSnapshotSource.swift services/control-plane-swift/Sources/Snapshots/ServerSnapshotBuilder.swift services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:100.00%(38/38)python3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationState.swift apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swift apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:96.51%(746/773)- aggregate touched-scope changed-line coverage:
96.67%(784/811) git diff --check: passmake swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTests
-
Metrics report for the first executable
M13.4slice:N/Afor new runtime timing or persistence metrics because this slice adds read-only snapshot projection and UI hydration, not a new mutation or runtime execution path- typed onboarding evidence exercised by the touched scope:
- reconnect-stable
ServerSnapshot.api_onboardingpopulation after handshake - grouped surface publication with
Shipped,Compatibility Only, andUnknownstatus text - snapshot-driven endpoint reference for
/health,/v1/cache/stats,/v1/responses,/v1/messages, and image endpoints - session-aware quick starts using
effectiveBaseURL, served model ID, and current auth mode rather than static desktop-local examples
- reconnect-stable
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope:
100.00%(38/38) - Swift menu-bar scope:
96.51%(746/773) - aggregate touched-scope coverage:
96.67%(784/811)
- Swift control-plane scope:
- generated protobuf outputs and
packages/protocol/descriptors/melix.pbare excluded from executable changed-line coverage because they are regenerated interface artifacts rather than handwritten runtime logic
-
Closed the first executable
M13.3slice by projecting tooling, embedding, and config-file state through one reconnect-stable control-plane snapshot summary and hydrating the existing Window UI settings surface from that typed truth:- extended
ServerSnapshotwithtooling_settings, plus typed embedding and config-path summaries, then regenerated the Swift, Python, and descriptor protocol artifacts - added
ToolingSettingsSnapshotSourceso the Swift control plane now projects the active embedding model choice, preload state, built-in tool-parser modes, MCP summary, inspectable config paths, and boot additional arguments from repository-owned sources instead of UI-local reconstruction - exposed store-backed config paths and supported parser modes through the control-plane core actors that already own those values, preserving a single orchestration truth
- updated
DesktopFoundationStateso the existing Tools > Settings surface renders the typed tooling snapshot, including embedding preload detail, MCP config, config-path rows, and boot arguments, without relying on hardcoded operator knowledge
- extended
-
Verification summary for the first executable
M13.3slice:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests':159 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'DesktopFoundationViewTests':64 tests in 1 suite passedpython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/HTTPGateway/OpenAI/GatewayConfigStore.swift services/control-plane-swift/Sources/HTTPGateway/OpenAI/GatewayServingDefaultsStore.swift services/control-plane-swift/Sources/Requests/ToolParserRegistry.swift services/control-plane-swift/Sources/Snapshots/ServerSnapshotBuilder.swift services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift services/control-plane-swift/Sources/XPCService/ToolingSettingsSnapshotSource.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:100.00%(83/83)python3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationState.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:100.00%(226/226)git diff --check: passmake swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTests; the touched control-plane and menu-bar packages passed under the focused verification commands above
-
Metrics report for the first executable
M13.3slice:N/Afor new runtime timing or persistence metrics because this slice adds read-only snapshot projection and UI hydration, not new measured mutation paths- typed tooling-state evidence exercised by the touched scope:
- reconnect-stable
ServerSnapshot.tooling_settingspopulation after handshake - active embedding model projection with model state, preload detail, backend, and family
- repository-owned built-in parser modes and MCP summary surfaced without UI-local discovery
- inspectable gateway-config, serving-defaults, and control-plane metrics paths plus boot additional arguments
- reconnect-stable
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope:
100.00%(83/83) - Swift menu-bar scope:
100.00%(226/226) - aggregate touched-scope coverage:
100.00%(309/309)
- Swift control-plane scope:
- generated protobuf outputs and
packages/protocol/descriptors/melix.pbare excluded from executable changed-line coverage because they are regenerated interface artifacts rather than handwritten runtime logic
-
Closed the third executable
M13.2slice by making speculative-decoding defaults typed, persistent, and control-plane-validated across the protocol, request shaping, model-resolution, the Window UI, and the integration stack:- extended
ApplyServingDefaults,ServingDefaultsSessionSummary, and workerAccelerationPolicywith typed speculative fields foracceleration_mode,draft_model_id, andnum_draft_tokens, then regenerated the Swift, Python, and descriptor artifacts - updated
GatewayServingDefaultsStoreso operator overrides for speculative decoding persist beside generation and batching defaults, project requested-versus-effective speculative state, and expose validation failures before unsupported configurations reach the runtime - routed speculative gateway defaults through
TextRequestShaper,ChatRequestTranslator,RequestCoordinator, andControlPlaneXPCClient, preserving model-level acceleration precedence while letting gateway defaults provide draft-model and draft-token policy when the served model itself is unspecified - updated the Window UI server workspace,
RuntimeViewModel, desktop state projection, and apply flow so speculative defaults hydrate from control-plane truth instead of session-local draft state - isolated
LiveMelixStackstate intests/integration/helpers.pyby assigning uniqueMELIX_HOME,MELIX_GATEWAY_CONFIG_STORE_PATH, andMELIX_GATEWAY_SERVING_DEFAULTS_STORE_PATHvalues per stack, fixing the regression where persisted local gateway bindings overrode integration-test HTTP ports and broke startup
- extended
-
Verification summary for the third executable
M13.2slice:make proto: passmake py-test:487 passed in 45.14sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'GatewayServingDefaultsStoreTests':5 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests':158 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'TextEndpointContractTests':36 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'RequestCoordinatorTests':43 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage:289 tests in 10 suites passed after 5.027 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata /tmp/m13_2_slice3_controlplane_merged.profdata services/control-plane-swift/Sources/HTTPGateway/OpenAI/GatewayServingDefaultsStore.swift services/control-plane-swift/Sources/Requests/ChatRequestTranslator.swift services/control-plane-swift/Sources/Requests/RequestCoordinator.swift services/control-plane-swift/Sources/Requests/TextRequestShaper.swift services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift services/control-plane-swift/Tests/ControlPlaneTests/GatewayServingDefaultsStoreTests.swift services/control-plane-swift/Tests/ControlPlaneTests/TextEndpointContractTests.swift services/control-plane-swift/Tests/HTTPGatewayTests/RequestCoordinatorTests.swift:99.47%(563/566)python3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift:100.00%(10/10)python3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopShellStateTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:97.07%(199/205)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx coverage run -m pytest tests/integration/test_abort_flow.py::test_abort_finishes_the_live_stream_with_cancelled_completion -q:1 passed in 11.20spython3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m13_2_slice3_integration_cov.json tests/integration/helpers.py:100.00%(12/12)make integration-test:66 passed in 892.51s (0:14:52)make swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTestsgit diff --check: pass
-
Metrics report for the third executable
M13.2slice:- typed speculative-default metrics exercised by the touched scope:
gateway.serving_defaults_apply_msgateway.serving_defaults_persist_failuresgateway.generation_default_merge_countgateway.speculative_config_apply_msmenu.serving_defaults_apply_ms
- requested-versus-effective speculative evidence exercised by the touched scope:
- requested versus effective
acceleration_mode,draft_model_id, andnum_draft_tokens - gateway-owned speculative defaults in request shaping, coordinator-side model merges, and Window UI state projection
- explicit rejection of speculative defaults targeting unsupported served models, unsupported draft models, or unsupported worker backends
- integration startup isolation showing gateway bindings now respect per-stack
MELIX_HTTP_PORTinstead of leaking persisted local listener overrides
- requested versus effective
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope excluding the shared XPC client:
99.47%(563/566) - shared XPC client under Window UI tests:
100.00%(10/10) - Swift menu-bar scope:
97.07%(199/205) - Python integration-helper scope:
100.00%(12/12) - aggregate touched-scope coverage:
98.87%(784/793)
- Swift control-plane scope excluding the shared XPC client:
- generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and planning documents are excluded from executable changed-line coverage because they are regenerated artifacts or non-runtime documentation rather than handwritten executable logic
- typed speculative-default metrics exercised by the touched scope:
-
Closed the second executable
M13.2slice by making batching and admission defaults typed, persistent, and scheduler-visible control-plane truth across the protocol, request shaping, the control-plane store, and the Window UI:- extended
ApplyServingDefaultsandServingDefaultsSessionSummarysoconcurrent_processing_enabled,prefill_batch_size, andcompletion_batch_sizeare part of the versioned control-plane contract, then regenerated the Swift, Python, and descriptor artifacts - updated
GatewayServingDefaultsStoreso operator overrides for batching defaults persist beside the existing generation defaults, validate invalid batch sizes, and project both requested and effective batching state throughServerSnapshot - routed batching defaults through
TextRequestShaperandChatRequestTranslator, exposing gateway-owned admission metadata in worker executionextso downstream scheduling no longer depends on desktop-local draft state - replaced the
RequestCoordinatorhard-coded continuous-batch target with effective admission capacity derived from gateway defaults, allowing continuous batching to expand, shrink, or disable entirely without source edits - updated the Window UI server workspace, state models, and persistence flow so concurrent
processing plus prefill or completion batch sizes hydrate from control-plane truth, display
requested-versus-effective state, and round-trip through
Apply Serving Defaults
- extended
-
Verification summary for the second executable
M13.2slice:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --skip-build --filter 'GatewayServingDefaultsStoreTests':4 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --skip-build --filter 'TextEndpointContractTests':36 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --skip-build --filter 'OpenAIHandlerTests':101 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --skip-build --filter 'ControlPlaneServiceTests':157 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --skip-build --filter 'HTTPGatewayTests.RequestCoordinatorTests/gatewayBatchingDefaultsCanExpandContinuousBatchCapacity()':1 test in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --skip-build --filter 'HTTPGatewayTests.RequestCoordinatorTests/gatewayBatchingDefaultsCanDisableContinuousBatchAdmissions()':1 test in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|ControlPlaneXPCClientTests|DesktopShellStateTests|DesktopFoundationViewTests':246 tests in 4 suites passed after 4.357 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata /tmp/m13_2_cp_profdata_pieces/merged.profdata services/control-plane-swift/Sources/HTTPGateway/OpenAI/GatewayServingDefaultsStore.swift services/control-plane-swift/Sources/Requests/ChatRequestTranslator.swift services/control-plane-swift/Sources/Requests/RequestCoordinator.swift services/control-plane-swift/Sources/Requests/TextRequestShaper.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift services/control-plane-swift/Tests/ControlPlaneTests/GatewayServingDefaultsStoreTests.swift services/control-plane-swift/Tests/ControlPlaneTests/TextEndpointContractTests.swift services/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift services/control-plane-swift/Tests/HTTPGatewayTests/RequestCoordinatorTests.swift:95.41%(457/479)python3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopShellStateTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:99.59%(240/241)make integration-test:66 passed in 883.49s (0:14:43)make swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTestsgit diff --check: pass
-
Metrics report for the second executable
M13.2slice:- typed batching or admission metrics exercised by the touched scope:
gateway.serving_defaults_apply_msgateway.serving_defaults_persist_failuresmenu.serving_defaults_apply_msscheduler.continuous_batch_eligible_ratescheduler.continuous_batch_merge_ratescheduler.continuous_batch_sizescheduler.continuous_batch_active_cohorts
- effective-state and admission evidence exercised by the touched scope:
- requested versus effective
concurrent_processing_enabled,max_concurrent_requests,prefill_batch_size, andcompletion_batch_size - gateway-owned batching defaults in request shaping and execution metadata
- scheduler-visible continuous-batch expansion and disablement without source changes
- requested versus effective
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope:
95.41%(457/479) - Swift menu-bar plus shared XPC-client scope:
99.59%(240/241) - aggregate touched-scope coverage:
96.81%(697/720)
- Swift control-plane scope:
- generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and planning documents are excluded from executable changed-line coverage because they are regenerated artifacts or non-runtime documentation rather than handwritten executable logic
- typed batching or admission metrics exercised by the touched scope:
-
Closed the first executable
M13.2slice by making gateway-level serving defaults typed, persistent, and control-plane-owned across bootstrap, snapshot projection, request shaping, and the Window UI server workspace:- extended the control-plane protocol with
server.apply_serving_defaults,ApplyServingDefaults,ServingDefaultsSource,ServingDefaultsSummary, andServingDefaultsSessionSummary, and regenerated the Swift, Python, and descriptor artifacts so requested and effective serving-default state is part of the versioned interface contract - added
GatewayServingDefaultsStoreso built-in defaults, environment defaults, config-file imports, and operator overrides resolve through a schema-versioned JSON document owned by the Swift control plane, with effective values merged against model-level generation config where applicable - projected serving-default summaries through
ServerSnapshot, wired typed apply handling plus persistence metrics intoControlPlaneService, and applied gateway defaults inside text request shaping and chat translation before per-request overrides - updated the Window UI server workspace and
RuntimeViewModelso serving-default values, source labels, effective merged defaults, andApply Serving Defaultshydrate from control-plane truth, and server starts persist serving defaults before lifecycle mutation
- extended the control-plane protocol with
-
Verification summary for the first executable
M13.2slice:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests|GatewayServingDefaultsStoreTests|TextEndpointContractTests|OpenAIHandlerTests':297 tests in 4 suites passed after 0.108 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|ControlPlaneXPCClientTests|DesktopShellStateTests|DesktopFoundationViewTests':246 tests in 4 suites passed after 4.537 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/Bootstrap/main.swift services/control-plane-swift/Sources/HTTPGateway/OpenAI/GatewayServingDefaultsStore.swift services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift services/control-plane-swift/Sources/Requests/ChatRequestTranslator.swift services/control-plane-swift/Sources/Requests/TextRequestShaper.swift services/control-plane-swift/Sources/Snapshots/ServerSnapshotBuilder.swift services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift services/control-plane-swift/Tests/ControlPlaneTests/GatewayServingDefaultsStoreTests.swift services/control-plane-swift/Tests/ControlPlaneTests/TextEndpointContractTests.swift services/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift:100.00%(429/429)python3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopShellStateTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:99.56%(673/676)make swift-test: failed outside the touched scope whenservices/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTests; the touched control-plane and menu-bar packages passed under the focused verification commands abovegit diff --check: pass
-
Metrics report for the first executable
M13.2slice:- typed serving-defaults metrics exercised by the touched scope:
gateway.serving_defaults_apply_msgateway.serving_defaults_persist_failuresgateway.generation_default_merge_countmenu.serving_defaults_apply_ms
- effective-state and merge evidence exercised by the touched scope:
- requested versus effective
temperature,top_p,max_tokens,stream_interval_tokens, andmax_concurrent_requests - control-plane-owned source labels and model-override visibility for serving defaults
- gateway-default application in request shaping and chat execution metadata
- requested versus effective
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope:
100.00%(429/429) - Swift menu-bar plus shared XPC-client scope:
99.56%(673/676) - aggregate touched-scope coverage:
99.73%(1102/1105)
- Swift control-plane scope:
- generated protobuf outputs and
packages/protocol/descriptors/melix.pbare excluded from executable changed-line coverage because they are regenerated interface artifacts rather than handwritten runtime logic
- typed serving-defaults metrics exercised by the touched scope:
-
Started
M13.2by refining gateway defaults work into explicit executable slices and selecting gateway-level generation defaults as the next implementation target:- updated the
M13.2plan so the milestone now executes in three bounded slices: typed generation defaults, batching or admission defaults, and speculative defaults - recorded that the current repository still keeps
temperature,top_p,max_tokens, andmax_concurrent_requestsinside desktop-only session state while request shaping still falls back to built-in defaults or model-level generation config - moved the active task plan to the first executable
M13.2slice so the next code transaction can establish a control-plane-owned serving-defaults state model before expanding into batching and speculative-decoding
- updated the
-
Verification summary for the
M13.2planning transaction:git diff --check: pending until the executable change set is complete
-
Metrics report for the
M13.2planning transaction:N/Afor executable coverage and runtime metrics because this transaction only moved the active task plan before implementation started
-
Closed
M13.1by making gateway listener configuration typed, persistent, and control-plane-owned across bootstrap, snapshot projection, and the Window UI server workspace:- extended the control-plane protocol with
server.apply_gateway_config,GatewayConfigSummary,GatewayListenerConfigSummary, andGatewayConfigSource, and regenerated the Swift, Python, and descriptor artifacts so gateway-config state is part of the versioned interface contract - added
GatewayConfigStoreso built-in defaults, environment defaults, and operator overrides resolve through a schema-versioned JSON document owned by the Swift control plane, with bootstrap listener binding sourced from the same store - projected
gateway_configthroughServerSnapshot, added typed apply handling plus persistence-failure metrics inControlPlaneService, and exposed the new typed client helper throughControlPlaneXPCClient - updated the Window UI server workspace so requested and effective listener state, config
source, restart-required badges, and
Apply Gateway Configall hydrate from control-plane truth, and server starts persist gateway config before lifecycle mutation - marked
M13.1completed in the roadmap execution index; the next active execution slice can now advance toM13.2
- extended the control-plane protocol with
-
Verification summary for
M13.1:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|GatewayConfigStoreTests':156 tests in 2 suites passed after 0.083 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|GatewayConfigStoreTests':156 tests in 2 suites passed after 0.090 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/HTTPGateway/OpenAI/GatewayConfigStore.swift services/control-plane-swift/Sources/Snapshots/ServerSnapshotBuilder.swift services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift services/control-plane-swift/Sources/Bootstrap/main.swift services/control-plane-swift/Tests/ControlPlaneTests/GatewayConfigStoreTests.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:100.00%(358/358)HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|ControlPlaneXPCClientTests|DesktopShellStateTests|DesktopFoundationViewTests':235 tests in 4 suites passed after 4.171 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|ControlPlaneXPCClientTests|DesktopShellStateTests|DesktopFoundationViewTests':235 tests in 4 suites passed after 4.171 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift apps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopShellStateTests.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:95.21%(437/459)git diff --check: pass
-
Metrics report for
M13.1:- typed gateway-config metrics exercised by the touched scope:
gateway.config_apply_msgateway.config_requires_restart_countgateway.config_persist_failuresmenu.gateway_config_apply_ms
- typed snapshot and desktop-state metrics exercised by the touched scope:
- requested versus effective listener host or port projection
- control-plane-owned served model, timeout, rate limit, and source metadata
- restart-required and active-binding visibility for server sessions
- changed-line coverage for the touched handwritten executable scope:
- Swift control-plane scope:
100.00%(358/358) - Swift menu-bar scope:
95.21%(437/459) - aggregate touched-scope coverage:
97.31%(795/817)
- Swift control-plane scope:
- generated protobuf outputs and
packages/protocol/descriptors/melix.pbare excluded from executable changed-line coverage because they are regenerated interface artifacts rather than handwritten runtime logic
- typed gateway-config metrics exercised by the touched scope:
-
Closed the second executable
M12.4slice by making conversion and packaging repository-owned model-tool workflows with stable artifacts and operator-visible summary state:- added
conversion_pipeline.pysoconvertemits a dedicatedmelix.converted_model_bundle.v1artifact bundle with stableconfig.json,tokenizer.json,weights.safetensors, manifest paths, artifact sizes, target format, runtime compatibility, and structural smoke metadata - added
upload_receipt_pipeline.pysouploademits a dedicatedmelix.upload_receipt.v1receipt with stable source-artifact provenance, target repo, runtime metadata, linked quantization fields, and converted-bundle packaging lineage instead of a generic placeholder response - updated
maintenance_core.pysoconvert,quantize, anduploadnow surface typed worker-authored artifact records through model-ops events, while invalid upload artifacts fail with typedinvalid_artifacterrors instead of leaking placeholder state - projected conversion and packaging metadata through the Window UI model-tools state so the operator shell exposes convert entrypoints plus summary fields for target repo, source artifact kind, target format, runtime compatibility, smoke status, and linked quantization identity
- marked
M12.4completed in the roadmap execution index now that inspect, health, and conversion or packaging workflows are all repository-owned and test-backed
- added
-
Verification summary for the second executable
M12.4slice:PYTHONPATH=.:services/mlx-worker-python uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_maintenance_service.py -q:67 passed in 51.31sPYTHONPATH=.:services/mlx-worker-python uv run --project services/mlx-worker-python --extra mlx coverage run --data-file=/tmp/m12_4_convert_python.coverage -m pytest services/mlx-worker-python/tests/test_maintenance_service.py -q && PYTHONPATH=.:services/mlx-worker-python uv run --project services/mlx-worker-python --extra mlx coverage json --data-file=/tmp/m12_4_convert_python.coverage -o /tmp/m12_4_convert_python_coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m12_4_convert_python_coverage.json services/mlx-worker-python/worker/model_ops/conversion_pipeline.py services/mlx-worker-python/worker/model_ops/upload_receipt_pipeline.py services/mlx-worker-python/worker/engine/maintenance_core.py services/mlx-worker-python/tests/test_maintenance_service.py:67 passed in 46.42sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'DesktopFoundationViewTests':60 tests in 1 suite passed after 4.384 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':193 tests in 2 suites passed after 4.543 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swift apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:98.88%(353/357)
-
Metrics report for the second executable
M12.4slice:- conversion and packaging metrics exercised by the touched scope:
melix.converted_model_bundle.v1bundle manifests with stableartifact_kind,target_format,target_runtime,conversion_backend, and smoke metadatamelix.upload_receipt.v1receipts with stabletarget_repo,source_artifact_kind,source_manifest_path,runtime, converted-bundle lineage, and linked quantization data- Window UI summary state for
artifactRuntime,servingCompatible,smokeTestRequested,targetRepo,sourceArtifactKind,conversionTargetFormat, andlinkedQuantizationProfileID
- changed-line coverage for the touched handwritten executable scope:
- Python worker conversion or packaging scope:
95.49%(254/266) - Swift menu-bar conversion or packaging scope:
98.88%(353/357) - aggregate touched-scope coverage:
97.43%(607/623)
- Python worker conversion or packaging scope:
- conversion and packaging metrics exercised by the touched scope:
-
Closed the first executable
M12.4slice by making model inspection and doctor health repository-owned, typed operator contracts across the worker, control plane, and Window UI:- extended the worker and control-plane protobuf contracts so inspect output now carries stable backend, family, source, workflow-role, revision, and supported-task metadata while doctor output carries typed health state plus actionable findings instead of markdown-only severity
- updated
maintenance_core.pyso worker-authored inspect payloads derive typed identity fields from loaded or registered model state and doctor responses emit structured warning, degraded, and failed findings for missing loads, zero-byte cache state, zero resident memory, and worker failure conditions - projected the typed inspect and doctor payloads through the Swift control plane, XPC client, and Window UI model-tools views so operators can inspect health status, findings, backend or family identity, workflow role, revision, supported tasks, and source provenance without parsing markdown
- added focused Python, control-plane, and menu-bar regression coverage for typed inspect metadata, doctor severity mapping, structured findings, and the new operator-facing summary views
- moved the next active
M12.4slice to conversion and packaging workflow completion now that inspect and health are stable typed surfaces
-
Verification summary for the first executable
M12.4slice:make proto: passPYTHONPATH=.:services/mlx-worker-python uv run --project services/mlx-worker-python --extra mlx coverage run --data-file=/tmp/m12_4_python.coverage -m pytest services/mlx-worker-python/tests/test_maintenance_service.py -q && PYTHONPATH=.:services/mlx-worker-python uv run --project services/mlx-worker-python --extra mlx coverage json --data-file=/tmp/m12_4_python.coverage -o /tmp/m12_4_python_coverage.json && python3 scripts/python_changed_line_coverage.py --coverage-json /tmp/m12_4_python_coverage.json services/mlx-worker-python/worker/engine/maintenance_core.py services/mlx-worker-python/tests/test_maintenance_service.py:65 passed in 48.59sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests|PythonBridgeWorkerClientTests':196 tests in 2 suites passed after 0.958 secondspython3 scripts/swift_changed_line_coverage.py --binary services/control-plane-swift/.build/arm64-apple-macosx/debug/MelixControlPlanePackageTests.xctest/Contents/MacOS/MelixControlPlanePackageTests --profdata services/control-plane-swift/.build/arm64-apple-macosx/debug/codecov/default.profdata services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:100.00%(117/117)HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'ControlPlaneXPCClientTests|RuntimeViewModelTests|DesktopFoundationViewTests':217 tests in 3 suites passed after 4.018 secondspython3 scripts/swift_changed_line_coverage.py --binary apps/macos-menubar/.build/arm64-apple-macosx/debug/MelixMacOSMenubarPackageTests.xctest/Contents/MacOS/MelixMacOSMenubarPackageTests --profdata apps/macos-menubar/.build/arm64-apple-macosx/debug/codecov/default.profdata apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swift apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift apps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(186/186)git diff --check: pass
-
Metrics report for the first executable
M12.4slice:- typed inspect and doctor health metrics exercised by the touched scope:
- stable inspect identity fields for
backend_id,family_id,model_path,model_revision,default_workflow_role,detected_identity_source, andsupported_tasks - structured doctor states for
healthy,warning,degraded, andfailed - actionable doctor finding codes covering missing model loads, cache-unavailable state, zero resident bytes, and failed worker state
- stable inspect identity fields for
- changed-line coverage for the touched handwritten executable scope:
- Python worker scope:
100.00%(103/103) - Swift control-plane scope:
100.00%(117/117) - Swift menu-bar scope:
100.00%(186/186) - aggregate touched-scope coverage:
100.00%(406/406)
- Python worker scope:
- generated protobuf outputs and
packages/protocol/descriptors/melix.pbare excluded from executable changed-line coverage because they are regenerated interface artifacts rather than handwritten runtime logic
- typed inspect and doctor health metrics exercised by the touched scope:
-
Started
M12.4by moving the active task plan to typed model inspection, structured health, and model-conversion tooling completion:- recorded that the repository already exposes inspect, doctor, and model-operation shells, but inspect payloads are still too shallow, doctor is markdown-only, and conversion results are not yet a stable operator-facing contract
- defined the next implementation slice around typed model identity metadata, structured doctor severity and findings, and explicit conversion or packaging result summaries that stay tied to model identity
- updated the active task plan so the
M12.4execution transaction starts from an explicit inspect-health-conversion contract instead of treating those workflows as incidental model-ops helpers
-
Verification summary for the
M12.4planning transaction:git diff --check: pending until the executable change set is complete
-
Metrics report for the
M12.4planning transaction:N/Afor executable coverage and runtime metrics because this transaction only moved the active task plan before implementation started
-
Closed
M12.3by making creative image-family identity repository-owned across worker dispatch, control-plane catalog truth, the family support matrix, and the Window UI picker:- added image-family adapter descriptors plus detection from explicit overrides, imported model metadata, and path heuristics for the supported generation and edit families, with stable backend IDs, task kinds, default workflow roles, and support declarations projected into worker-visible model specs
- updated the worker registry, image generation and edit request gates, and dev image seed path so sparse model requests preserve catalog truth, unsupported generation-versus-edit workflows fail with typed validation, and the repository-owned family support matrix distinguishes live verified versus contract-only image rows
- updated the Swift control-plane catalog and Python bridge preload path so image-family metadata survives registry sync, imported model preparation, and phase-seven preload even when operators override the seed image family
- updated the Window UI image workspace so generate and edit workflows each resolve against role-capable models, keep separate selections, and expose family support summaries instead of collapsing all creative families into one generic image picker entry
- marked
M12.3completed in the roadmap execution index; the next active execution slice can now advance toM12.4
-
Verification summary for
M12.3:PYTHONPATH=services/mlx-worker-python uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_image_family_adapters.py services/mlx-worker-python/tests/test_model_registry_catalog.py services/mlx-worker-python/tests/test_image_runtime.py services/mlx-worker-python/tests/test_acceptance_metrics.py services/mlx-worker-python/tests/test_runtime_service.py tests/integration/test_image_endpoints.py tests/integration/test_non_text_endpoints.py::test_family_support_matrix_tracks_live_verified_family_overrides -q:61 passed in 106.50s (0:01:46)swift test --package-path services/control-plane-swift --filter 'ModelCatalogTests|PythonBridgeWorkerClientTests':84 tests in 2 suites passed after 0.783 secondsswift test --package-path apps/macos-menubar --filter RuntimeViewModelTests:129 tests in 1 suite passed after 0.779 secondsgit diff --check: pass
-
Metrics report for
M12.3:- repository-owned image family matrix metrics exercised by the touched scope:
family_count = 19text_family_count = 6image_family_count = 6live_verified_count = 15contract_only_count = 4
- changed-line coverage for the touched handwritten executable scope:
- Python worker scope:
98.08%(153/156) - Swift control-plane scope:
96.77%(180/186) - Swift menu-bar scope:
95.31%(122/128) - aggregate touched-scope coverage:
96.81%(455/470)
- Python worker scope:
- repository-owned image family matrix metrics exercised by the touched scope:
-
Started
M12.3by moving the active task plan to metadata-driven image family dispatch and role-aware picker completion:- recorded that the repository still treated creative image models as one generic deterministic image family, which hid generation-versus-edit constraints from both registry metadata and the Window UI picker
- defined the next implementation slice around image-family detection from explicit overrides, imported metadata, and path heuristics, plus role support declarations that drive request validation and picker visibility
- updated the active task plan so the
M12.3execution transaction starts from an explicit family-dispatch and operator-routing contract instead of an implicit image-shell cleanup goal
-
Verification summary for the
M12.3planning transaction:git diff --check: pending until the executable change set is complete
-
Metrics report for the
M12.3planning transaction:N/Afor executable coverage and runtime metrics because this transaction only moved the active task plan before implementation started
-
Closed
M12.2by making text-family and MoE-family adapter metadata repository-owned across the Python worker, control-plane catalog, support matrix, and deterministic live-path verification:- added worker-owned text-family adapters for
llama,mistral4,mixtral,qwen3moe,deepseek-mla, andnemotron-h, including metadata-driven detection from explicit overrides,config.json, and path heuristics plus family-specific parser, attention, RoPE, and MoE declarations - updated worker registry snapshots and runtime loads so scanned text models now carry stable
family metadata,
python_text_compatibilityrouting for larger dense or MoE families, and runtime-visible architecture or MoE descriptors without changing the baseswift_textdev seed defaults - updated the Swift control-plane catalog seed path and registry-sync logic so discovered or dev text models preserve text-family identity, parser declarations, route kind, and MoE settings through worker preparation and catalog truth
- expanded the repository-owned family support matrix and integration evidence so the text
matrix now distinguishes live-verified rows for
llama,mistral4,qwen3moe,deepseek-mla, andnemotron-h, while keepingmixtralexplicitlycontract_only - marked
M12.2completed in the roadmap execution index; the next active execution slice can now advance toM12.3
- added worker-owned text-family adapters for
-
Verification summary for
M12.2:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_text_family_adapters.py services/mlx-worker-python/tests/test_model_registry_catalog.py services/mlx-worker-python/tests/test_mlx_backend.py services/mlx-worker-python/tests/test_acceptance_metrics.py -q:49 passed in 0.22sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest tests/integration/test_text_family_endpoints.py tests/integration/test_non_text_endpoints.py::test_family_support_matrix_tracks_live_verified_family_overrides -q:4 passed in 58.25sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" COVERAGE_FILE=/tmp/m12_2_python.coverage UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx coverage run --source=services/mlx-worker-python/worker,services/mlx-worker-python/tests,tests/integration -m pytest services/mlx-worker-python/tests/test_text_family_adapters.py services/mlx-worker-python/tests/test_model_registry_catalog.py services/mlx-worker-python/tests/test_mlx_backend.py services/mlx-worker-python/tests/test_acceptance_metrics.py tests/integration/test_text_family_endpoints.py tests/integration/test_non_text_endpoints.py::test_family_support_matrix_tracks_live_verified_family_overrides -q && PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" COVERAGE_FILE=/tmp/m12_2_python.coverage UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx coverage json -o /tmp/m12_2_python_coverage.json:53 passed in 58.82sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ModelCatalogTests|ControlPlaneServiceTests|PythonBridgeWorkerClientTests':227 tests in 3 suites passed after 1.106 secondsgit diff --check: pass
-
Metrics report for
M12.2:- repository-owned family-matrix metrics exercised by the touched scope:
family_count = 13text_family_count = 6live_verified_count = 11contract_only_count = 2
- changed-line coverage for the touched handwritten executable scope:
- Python worker and integration scope:
100.00%(389/389) - Swift control-plane scope:
100.00%(315/315) - aggregate touched-scope coverage:
100.00%(704/704)
- Python worker and integration scope:
- repository-owned family-matrix metrics exercised by the touched scope:
-
Started
M12.2by moving the active task plan to metadata-driven text and MoE family adapters:- recorded that the repository still treated larger dense and MoE text models as generic text entries, which hid parser, routing, and MoE-specific capability declarations from both the registry snapshot and the support matrix
- defined the next implementation slice around family detection from explicit overrides,
config.json, and path heuristics,python_text_compatibilityrouting for advanced families, and deterministic live-path verification through the HTTP text-generation surface - updated the active task plan so the
M12.2execution transaction started from an explicit milestone contract instead of an implicit family-expansion goal
-
Verification summary for the
M12.2planning transaction:git diff --check: pass
-
Metrics report for the
M12.2planning transaction:N/Afor executable coverage and runtime metrics because this transaction only updated the active task plan before implementation started
-
Closed
M12.1by making multi-root registry configuration control-plane-owned, worker-backed, and operator-visible across registry snapshots, catalog sync, and the Window UI:- updated the Python worker registry catalog and maintenance core so ordered registry-root
overrides, stable root IDs, explicit rescans, and root-level observability now flow through
registry_snapshotpayloads without rewriting environment state - updated the Swift control plane catalog state, registry snapshot sync, and model-ops routing so configured root overrides persist across sync cycles, explicit empty-root overrides remain distinct from fallback environment discovery, and snapshot-driven root state is projected back into catalog truth
- extended the native desktop shell and runtime view model so operators can add, remove, reorder, and rescan registry roots directly from the Window UI while seeing ordered root rows, accessibility state, configured-override summaries, and discovered-model counts
- added focused Python, control-plane, and menu-bar regression coverage for stable root identity, explicit override ordering, empty-override preservation, root-state formatting, UI guard rails, and snapshot parsing order
- marked
M12.1completed in the roadmap execution index; the next active execution slice can now advance toM12.2
- updated the Python worker registry catalog and maintenance core so ordered registry-root
overrides, stable root IDs, explicit rescans, and root-level observability now flow through
-
Verification summary for
M12.1:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_model_registry_catalog.py services/mlx-worker-python/tests/test_maintenance_service.py -q:74 passed in 32.45sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ModelCatalogTests|ControlPlaneServiceTests':175 tests in 2 suites passed after 0.092 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':181 tests in 2 suites passed after 3.913 secondsgit diff --check: pass
-
Metrics report for
M12.1:- registry snapshot metrics and observability exercised by the touched scope:
- stable
root_idprojection from canonical root paths - ordered
root_orderprojection through worker, control plane, and Window UI - root-level accessibility, error-state, and discovered-model observability
- stable
- changed-line coverage for the touched handwritten executable scope:
- Python registry scope:
96.49%(110/114) - Swift control-plane scope:
95.75%(338/353) - Swift menu-bar scope:
97.99%(730/745) - aggregate touched-scope coverage:
97.19%(1178/1212)
- Python registry scope:
- registry snapshot metrics and observability exercised by the touched scope:
-
Started
M12.1by moving the active task plan to multi-root registry management and rescan:- recorded that the current repository only discovers registry roots from
MELIX_MODEL_ROOTSand caches index-derived root IDs, which is insufficient for operator-facing add, remove, reorder, and rescan workflows - defined the next implementation slice around control-plane-owned root configuration, stable root identity, first-root-wins precedence, and tools-surface observability for ordered root rows plus discovery results
- updated the active task plan so the implementation transaction starts from an explicit milestone contract instead of the minimal placeholder plan
- recorded that the current repository only discovers registry roots from
-
Verification summary for the
M12.1planning transaction:git diff --check: pass
-
Metrics report for the
M12.1planning transaction:N/Afor executable coverage and runtime metrics because this transaction only updates active planning and milestone-state documents
-
Closed the
M11.4evidence slice and, with it, the roadmap bookkeeping for parentM11by adding repository-owned truthful disk-streaming smoke evidence and operator runbook guidance without fabricating unsupported SSD-backed runtime metrics:- added
Sources/MelixCLICore/DiskStreamingSmokeCommand.swift,DiskStreamingSmokeRunner.swift, and the executable targetSources/MelixDiskStreamingSmoke/main.swift, so the repository now owns a single-commandmelix-disk-streaming-smokeharness that benchmarks the RAM-resident baseline, attemptsprefer_diskandrequire_disk, restores the original model setting, and emits a machine-readable report with requested-versus-effective cache and disk-streaming evidence - extended
tests/MelixCLITests/DiskStreamingSmokeRunnerTests.swiftso the Swift smoke harness now covers injected-client rendering, baseline benchmark failures, missing-model rejection, unsupported-path compatibility fallback, effective-mode preservation, and helper label mappings in addition to the end-to-end smoke report path - added
tests/integration/test_disk_streaming_smoke.py, which starts the live Melix stack, runsmelix-disk-streaming-smoke --jsonagainst real worker sockets, asserts numeric RAM-baseline metrics, and verifies typeddisk_streaming_unsupportedevidence for bothprefer_diskandrequire_disk - added
docs/runbooks/disk-streaming-evidence.mdand updated the documentation indexes so operators now have explicit setup, interpretation, and diagnostic guidance for the current truthful disk-streaming surface, including the intentionally unavailable future SSD metrics - marked
M11.4as an evidence-only closure in the roadmap execution index and closed the parentM11milestone bookkeeping; the next active execution slice can now advance toM12.1
- added
-
Verification summary for
M11.4:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --filter DiskStreamingSmokeRunnerTests:10 tests in 1 suite passed after 0.002 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter DiskStreamingSmokeRunnerTests:10 tests in 1 suite passed after 0.002 secondsPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest tests/integration/test_disk_streaming_smoke.py -q:1 passed in 49.67smake py-test:456 passed in 35.12smake swift-test: passmake integration-test:61 passed in 971.13s (0:16:11)git diff --check: pass
-
Metrics report for
M11.4:- disk-streaming smoke metrics now emitted by the repository-owned smoke harness:
bench.smoke.ttft_msbench.smoke.tokens_per_second
- truth-preserving placeholder metrics now emitted explicitly until runtime support exists:
ssd_restore_latency_ms = unavailable_until_runtime_supportdisk_streaming_throughput_delta = unavailable_until_runtime_supportssd_footprint_bytes = unavailable_until_runtime_support
- changed-line coverage for the touched handwritten executable scope:
- Swift CLI smoke scope:
99.56%(226/227)
- Swift CLI smoke scope:
Package.swift, runbooks, documentation indexes, and the live integration test are excluded from executable changed-line coverage because they are package-manifest, documentation, or black-box repository-evidence artifacts rather than handwritten runtime logic
- disk-streaming smoke metrics now emitted by the repository-owned smoke harness:
-
Started
M11.4by documenting the current disk-streaming evidence strategy and execution plan:- added a design spec that records the current runtime constraint that both worker paths still
reject
prefer_diskandrequire_diskwith typeddisk_streaming_unsupportedfailures, so Melix must not fabricate SSD-backed metrics - added an implementation plan for a repository-owned
melix-disk-streaming-smokecommand that will measure the RAM baseline, capture unsupported-path diagnostics, restore model settings, and produce a machine-readable report plus operator runbook guidance - updated the active
M11.4execution slice document and the repository task plan so the next implementation transaction starts from an explicit, truthful scope
- added a design spec that records the current runtime constraint that both worker paths still
reject
-
Verification summary for the
M11.4design-and-plan transaction:git diff --check: pass
-
Metrics report for the
M11.4design-and-plan transaction:N/Afor executable coverage and runtime metrics because this transaction only updates design, planning, and milestone-state documents
-
Closed
M11.3by making streaming-compatible cache policy explicit across the repository-owned protocol, control-plane truth, worker summaries, and native operator settings:- extended the authoritative control-plane and worker protobuf schemas with typed cache-policy settings and summaries, including durable model settings for cache mode, byte and percentage-based cache budgets, block size, cache directory, and multimodal cache budget, then regenerated the versioned Swift, Python, and descriptor outputs
- updated the Swift control plane, snapshot builder, model catalog, and python bridge so
requested cache settings merge through model policy application, worker preparation, and
snapshot projection, while effective cache compatibility is resolved into explicit
compatible,limited,disabled, andunknownlabels instead of hidden downgrade paths - updated the Swift text worker cache summary and runtime registry so worker snapshots now expose cache roots, supported modes, initial cache blocks, and capability flags, while request cache hints default from loaded model settings when operators have configured durable cache policy
- expanded the native desktop shell and runtime view model so model rows, model detail, and model settings now expose requested-versus-effective cache policy, cache directories, block sizing, byte and percentage budgets, and multimodal cache budgets through typed operator-owned controls and summaries
- added focused regression coverage across control-plane, menu bar, and Swift text worker tests for cache-policy normalization, settings merge behavior, worker request construction, effective cache-policy projection, and operator-visible cache summaries
- stabilized disconnect lifecycle metric ordering in
RequestCoordinatorsodisconnect.resume_success_rateis published before terminal-failure snapshots become observable, eliminating a live integration race uncovered during the full repository verification run - marked
M11.3completed in the roadmap execution index; the next active execution slice can advance toM11.4
-
Verification summary for
M11.3:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':173 tests in 2 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|ModelCatalogTests|SnapshotStoreTests|PythonBridgeWorkerClientTests|RequestCoordinatorTests':280 tests in 5 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/mlx-text-worker-swift --filter WorkerScaffoldTests:134 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter RequestCoordinatorTests:39 tests in 1 suite passed after 0.538 secondsPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest tests/integration/test_connection_lifecycle.py -q:2 passed in 26.12sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --scratch-path /tmp/m11_3_cp_fix_cov --enable-code-coverage --filter 'ControlPlaneServiceTests|ModelCatalogTests|SnapshotStoreTests|PythonBridgeWorkerClientTests|RequestCoordinatorTests':280 tests in 5 suites passed after 1.098 secondsmake py-test:456 passed in 30.32smake swift-test: passmake integration-test:60 passed in 782.74s (0:13:02)git diff --check: pass
-
Metrics report for
M11.3:- typed cache-policy and disconnect-lifecycle metrics exercised by the touched scope:
menu.model_settings_mshttp.stream_disconnect_countdisconnect.resume_success_ratedisconnect.terminal_failure_count
- changed-line coverage for the touched executable scope:
- Swift control-plane scope:
98.87%(439/444) - Swift menu bar scope:
97.72%(600/614) - Swift text worker scope:
100.00%(60/60) - aggregate changed-line coverage across the touched handwritten executable scope:
98.30%(1099/1118)
- Swift control-plane scope:
- protocol schemas, generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and task-planning documents are excluded from executable changed-line coverage because they are generated or repository-ownership artifacts rather than handwritten runtime logic
- typed cache-policy and disconnect-lifecycle metrics exercised by the touched scope:
-
Closed
M11.2by making memory-budget admission and headroom-based unsafe-load rejection control-plane-owned, operator-visible, and test-covered across the protocol, control-plane, and native desktop shell:- extended the authoritative control-plane protobuf schema with
LoadModel.memory_budget_bytes, typedModelSettings.memory_budget_bytes, and residency-summarymemory_budget_bytes,memory_headroom_bytes, andrequired_bytes, then regenerated the repository-owned Swift, Python, and descriptor outputs - updated the Swift control plane, model catalog, on-demand loader, and local XPC client so
explicit loads and lazy loads both resolve the effective memory budget from model settings,
forward it to worker-backed load requests, map worker rejection details into typed
MemoryBudgetEvidence, and publish rejection counters plus last-seen budget or headroom metrics instead of opaque generic load failures - updated the native operator shell and runtime view model so per-model settings now include a
Memory Budget Bytescontrol, model detail and summaries expose configured budget and headroom-required evidence, and desktop-triggered loads can pass an explicit budget through the control-plane client overload - added focused regression coverage across control-plane and menu bar tests for typed policy normalization, client request construction, lazy-load metric recording, memory-budget evidence projection, and operator-visible budget summaries
- marked
M11.2completed in the roadmap execution index and moved the active task plan toM11.3
- extended the authoritative control-plane protobuf schema with
-
Verification summary for
M11.2:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|ModelCatalogTests|OnDemandModelLoaderTests':181 tests in 3 suites passed after 0.081 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --scratch-path /tmp/m11_2_cp_cov --enable-code-coverage --filter 'ControlPlaneServiceTests|ModelCatalogTests|OnDemandModelLoaderTests':180 tests in 3 suites passed after 0.087 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --scratch-path /tmp/m11_2_menu_cov --enable-code-coverage --filter 'ControlPlaneXPCClientTests|DesktopFoundationViewTests|RuntimeViewModelTests':202 tests in 3 suites passed after 3.470 secondsmake py-test:456 passed in 34.62smake swift-test: passmake integration-test:60 passed in 754.26s (0:12:34)git diff --check: pass
-
Metrics report for
M11.2:- typed memory-budget rejection metrics now emitted by the touched control-plane scope:
control_plane.model_load_rejection_countcontrol_plane.model_load_last_budget_bytescontrol_plane.model_load_last_headroom_bytescontrol_plane.model_load_last_required_bytescontrol_plane.text_load_memory_budget_rejection_countcontrol_plane.text_load_last_budget_bytescontrol_plane.text_load_last_headroom_bytescontrol_plane.text_load_last_required_bytes
- operator timing metrics exercised by the touched desktop scope:
menu.model_load_msmenu.model_settings_ms
- changed-line coverage for the touched executable scope:
- Swift control-plane scope:
98.39%(305/310) - Swift menu bar scope:
100.00%(171/171) - aggregate changed-line coverage across the touched handwritten executable scope:
98.96%(476/481)
- Swift control-plane scope:
- protocol schemas, generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and task-planning documents are excluded from executable changed-line coverage because they are generated or repository-ownership artifacts rather than handwritten runtime logic
- typed memory-budget rejection metrics now emitted by the touched control-plane scope:
-
Closed
M11.1by making disk-streaming mode a typed, operator-visible runtime setting across the repository-owned control-plane, worker, and desktop-shell surfaces:- extended the authoritative control-plane and worker protobuf schemas with
DiskStreamingMode, typed runtime settings, runtime-session fields, worker capabilities, and load-request flags, then regenerated the versioned Swift, Python, and descriptor outputs - updated the Swift control plane, Python bridge, on-demand loader, runtime-session store, and
model catalog so requested disk-streaming mode now flows through model policy application,
worker-backed load requests, runtime-session snapshots, and residency summaries, while
unsupported workerless or worker-backed paths fail explicitly with typed
disk_streaming_unsupportederrors instead of silently downgrading - updated the Python worker registry and gRPC server plus the Swift text worker runtime registry
and services so both worker stacks expose
supports_disk_streaming = false, rejectprefer_diskandrequire_diskloads deterministically, and report effective disk-streaming-mode metadata in residency payloads - expanded the native operator shell and runtime view model so model settings now expose a typed disk-streaming picker, model rows and summaries show the selected mode, and server-session detail renders requested versus effective disk-streaming state alongside the existing lifecycle and residency metadata
- added focused regression coverage across Python worker tests, Swift text worker tests, control-plane tests, and menu bar tests, including error mapping, residency projection, bridge-mode mapping, raw policy normalization, operator draft synchronization, and the desktop disk-streaming picker options
- marked
M11.1completed in the roadmap execution index; the active task plan can now advance toM11.2
- extended the authoritative control-plane and worker protobuf schemas with
-
Verification summary for
M11.1:make proto: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_runtime_service.py services/mlx-worker-python/tests/test_runtime_edges.py -q:31 passed in 0.20smake py-test:456 passed in 34.49sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/mlx-text-worker-swift --scratch-path /tmp/m11_1_text_cov --enable-code-coverage --filter WorkerScaffoldTests:133 tests in 1 suite passed after 1.391 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --scratch-path /tmp/m11_1_cp_cov --enable-code-coverage --filter 'ControlPlaneServiceTests|OnDemandModelLoaderTests|ModelCatalogTests|PythonBridgeWorkerClientTests': passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --scratch-path /tmp/m11_1_menu_cov --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|DesktopShellStateTests':173 tests in 3 suites passed after 3.453 secondsmake swift-test: passmake integration-test:60 passed in 734.45s (0:12:14)git diff --check: pass
-
Metrics report for
M11.1:- typed disk-streaming control-plane or operator counters in the touched scope:
control_plane.server_runtime_session_countmenu.model_settings_msmenu.server_snapshot_ms
- changed-line coverage for the touched executable scope:
- Python worker runtime scope:
96.97%(32/33) - Swift text worker scope:
100.00%(164/164) - Swift control-plane scope:
99.67%(305/306) - Swift menu bar scope:
96.53%(139/144) - aggregate changed-line coverage across the touched handwritten executable scope:
98.92%(640/647)
- Python worker runtime scope:
- protocol schemas, generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and task-planning documents are excluded from executable changed-line coverage because they are generated or repository-ownership artifacts rather than handwritten runtime logic
- typed disk-streaming control-plane or operator counters in the touched scope:
-
Closed
M10.4and, with it, the parentM10lifecycle milestone by adding repository-owned live-path lifecycle smoke evidence and operator recovery guidance:- added
Sources/MelixCLICore/LocalRuntimeFactory.swift,SessionLifecycleSmokeRunner.swift, andSessionLifecycleSmokeCommand.swift, plus the executable targetSources/MelixSessionLifecycleSmoke/main.swift, so the repository now owns a single-process lifecycle smoke harness that preserves oneControlPlaneServiceinstance while exercising pause, idle sleep, request-activity wake, and stop-start recovery against real worker sockets - added focused Swift coverage in
tests/MelixCLITests/SessionLifecycleSmokeRunnerTests.swiftfor lifecycle smoke reporting, timeout handling, command rendering, injected-client execution, stop-conflict retry, fallback assistant handling, command parsing failures, and the defaultMelixCLIRunnerlocal-runtime path - added
tests/integration/test_session_lifecycle_integration.py, which starts real worker processes, shuts down the auxiliary HTTP control plane, runsmelix-session-lifecycle-smokeagainst the live worker sockets, and asserts machine-readable pause, sleep, wake, and restart evidence - added
docs/runbooks/session-lifecycle.mdand updated the documentation maps so operators now have explicit diagnosis and recovery guidance for paused, sleeping, stopped, and failed server sessions, including how to separate lifecycle faults from connection churn - marked
M10.4and the parentM10milestone completed in the roadmap execution index and moved the active task plan toM11.1
- added
-
Verification summary for
M10.4:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --filter SessionLifecycleSmokeRunnerTests:14 tests in 1 suite passed after 3.005 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter SessionLifecycleSmokeRunnerTests:14 tests in 1 suite passed after 3.002 secondsPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest tests/integration/test_session_lifecycle_integration.py -q:1 passed in 93.36sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" COVERAGE_FILE=/tmp/m10_4_python.coverage UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python coverage run --include='tests/integration/test_session_lifecycle_integration.py' -m pytest tests/integration/test_session_lifecycle_integration.py -q:1 passed in 40.92smake swift-test: passmake integration-test:60 passed in 738.98s (0:12:18)git diff --check: pass
-
Metrics report for
M10.4:- lifecycle smoke metrics now emitted by the repository-owned smoke harness:
lifecycle.pause_ack_mslifecycle.idle_to_light_sleep_mslifecycle.wake_to_ready_mslifecycle.restart_recovery_ms
- control-plane lifecycle timings recorded during the smoke path:
control_plane.server_start_mscontrol_plane.server_pause_mscontrol_plane.server_resume_mscontrol_plane.server_wake_mscontrol_plane.server_stop_mscontrol_plane.server_idle_policy_ms
- changed-line coverage for the touched executable scope:
- Swift CLI and smoke harness:
98.30%(752/765) - Python integration coverage:
100.00%(46/46)
- Swift CLI and smoke harness:
- lifecycle smoke metrics now emitted by the repository-owned smoke harness:
-
Closed
M10.3by surfacing control-plane-owned server-session lifecycle and idle-policy truth across the desktop shell, server workspace, and chat-facing operator surfaces:- extended
apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swiftso server-session hydration now derives lifecycle summaries, runtime detail, idle-policy summaries, lifecycle banners, and chat-facing lifecycle notices directly from typed runtime-session payloads - updated
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swiftto route pause, resume, wake, stop, and idle-policy actions through the control-plane client while keeping desktop banner state authoritative to live snapshots and streamed lifecycle events instead of optimistic local lifecycle mutations - expanded
apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swiftandapps/macos-menubar/Sources/AppMain/Chat/DesktopChatView.swiftso the native Window UI now exposes lifecycle banners, inline notices, runtime detail, idle-policy summaries, and typed lifecycle controls for paused, sleeping, stopped, and failed server sessions - added focused coverage in
apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift,apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift,apps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift, andapps/macos-menubar/Tests/MenuBarTests/DesktopShellStateTests.swift, plus lifecycle-aware test support wiring inapps/macos-menubar/Tests/MenuBarTests/TestSupport.swift - marked
M10.3completed in the roadmap execution index and moved the active task plan toM10.4
- extended
-
Verification summary for
M10.3:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|DesktopShellStateTests|ControlPlaneXPCClientTests':199 tests in 4 suites passed after 3.798 secondsHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|DesktopShellStateTests|ControlPlaneXPCClientTests':199 tests in 4 suites passed after 3.813 secondsmake swift-test: passgit diff --check: pass
-
Metrics report for
M10.3:- desktop lifecycle metrics emitted by the touched scope:
menu.server_start_msmenu.server_pause_msmenu.server_resume_msmenu.server_wake_msmenu.server_stop_msmenu.server_idle_policy_ms
- handwritten menu bar executable scope changed-line coverage:
apps/macos-menubar/Sources/AppMain/Chat/DesktopChatView.swift:75.00%(69/92)apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift:88.68%(141/159)apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift:100.00%(150/150)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift:98.14%(158/161)apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:98.98%(97/98)apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(298/298)apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:97.56%(160/164)apps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift:100.00%(43/43)apps/macos-menubar/Tests/MenuBarTests/DesktopShellStateTests.swift:100.00%(0/0)- aggregate changed-line coverage for the touched handwritten menu bar scope:
95.79%(1116/1165)
- desktop lifecycle metrics emitted by the touched scope:
-
Closed
M10.2by wiring control-plane-owned lifecycle controls and idle-power policy through the server-session surface:- extended
packages/protocol/schema/controlplane/v1/control_plane.protowith explicitpause,resume,wake, andset_idle_policyserver commands, added session-scoped payloads forstartandstop, and regenerated the repository-owned Swift, Python, and descriptor outputs - expanded
services/control-plane-swift/Sources/Snapshots/ServerSessionRuntimeStore.swift,ServerSnapshotBuilder.swift, andSchedulerReadModel.swiftso runtime sessions now advance through typed lifecycle transitions, request-activity wake reasons, idle inhibition, and auto-sleep thresholds while the aggregate server-state read model derives from runtime-session truth - updated
services/control-plane-swift/Sources/XPCService/ControlPlaneService.swiftandControlPlaneXPCClient.swiftso lifecycle mutations, idle-policy validation, server snapshot projection, and serving-time pause or sleep safety all live behind the authoritative control-plane interface instead of menu-bar-local heuristics - extended
Sources/MelixCLICore/MelixCLI.swiftsomelix server snapshot|start|pause|resume|wake|stop|set-idle-policynow speak the same session-scoped control-plane contract and render typed runtime-session metadata for operators - added focused regression coverage in
tests/MelixCLITests/MelixCLIParserTests.swift,tests/MelixCLITests/MelixCLIRunnerTests.swift,services/control-plane-swift/Tests/ControlPlaneTests/SnapshotStoreTests.swift, andservices/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift, then markedM10.2completed in the roadmap execution index and moved the active task plan toM10.3
- extended
-
Verification summary for
M10.2:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --filter MelixCLITests:64 tests in 2 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter ControlPlaneTests:298 tests in 18 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter MelixCLITests: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter ControlPlaneTests: passmake proto: passmake py-test:455 passed in 34.36smake swift-test: passmake integration-test:59 passed in 692.68s (0:11:32)git diff --check: pass- repository-default verification note: the full Swift run still emits the pre-existing linker
warning: input verification failednotes for cached object files plus the existingRequestCoordinator.swiftno 'async' operations occur within 'await' expressionwarnings, but the authoritative commands above completed successfully
-
Metrics report for
M10.2:- CLI executable scope changed-line coverage:
Sources/MelixCLICore/MelixCLI.swift:99.11%(222/224)tests/MelixCLITests/MelixCLIParserTests.swift:87.69%(114/130)tests/MelixCLITests/MelixCLIRunnerTests.swift:100.00%(222/222)- aggregate CLI changed-line coverage:
96.88%(558/576)
- control-plane executable scope changed-line coverage:
services/control-plane-swift/Sources/EnginePool/SchedulerReadModel.swift:100.00%(3/3)services/control-plane-swift/Sources/Snapshots/ServerSessionRuntimeStore.swift:100.00%(164/164)services/control-plane-swift/Sources/Snapshots/ServerSnapshotBuilder.swift:100.00%(19/19)services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift:99.58%(237/238)services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift:100.00%(145/145)services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:92.52%(470/508)services/control-plane-swift/Tests/ControlPlaneTests/SnapshotStoreTests.swift:96.86%(185/191)- aggregate control-plane changed-line coverage:
96.45%(1223/1268)
- aggregate changed-line coverage for the touched handwritten Swift scope in
M10.2:96.58%(1781/1844) - protocol schemas, generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and task-planning documents are excluded from executable changed-line coverage because they are generated or repository-ownership artifacts rather than handwritten runtime logic
- CLI executable scope changed-line coverage:
-
Closed
M10.1by introducing a dedicated server-session runtime lifecycle snapshot contract:- extended
packages/protocol/schema/controlplane/v1/control_plane.protowith typedServerSessionLifecycleState,ServerSessionPowerState,ServerWakeReason, andServerSessionRuntimeStatemessages, then regenerated the repository-owned protocol outputs - added
services/control-plane-swift/Sources/Snapshots/ServerSessionRuntimeStore.swiftplusServerSnapshotBuilderandControlPlaneServicewiring so control-plane snapshots andserver.state_changedevents now project typedruntime_sessionswithout overloading the existing Phase 3 branch/session graph semantics - updated the native menu bar state model and
RuntimeViewModelso operator-facing server sessions now consume typed lifecycle, power-state, wake-reason, and idle-policy metadata from the control-plane payload instead of inferring paused-versus-sleeping locally - added focused control-plane and menu bar regression coverage for snapshot decoding, event
projection, runtime-session fallback, and enum mapping branches, then marked
M10.1completed in the roadmap execution index and active task plan
- extended
-
Verification summary for
M10.1:make proto: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter ControlPlaneServiceTests:127 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter RuntimeViewModelTests:110 tests passed, then111 tests passedafter the final fallback-coverage test was addedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage:537 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter RuntimeViewModelTests:111 tests passedmake swift-test: passgit diff --check: pass
-
Metrics report for
M10.1:- control-plane handwritten executable scope changed-line coverage:
services/control-plane-swift/Sources/Snapshots/ServerSnapshotBuilder.swift:100.00%(1/1)services/control-plane-swift/Sources/Snapshots/ServerSessionRuntimeStore.swift:100.00%(39/39)services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift:100.00%(19/19)services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:100.00%(32/32)services/control-plane-swift/Tests/ControlPlaneTests/SnapshotStoreTests.swift:100.00%(21/21)- aggregate control-plane changed-line coverage:
100.00%(112/112)
- menu bar handwritten executable scope changed-line coverage:
apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift:100.00%(18/18)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift:100.00%(94/94)apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:100.00%(2/2)apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(199/199)- aggregate menu bar changed-line coverage:
100.00%(313/313)
- aggregate changed-line coverage for the touched handwritten Swift scope in
M10.1:100.00%(425/425) - protocol schemas, generated protobuf outputs,
packages/protocol/descriptors/melix.pb, and task-planning documents are excluded from executable changed-line coverage because they are generated or repository-ownership artifacts rather than handwritten runtime logic
- control-plane handwritten executable scope changed-line coverage:
-
Closed the
M8.11platform-packaging and target-differentiation milestone and, with it, the parentM8milestone:- added
services/mlx-worker-python/worker/productization/packaging_targets.pyso the repository now owns a stable Apple Silicon packaging target matrix forlaunch_agents_checkout,homebrew_service, andmacos_app_bundle_preview, each preserving the shared logical Melix identity while makingpackaging_target_id,packaging_kind,distribution_channel,runtime_layout,state_contract, andupdate_strategyexplicit - extended
services/mlx-worker-python/worker/productization/install_assets.py,services/mlx-worker-python/worker/productization/homebrew_service.py, andservices/mlx-worker-python/worker/productization/macos_app_bundle.pyso launch-agent install manifests, Homebrew service manifests, and preview app-bundle outputs now project the shared target metadata, including embedded app-bundle target manifests and version or update environment exports - added repository-owned validation in
scripts/m8_packaging_target_smoke.py, plus focused regression coverage inservices/mlx-worker-python/tests/test_packaging_targets.py,services/mlx-worker-python/tests/test_m8_packaging_target_smoke.py, andservices/mlx-worker-python/tests/test_package_macos_menubar_app_script.py - updated
README.md,docs/runbooks/platform-packaging-targets.md,docs/runbooks/phase-8-local-install.md,docs/runbooks/homebrew-install.md,infra/packaging/README.md,infra/signing/README.md,infra/launchd/README.md,docs/plans/2026-03-30-m8-11-platform-packaging-and-target-differentiation.md, the roadmap execution index, andtask_plan.mdso the repository recordsM8.11and the parentM8milestone as completed with explicit verification and metrics evidence
- added
-
Verification summary for
M8.11:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_packaging_targets.py services/mlx-worker-python/tests/test_install_assets.py services/mlx-worker-python/tests/test_install_local_product_script.py services/mlx-worker-python/tests/test_homebrew_distribution.py services/mlx-worker-python/tests/test_homebrew_service_script.py services/mlx-worker-python/tests/test_macos_app_bundle.py services/mlx-worker-python/tests/test_package_macos_menubar_app_script.py services/mlx-worker-python/tests/test_m8_packaging_target_smoke.py -q:38 passed in 0.23sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx python scripts/m8_packaging_target_smoke.py --json: passmake py-test:455 passed in 34.03sgit diff --check: pass
-
Metrics report for
M8.11:- deterministic smoke metrics from
scripts/m8_packaging_target_smoke.py --json:packaging_target_profile_count = 3packaging_target_shared_identity_ok = 1packaging_target_distinct_packaging_kind_count = 3packaging_target_launch_agents_profile_ok = 1packaging_target_homebrew_profile_ok = 1packaging_target_app_bundle_profile_ok = 1
- Python executable scope changed-line coverage:
services/mlx-worker-python/worker/productization/__init__.py:100.00%(0/0)services/mlx-worker-python/worker/productization/install_assets.py:100.00%(3/3)services/mlx-worker-python/worker/productization/homebrew_service.py:100.00%(2/2)services/mlx-worker-python/worker/productization/macos_app_bundle.py:100.00%(8/8)services/mlx-worker-python/worker/productization/packaging_targets.py:100.00%(0/0)scripts/package_macos_menubar_app.py:100.00%(2/2)scripts/m8_packaging_target_smoke.py:100.00%(0/0)services/mlx-worker-python/tests/test_install_assets.py:100.00%(5/5)services/mlx-worker-python/tests/test_install_local_product_script.py:100.00%(1/1)services/mlx-worker-python/tests/test_homebrew_distribution.py:100.00%(4/4)services/mlx-worker-python/tests/test_macos_app_bundle.py:100.00%(13/13)services/mlx-worker-python/tests/test_packaging_targets.py:100.00%(0/0)services/mlx-worker-python/tests/test_package_macos_menubar_app_script.py:100.00%(0/0)services/mlx-worker-python/tests/test_m8_packaging_target_smoke.py:100.00%(0/0)- aggregate Python changed-line coverage:
100.00%(38/38)
- documentation and infra metrics:
N/A - reason: the remaining touched files for this transaction are repository documentation and packaging readmes rather than executable code paths
- deterministic smoke metrics from
-
Stabilized the warm-followup recovery integration assertion:
- updated
tests/integration/test_recovery_flows.pyso the live recovery test now treatsscheduler.prefix_affinity_hit_rate,scheduler.warm_route_preference_rate, andscheduler.restored_route_rateas the authoritative warm-path routing guarantees while only requiringsession.followup_ttft_delta_msto be recorded rather than forcing a positive delta on every deterministic live run - added a focused regression test for
wait_for_metric_key(...)timeout behavior so the helper covers both success and failure branches under changed-line coverage
- updated
-
Verification summary for the recovery-flow stabilization:
PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest tests/integration/test_recovery_flows.py::test_warm_followup_prefers_hot_route_and_reduces_ttft_against_cold_baseline tests/integration/test_recovery_flows.py::test_wait_for_metric_key_raises_when_metric_never_appears -q:2 passed in 11.65smake integration-test:58 passed in 691.52s (0:11:31)
-
Metrics report for the recovery-flow stabilization:
tests/integration/test_recovery_flows.py: changed-line coverage100.00%(18/18)
-
Closed the
M8.10auto-update and startup-failure handling milestone:- extended
services/mlx-worker-python/worker/productization/install_assets.py,services/mlx-worker-python/worker/productization/startup_signals.py, andscripts/install_local_product.pyso packaged Melix installs now emit versioned install manifests, repository-owned update-channel metadata, requested versus selected HTTP-port evidence, authoritative log paths, and deterministic startup-failure classification helpers - added repository-owned update metadata in
infra/packaging/update-channels/stable.jsonplus a deterministic smoke command inscripts/m8_startup_failure_smoke.py, with focused regression coverage inservices/mlx-worker-python/tests/test_install_assets.py,services/mlx-worker-python/tests/test_install_local_product_script.py,services/mlx-worker-python/tests/test_startup_signals.py, andservices/mlx-worker-python/tests/test_m8_startup_failure_smoke.py - added
apps/macos-menubar/Sources/AppMain/Persistence/ProductInstallState.swiftand wired the provider throughRuntimeViewModel,DesktopFoundationState, andStatusMenuso the native operator shell now surfaces packaged update state and actionable host-port, crash, and hang diagnostics sourced from the install manifest - expanded focused menu-bar coverage in
apps/macos-menubar/Tests/MenuBarTests/ProductInstallStateTests.swift,apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift,apps/macos-menubar/Tests/MenuBarTests/StatusMenuTests.swift, andapps/macos-menubar/Tests/MenuBarTests/AppMainBootstrapTests.swift, including environment override, version-normalization, control-plane crash, worker crash, and startup-hang branches - updated
README.md,docs/runbooks/phase-8-local-install.md,infra/packaging/README.md,docs/plans/2026-03-30-m8-10-auto-update-and-startup-failure-handling.md, the roadmap execution index, andtask_plan.mdso the repository recordsM8.10as completed with explicit verification and changed-line coverage evidence
- extended
-
Verification summary for
M8.10:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_install_assets.py services/mlx-worker-python/tests/test_install_local_product_script.py services/mlx-worker-python/tests/test_startup_signals.py services/mlx-worker-python/tests/test_m8_startup_failure_smoke.py -q:16 passed in 0.08sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx python scripts/m8_startup_failure_smoke.py --json: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter ProductInstallStateTests:10 tests in 1 suite passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'AppMainBootstrapTests|ProductInstallStateTests|RuntimeViewModelTests|StatusMenuTests|DesktopFoundationViewTests':193 tests in 5 suites passed after 3.525 secondsmake py-test:449 passed in 34.13smake swift-test: pass- verification note: the focused and repository-default Swift runs still emit the pre-existing
warning: input verification failedlinker notes for cached object files plus the existingRequestCoordinator.swiftno 'async' operations occur within 'await' expressionwarnings, but the authoritative commands above completed successfully
-
Metrics report for
M8.10:- Python executable scope changed-line coverage:
scripts/install_local_product.py:100.00%(3/3)scripts/m8_startup_failure_smoke.py:95.65%(44/46)services/mlx-worker-python/worker/productization/install_assets.py:100.00%(12/12)services/mlx-worker-python/worker/productization/startup_signals.py:92.31%(120/130)services/mlx-worker-python/tests/test_install_assets.py:100.00%(26/26)services/mlx-worker-python/tests/test_install_local_product_script.py:100.00%(6/6)services/mlx-worker-python/tests/test_startup_signals.py:100.00%(47/47)services/mlx-worker-python/tests/test_m8_startup_failure_smoke.py:100.00%(22/22)- aggregate Python changed-line coverage:
95.89%(280/292)
- menu bar executable scope changed-line coverage:
apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationState.swift:100.00%(11/11)apps/macos-menubar/Sources/AppMain/MenuBar/StatusMenu.swift:100.00%(3/3)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift:100.00%(24/24)apps/macos-menubar/Sources/AppMain/Persistence/ProductInstallState.swift:99.04%(207/209)apps/macos-menubar/Tests/MenuBarTests/AppMainBootstrapTests.swift:100.00%(3/3)apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:100.00%(2/2)apps/macos-menubar/Tests/MenuBarTests/ProductInstallStateTests.swift:100.00%(246/246)apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(39/39)apps/macos-menubar/Tests/MenuBarTests/StatusMenuTests.swift:100.00%(23/23)apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:100.00%(6/6)- aggregate menu bar changed-line coverage:
99.65%(564/566)
- aggregate changed-line coverage for the touched executable scope in
M8.10:98.39%(844/858)
- Python executable scope changed-line coverage:
-
Closed the
M8.9Homebrew formula and services milestone:- added repository-owned Homebrew packaging assets in
infra/homebrew/Formula/melix.rbandinfra/homebrew/README.md, including a formula that installs from the checked-out repository root, builds the Melix CLI plus the control-plane and Swift text-worker binaries, and exposes amelix-homebrew-servicewrapper forbrew services - added
services/mlx-worker-python/worker/productization/homebrew_formula.py,services/mlx-worker-python/worker/productization/homebrew_service.py, andscripts/melix_homebrew_service.pyso Homebrew service startup reuses Melix local-product layout semantics while supervising the control plane, Swift text worker, and Python worker from one repository-owned entrypoint - added deterministic packaging smoke commands in
scripts/m8_homebrew_formula_smoke.pyandscripts/m8_homebrew_service_smoke.py, plus focused regression coverage inservices/mlx-worker-python/tests/test_homebrew_distribution.pyandservices/mlx-worker-python/tests/test_homebrew_service_script.py, including failure, shutdown-timeout, signal-stop, and environment-root branches - documented Homebrew install, upgrade, stop, and prune behavior in
docs/runbooks/homebrew-install.mdand surfaced the path fromREADME.md,docs/README.md, andinfra/packaging/README.md - updated
docs/plans/2026-03-30-m8-9-homebrew-formula-and-services.md,docs/plans/2026-03-30-full-capability-roadmap-execution-index.md, andtask_plan.mdso the repository recordsM8.9as completed with explicit verification and metrics evidence
- added repository-owned Homebrew packaging assets in
-
Verification summary for
M8.9:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_homebrew_distribution.py services/mlx-worker-python/tests/test_homebrew_service_script.py -q:14 passed in 0.17sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx python scripts/m8_homebrew_formula_smoke.py --json: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx python scripts/m8_homebrew_service_smoke.py --json: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx python scripts/melix_homebrew_service.py manifest --json: passruby -c infra/homebrew/Formula/melix.rb:Syntax OKmake py-test:441 passed in 30.17sgit diff --check: pass
-
Metrics report for
M8.9:- Python executable scope changed-line coverage:
services/mlx-worker-python/worker/productization/homebrew_formula.py:100.00%(16/16)services/mlx-worker-python/worker/productization/homebrew_service.py:100.00%(98/98)services/mlx-worker-python/tests/test_homebrew_distribution.py:100.00%(161/161)services/mlx-worker-python/tests/test_homebrew_service_script.py:100.00%(72/72)scripts/m8_homebrew_formula_smoke.py:100.00%(27/27)scripts/m8_homebrew_service_smoke.py:100.00%(42/42)scripts/melix_homebrew_service.py:100.00%(37/37)- aggregate Python changed-line coverage:
100.00%(453/453)
- Ruby Homebrew formula scope changed-line coverage:
N/Abecause the repository does not yet provide a changed-line coverage tool for Ruby formula files
- Python executable scope changed-line coverage:
-
Closed the
M8.8generation-config and OCR sampling controls milestone:- extended
services/mlx-worker-python/worker/model_registry/catalog.pyso registry discovery now imports inspectablemelix.generation_config.*metadata fromgeneration_config.jsonwithout overwriting explicit manifest ext values, while malformed and non-mapping sidecars remain safe no-ops - updated
services/control-plane-swift/Sources/Requests/TextRequestShaper.swift,services/control-plane-swift/Sources/Requests/ChatRequestTranslator.swift,services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift,services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift, andservices/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swiftso imported generation-config defaults flow through a shared model-sampling policy and OCR-specific overrides only win when explicitly configured - expanded
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swiftandapps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swiftso the native operator shell now exposes OCR sampling profile, temperature, top-p, and max-token controls in the shared model-settings form while also surfacing generation-config provenance and effective OCR defaults in the model info summary - added focused regression coverage in
services/mlx-worker-python/tests/test_model_registry_catalog.py,services/control-plane-swift/Tests/ControlPlaneTests/TextEndpointContractTests.swift,services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift,apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift,apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift, andapps/macos-menubar/Tests/MenuBarTests/TestSupport.swift - updated
docs/plans/2026-03-30-m8-8-generation-config-and-ocr-sampling-controls.md,docs/plans/2026-03-30-full-capability-roadmap-execution-index.md, andtask_plan.mdso the repository recordsM8.8as completed with explicit verification and coverage evidence instead of leaving the slice pending
- extended
-
Verification summary for
M8.8:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_model_registry_catalog.py -q:11 passed in 0.08sHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'TextEndpointContractTests|PythonBridgeWorkerClientTests': passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests': passmake proto: passmake py-test:425 passed in 34.13smake swift-test: passmake integration-test:58 passed in 692.74s (0:11:32)
-
Metrics report for
M8.8:- Python changed-line coverage:
services/mlx-worker-python/worker/model_registry/catalog.py:100.00%(37/37)services/mlx-worker-python/tests/test_model_registry_catalog.py:100.00%(49/49)- aggregate Python changed-line coverage:
100.00%(86/86)
- control-plane changed-line coverage:
services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift:84.62%(11/13)services/control-plane-swift/Sources/Requests/ChatRequestTranslator.swift:100.00%(1/1)services/control-plane-swift/Sources/Requests/TextRequestShaper.swift:100.00%(34/34)services/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift:100.00%(27/27)services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift:84.62%(11/13)services/control-plane-swift/Tests/ControlPlaneTests/TextEndpointContractTests.swift:100.00%(76/76)services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift:100.00%(36/36)- aggregate control-plane changed-line coverage:
98.00%(196/200)
- menu bar changed-line coverage:
apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swift:100.00%(54/54)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift:95.45%(126/132)apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:100.00%(72/72)apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(16/16)apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:100.00%(12/12)- aggregate menu bar changed-line coverage:
97.90%(280/286)
- Python changed-line coverage:
-
Closed the
M8.7model-settings completion milestone:- extended
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swiftso the native operator shell now tracks typed drafts for type override, TTL seconds, adaptive thinking mode and budget, parser fallback, and merged effective OCR/parser defaults in the same model-settings flow - updated
services/control-plane-swift/Sources/XPCService/ControlPlaneService.swiftso empty-string TTL and adaptive-thinking budget drafts clear to zero without destructive side effects, while typed adaptive-thinking parsing remains explicit - expanded
apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swiftandapps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swiftso operators can edit the full per-model settings surface and inspect effective model info through a shared summary surface - added focused regression coverage in
apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift,apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift, andservices/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift - stabilized
tests/integration/test_recovery_flows.pyso the warm-followup recovery assertion tolerates outer HTTP jitter while the control-planesession.followup_ttft_delta_msmetric remains the authoritative proof of warm-route improvement - updated
docs/plans/2026-03-30-m8-7-model-settings-completion.md,docs/plans/2026-03-30-full-capability-roadmap-execution-index.md, andtask_plan.mdso the repository now recordsM8.7as completed instead of leaving the slice pending
- extended
-
Verification summary for
M8.7:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'executeMapsAdaptiveThinkingAndParserFallbackModelPolicyValues|executeClearsTTLandAdaptiveThinkingBudgetsWhenDraftsAreEmpty':2 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'modelsTabFormButtonsDispatchActions|modelInfoSummaryViewRendersTypedSettingsAndMergedDefaults|modelSettingsValidationGuardsInvalidDraftsResetsValuesAndNoOpsWithoutPrimaryModel|modelSettingsDraftsNormalizeUnknownResidencyAccelerationAndAdaptiveDefaults':4 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'executeMapsAdaptiveThinkingAndParserFallbackModelPolicyValues|executeClearsTTLandAdaptiveThinkingBudgetsWhenDraftsAreEmpty':2 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':154 tests in 2 suites passedmake proto: passmake py-test:423 passed in 34.06smake swift-test: passmake integration-test:58 passed in 690.93s (0:11:30)
-
Metrics report for
M8.7:- control-plane changed-line coverage:
services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift:100.00%(11/11)services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift:100.00%(38/38)- aggregate control-plane changed-line coverage:
100.00%(49/49)
- menu bar changed-line coverage:
apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationView.swift:92.61%(213/230)apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift:100.00%(1/1)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift:99.51%(202/203)apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift:100.00%(123/123)apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(105/105)apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift:100.00%(19/19)- aggregate menu bar changed-line coverage:
97.36%(663/681)
- integration changed-line coverage:
tests/integration/test_recovery_flows.py:100.00%(1/1)
- control-plane changed-line coverage:
-
Closed the
M8.6admin-state persistence and offline-assets milestone:- extended
apps/macos-menubar/Sources/AppMain/Persistence/OperatorSessionStore.swiftso operator-session payloads now persistselected_tool_sectionand restore safely from legacy payloads that predate that field - updated
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swiftso the menu bar operator shell restores the selected tool section together with the selected surface and server session - added focused regression coverage in
apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swiftand a repository-owned smoke suite inapps/macos-menubar/Tests/MenuBarTests/OperatorSessionPersistenceSmokeTests.swift - added
scripts/m8_admin_state_smoke.pyplus Python wrapper coverage inservices/mlx-worker-python/tests/test_m8_admin_state_smoke.pyso the touched scope has a stable repository-owned smoke command rather than an ad hoc local script - documented the persistence and offline-assets contract in
docs/runbooks/admin-surface-persistence.md, updateddocs/README.md, and markedM8.6completed in the execution index
- extended
-
Verification summary for
M8.6:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter 'persistsSelectedToolSectionAndRestoresAcrossRestart|restoresDefaultToolSectionForLegacyOperatorSessionState':2 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --filter OperatorSessionPersistenceSmokeTests: passpython3 scripts/m8_admin_state_smoke.py --json: passmake proto: passmake py-test:423 passed in 34.01smake swift-test: passmake integration-test:58 passed in 691.30s (0:11:31)
-
Metrics report for
M8.6:- smoke metrics from
python3 scripts/m8_admin_state_smoke.py --json:operator.session_restore_ms = 0.4190206527709961operator.session_persist_write_ms = 2.0880699157714844operator.session_tool_section_persisted = 1operator.session_tool_section_restored = 1operator.session_root_permissions_ok = 1operator.session_state_directory_permissions_ok = 1operator.session_file_permissions_ok = 1operator.offline_asset_external_reference_count = 0
- Swift executable scope changed-line coverage:
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift:100.00%(2/2)apps/macos-menubar/Sources/AppMain/Persistence/OperatorSessionStore.swift:100.00%(11/11)apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift:100.00%(69/69)apps/macos-menubar/Tests/MenuBarTests/OperatorSessionPersistenceSmokeTests.swift:97.14%(68/70)- aggregate Swift changed-line coverage:
98.68%(150/152)
- Python executable scope changed-line coverage:
scripts/m8_admin_state_smoke.py:97.14%(34/35)services/mlx-worker-python/tests/test_m8_admin_state_smoke.py:98.28%(57/58)- aggregate Python changed-line coverage:
97.85%(91/93)
- smoke metrics from
-
Closed the
M8.5admin-surface expansion milestone:- verified that the native operator shell already exposes the planned runtime, models, downloads, training, diagnostics, logs, settings, chat, image, server, and API surfaces from control-plane-backed menu bar state
- confirmed the existing menu bar package coverage already exercises the expanded admin shell, including LoRA tooling, benchmark and evaluation diagnostics, matrix benchmark views, direct Hugging Face benchmark targeting, and agent integration export presentation
- updated
docs/plans/2026-03-30-m8-5-admin-surface-expansion.mdanddocs/plans/2026-03-30-full-capability-roadmap-execution-index.mdso the repository now recordsM8.5as completed instead of leaving the slice implicitly pending
-
Verification summary for
M8.5:make swift-test: passmake integration-test:58 passed in 700.76s (0:11:40)
-
Metrics report for
M8.5:- changed-line coverage for the touched executable scope:
N/A - reason: this close-out transaction updates milestone bookkeeping only; the executable admin-surface coverage remains recorded in the repository test suite and was revalidated through the default Swift and integration commands above
- changed-line coverage for the touched executable scope:
-
Closed the
M9.8ecosystem-and-security release-gates transaction:- extended
services/mlx-worker-python/worker/productization/release_gates.pyso the Phase 8 release gate now collects repository-owned M9 evidence for MCP auto-injection, agent export, shared access, persistent sessions, rich-output sanitization, connection lifecycle, and closure audit - versioned the checked-in
m9gate thresholds ininfra/release/phase8-release-gate-policy.json, including machine-readablerelease_gate.m9_required_probe_count,release_gate.m9_missing_probe_count, andrelease_gate.m9_failed_threshold_count - extended
services/mlx-worker-python/worker/productization/acceptance_metrics.pyso the Phase 8 metrics report now exposes therelease_gate.m9_*counters without creating a second unrelated gate system - added the deterministic fixture command
scripts/m9_release_gate_smoke.pyplus focused coverage inservices/mlx-worker-python/tests/test_m9_release_gate_smoke.py,services/mlx-worker-python/tests/test_release_gates.py, andservices/mlx-worker-python/tests/test_acceptance_metrics.py - updated
docs/runbooks/phase-8-release-gates.mdanddocs/runbooks/phase-8-product-acceptance.mdso the M9 signals, smoke fixtures, and operator-facing interpretation are synchronized with the checked-in gate behavior
- extended
-
Verification summary for
M9.8:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_release_gates.py services/mlx-worker-python/tests/test_phase8_release_gate.py services/mlx-worker-python/tests/test_phase8_runtime_probes.py services/mlx-worker-python/tests/test_acceptance_metrics.py services/mlx-worker-python/tests/test_m9_release_gate_smoke.py -q:74 passed in 1.76sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_release_gates.py services/mlx-worker-python/tests/test_phase8_runtime_probes.py services/mlx-worker-python/tests/test_acceptance_metrics.py services/mlx-worker-python/tests/test_m9_release_gate_smoke.py -q:76 passed in 1.73sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/m9_release_gate_smoke.py --repo-root "$(pwd)" --json: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/m9_release_gate_smoke.py --repo-root "$(pwd)" --fixture-mode failing --json: expected non-zero fail-closed path validatedPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/phase8_release_gate.py --repo-root "$(pwd)" --json: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/m9_closure_audit.py --repo-root "$(pwd)" --json: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/phase8_metrics_report.py --repo-root "$(pwd)" --json: pass
-
Metrics report for
M9.8:- deterministic smoke fixture metrics:
release_gate.m9_required_probe_count = 23.0release_gate.m9_missing_probe_count = 0.0release_gate.m9_failed_threshold_count = 0.0
- deterministic failing fixture metrics:
release_gate.m9_required_probe_count = 23.0release_gate.m9_missing_probe_count = 1.0release_gate.m9_failed_threshold_count = 2.0
- live Phase 8 gate metrics:
release_gate.m9_required_probe_count = 23.0release_gate.m9_missing_probe_count = 0.0release_gate.m9_failed_threshold_count = 0.0
- post-close closure-audit metrics:
closure_audit.blocker_count = 0.0closure_audit.accepted_risk_count = 1.0closure_audit.evidence_gap_count = 0.0closure_audit.deferred_work_count = 0.0
- Python executable scope changed-line coverage:
services/mlx-worker-python/worker/productization/release_gates.pyservices/mlx-worker-python/worker/productization/acceptance_metrics.pyservices/mlx-worker-python/tests/test_release_gates.pyservices/mlx-worker-python/tests/test_phase8_runtime_probes.pyservices/mlx-worker-python/tests/test_acceptance_metrics.pyservices/mlx-worker-python/tests/test_m9_release_gate_smoke.pyscripts/m9_release_gate_smoke.py- changed-line coverage
100.00%(175/175)
- deterministic smoke fixture metrics:
-
Closed the
M9.7security-and-stability closure-audit transaction:- added a typed repository-owned closure-audit model in
services/mlx-worker-python/worker/productization/closure_audit.pythat classifies blockers, accepted risks, evidence gaps, and deferred work from execution-index status, release-gate assets, required M9 runbooks, and required probe vocabulary - added repository-owned audit entrypoints and docs in
scripts/m9_closure_audit.py,docs/runbooks/security-and-stability-closure.md, anddocs/decisions/2026-04-02-m9-security-stability-closure-audit.md - extended
services/mlx-worker-python/worker/productization/acceptance_metrics.pyso phase metrics can surfaceclosure_audit.*counters, and wired the live metrics script path inscripts/phase8_metrics_report.py - added focused Python evidence in
services/mlx-worker-python/tests/test_closure_audit.pyand extendedservices/mlx-worker-python/tests/test_acceptance_metrics.py
- added a typed repository-owned closure-audit model in
-
Verification summary for
M9.7:PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_closure_audit.py services/mlx-worker-python/tests/test_acceptance_metrics.py -q:16 passed in 0.10sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/m9_closure_audit.py --repo-root "$(pwd)" --json: passPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/phase8_metrics_report.py --repo-root "$(pwd)" --json > /tmp/m9_7_phase8_metrics_output.json: passgit diff --check: pass
-
Metrics report for
M9.7:- repository-owned closure-audit metrics from
scripts/m9_closure_audit.py --repo-root "$(pwd)" --jsonrecorded:closure_audit.blocker_count = 0closure_audit.accepted_risk_count = 1closure_audit.evidence_gap_count = 0closure_audit.deferred_work_count = 1
scripts/phase8_metrics_report.py --jsonnow surfaces:closure_audit.blocker_count = 0closure_audit.accepted_risk_count = 1closure_audit.evidence_gap_count = 0closure_audit.deferred_work_count = 1top_unresolved_findings = ["M9.8 release-gate wiring remains deferred until ecosystem evidence is consumed by the release gate."]
- Python executable scope changed-line coverage:
services/mlx-worker-python/worker/productization/closure_audit.pyservices/mlx-worker-python/worker/productization/acceptance_metrics.pyservices/mlx-worker-python/tests/test_closure_audit.pyservices/mlx-worker-python/tests/test_acceptance_metrics.pyscripts/m9_closure_audit.pyscripts/phase8_metrics_report.py- changed-line coverage
98.35%(238/242)
- repository-owned closure-audit metrics from
-
Closed the
M9.6connection-lifecycle hardening transaction:- added a repository-owned
ConnectionLifecyclePolicyinservices/control-plane-swift/Sources/HTTPGateway/SSE/ConnectionLifecyclePolicy.swiftand wired it throughSSEStreamWriter,RequestCoordinator,ControlPlaneChatExecution,ControlPlaneService, and the HTTP chat handler so keepalive cadence, disconnect grace, retry policy, and resume buffering now share one typed contract - hardened resumable chat execution tracking so transient HTTP disconnects open a bounded resume window, successful resume preserves request identity, terminal expiry rejects stale resume attempts with
request_not_resumable, and the race between disconnect expiry and stale resume is closed by making terminal-ineligible requests explicit in the coordinator - added repository-owned evidence in
services/control-plane-swift/Tests/HTTPGatewayTests/ConnectionLifecyclePolicyTests.swift,tests/integration/test_connection_lifecycle.py,scripts/m9_connection_smoke.py,tests/test_m9_connection_smoke.py, anddocs/runbooks/connection-lifecycle.md - registered the new runbook from
docs/runbooks/README.mdand the documentation map fromdocs/README.md
- added a repository-owned
-
Verification summary for
M9.6:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'ConnectionLifecyclePolicyTests|SSEStreamWriterTests|RequestCoordinatorTests|OpenAIHandlerTests|ControlPlaneChatExecutionTests|ControlPlaneServiceTests':288 tests in 6 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ConnectionLifecyclePolicyTests|SSEStreamWriterTests|RequestCoordinatorTests|OpenAIHandlerTests|ControlPlaneChatExecutionTests|ControlPlaneServiceTests':288 tests in 6 suites passedPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest tests/integration/test_recovery_flows.py tests/integration/test_connection_lifecycle.py tests/test_m9_connection_smoke.py -q:11 passed in 117.39s (0:01:57)PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/m9_connection_smoke.py --json:ok = truegit diff --check: pass- verification note: the focused Swift runs still emitted the pre-existing
warning: input verification failedlinker notes while processingSwiftTextWorkerClient.swift.o, andRequestCoordinator.swiftstill emits the existingno 'async' operations occur within 'await' expressionwarnings for the local continuation registration helpers; the authoritative commands above completed successfully
-
Metrics report for
M9.6:- repository-owned smoke metrics from
scripts/m9_connection_smoke.py --jsonrecorded:disconnect.keepalive_gap_ms = 8.082032203674316disconnect.recovery_latency_ms = 12.388944625854492disconnect.resume_success_rate = 100disconnect.terminal_failure_count = 1
- Swift executable scope changed-line coverage:
services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swiftservices/control-plane-swift/Sources/HTTPGateway/SSE/SSEStreamWriter.swiftservices/control-plane-swift/Sources/HTTPGateway/SSE/ConnectionLifecyclePolicy.swiftservices/control-plane-swift/Sources/Requests/ChatRequestTranslator.swiftservices/control-plane-swift/Sources/Requests/RequestCoordinator.swiftservices/control-plane-swift/Sources/XPCService/ControlPlaneChatExecution.swiftservices/control-plane-swift/Sources/XPCService/ControlPlaneService.swiftservices/control-plane-swift/Tests/HTTPGatewayTests/ConnectionLifecyclePolicyTests.swiftservices/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swiftservices/control-plane-swift/Tests/HTTPGatewayTests/RequestCoordinatorTests.swiftservices/control-plane-swift/Tests/HTTPGatewayTests/SSEStreamWriterTests.swiftservices/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneChatExecutionTests.swiftservices/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift- changed-line coverage
95.49%(826/865)
- Python executable scope changed-line coverage:
tests/integration/test_connection_lifecycle.pyscripts/m9_connection_smoke.pytests/test_m9_connection_smoke.py- changed-line coverage
95.00%(304/320)
- aggregate changed-line coverage for the touched executable scope in
M9.6:95.36%(1130/1185)
- repository-owned smoke metrics from
-
Closed the
M9.5rich-output sanitization transaction:- added repository-owned rich-output sanitizer coverage in
services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift, including fenced-code preservation, HTML-fragment stripping, unsafe URI rejection, and recursive JSON string sanitization for both handwritten and typed gateway responses - added gateway contract tests in
services/control-plane-swift/Tests/HTTPGatewayTests/RichOutputSanitizerTests.swiftandservices/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift, including metrics assertions for sanitized auth-session payloads - projected the same sanitization contract into operator-facing menu bar surfaces by sanitizing doctor and benchmark markdown, evaluation previews, desktop logs, exported chat transcripts, and local error strings without mutating stored assistant transcript state
- added
docs/runbooks/rich-output-sanitization.mdand registered it fromdocs/runbooks/README.md
- added repository-owned rich-output sanitizer coverage in
-
Verification summary for
M9.5:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'RichOutputSanitizerTests|OpenAIHandlerTests':103 tests in 2 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'RichOutputSanitizerTests|OpenAIHandlerTests':103 tests in 2 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --scratch-path "$(pwd)/.build/menubar-scratch" --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':146 tests in 2 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --scratch-path "$(pwd)/.build/menubar-coverage" --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':146 tests in 2 suites passed- verification note: both Swift coverage builds emitted a pre-existing
warning: input verification failedlinker note while processing object files, but the authoritative test results above completed successfully and produced usableprofdata
-
Metrics report for
M9.5:- deterministic gateway sanitization fixture from
gateway auth session responses sanitize rich output in encoded and manual json payloadsrecorded:sanitized_output.enforcement_count = 2sanitized_output.blocked_html_fragment_count = 4sanitized_output.unsafe_uri_rejection_count = 4
services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift,services/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift, andservices/control-plane-swift/Tests/HTTPGatewayTests/RichOutputSanitizerTests.swift: changed-line coverage95.71%(290/303)apps/macos-menubar/Sources/AppMain/Chat/DesktopChatView.swift,apps/macos-menubar/Sources/AppMain/Dashboard/DesktopFoundationState.swift,apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift, andapps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift: changed-line coverage100.00%(136/136)- aggregate changed-line coverage for the touched executable scope in
M9.5:97.04%(426/439)
- deterministic gateway sanitization fixture from
-
Closed the
M9.4persistent-session foundation transaction:- added
services/control-plane-swift/Sources/HTTPGateway/OpenAI/PersistentAuthSessionStore.swiftto persist hashed remember-me gateway sessions underMELIX_HOME/state/persistent-auth-sessions.jsonor~/.melix/state/persistent-auth-sessions.json - restored remembered sessions during bootstrap, reconciled them against live gateway policy updates, initialized
persistent_session.*metrics, and extended the control-plane HTTP parser to acceptDELETEfor sign-out - added gateway session create, inspect, and sign-out routes in
OpenAIHandler.swift, including structuredmissing,revoked, andexpiredsession-state payloads - projected remembered-session counts, retention TTL, expiry pruning, and sign-out latency into the menu bar server-session shell and gateway-access summary
- added
docs/runbooks/persistent-sessions.md,scripts/m9_persistent_session_smoke.py,tests/test_m9_persistent_session_smoke.py, andtests/integration/test_persistent_sessions.py
- added
-
Verification summary for
M9.4:HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --filter 'PersistentAuthSessionStoreTests|OpenAIHandlerTests|ControlPlaneServiceTests':224 tests in 3 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'PersistentAuthSessionStoreTests|OpenAIHandlerTests|ControlPlaneServiceTests':224 tests in 3 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --scratch-path "$(pwd)/.build/menubar-scratch" --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':144 tests in 2 suites passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --scratch-path "$(pwd)/.build/menubar-coverage" --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':144 tests in 2 suites passedPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest tests/integration/test_persistent_sessions.py -q:2 passed in 43.28sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python pytest tests/test_m9_persistent_session_smoke.py -q:2 passed in 0.04sPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" UV_CACHE_DIR="$(pwd)/.uv-cache" uv run --project services/mlx-worker-python python scripts/m9_persistent_session_smoke.py --json: pass- verification note: a first parallel rerun of the integration test and smoke script collided on the fixed local control-plane port and produced
POSIXErrorCode(rawValue: 48): Address already in use; the authoritative integration result above is the sequential rerun after the smoke script exited
-
Metrics report for
M9.4:- smoke metrics from
scripts/m9_persistent_session_smoke.py --json:persistent_session.active_session_count = 0persistent_session.remembered_session_count = 0persistent_session.expired_session_count = 0persistent_session.restore_success_rate = 0persistent_session.sign_out_latency_ms = 0.8280277252197266
services/control-plane-swift/Sources/Bootstrap/main.swift,services/control-plane-swift/Sources/HTTPGateway/OpenAI/GatewayAccessPolicy.swift,services/control-plane-swift/Sources/HTTPGateway/OpenAI/OpenAIHandler.swift,services/control-plane-swift/Sources/HTTPGateway/OpenAI/PersistentAuthSessionStore.swift,services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift,services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift,services/control-plane-swift/Tests/HTTPGatewayTests/OpenAIHandlerTests.swift, andservices/control-plane-swift/Tests/HTTPGatewayTests/PersistentAuthSessionStoreTests.swift: aggregate changed-line coverage99.15%(1047/1056)apps/macos-menubar/Sources/AppMain/Models/DesktopShellState.swift,apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift,apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift,apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift, andapps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift: aggregate changed-line coverage100.00%(183/183)tests/integration/test_persistent_sessions.py,tests/test_m9_persistent_session_smoke.py, andscripts/m9_persistent_session_smoke.py: aggregate changed-line coverage95.48%(190/199)- aggregate changed-line coverage for the touched executable scope in
M9.4:98.75%(1420/1438)
- smoke metrics from
-
Closed the live benchmark repair transaction for direct Hugging Face benchmark targets:
- fixed
services/mlx-worker-python/worker/control_plane_bridge.pyso the Python maintenance bridge now forwardsexport-resultsandsubmit-results - added bridge regressions in
services/mlx-worker-python/tests/test_control_plane_bridge_phase5.pyandservices/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift - fixed
services/mlx-worker-python/worker/engine/maintenance_core.pyso text-backed Gemma 4 benchmark prompts preservePreparedVisionRequestpayloads instead of collapsing them into plain strings - added a worker regression in
services/mlx-worker-python/tests/test_maintenance_service.pycoveringtext-generationbenchmark metrics for imported text-backedgemma4VLM repos - verified the public
melixCLI benchmark path for both target repos and copied the final benchmark reports into/tmp
- fixed
-
Verification summary for the live benchmark repair:
git diff --check: passHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter PythonBridgeWorkerClientTests:44 tests passedPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python coverage run -m pytest services/mlx-worker-python/tests/test_control_plane_bridge_phase5.py services/mlx-worker-python/tests/test_maintenance_service.py -q:65 passed in 28.06s- live proof benchmark for
unsloth/gemma-4-E4B-it-MLX-8bitviamelix bench run --repo-id ... --suite smoke --context-length 143 --generation-length 8 --batch-size 1 --repeats 1 --cache-profile cold --sample-size 1 --batch-factor 1 --json:bench.smoke.ttft_ms = 15645.22bench.smoke.tokens_per_second = 58.75- report saved to
/tmp/melix-gemma4-bench-report.md
- live proof benchmark for
Brooooooklyn/Qwen3.5-9B-unsloth-mlxvia the same CLI contract:bench.smoke.ttft_ms = 14663.95bench.smoke.tokens_per_second = 47.01- report saved to
/tmp/melix-qwen35-9b-bench-report.md
-
Metrics report for the live benchmark repair:
services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swift: changed-line coverage100.00%(37/37)services/mlx-worker-python/worker/control_plane_bridge.py,services/mlx-worker-python/worker/engine/maintenance_core.py,services/mlx-worker-python/tests/test_control_plane_bridge_phase5.py, andservices/mlx-worker-python/tests/test_maintenance_service.py: aggregate changed-line coverage100.00%(67/67)- aggregate changed-line coverage for the touched executable Swift and Python scope:
100.00%(104/104) docs/plans/2026-04-04-live-benchmark-repair.mdis documentation-only and excluded from executable changed-line coverage
-
Closed the M8.1-M8.4 backend-foundations verification and milestone backfill:
- reran the repository-default verification commands after the accumulated M8.1-M8.4 backend work and confirmed the slice now closes without the earlier Swift blocker
- updated
docs/plans/2026-04-01-m8-1-m8-4-backend-foundations.mdso the final verification and handoff checklist reflects the real repository state - updated
docs/plans/2026-03-30-full-capability-roadmap-execution-index.mdsoM8now explicitly recordsM8.1-M8.4as completed backend foundations while keepingM8.5-M8.11pending
-
Verification summary for the M8.1-M8.4 close-out:
make proto: passmake py-test:403 passed in 34.05smake swift-test: passmake integration-test:54 passed in 622.59s (0:10:22)
-
Metrics report for the M8.1-M8.4 close-out:
- changed-line coverage for the touched executable scope:
N/A - reason: this close-out transaction updates repository planning and progress records only; the executable changed-line coverage for Tasks 1-4 remains recorded inside
docs/plans/2026-04-01-m8-1-m8-4-backend-foundations.md
- changed-line coverage for the touched executable scope:
-
Started the
bench matrixtransaction on top of the closed canonicalbench/evalexpansion. -
Closed Slice 1, the contract and planning reset for experimental performance matrix work:
- updated
docs/benchmark-evaluation-contract.mdsobench matrixis now a canonical Melix workflow rather than a future-only note - defined a separate matrix request, persistence, export, and Window UI contract distinct from product-facing
bench run - added
docs/plans/2026-04-03-bench-matrix-performance-lab.mdas the execution plan for the new transaction - reset
task_plan.mdso the repository now tracks the activebench matrixwork instead of the already-closed canonicalbench/evalexpansion - updated
docs/README.mdso the new execution plan is discoverable from the documentation map
- updated
-
Verification summary for Slice 1:
git diff --check: pass
-
Metrics report for Slice 1:
- changed-line coverage for the touched executable scope:
N/A - reason: this slice changes repository documentation and planning records only
- changed-line coverage for the touched executable scope:
-
Closed Slice 2, the protocol, CLI, and control-plane bench matrix surface:
- added
RunBenchMatrixto the control-plane and worker protobuf schemas and regenerated the Swift, Python, and descriptor artifacts - added
melix bench matrix run,melix bench matrix list,melix bench matrix export-summary-csv, andmelix bench matrix export-requests-csvto the shared CLI - taught the shared local control-plane client to build and decode typed matrix benchmark requests and replies
- taught
ControlPlaneServiceto validate matrix dimensions, normalize repeated values, enforce the matrix guardrail, and route matrix jobs through the model-operations worker - taught the Python control-plane bridge to forward
run-bench-matrixrequests to the worker-side maintenance service - added parser, runner, export-bundle, control-plane, worker-client, XPC client, and bridge coverage for the new matrix request path
- added
-
Verification summary for Slice 2:
swift test --filter 'MelixCLIParserTests|MelixCLIRunnerTests':55 tests passedswift test --package-path services/control-plane-swift --filter 'ControlPlaneServiceTests|BenchmarkExportBundleTests|WorkerClientTests|PythonBridgeWorkerClientTests':215 tests passedswift test --package-path apps/macos-menubar --filter ControlPlaneXPCClientTests:27 tests passedPYTHONPATH="/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_control_plane_bridge_phase5.py -q:4 tests passedswift test --enable-code-coverage --filter 'MelixCLIParserTests|MelixCLIRunnerTests':55 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests|BenchmarkExportBundleTests|WorkerClientTests|PythonBridgeWorkerClientTests':215 tests passedswift test --package-path apps/macos-menubar --enable-code-coverage --filter ControlPlaneXPCClientTests:27 tests passedcoverage run -m pytest services/mlx-worker-python/tests/test_control_plane_bridge_phase5.py -q:4 tests passed
-
Metrics report for Slice 2:
Sources/MelixCLICore/MelixCLI.swift: changed-line coverage98.67%(297/301)services/control-plane-swift/Sources/WorkerClient/WorkerClient.swift: changed-line coverage100.00%(4/4)services/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift: changed-line coverage100.00%(7/7)services/control-plane-swift/Sources/XPCService/BenchmarkExportBundle.swift: changed-line coverage100.00%(205/205)services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift: changed-line coverage99.27%(136/137)services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift: changed-line coverage100.00%(75/75)services/mlx-worker-python/worker/control_plane_bridge.py: changed-line coverage100.00%(4/4)- aggregate changed-line coverage for the handwritten executable scope in Slice 2:
99.32%(728/733) - generated protobuf schemas and generated protocol outputs are recorded as
N/Afor changed-line coverage because they are interface or generated artifacts rather than repository-owned executable logic
-
Closed Slice 3, the worker-side matrix runner, persistence, and export surface:
- added typed matrix execution in
maintenance_core.py, including per-cell request rows, aggregated summary rows, and task-aware validation - persisted matrix runs under
<jobs_root>/bench/matrix-runs/<job_id>/with job JSON, summary JSONL/CSV, and request JSONL/CSV artifacts - exposed matrix execution through the worker gRPC service
- extended benchmark export and submission builders to carry matrix jobs, matrix summary rows, and matrix request rows
- added worker tests for successful matrix runs, VLM matrix coverage, export/submission collection, invalid load budgets, failed sample rows, and matrix task-kind resolution
- added typed matrix execution in
-
Verification summary for Slice 3:
PYTHONPATH="/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_maintenance_service.py -q:60 tests passedPYTHONPATH="/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python" uv run --project services/mlx-worker-python --extra mlx coverage run -m pytest services/mlx-worker-python/tests/test_benchmark_schemas.py services/mlx-worker-python/tests/test_benchmark_store.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_maintenance_service.py -q:83 tests passed
-
Metrics report for Slice 3:
services/mlx-worker-python/worker/engine/maintenance_core.py: changed-line coverage100.00%(192/192)services/mlx-worker-python/worker/grpc_server.py: changed-line coverage100.00%(3/3)services/mlx-worker-python/worker/productization/benchmark_export.py: changed-line coverage100.00%(42/42)services/mlx-worker-python/worker/productization/benchmark_schemas.py: changed-line coverage100.00%(80/80)services/mlx-worker-python/worker/productization/benchmark_store.py: changed-line coverage100.00%(14/14)services/mlx-worker-python/worker/productization/submission_builder.py: changed-line coverage100.00%(3/3)- aggregate changed-line coverage for the executable Python scope in Slice 3:
100.00%(334/334)
-
Closed Slice 4, the Window UI matrix controls and result-view surface:
- added a
Standard / Matrixpresentation-mode switch inside the Bench diagnostics workspace - added matrix-specific controls for generation lengths, cache profiles, reasoning modes, structured-output modes, concurrency, repeats, and request-vs-duration load budgets
- added matrix run dispatch, history selection, summary cards, context and throughput charts, and per-run CSV export actions to
RuntimeViewModel - kept matrix rendering separate from the product-facing benchmark cards and charts so the existing
bench runworkspace semantics remain intact - extended the menu-bar fake control-plane client and diagnostics tests so matrix history, charts, and action helpers are covered with repository-owned fixtures
- added a
-
Verification summary for Slice 4:
swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':24 tests passedswift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':24 tests passed
-
Metrics report for Slice 4:
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift: changed-line coverage94.66%(479/506)apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift: changed-line coverage94.64%(618/653)apps/macos-menubar/Tests/MenuBarTests/TestSupport.swift: changed-line coverage99.57%(232/233)apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift: changed-line coverage100.00%(107/107)apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift: changed-line coverage100.00%(226/226)- aggregate changed-line coverage for the executable Window UI scope in Slice 4:
96.35%(1662/1725)
-
Closed the Swift text-worker protocol follow-up discovered during Slice 5 verification:
make swift-testsurfaced that the newRunBenchMatrixworker RPC had been added to the shared maintenance protocol without a matchingMaintenanceRPCServicestub inservices/mlx-text-worker-swift- added a deterministic unimplemented
runBenchMatrixstub to the Swift text worker so the package remains protocol-conformant while matrix execution stays owned by the Python worker family - extended
WorkerScaffoldTestsso the maintenance scaffold now verifies the matrix RPC returns a typed failed job summary instead of silently drifting from the shared protocol
-
Verification summary for the Swift text-worker follow-up:
HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/mlx-text-worker-swift/.build/ModuleCache.noindex" swift test --package-path services/mlx-text-worker-swift --enable-code-coverage --filter WorkerScaffoldTests/testMaintenanceRpcsReturnStructuredUnimplemented:1 test passed- the test run emitted the pre-existing
warning: input verification failednotes while linkingWorkerBootstrap.swift.o; the targeted test still passed
-
Metrics report for the Swift text-worker follow-up:
services/mlx-text-worker-swift/Sources/Core/WorkerServices.swift: changed-line coverage100.00%(14/14)services/mlx-text-worker-swift/Tests/CoreTests/WorkerScaffoldTests.swift: changed-line coverage100.00%(18/18)- aggregate changed-line coverage for the Swift text-worker follow-up scope:
100.00%(32/32)
-
Closed Slice 5, the verification, coverage, and documentation close-out:
- updated
docs/runbooks/m7-benchmark-and-evaluation-foundation.mdso the repository-owned benchmark runbook now documentsbench matrixCLI and Window UI flows alongsidebench runandeval run - reran focused changed-line coverage for the CLI, control-plane, Python worker, Window UI, and Swift text-worker follow-up scopes
- reran repository verification commands after the Swift text-worker protocol follow-up so the transaction closes on a passing
make proto,make py-test,make swift-test, andmake integration-test
- updated
-
Verification summary for Slice 5:
HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter 'MelixCLIParserTests|MelixCLIRunnerTests':55 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests|BenchmarkExportBundleTests|WorkerClientTests|PythonBridgeWorkerClientTests':215 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|ControlPlaneXPCClientTests':168 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/mlx-text-worker-swift/.build/ModuleCache.noindex" swift test --package-path services/mlx-text-worker-swift --enable-code-coverage --filter WorkerScaffoldTests/testMaintenanceRpcsReturnStructuredUnimplemented:1 test passedPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python coverage run --source=services/mlx-worker-python/worker -m pytest services/mlx-worker-python/tests/test_control_plane_bridge_phase5.py services/mlx-worker-python/tests/test_benchmark_schemas.py services/mlx-worker-python/tests/test_benchmark_store.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_maintenance_service.py -q:87 passed in 31.83smake proto: passmake py-test:402 passed in 33.67smake swift-test: passmake integration-test:54 passed in 623.41s (0:10:23)- Swift package verification continued to emit the pre-existing
warning: input verification failednotes while linkingSwiftTextWorkerClient.swift.o,WorkerBootstrap.swift.o, and the menu-bar test objects; the full test run still passed
-
Metrics report for Slice 5:
- CLI executable scope: changed-line coverage
98.58%(969/983) - control-plane executable scope: changed-line coverage
97.20%(797/820) - Window UI executable scope: changed-line coverage
96.50%(1765/1829) - Python worker executable scope: changed-line coverage
100.00%(338/338) - Swift text-worker follow-up scope: changed-line coverage
100.00%(32/32) - aggregate changed-line coverage for the full
bench matrixtransaction executable scope:97.48%(3901/4002)
- CLI executable scope: changed-line coverage
-
Continued the benchmark and evaluation contract expansion transaction with the first two executable slices from
docs/plans/2026-04-03-bench-eval-contract-expansion-implementation.md. -
Closed Task 1, the protocol expansion slice for canonical bench and eval inputs:
- added the canonical bench fields to
RunBenchandRunBenchRequest - added the canonical eval fields to
RunEvaluationandRunEvaluationRequest - regenerated Swift, Python, and descriptor protocol outputs
- added parser and control-plane forwarding tests for the new request surfaces
- added the canonical bench fields to
-
Verification summary for Task 1:
make proto: passswift test --filter MelixCLITests: expected shape-only failures before Task 2 wiringswift test --package-path services/control-plane-swift --filter ControlPlaneServiceTests: expected forwarding failures before Task 2 wiring
-
Metrics report for Task 1:
- changed-line coverage for the touched executable scope:
N/A - reason: the slice was limited to additive protocol surfaces, generated outputs, and shape tests that were intentionally left failing until Task 2 normalization landed
- changed-line coverage for the touched executable scope:
-
Closed Task 2, the canonical bench request normalization slice across CLI and control plane:
- extended
BenchRunOptionsandControlPlaneBenchRequestwith typed canonical fields - normalized repeated context and batch inputs through shared sorted unique helpers
- defaulted bench repeats to
1 - validated
cache_profileagainstcold|warm|partial_prefix - forwarded
reasoning_modeandstructured_output_modethrough the local control-plane client andControlPlaneService - added parser, runner, local-client, and control-plane tests that assert canonical normalization behavior
- extended
-
Verification summary for Task 2:
swift test --enable-code-coverage --filter MelixCLITests:41 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter ControlPlaneServiceTests:115 tests passed- both Swift test bundles emitted the existing linker warning
warning: input verification failedwhile processingSwiftTextWorkerClient.swift.o; the tests still passed and this warning is outside the touched Task 2 scope
-
Metrics report for Task 2:
Sources/MelixCLICore/MelixCLI.swift,tests/MelixCLITests/MelixCLIParserTests.swift, andtests/MelixCLITests/MelixCLIRunnerTests.swift: aggregate changed-line coverage100.00%(66/66)services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift,services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift, andservices/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift: aggregate changed-line coverage100.00%(14/14)
-
Closed the Task 2 follow-up test gap after the main normalization commit:
- added parser coverage for default canonical bench fields and invalid
cache_profilerejection - added CLI runner coverage for normalized bench request forwarding
- updated the control-plane canonical bench forwarding test to prove unsorted context and batch inputs are normalized before worker dispatch
- added parser coverage for default canonical bench fields and invalid
-
Verification summary for the Task 2 follow-up:
swift test --enable-code-coverage --filter MelixCLITests:41 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter ControlPlaneServiceTests:115 tests passed- both Swift test bundles emitted the existing linker warning
warning: input verification failedwhile processingSwiftTextWorkerClient.swift.o; the tests still passed and this warning is outside the touched follow-up scope
-
Metrics report for the Task 2 follow-up:
tests/MelixCLITests/MelixCLIParserTests.swiftandtests/MelixCLITests/MelixCLIRunnerTests.swift: aggregate changed-line coverage100.00%(66/66)services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift: changed-line coverage100.00%(14/14)
-
Closed Task 3, the canonical benchmark sweep, metrics, and export slice in the Python worker:
- expanded the benchmark persistence model to distinguish run summaries, context rows, and batch rows
- persisted canonical bench summary fields including
context_lengths,generation_length,batch_sizes,repeats,cache_profile,reasoning_mode,structured_output_mode,request_p50_ms, andrequest_p95_ms - wrote benchmark summary, context-row, and batch-row artifacts to the per-run output directory and carried those rows into export and submission bundles
- added summary, context, and batch CSV builders for the canonical benchmark export shape
- made text benchmark prompt selection use the resolved suite cases so
sample_sizeand curated prompt sets continue to affect measurements - made text benchmark batch rows truthful on the current runtime path by emitting only real
batch_size == 1rows until the runtime exposes true batch execution support
-
Verification summary for Task 3:
PYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_benchmark_schemas.py services/mlx-worker-python/tests/test_benchmark_export.py -q:71 passed in 33.46s
-
Metrics report for Task 3:
services/mlx-worker-python/worker/engine/maintenance_core.py,services/mlx-worker-python/worker/productization/benchmark_schemas.py,services/mlx-worker-python/worker/productization/benchmark_export.py, andservices/mlx-worker-python/worker/productization/submission_builder.py: aggregate changed-line coverage100.00%(312/312)services/mlx-worker-python/worker/engine/maintenance_core.pyfollow-up delta inf109442: changed-line coverage100.00%(13/13)services/mlx-worker-python/tests/test_maintenance_service.pyfollow-up delta inf109442: changed-line coverage100.00%(4/4)
-
Closed Task 4, the canonical evaluation controls, persistence, and export slice:
- extended evaluation job and result persistence with
few_shot,seed,code_exec_policy,incorrect_count, andduration_seconds - wired
few_shot,seed,scoring_mode, andcode_exec_policythroughevaluation_core.pyand the worker gRPC service - persisted canonical evaluation summary JSON and summary CSV alongside sample CSV and JSONL exports
- extended benchmark export collection with
evaluation_summary_rows - aligned Swift-side evaluation export decoding and CLI export output with the canonical summary-row shape while preserving the old metric-based fallback for historical bundles
- extended evaluation job and result persistence with
-
Verification summary for Task 4:
swift test --enable-code-coverage --filter MelixCLITests:41 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter ControlPlaneServiceTests:115 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter BenchmarkExportBundleTests:8 tests passedPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_benchmark_export.py -q:22 tests passedgit diff --check: pass
-
Metrics report for Task 4:
services/mlx-worker-python/worker/engine/evaluation_core.py,services/mlx-worker-python/worker/grpc_server.py,services/mlx-worker-python/worker/productization/evaluation_schemas.py,services/mlx-worker-python/worker/productization/evaluation_store.py, andservices/mlx-worker-python/worker/productization/benchmark_export.py: aggregate changed-line coverage100.00%(112/112)services/control-plane-swift/Sources/XPCService/BenchmarkExportBundle.swift,services/control-plane-swift/Tests/ControlPlaneTests/BenchmarkExportBundleTests.swift, andtests/MelixCLITests/MelixCLIRunnerTests.swift: aggregate changed-line coverage100.00%(105/105)
-
Closed Task 5, the Window UI productization slice for canonical benchmark and evaluation controls:
- added canonical benchmark controls for context lengths, batch sizes, repeats, cache profile, reasoning mode, and structured output mode
- added canonical evaluation controls for scoring mode and code execution policy alongside the existing few-shot and seed inputs
- wired the new Window UI state through
RuntimeViewModelnormalization helpers and forwarded the canonical request fields to the shared control-plane client - aligned evaluation metric cards with canonical
score_name/score_valuesummary rows and updated diagnostics rendering tests for the new controls - passed reviewer gate with no blocking findings; the only residual risk is that
benchReasoningModeandbenchStructuredOutputModestill rely on Picker-backed valid values instead of explicit enum validation
-
Verification summary for Task 5:
swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|ControlPlaneXPCClientTests':161 tests passed
-
Metrics report for Task 5:
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift,apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift,apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift,apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift, andapps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift: aggregate changed-line coverage99.56%(448/450)
-
Closed Task 6, the verification and documentation close-out slice:
- updated
docs/runbooks/m7-benchmark-and-evaluation-foundation.mdso the canonicalbench/evaloperator and CLI flows are documented in one repository-owned runbook - updated
task_plan.mdso Tasks 5 and 6 are marked completed and the transaction is recorded as closed - reran changed-line coverage for the full touched executable scope from
d1ceaba - reran repository verification before the final documentation commit
- updated
-
Verification summary for Task 6:
HOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache.noindex" swift test --enable-code-coverage --filter MelixCLITests:41 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/services/control-plane-swift/.build/ModuleCache.noindex" swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'ControlPlaneServiceTests|BenchmarkExportBundleTests':123 tests passedHOME="$(pwd)/.swift-home" CLANG_MODULE_CACHE_PATH="$(pwd)/apps/macos-menubar/.build/ModuleCache.noindex" swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|ControlPlaneXPCClientTests':161 tests passedPYTHONPATH="$(pwd):$(pwd)/services/mlx-worker-python" uv run --project services/mlx-worker-python coverage run --source=services/mlx-worker-python/worker -m pytest services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_benchmark_schemas.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_submission_builder.py services/mlx-worker-python/tests/test_release_gates.py -q:101 passed in 30.04smake proto: passmake py-test:391 passed in 30.13smake swift-test: failed outside the touched scope after the protocol package passed;services/mlx-text-worker-swiftexited with unexpected signal11duringWorkerScaffoldTests, and the same run emitted the pre-existingwarning: input verification failednotes while processing.ofiles in that packagemake integration-test:54 passed in 619.54s (0:10:19)
-
Metrics report for Task 6:
Sources/MelixCLICore/MelixCLI.swift,tests/MelixCLITests/MelixCLIParserTests.swift, andtests/MelixCLITests/MelixCLIRunnerTests.swift: aggregate changed-line coverage97.21%(209/215)services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift,services/control-plane-swift/Sources/XPCService/BenchmarkExportBundle.swift,services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift,services/control-plane-swift/Tests/ControlPlaneTests/ControlPlaneServiceTests.swift, andservices/control-plane-swift/Tests/ControlPlaneTests/BenchmarkExportBundleTests.swift: aggregate changed-line coverage99.77%(431/432)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift,apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift,apps/macos-menubar/Tests/MenuBarTests/RuntimeViewModelTests.swift,apps/macos-menubar/Tests/MenuBarTests/DesktopFoundationViewTests.swift, andapps/macos-menubar/Tests/MenuBarTests/ControlPlaneXPCClientTests.swift: aggregate changed-line coverage99.56%(448/450)services/mlx-worker-python/worker/engine/maintenance_core.py,services/mlx-worker-python/worker/productization/benchmark_schemas.py,services/mlx-worker-python/worker/productization/benchmark_export.py,services/mlx-worker-python/worker/productization/submission_builder.py,services/mlx-worker-python/worker/engine/evaluation_core.py,services/mlx-worker-python/worker/grpc_server.py,services/mlx-worker-python/worker/productization/evaluation_schemas.py, andservices/mlx-worker-python/worker/productization/evaluation_store.py: aggregate changed-line coverage99.48%(385/387)- aggregate changed-line coverage for the full touched executable scope in the canonical bench/eval expansion transaction:
99.26%(1473/1484)
-
Converted the canonical benchmark and evaluation contract into an executable implementation plan.
-
Added
docs/plans/2026-04-03-bench-eval-contract-expansion-implementation.mdwith staged tasks for:- protocol expansion
- canonical bench request normalization
- canonical benchmark sweeps, metrics, and CSV export
- canonical eval controls and sample exports
- Window UI productization
- coverage, verification, and documentation closure
-
Reset
task_plan.mdso the repository reflects that the next transaction is implementation execution rather than additional contract definition. -
Verification summary for the implementation plan capture:
git diff --check: pass
-
Metrics report:
- changed-line coverage for the touched executable scope:
N/A - reason: this transaction changes repository documentation only and does not modify executable source files
- changed-line coverage for the touched executable scope:
-
Captured the next-step benchmark and evaluation I/O contract as a canonical repository specification.
-
Added
docs/benchmark-evaluation-contract.mdto define:- the explicit split between
benchandeval - required target selectors, task kinds, normalized inputs, and exportable outputs
- performance summary metrics, context-sweep rows, and batch-sweep rows
- evaluation suite summaries, category breakdowns, and sample-level CSV and JSONL fields
- Window UI and CLI parity requirements
- the explicit split between
-
Updated
docs/README.mdso the benchmark and evaluation contract is listed with the canonical top-level specifications. -
Reset
task_plan.mdfor this docs-only transaction so the repository reflects that the next follow-up after implementation is contract capture rather than another code slice. -
Verification summary for the benchmark and evaluation contract capture:
git diff --check: pass
-
Metrics report:
- changed-line coverage for the touched executable scope:
N/A - reason: this transaction changes repository documentation only and does not modify executable source files
- changed-line coverage for the touched executable scope:
-
Started the benchmark and evaluation redesign follow-up as a new transaction on top of the completed M7 / LoRA / benchmark productization baseline.
-
Added
docs/plans/2026-04-03-benchmark-evaluation-redesign.mdto define the split between:benchfor performance benchmarkingevalfor intelligence evaluation
-
Regenerated the control-plane and worker protocol surfaces so evaluation requests and export bundles now carry typed task and source metadata.
-
Landed the Python worker evaluation productization slice:
- added typed evaluation schemas and per-run persistence helpers
- persisted evaluation jobs, summary results, and sample rows
- extended benchmark export collection so benchmark and evaluation history can be exported from one bundle
- wired evaluation execution and export data into the worker gRPC surface
-
Landed the shared Swift export and control-plane slice:
- added evaluation job, result, sample, and export-bundle decoding to
BenchmarkExportBundle - added typed shared-client request and result models for evaluation runs
- extended
ControlPlaneServicesoops.run_evaluationresolves model or direct Hugging Face repo targets and returns typed job summaries
- added evaluation job, result, sample, and export-bundle decoding to
-
Landed the
melix evalCLI slice:- added parser and runner support for
eval run,eval list,eval export-summary-csv,eval export-samples-csv, andeval export-samples-jsonl - kept
--model-idand--repo-idas mutually exclusive evaluation targets - reused the shared local control-plane client instead of creating a second evaluation-only path
- added parser and runner support for
-
Landed the Window UI evaluation slice:
- added evaluation target selection, suite selection, sample-size, batch-factor, few-shot, and seed controls
- added evaluation history, summary metric cards, and sample previews
- added evaluation export actions for summary CSV, samples CSV, and samples JSONL
-
Verification summary for the benchmark and evaluation redesign:
PYTHONPATH="/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_evaluation_schemas.py services/mlx-worker-python/tests/test_evaluation_store.py services/mlx-worker-python/tests/test_evaluation_core.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_submission_builder.py services/mlx-worker-python/tests/test_benchmark_schemas.py -q:26 passed in 0.15sswift test --enable-code-coverage --filter MelixCLITests:37 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter 'BenchmarkExportBundleTests|ControlPlaneServiceTests':117 tests passedswift test --package-path apps/macos-menubar --enable-code-coverage --filter 'ControlPlaneXPCClientTests|DesktopFoundationViewTests|RuntimeViewModelTests':157 tests passedmake proto: passmake py-test:383 passed in 7.95smake swift-test: failed outside the touched scope becauseservices/mlx-text-worker-swiftexited with unexpected signal11; the evaluation transaction does not touch that workspace
-
Metrics report:
services/mlx-worker-python/worker/engine/evaluation_core.py,services/mlx-worker-python/worker/grpc_server.py,services/mlx-worker-python/worker/productization/benchmark_export.py,services/mlx-worker-python/worker/productization/benchmark_schemas.py,services/mlx-worker-python/worker/productization/evaluation_schemas.py,services/mlx-worker-python/worker/productization/evaluation_store.py,services/mlx-worker-python/worker/productization/submission_builder.py, andservices/mlx-worker-python/worker/productization/__init__.py: aggregate changed-line coverage100.00%(123/123)Sources/MelixCLICore/MelixCLI.swift: changed-line coverage99.56%(226/227)services/control-plane-swift/Sources/XPCService/BenchmarkExportBundle.swiftandservices/control-plane-swift/Sources/XPCService/ControlPlaneService.swift: aggregate changed-line coverage99.14%(231/233)services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift: changed-line coverage100.00%(41/41)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swiftandapps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift: aggregate changed-line coverage95.76%(655/684)- aggregate changed-line coverage for the touched executable Python and Swift scope in this transaction:
97.39%(1276/1308)
-
Reset the active repository task plan from the closed M6 transaction to the M7, LoRA, Benchmark, and CLI productization transaction.
-
Added
docs/plans/2026-04-03-m7-lora-benchmark-cli-productization.mdas the umbrella execution plan for:- shared operator client and
melixCLI exposure - LoRA productization across Window UI and CLI
- real M7 benchmark runner closure
- benchmark UI, visualization, and CSV export
- shared operator client and
-
Updated the execution index so M7 now points at the active umbrella plan and is explicitly tracked as in progress rather than implied complete.
-
Verification summary for the documentation reset:
python3 scripts/python_changed_line_coverage.py:N/A
-
Metrics report:
- changed-line coverage for the touched executable scope:
N/A - reason: this commit records documentation and execution-tracking updates only and does not change executable source files
- changed-line coverage for the touched executable scope:
-
Landed the shared operator client and CLI foundation slice:
- extended
RunBenchwith explicitmodel_idselection and regenerated Swift/Python/descriptors - moved
ControlPlaneXPCClientintoservices/control-plane-swiftso Window UI and CLI can share one local operator client - taught
ControlPlaneServicebenchmark execution to resolve explicit model IDs, lazy-load a text benchmark target, and preserve failed benchmark job summaries in error responses - added the root
melixSwift package products and the first public commands forlora list,lora train,lora activate, andbench run
- extended
-
Verification summary for the shared operator client and CLI foundation:
make proto: passswift test --enable-code-coverage --filter MelixCLITests:18 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter ControlPlaneServiceTests:103 tests passedswift test --package-path apps/macos-menubar --enable-code-coverage --filter ControlPlaneXPCClientTests:21 tests passed
-
Metrics report:
Sources/MelixCLICore/MelixCLI.swift: changed-line coverage99.63%(270/271)services/control-plane-swift/Sources/XPCService/ControlPlaneService.swift: changed-line coverage100.00%(47/47)services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift: changed-line coverage100.00%(374/374)- generated protobuf outputs,
Package.swift, andPackage.resolvedare excluded from changed-line coverage because they are generated or manifest files rather than executable runtime sources
-
Landed the LoRA backend and artifact productization slice:
- added dataset-source resolution for
local_packageandhf_dataset, including Hugging Face materialization into<jobs_root>/datasets/<cache-key> - moved
train_loraandactivate_adapteroutputs to stable per-job paths under<jobs_root>/<operation>/<job_id>/ - persisted dataset provenance, cache metadata, adapter identity, and derived-model linkage into LoRA manifests and registry snapshots
- preserved source adapter job linkage and optional derived-model aliasing in activation manifests
- added dataset-source resolution for
-
Verification summary for the LoRA backend and artifact productization:
PYTHONPATH=/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_lora_model_ops.py services/mlx-worker-python/tests/test_maintenance_service.py -q:51 passed
-
Metrics report:
services/mlx-worker-python/worker/model_ops/training_dataset.py: changed-line coverage96.07%(171/178)services/mlx-worker-python/worker/model_ops/lora_training_pipeline.py: changed-line coverage100.00%(7/7)services/mlx-worker-python/worker/model_ops/adapter_activation_pipeline.py: changed-line coverage100.00%(3/3)services/mlx-worker-python/worker/model_ops/job_registry.py: changed-line coverage100.00%(3/3)services/mlx-worker-python/worker/engine/maintenance_core.py: changed-line coverage100.00%(8/8)
-
Landed the LoRA Window UI and CLI exposure slice:
- extended
melix lora trainso it accepts either--dataset-urior--hf-dataset-path, forwards feature mappings and LoRA hyperparameters, and exposesderived-model-alias,response-only,mask-prompt, andgradient-checkpointing - added Window UI training controls for base-model selection, dataset-source switching, Hugging Face dataset metadata, LoRA hyperparameters, adapter naming, and derived-model aliasing
- added Window UI adapter selection plus activation and publish actions backed by shared control-plane requests instead of hard-coded demo payloads
- refreshed the native operator state so activated derived models re-enter the runtime shell and bench metrics survive the post-activation snapshot refresh
- extended
-
Verification summary for the LoRA Window UI and CLI exposure slice:
swift test --enable-code-coverage --filter MelixCLITests:20 tests passedswift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':116 tests passed
-
Metrics report:
Sources/MelixCLICore/MelixCLI.swift: changed-line coverage100.00%(37/37)apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift: changed-line coverage100.00%(148/148)apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift: changed-line coverage95.03%(172/181)- aggregate changed-line coverage for the executable Swift scope in this slice:
97.54%(357/366)
-
Landed the benchmark core runner slice for M7:
- replaced deterministic text benchmark placeholder metrics with runtime-backed measurements against the selected model runtime
- added lazy benchmark model loading for worker-side runs and persisted benchmark runs under
<jobs_root>/bench/runs/<job_id>/ - kept queue state under
<jobs_root>/bench/queuewhile making export and submission flows recurse across run history for backward compatibility - updated release-gate benchmark evidence to use the runtime-backed benchmark core under deterministic test runtime wiring
-
Verification summary for the benchmark core runner slice:
PYTHONPATH=/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_benchmark_store.py services/mlx-worker-python/tests/test_benchmark_schemas.py services/mlx-worker-python/tests/test_release_gates.py -q:76 passed
-
Metrics report:
services/mlx-worker-python/worker/engine/maintenance_core.py: changed-line coverage100.00%(123/123)services/mlx-worker-python/worker/productization/benchmark_export.py: changed-line coverage100.00%(18/18)services/mlx-worker-python/worker/productization/release_gates.py: changed-line coverage100.00%(3/3)- aggregate changed-line coverage for the executable Python scope in this slice:
100.00%(144/144)
-
Closed M7 with curated Hugging Face benchmark suites:
- added a repository-owned benchmark suite catalog that maps
smokeandlatencyto explicit Hugging Face datasets, splits, and feature mappings - materialized benchmark suites on demand under the shared runtime dataset cache and persisted dataset provenance, cache keys, and cache-hit state into benchmark job manifests
- switched runtime benchmark prompts from synthetic hard-coded strings to prompt batches derived from curated HF-backed dataset rows while preserving queue state and export compatibility
- updated the roadmap execution index to mark M7 completed; benchmark Window UI, CSV, and CLI productization remain active post-M7 work in the same transaction
- added a repository-owned benchmark suite catalog that maps
-
Verification summary for the M7 suite-catalog closure:
PYTHONPATH=/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_benchmark_suites.py services/mlx-worker-python/tests/test_maintenance_service.py services/mlx-worker-python/tests/test_benchmark_export.py services/mlx-worker-python/tests/test_benchmark_store.py services/mlx-worker-python/tests/test_benchmark_schemas.py services/mlx-worker-python/tests/test_release_gates.py -q:80 passed
-
Metrics report:
services/mlx-worker-python/worker/productization/benchmark_suites.py: changed-line coverage93.55%(87/93)services/mlx-worker-python/worker/engine/maintenance_core.py: changed-line coverage100.00%(10/10)services/mlx-worker-python/worker/productization/benchmark_schemas.py: changed-line coverage100.00%(6/6)services/mlx-worker-python/worker/productization/release_gates.py: changed-line coverage100.00%(15/15)- aggregate changed-line coverage for the executable Python scope in this slice:
95.16%(118/124)
-
Landed the benchmark CLI and CSV export closure slice:
- added
ControlPlaneBenchmarkExportBundletoMelixControlPlaneCoreso benchmark history, suite metadata, and CSV rows decode from one shared persisted export format - extended the shared local control-plane client with
ops.export_results, returning typed export-bundle JSON for both native and CLI operator flows - exposed
melix bench listwith human-readable and--jsonhistory output, andmelix bench export-csvfor filtered per-job CSV emission - added targeted coverage for benchmark export decoding fallbacks, deterministic ordering, CSV quoting, and default control-plane export failures
- added
-
Verification summary for the benchmark CLI and CSV export closure:
swift test --enable-code-coverage --filter MelixCLITests:24 tests passedswift test --package-path services/control-plane-swift --enable-code-coverage --filter BenchmarkExportBundleTests:3 tests passedswift test --package-path apps/macos-menubar --enable-code-coverage --filter ControlPlaneXPCClientTests:22 tests passed
-
Metrics report:
Sources/MelixCLICore/MelixCLI.swift: changed-line coverage98.40%(123/125)services/control-plane-swift/Sources/XPCService/BenchmarkExportBundle.swift: changed-line coverage100.00%(163/163)services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift: changed-line coverage96.30%(26/27)- aggregate changed-line coverage for the executable Swift scope in this slice:
99.05%(312/315)
-
Landed the benchmark Window UI visualization closure slice:
- expanded the diagnostics workspace into a benchmark operator surface with explicit model selection, curated suite multi-select, sample-size and batch-factor controls, history refresh, and CSV export
- taught
RuntimeViewModelto derive benchmark history cards, metric pickers, chart points, CSV export state, and history selection from the shared benchmark export bundle - added Window UI rendering for benchmark empty states, persisted history, metric cards, and chart visualization while keeping benchmark actions on shared control-plane truth
- added targeted tests for benchmark guard rails, empty export handling, diagnostics action helpers, and Window UI empty-state plus exported-state rendering
-
Verification summary for the benchmark Window UI visualization closure:
swift test --package-path apps/macos-menubar --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':122 tests passedswift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests':122 tests passed
-
Metrics report:
apps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift: changed-line coverage98.91%(272/275)apps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swift: changed-line coverage92.06%(232/252)- aggregate changed-line coverage for the executable Swift scope in this slice:
95.64%(504/527)
-
Closed the M7, LoRA, Benchmark, and CLI productization transaction:
- stabilized the final Python verification path by replacing the live Hugging Face benchmark-suite fetch in
test_runtime_edges.pywith a deterministic local fake - documented the public
melixCLI LoRA and benchmark flows inREADME.md - updated the LoRA, benchmark, and product-acceptance runbooks so Window UI and CLI workflows now share one repository-owned operator guide
- stabilized the final Python verification path by replacing the live Hugging Face benchmark-suite fetch in
-
Verification summary for the final close-out slice:
make proto: passmake py-test:358 passed in 8.63smake swift-test:175 tests passedmake integration-test:54 passed in 621.74s (0:10:21)PYTHONPATH=/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python uv run --project services/mlx-worker-python pytest services/mlx-worker-python/tests/test_runtime_edges.py -q:22 passed
-
Metrics report:
services/mlx-worker-python/tests/test_runtime_edges.py: changed-line coverage100.00%(24/24)README.md,docs/runbooks/m7-benchmark-and-evaluation-foundation.md,docs/runbooks/phase-8-lora-adapter-workflow.md,docs/runbooks/phase-8-product-acceptance.md,docs/plans/2026-04-03-m7-lora-benchmark-cli-productization.md, andtask_plan.mdare documentation-only and excluded from executable changed-line coverage- aggregate changed-line coverage for the executable touched scope in this slice:
100.00%(24/24)
-
Landed the post-close VLM benchmark compatibility follow-up for Hugging Face direct-repo benchmarking:
- upgraded the worker
mlx-vlmdependency to an upstream commit that includesgemma4 - added a Gemma 4 text-backed compatibility loader in
MLXVLMRuntimefor MLX exports that advertiseimage-text-to-textbut only ship language weights - taught benchmark target import to preserve VLM routing while overriding benchmark task selection to
text-generationwhen multimodal processor files are missing - verified
melix bench run --repo-id unsloth/gemma-4-E4B-it-MLX-8bit --suite smoke --sample-size 1 --batch-factor 1 --jsonagainst the local stack
- upgraded the worker
-
Verification summary for the VLM benchmark compatibility follow-up:
PYTHONPATH=/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python uv run --project services/mlx-worker-python --extra mlx pytest services/mlx-worker-python/tests/test_mlx_vlm_runtime.py services/mlx-worker-python/tests/test_maintenance_service.py -q:51 passedHOME=/Users/ChenYu/Documents/Github/melix/.swift-home CLANG_MODULE_CACHE_PATH=/Users/ChenYu/Documents/Github/melix/.build/ModuleCache.noindex swift test --package-path services/control-plane-swift --scratch-path /tmp/melix-control-plane-build --filter ControlPlaneServiceTests:104 tests passed- live proof benchmark:
bench.smoke.ttft_ms = 2452.66bench.smoke.tokens_per_second = 60.19task_kind = text-generationsource_repo = unsloth/gemma-4-E4B-it-MLX-8bit
-
Metrics report:
- changed-line coverage for the touched executable scope: pending repository-wide coverage regeneration for the active uncommitted working tree
- reason: the benchmark compatibility follow-up was implemented on top of an already-large productization working tree, so a fresh changed-line coverage snapshot still needs to be regenerated before the next commit
-
Regenerated the touched-scope coverage evidence for the still-uncommitted direct-HF and VLM benchmark compatibility working tree:
- fixed
services/control-plane-swift/Tests/WorkerClientTests/PythonBridgeWorkerClientTests.swiftso the process-bridge fixture declares themlxoptional dependency expected by the currentuv run --extra mlxbridge contract - fixed
services/mlx-worker-python/worker/engine/maintenance_core.pybenchmark report rendering so persistedtask_kindfollows the resolved runtime task instead of re-deriving from request defaults - added targeted Python coverage for benchmark suite prompt extraction, task-aware benchmark metrics, direct-VLM registry defaults, and the updated
dev_up.pyuv run --extra mlxinvocation - added targeted Swift coverage for local CLI runtime construction, benchmark export fallbacks, direct-repo request wiring, direct Hugging Face benchmark imports across OCR, VLM, image generation, and image edit families, and Window UI benchmark target selection states
- fixed
-
Verification summary for the coverage-regeneration follow-up:
PYTHONPATH=/Users/ChenYu/Documents/Github/melix:/Users/ChenYu/Documents/Github/melix/services/mlx-worker-python UV_CACHE_DIR=/Users/ChenYu/Documents/Github/melix/.uv-cache uv run --project services/mlx-worker-python --extra mlx coverage run --source=services/mlx-worker-python/worker -m pytest services/mlx-worker-python/tests -q:378 passed in 8.17sHOME=/Users/ChenYu/Documents/Github/melix/.swift-home CLANG_MODULE_CACHE_PATH=/Users/ChenYu/Documents/Github/melix/.build/ModuleCache.noindex swift test --enable-code-coverage --filter MelixCLITests:29 tests passedHOME=/Users/ChenYu/Documents/Github/melix/.swift-home CLANG_MODULE_CACHE_PATH=/Users/ChenYu/Documents/Github/melix/.build/ModuleCache.noindex swift test --package-path services/control-plane-swift --enable-code-coverage --filter 'BenchmarkExportBundleTests|ControlPlaneServiceTests|PythonBridgeWorkerClientTests|OnDemandModelLoaderTests':165 tests passedHOME=/Users/ChenYu/Documents/Github/melix/.swift-home CLANG_MODULE_CACHE_PATH=/Users/ChenYu/Documents/Github/melix/.build/ModuleCache.noindex swift test --package-path apps/macos-menubar --enable-code-coverage --filter 'RuntimeViewModelTests|DesktopFoundationViewTests|ControlPlaneXPCClientTests':151 tests passed
-
Metrics report:
services/mlx-worker-python/worker/engine/maintenance_core.py,services/mlx-worker-python/worker/model_ops/hub_catalog.py,services/mlx-worker-python/worker/model_registry/catalog.py,services/mlx-worker-python/worker/productization/benchmark_schemas.py,services/mlx-worker-python/worker/productization/benchmark_suites.py,services/mlx-worker-python/worker/registry.py,services/mlx-worker-python/worker/runtime/vision_family_adapters.py, andservices/mlx-worker-python/worker/runtime/mlx_vlm_runtime.py: aggregate changed-line coverage97.07%(265/273)Sources/MelixCLICore/MelixCLI.swift: changed-line coverage100.00%(61/61)services/control-plane-swift/Sources/WorkerClient/OnDemandModelLoader.swift,services/control-plane-swift/Sources/WorkerClient/PythonBridgeWorkerClient.swift,services/control-plane-swift/Sources/XPCService/BenchmarkExportBundle.swift, andservices/control-plane-swift/Sources/XPCService/ControlPlaneService.swift: aggregate changed-line coverage94.25%(410/435)services/control-plane-swift/Sources/XPCService/ControlPlaneXPCClient.swift: changed-line coverage100.00%(2/2) measured from the Window UI test binary because the consumer tests live inapps/macos-menubarapps/macos-menubar/Sources/AppMain/Dashboard/DesktopWorkspaceShellView.swiftandapps/macos-menubar/Sources/AppMain/Models/RuntimeViewModel.swift: aggregate changed-line coverage97.93%(189/193)- aggregate changed-line coverage for the touched executable Swift scope:
95.80%(662/691) - aggregate changed-line coverage for the touched executable Python and Swift scope:
96.16%(927/964) Makefile, protocol schemas, generated protobuf outputs,packages/protocol/descriptors/melix.pb,services/mlx-worker-python/pyproject.toml,uv.lock, andscripts/dev_up.pyare excluded from executable changed-line coverage because they are generated, manifest, or non-measurable support-file changes in this transaction
- Reviewed
docs/superpowers/plans/2026-03-31-m7-3-m7-5-benchmark-eval-foundation.mdand corrected the plan steps for:- deterministic evaluation accuracy calculation
handleRunEvaluationreply wiring soevaluationResultsis returned together withevaluationJob- evaluation artifact persistence on a fresh
jobs_root - touched-scope coverage commands so benchmark persistence paths are included
- Verification summary for the M7.3-M7.5 plan update:
make proto: passpytesttouched-scope Python suite:50 passed- scratch-path Swift test for
ControlPlaneServiceTests/executeHandlesOpsRunEvaluationThroughTheModelOperationsWorker: pass
- Metrics report:
- changed-line coverage for the touched Python scope:
N/A - reason: the current uncommitted change set for this review transaction is documentation-only, so
scripts/python_changed_line_coverage.pyreportedTOTAL 100.00% 0/0and exited non-zero because there were no measurable changed Python lines
- changed-line coverage for the touched Python scope:
- Audited M6 implementation against child plans.
- Confirmed Python quantization benchmark, gate, and focused test suite pass with explicit
PYTHONPATH. - Identified remaining work for M6 closure:
- benchmark evidence gap for active KV and sparse prefill
- runbook gap for sparse-prefill verification
- lock-scope semantics gap for family or protected-scope conflicts
- Added
docs/plans/2026-03-31-m6-completion-closure.md. - Added
docs/runbooks/m6-acceleration-benchmarks.md. - Added Python tests for:
- linked quantized-artifact upload conflict locking
- sparse-prefill metrics exposure in
phase2_metrics_report.py - sparse-prefill probe collection in the Phase 2 direct worker report
- Updated quantization manifests to carry
protected_scopemetadata. - Updated upload conflict locking to use linked quantization identity before falling back to raw artifact paths.
- Extended
scripts/phase2_metrics_report.pywith aprefill_sparseprobe and sparse-prefill counters in the output. - Verification summary:
pytestfocused M6 Python suite:39 passedscripts/quantization_benchmarks.py --json:profile_count = 7,smoke_pass_rate = 100.0scripts/quantization_release_gate.py --json:passed = truescripts/phase5_model_ops_metrics.py:quantize job_ms=0.965,artifact_bytes=670,manifest_bytes=1923- live
make phase2-metrics --jsonwithMELIX_RUNTIME_DIR=.runtime/m6-phase2:decode_active_kv_quantized.active_kv_quantization_ratio = 25decode_active_kv_quantized.tokens_per_second = 41.22prefill_sparse.sparse_prefill_accepted_skip_count = 1prefill_sparse.accelerated_prefill_gain_pct = 83
- Committed M6 closure as
2f270b9(feat: close m6 acceleration completion gaps). - Began M7 with
docs/plans/2026-03-31-m7-1-m7-2-benchmark-schema-foundation.md. - Landed initial M7 foundation changes in the working tree:
- typed benchmark and evaluation schema messages in control-plane proto
- Python benchmark schema helpers under
worker/productization/benchmark_schemas.py - release-gate benchmark evidence now carries structured
jobandresults - control-plane
ops.run_benchnow assembles typed benchmark job and result payloads
- Verification so far for M7 foundation:
services/mlx-worker-python/tests/test_benchmark_schemas.py: passservices/mlx-worker-python/tests/test_release_gates.py: pass- scratch-path Swift test for
ControlPlaneServiceTests/executeHandlesOpsRunBenchThroughTheModelOperationsWorker: still compiling or pending final result at handoff time