fix: disable continuous_batching and harden smoke tests by weklund · Pull Request #25 · weklund/mlx-stack

weklund · 2026-04-03T14:44:36Z

Summary

Cherry-picked from the now-closed #23 (stale branch). Contains the only useful changes that weren't already on main.

Disable continuous_batching vllm flag in stack_init.py and onboarding.py with TODO(#17) — workaround for waybarrios/vllm-mlx#211 (missing return in load_model_with_fallback)
Add inference warmup request after health check in test ServiceManager.start_vllm() — absorbs model weight loading / JIT compilation latency so test inference timeouts measure actual generation
Add time.sleep(1) + orphan process cleanup in ServiceManager.stop_all() to reduce port-still-bound flakiness
Skip own PID in kill_processes_on_port() to avoid killing pytest itself
Add openai>=2.30.0 to dev dependencies (needed for harness tests)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

weklund and others added 2 commits April 3, 2026 10:33

fix: failing smoke tests

22fa9cf

fix: apply contextlib.suppress to cherry-picked warmup code

e4e64d5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>