Skip to content

fix: disable continuous_batching and harden smoke tests#25

Open
weklund wants to merge 2 commits intomainfrom
fix/disable-continuous-batching
Open

fix: disable continuous_batching and harden smoke tests#25
weklund wants to merge 2 commits intomainfrom
fix/disable-continuous-batching

Conversation

@weklund
Copy link
Copy Markdown
Owner

@weklund weklund commented Apr 3, 2026

Summary

Cherry-picked from the now-closed #23 (stale branch). Contains the only useful changes that weren't already on main.

  • Disable continuous_batching vllm flag in stack_init.py and onboarding.py with TODO(#17) — workaround for waybarrios/vllm-mlx#211 (missing return in load_model_with_fallback)
  • Add inference warmup request after health check in test ServiceManager.start_vllm() — absorbs model weight loading / JIT compilation latency so test inference timeouts measure actual generation
  • Add time.sleep(1) + orphan process cleanup in ServiceManager.stop_all() to reduce port-still-bound flakiness
  • Skip own PID in kill_processes_on_port() to avoid killing pytest itself
  • Add openai>=2.30.0 to dev dependencies (needed for harness tests)

Test plan

  • make lint — all checks passed
  • make test — 1480 passed, 89% coverage
  • make test-catalog — 121 passed

🤖 Generated with Claude Code

weklund and others added 2 commits April 3, 2026 10:33
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant