Conversation
…rottling Add per-request stream_interval (token-count) and stream_interval_ms (time-based, in milliseconds) to the OpenAI API's StreamOptions. This allows clients to control streaming granularity per request without changing server config. - Add stream_interval and stream_interval_ms fields to StreamOptions in the OpenAI-compatible API protocol - Add stream_interval_ms global config to SchedulerConfig with --stream-interval-ms CLI argument (default 0, disabled) - Add optional stream_interval and stream_interval_ms fields to SamplingParams for per-request overrides - Inject stream_options values into SamplingParams in both chat completion and completion serving layers - Implement "whichever first" emission logic in OutputProcessor: when both token-count and time-based intervals are configured, emit on whichever threshold is reached first - Per-request values override global config; if not specified, fall back to global defaults (stream every 1 token, no time limit) Signed-off-by: Quan Truong <quan@deepinfra.com>
…hrottling Add three tests for the stream interval feature: - test_stream_interval_ms: validates time-based throttling at global level - test_per_request_stream_interval_override: validates per-request token-count override via SamplingParams - test_both_intervals_whichever_first: validates combined token+time "whichever first" emission logic Signed-off-by: Quan Truong <quan@deepinfra.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.