Conversation
There was a problem hiding this comment.
Pull request overview
This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step cap across all concurrent ensemble requests (instead of being enforced per ensemble request), helping bound memory growth when upstream steps outpace downstream consumption.
Changes:
- Move per-step in-flight limiters to be owned globally by the ensemble model (
EnsembleInfo) and shared across allEnsembleContexts. - Update the limiter API from “wait + increment/decrement” to an acquire/release slot model, and wire it into scheduling and completion paths.
- Update configuration/comment semantics to reflect the new global behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/ensemble_scheduler/ensemble_scheduler.h |
Updates docs for max_inflight_requests semantics and adds storage for per-step global limiters. |
src/ensemble_scheduler/ensemble_scheduler.cc |
Implements global limiter allocation and integrates acquire/release into step scheduling and completion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step limit shared across all concurrent ensemble requests for a given ensemble model, instead of being enforced per-ensemble-request.
Changes:
- Introduces a per-step global limiter (
StepInflightRequestLimiter) stored onEnsembleInfo, allocated once per step whenmax_inflight_requests > 0. - Updates step dispatch to
Acquire()a limiter slot beforeInferAsync()andRelease()it on final response completion (or on scheduling failure). - Removes the per-
EnsembleContextlimiter instances and adjusts counter decrement logic to avoid underflow when a step is not actually scheduled.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/ensemble_scheduler/ensemble_scheduler.h |
Adds the limiter type and stores one limiter per ensemble step on EnsembleInfo to make the limit global across ensemble requests. |
src/ensemble_scheduler/ensemble_scheduler.cc |
Moves limiter implementation, wires Acquire/Release into scheduling and completion paths, and removes per-context limiter initialization/usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
This PR changes ensemble scheduling so
max_inflight_requestsis enforced as a global per-step cap across all concurrent ensemble requests (instead of being enforced per ensemble request), helping bound memory growth when upstream steps outpace downstream consumption.Changes:
EnsembleInfo) and shared across allEnsembleContexts.Acquire()a limiter slot beforeInferAsync()andRelease()it on final response completion (or on scheduling failure).CI and Doc: triton-inference-server/server#8707
Doc: triton-inference-server/common#152