Skip to content

feat: Enforce max_inflight_requests as a global per-step limit across ensemble requests#482

Open
pskiran1 wants to merge 4 commits intomainfrom
spolisetty/tri-732-maximum-inflight-requests-from-request-context-ensemble
Open

feat: Enforce max_inflight_requests as a global per-step limit across ensemble requests#482
pskiran1 wants to merge 4 commits intomainfrom
spolisetty/tri-732-maximum-inflight-requests-from-request-context-ensemble

Conversation

@pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Mar 18, 2026

This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step cap across all concurrent ensemble requests (instead of being enforced per ensemble request), helping bound memory growth when upstream steps outpace downstream consumption.

Changes:

  • Move per-step in-flight limiters to be owned globally by the ensemble model (EnsembleInfo) and shared across all EnsembleContexts.
  • Update the limiter API from “wait + increment/decrement” to an acquire/release slot model, and wire it into scheduling and completion paths.
  • Updates step dispatch to Acquire() a limiter slot before InferAsync() and Release() it on final response completion (or on scheduling failure).

CI and Doc: triton-inference-server/server#8707
Doc: triton-inference-server/common#152

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step cap across all concurrent ensemble requests (instead of being enforced per ensemble request), helping bound memory growth when upstream steps outpace downstream consumption.

Changes:

  • Move per-step in-flight limiters to be owned globally by the ensemble model (EnsembleInfo) and shared across all EnsembleContexts.
  • Update the limiter API from “wait + increment/decrement” to an acquire/release slot model, and wire it into scheduling and completion paths.
  • Update configuration/comment semantics to reflect the new global behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/ensemble_scheduler/ensemble_scheduler.h Updates docs for max_inflight_requests semantics and adds storage for per-step global limiters.
src/ensemble_scheduler/ensemble_scheduler.cc Implements global limiter allocation and integrates acquire/release into step scheduling and completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step limit shared across all concurrent ensemble requests for a given ensemble model, instead of being enforced per-ensemble-request.

Changes:

  • Introduces a per-step global limiter (StepInflightRequestLimiter) stored on EnsembleInfo, allocated once per step when max_inflight_requests > 0.
  • Updates step dispatch to Acquire() a limiter slot before InferAsync() and Release() it on final response completion (or on scheduling failure).
  • Removes the per-EnsembleContext limiter instances and adjusts counter decrement logic to avoid underflow when a step is not actually scheduled.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/ensemble_scheduler/ensemble_scheduler.h Adds the limiter type and stores one limiter per ensemble step on EnsembleInfo to make the limit global across ensemble requests.
src/ensemble_scheduler/ensemble_scheduler.cc Moves limiter implementation, wires Acquire/Release into scheduling and completion paths, and removes per-context limiter initialization/usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: feat A new feature

Development

Successfully merging this pull request may close these issues.

2 participants