feat: Enforce `max_inflight_requests` as a global per-step limit across ensemble requests by pskiran1 · Pull Request #482 · triton-inference-server/core

pskiran1 · 2026-03-18T12:31:53Z

This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step cap across all concurrent ensemble requests (instead of being enforced per ensemble request), helping bound memory growth when upstream steps outpace downstream consumption.

Changes:

Move per-step in-flight limiters to be owned globally by the ensemble model (EnsembleInfo) and shared across all EnsembleContexts.
Update the limiter API from “wait + increment/decrement” to an acquire/release slot model, and wire it into scheduling and completion paths.
Updates step dispatch to Acquire() a limiter slot before InferAsync() and Release() it on final response completion (or on scheduling failure).

CI and Doc: triton-inference-server/server#8707
Doc: triton-inference-server/common#152

Copilot

Pull request overview

This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step cap across all concurrent ensemble requests (instead of being enforced per ensemble request), helping bound memory growth when upstream steps outpace downstream consumption.

Changes:

Move per-step in-flight limiters to be owned globally by the ensemble model (EnsembleInfo) and shared across all EnsembleContexts.
Update the limiter API from “wait + increment/decrement” to an acquire/release slot model, and wire it into scheduling and completion paths.
Update configuration/comment semantics to reflect the new global behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/ensemble_scheduler/ensemble_scheduler.h`	Updates docs for `max_inflight_requests` semantics and adds storage for per-step global limiters.
`src/ensemble_scheduler/ensemble_scheduler.cc`	Implements global limiter allocation and integrates acquire/release into step scheduling and completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/ensemble_scheduler/ensemble_scheduler.h

src/ensemble_scheduler/ensemble_scheduler.cc

Copilot

Pull request overview

This PR changes ensemble scheduling so max_inflight_requests is enforced as a global per-step limit shared across all concurrent ensemble requests for a given ensemble model, instead of being enforced per-ensemble-request.

Changes:

Introduces a per-step global limiter (StepInflightRequestLimiter) stored on EnsembleInfo, allocated once per step when max_inflight_requests > 0.
Updates step dispatch to Acquire() a limiter slot before InferAsync() and Release() it on final response completion (or on scheduling failure).
Removes the per-EnsembleContext limiter instances and adjusts counter decrement logic to avoid underflow when a step is not actually scheduled.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/ensemble_scheduler/ensemble_scheduler.h`	Adds the limiter type and stores one limiter per ensemble step on `EnsembleInfo` to make the limit global across ensemble requests.
`src/ensemble_scheduler/ensemble_scheduler.cc`	Moves limiter implementation, wires `Acquire/Release` into scheduling and completion paths, and removes per-context limiter initialization/usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/ensemble_scheduler/ensemble_scheduler.cc

Update

5ba5964

pskiran1 requested a review from Copilot March 18, 2026 12:31

Copilot started reviewing on behalf of pskiran1 March 18, 2026 12:32 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler.h Show resolved Hide resolved

src/ensemble_scheduler/ensemble_scheduler.cc Outdated Show resolved Hide resolved

src/ensemble_scheduler/ensemble_scheduler.cc Outdated Show resolved Hide resolved

Update

32023d7

pskiran1 requested a review from Copilot March 18, 2026 14:12

Copilot started reviewing on behalf of pskiran1 March 18, 2026 14:12 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler.cc Show resolved Hide resolved

src/ensemble_scheduler/ensemble_scheduler.cc Show resolved Hide resolved

pskiran1 added 2 commits March 18, 2026 15:44

Update

0153fd6

Fix pre-commit

ca3733a

pskiran1 added the PR: feat A new feature label Mar 18, 2026

pskiran1 requested review from mattwittwer, mudit-eng, whoisj and yinggeh March 18, 2026 15:52

pskiran1 marked this pull request as draft March 18, 2026 15:54

pskiran1 mentioned this pull request Mar 19, 2026

max_inflight_requests functionality for whole ensemble (rather than per-request) triton-inference-server/server#8597

Open

pskiran1 marked this pull request as ready for review March 19, 2026 13:21

This was referenced Mar 21, 2026

doc: Enforce max_inflight_requests as a global per-step limit across ensemble requests triton-inference-server/common#152

Open

test: Enforce max_inflight_requests as a global per-step limit across ensemble requests triton-inference-server/server#8707

Open

pskiran1 added documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enforce `max_inflight_requests` as a global per-step limit across ensemble requests#482

feat: Enforce `max_inflight_requests` as a global per-step limit across ensemble requests#482
pskiran1 wants to merge 4 commits intomainfrom
spolisetty/tri-732-maximum-inflight-requests-from-request-context-ensemble

pskiran1 commented Mar 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

pskiran1 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

pskiran1 commented Mar 18, 2026 •

edited

Loading