Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions protobuf/model_config.proto
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright 2018-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -1662,11 +1662,13 @@ message ModelEnsembling

//@@ .. cpp:var:: uint32 max_inflight_requests
//@@
//@@ The maximum number of concurrent inflight requests allowed at each
//@@ ensemble step per inference request. This limit prevents unbounded
//@@ memory growth when ensemble steps produce responses faster than
//@@ downstream steps can consume, e.g. decoupled models.
//@@ Default value is 0, which indicates that no limit is enforced.
//@@ BETA (Subject to change)
//@@ The maximum number of concurrent in-flight requests allowed at each
Comment thread
pskiran1 marked this conversation as resolved.
//@@ ensemble step across all ongoing ensemble requests for this model
//@@ instance. This per-step limit prevents unbounded memory growth when
//@@ ensemble steps produce responses faster than downstream steps can
//@@ consume them (for example, in decoupled models).
//@@ The default value is 0, which indicates that no limit is enforced.
//@@
//@@ Note: Applying this limit may block upstream steps while they wait
//@@ for downstream capacity. This blocking does not cancel or internally
Expand Down
Loading