Skip to content

Commit 65529b0

Browse files
authored
feat: Support user-defined is_ready() in Python backend readiness checks (#430)
1 parent bfc93fa commit 65529b0

9 files changed

Lines changed: 501 additions & 10 deletions

File tree

README.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ any C++ code.
5252
- [Async Execute](#async-execute)
5353
- [Request Rescheduling](#request-rescheduling)
5454
- [`finalize`](#finalize)
55+
- [`is_ready`](#is_ready)
5556
- [Model Config File](#model-config-file)
5657
- [Inference Request Parameters](#inference-request-parameters)
5758
- [Inference Response Parameters](#inference-response-parameters)
@@ -367,9 +368,25 @@ class TritonPythonModel:
367368
"""
368369
print('Cleaning up...')
369370

371+
def is_ready(self):
372+
"""`is_ready` is called whenever the model readiness is checked
373+
via the health endpoint (v2/models/<model>/ready). Implementing
374+
`is_ready` is optional. If not implemented, the model is
375+
considered ready as long as the stub process is healthy. This
376+
function must return a boolean value. Both sync and async
377+
implementations are supported.
378+
379+
Returns
380+
-------
381+
bool
382+
True if the model is ready to serve inference requests,
383+
False otherwise.
384+
"""
385+
return True
386+
370387
```
371388

372-
Every Python backend can implement four main functions:
389+
Every Python backend can implement the following main functions:
373390

374391
### `auto_complete_config`
375392

@@ -748,6 +765,57 @@ class TritonPythonModel:
748765
Implementing `finalize` is optional. This function allows you to do any clean
749766
ups necessary before the model is unloaded from Triton server.
750767

768+
### `is_ready`
769+
770+
Implementing `is_ready` is optional. When defined, this function is invoked whenever the model’s readiness is verified through the
771+
`v2/models/<model>/ready` health endpoint. It must return a **boolean** value
772+
(`True` or `False`). Both synchronous and asynchronous (`async def`)
773+
implementations are supported.
774+
775+
Common use cases include:
776+
777+
- **External dependency checks** — Verify that required databases, remote APIs, feature stores, or downstream services are reachable before accepting requests.
778+
- **Lazy resource loading** — Return `False` until model weights or large artifacts being downloaded or initialized in the background are fully available.
779+
- **Graceful drain** — Use an external signal (such as a file flag, environment variable, or admin endpoint) to mark the model as not ready, allowing orchestrators like Kubernetes to stop routing traffic before shutdown or maintenance.
780+
- **Internal state validation** — Confirm that caches, connection pools, and other runtime state required for inference are healthy.
781+
782+
If `is_ready` is not implemented, the model is considered ready as long as the stub process remains healthy (the default behavior). In this case, no IPC overhead is incurred.
783+
784+
When `is_ready` is implemented, a readiness check timeout of five seconds is enforced. If the function fails to return within this period, the model is reported as not ready for that check. Only one internal readiness IPC call is executed per model instance at a given time. Concurrent readiness requests wait for the ongoing call to complete and reuse its result.
785+
786+
**Note:** The `is_ready` function should be kept as lightweight and efficient as possible. It shares an internal message queue with BLS decoupled response delivery. Although a slow readiness check does not affect standard (non‑decoupled) inference directly, it can delay the delivery of BLS decoupled streaming responses while both requests are processed. Avoid blocking operations such as long-running network calls or heavy computations inside this function.
787+
788+
```python
789+
import triton_python_backend_utils as pb_utils
790+
791+
792+
class TritonPythonModel:
793+
def initialize(self, args):
794+
# Load model resources, establish connections, etc.
795+
self.resource = connect_to_resource()
796+
797+
def is_ready(self):
798+
# Perform custom readiness checks such as verifying
799+
# that dependent resources are available.
800+
return self.resource.is_available()
801+
802+
def execute(self, requests):
803+
...
804+
805+
def finalize(self):
806+
self.resource.close()
807+
```
808+
809+
An asynchronous implementation is also supported:
810+
811+
```python
812+
class TritonPythonModel:
813+
async def is_ready(self):
814+
status = await self.check_dependency_health()
815+
return status.ok
816+
...
817+
```
818+
751819
You can look at the [add_sub example](examples/add_sub/model.py) which contains
752820
a complete example of implementing all these functions for a Python model
753821
that adds and subtracts the inputs given to it. After implementing all the

src/ipc_message.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -68,7 +68,8 @@ typedef enum PYTHONSTUB_commandtype_enum {
6868
PYTHONSTUB_UnloadModelRequest,
6969
PYTHONSTUB_ModelReadinessRequest,
7070
PYTHONSTUB_IsRequestCancelled,
71-
PYTHONSTUB_CancelBLSInferRequest
71+
PYTHONSTUB_CancelBLSInferRequest,
72+
PYTHONSTUB_UserModelReadinessRequest
7273
} PYTHONSTUB_CommandType;
7374

7475
///

src/pb_stub.cc

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -595,6 +595,10 @@ Stub::Initialize(bi::managed_external_buffer::handle_t map_handle)
595595
model_instance_.attr("initialize")(model_config_params);
596596
}
597597

598+
// Cache whether is_ready() function defined in the Python model.
599+
ipc_control_->stub_has_user_model_readiness_fn =
600+
py::hasattr(model_instance_, "is_ready");
601+
598602
initialized_ = true;
599603
}
600604

@@ -1350,6 +1354,9 @@ Stub::ParentToStubMQMonitor()
13501354
case PYTHONSTUB_CommandType::PYTHONSTUB_InferStreamExecResponse: {
13511355
ProcessBLSResponseDecoupled(ipc_message);
13521356
} break;
1357+
case PYTHONSTUB_CommandType::PYTHONSTUB_UserModelReadinessRequest: {
1358+
ProcessUserModelReadinessRequest(ipc_message);
1359+
} break;
13531360
default:
13541361
break;
13551362
}
@@ -1573,6 +1580,101 @@ Stub::ProcessBLSResponseDecoupled(std::unique_ptr<IPCMessage>& ipc_message)
15731580
}
15741581
}
15751582

1583+
void
1584+
Stub::ProcessUserModelReadinessRequest(std::unique_ptr<IPCMessage>& ipc_message)
1585+
{
1586+
AllocatedSharedMemory<UserModelReadinessMessage> readiness_message;
1587+
UserModelReadinessMessage* readiness_payload = nullptr;
1588+
try {
1589+
readiness_message =
1590+
shm_pool_->Load<UserModelReadinessMessage>(ipc_message->Args());
1591+
readiness_payload = readiness_message.data_.get();
1592+
}
1593+
catch (const PythonBackendException& pb_exception) {
1594+
LOG_ERROR << "Failed to process model readiness request: "
1595+
<< pb_exception.what();
1596+
return;
1597+
}
1598+
1599+
if (ipc_message->ResponseMutex() == nullptr) {
1600+
LOG_ERROR << "Failed to process model readiness request";
1601+
return;
1602+
}
1603+
1604+
bool is_ready = true;
1605+
bool function_exists = false;
1606+
bool has_exception = false;
1607+
std::string error_string;
1608+
1609+
try {
1610+
py::gil_scoped_acquire acquire;
1611+
1612+
function_exists = py::hasattr(model_instance_, "is_ready");
1613+
if (!function_exists) {
1614+
is_ready = true;
1615+
} else {
1616+
py::object result = model_instance_.attr("is_ready")();
1617+
1618+
bool is_coroutine = py::module::import("asyncio")
1619+
.attr("iscoroutine")(result)
1620+
.cast<bool>();
1621+
if (is_coroutine) {
1622+
result = RunCoroutine(result, false /* in_background */);
1623+
}
1624+
1625+
if (!py::isinstance<py::bool_>(result)) {
1626+
throw PythonBackendException("is_ready() must return a boolean value");
1627+
}
1628+
1629+
is_ready = result.cast<bool>();
1630+
}
1631+
}
1632+
catch (const PythonBackendException& pb_exception) {
1633+
has_exception = true;
1634+
error_string = pb_exception.what();
1635+
}
1636+
catch (const py::error_already_set& error) {
1637+
has_exception = true;
1638+
error_string = error.what();
1639+
}
1640+
1641+
// Populate response payload
1642+
readiness_payload->function_exists = function_exists;
1643+
readiness_payload->is_ready = has_exception ? false : is_ready;
1644+
readiness_payload->has_error = has_exception;
1645+
readiness_payload->is_error_set = false;
1646+
readiness_payload->error = 0;
1647+
1648+
if (has_exception) {
1649+
std::unique_ptr<PbString> error_string_shm;
1650+
LOG_IF_EXCEPTION(
1651+
error_string_shm = PbString::Create(shm_pool_, error_string));
1652+
if (error_string_shm != nullptr) {
1653+
readiness_payload->is_error_set = true;
1654+
readiness_payload->error = error_string_shm->ShmHandle();
1655+
}
1656+
}
1657+
1658+
// Signal parent process that response is ready
1659+
{
1660+
bi::scoped_lock<bi::interprocess_mutex> lock{
1661+
*(ipc_message->ResponseMutex())};
1662+
readiness_payload->waiting_on_stub = true;
1663+
ipc_message->ResponseCondition()->notify_all();
1664+
1665+
// Wait for parent ack with timeout to avoid deadlock
1666+
boost::posix_time::ptime timeout =
1667+
boost::get_system_time() +
1668+
boost::posix_time::milliseconds(kUserModelReadinessTimeoutMs);
1669+
while (readiness_payload->waiting_on_stub) {
1670+
if (!ipc_message->ResponseCondition()->timed_wait(lock, timeout)) {
1671+
readiness_payload->waiting_on_stub = false;
1672+
break;
1673+
}
1674+
}
1675+
}
1676+
}
1677+
15761678
PYBIND11_EMBEDDED_MODULE(c_python_backend_utils, module)
15771679
{
15781680
py::class_<PbError, std::shared_ptr<PbError>> triton_error(

src/pb_stub.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -265,6 +265,11 @@ class Stub {
265265
/// Get the CUDA memory pool address from the parent process.
266266
void GetCUDAMemoryPoolAddress(std::unique_ptr<IPCMessage>& ipc_message);
267267

268+
/// Calls the user's is_ready() Python method and returns its response
269+
/// when handling model readiness check requests.
270+
void ProcessUserModelReadinessRequest(
271+
std::unique_ptr<IPCMessage>& ipc_message);
272+
268273
private:
269274
bi::interprocess_mutex* stub_mutex_;
270275
bi::interprocess_condition* stub_cond_;

src/pb_utils.h

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
//
33
// Redistribution and use in source and binary forms, with or without
44
// modification, are permitted provided that the following conditions
@@ -53,6 +53,10 @@ namespace triton { namespace backend { namespace python {
5353

5454
namespace bi = boost::interprocess;
5555

56+
// Timeout for user-defined model readiness requests
57+
// and mutex locks (in milliseconds)
58+
constexpr uint64_t kUserModelReadinessTimeoutMs = 5000;
59+
5660
#define STUB_SET_RESPONSE_ERROR_IF_ERROR(SHM_POOL, RESPONSE, R, X) \
5761
do { \
5862
try { \
@@ -141,6 +145,7 @@ struct IPCControlShm {
141145
bool parent_health;
142146
bool uses_env;
143147
bool decoupled;
148+
bool stub_has_user_model_readiness_fn;
144149
bi::interprocess_mutex parent_health_mutex;
145150
bi::interprocess_mutex stub_health_mutex;
146151
bi::managed_external_buffer::handle_t stub_message_queue;
@@ -226,6 +231,14 @@ struct ModelLoaderMessage : SendMessageBase {
226231
bool is_model_ready;
227232
};
228233

234+
struct UserModelReadinessMessage : SendMessageBase {
235+
bool is_ready;
236+
bool function_exists;
237+
bool has_error;
238+
bool is_error_set;
239+
bi::managed_external_buffer::handle_t error;
240+
};
241+
229242
struct ResponseSenderBase {
230243
bi::interprocess_mutex mu;
231244
bi::interprocess_condition cv;

0 commit comments

Comments
 (0)