triton-inference-server
diff --git a/‎README.md‎
Lines changed: 69 additions & 1 deletion b/‎README.md‎
Lines changed: 69 additions & 1 deletion
diff --git a/‎src/ipc_message.h‎
Lines changed: 3 additions & 2 deletions b/‎src/ipc_message.h‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎src/pb_stub.cc‎
Lines changed: 103 additions & 1 deletion b/‎src/pb_stub.cc‎
Lines changed: 103 additions & 1 deletion
diff --git a/‎src/pb_stub.h‎
Lines changed: 6 additions & 1 deletion b/‎src/pb_stub.h‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎src/pb_utils.h‎
Lines changed: 14 additions & 1 deletion b/‎src/pb_utils.h‎
Lines changed: 14 additions & 1 deletion
@@ -52,6 +52,7 @@ any C++ code.
         - [Async Execute](#async-execute)
       - [Request Rescheduling](#request-rescheduling)
     - [`finalize`](#finalize)
+    - [`is_ready`](#is_ready)
   - [Model Config File](#model-config-file)
   - [Inference Request Parameters](#inference-request-parameters)
   - [Inference Response Parameters](#inference-response-parameters)
@@ -367,9 +368,25 @@ class TritonPythonModel:
         """
         print('Cleaning up...')
 
+    def is_ready(self):
+        """`is_ready` is called whenever the model readiness is checked
+        via the health endpoint (v2/models/<model>/ready). Implementing
+        `is_ready` is optional. If not implemented, the model is
+        considered ready as long as the stub process is healthy. This
+        function must return a boolean value. Both sync and async
+        implementations are supported.
+
+        Returns
+        -------
+        bool
+          True if the model is ready to serve inference requests,
+          False otherwise.
+        """
+        return True
+
 ```
 
-Every Python backend can implement four main functions:
+Every Python backend can implement the following main functions:
 
 ### `auto_complete_config`
 
@@ -748,6 +765,57 @@ class TritonPythonModel:
 Implementing `finalize` is optional. This function allows you to do any clean
 ups necessary before the model is unloaded from Triton server.
 
+### `is_ready`
+
+Implementing `is_ready` is optional. When defined, this function is invoked whenever the model’s readiness is verified through the
+`v2/models/<model>/ready` health endpoint. It must return a **boolean** value
+(`True` or `False`). Both synchronous and asynchronous (`async def`)
+implementations are supported.
+
+Common use cases include:
+
+- **External dependency checks** — Verify that required databases, remote APIs, feature stores, or downstream services are reachable before accepting requests.
+- **Lazy resource loading** — Return `False` until model weights or large artifacts being downloaded or initialized in the background are fully available.
+- **Graceful drain** — Use an external signal (such as a file flag, environment variable, or admin endpoint) to mark the model as not ready, allowing orchestrators like Kubernetes to stop routing traffic before shutdown or maintenance.
+- **Internal state validation** — Confirm that caches, connection pools, and other runtime state required for inference are healthy.
+
+If `is_ready` is not implemented, the model is considered ready as long as the stub process remains healthy (the default behavior). In this case, no IPC overhead is incurred.
+
+When `is_ready` is implemented, a readiness check timeout of five seconds is enforced. If the function fails to return within this period, the model is reported as not ready for that check. Only one internal readiness IPC call is executed per model instance at a given time. Concurrent readiness requests wait for the ongoing call to complete and reuse its result.
+
+**Note:** The `is_ready` function should be kept as lightweight and efficient as possible. It shares an internal message queue with BLS decoupled response delivery. Although a slow readiness check does not affect standard (non‑decoupled) inference directly, it can delay the delivery of BLS decoupled streaming responses while both requests are processed. Avoid blocking operations such as long-running network calls or heavy computations inside this function.
+
+```python
+import triton_python_backend_utils as pb_utils
+
+
+class TritonPythonModel:
+    def initialize(self, args):
+        # Load model resources, establish connections, etc.
+        self.resource = connect_to_resource()
+
+    def is_ready(self):
+        # Perform custom readiness checks such as verifying
+        # that dependent resources are available.
+        return self.resource.is_available()
+
+    def execute(self, requests):
+        ...
+
+    def finalize(self):
+        self.resource.close()
+```
+
+An asynchronous implementation is also supported:
+
+```python
+class TritonPythonModel:
+    async def is_ready(self):
+        status = await self.check_dependency_health()
+        return status.ok
+    ...
+```
+
 You can look at the [add_sub example](examples/add_sub/model.py) which contains
 a complete example of implementing all these functions for a Python model
 that adds and subtracts the inputs given to it. After implementing all the
 
@@ -1,4 +1,4 @@
-// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions
@@ -68,7 +68,8 @@ typedef enum PYTHONSTUB_commandtype_enum {
   PYTHONSTUB_UnloadModelRequest,
   PYTHONSTUB_ModelReadinessRequest,
   PYTHONSTUB_IsRequestCancelled,
-  PYTHONSTUB_CancelBLSInferRequest
+  PYTHONSTUB_CancelBLSInferRequest,
+  PYTHONSTUB_UserModelReadinessRequest
 } PYTHONSTUB_CommandType;
 
 ///
 
@@ -1,4 +1,4 @@
-// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions
@@ -595,6 +595,10 @@ Stub::Initialize(bi::managed_external_buffer::handle_t map_handle)
     model_instance_.attr("initialize")(model_config_params);
   }
 
+  // Cache whether is_ready() function defined in the Python model.
+  ipc_control_->stub_has_user_model_readiness_fn =
+      py::hasattr(model_instance_, "is_ready");
+
   initialized_ = true;
 }
 
@@ -1350,6 +1354,9 @@ Stub::ParentToStubMQMonitor()
       case PYTHONSTUB_CommandType::PYTHONSTUB_InferStreamExecResponse: {
         ProcessBLSResponseDecoupled(ipc_message);
       } break;
+      case PYTHONSTUB_CommandType::PYTHONSTUB_UserModelReadinessRequest: {
+        ProcessUserModelReadinessRequest(ipc_message);
+      } break;
       default:
         break;
     }
@@ -1573,6 +1580,101 @@ Stub::ProcessBLSResponseDecoupled(std::unique_ptr<IPCMessage>& ipc_message)
   }
 }
 
+void
+Stub::ProcessUserModelReadinessRequest(std::unique_ptr<IPCMessage>& ipc_message)
+{
+  AllocatedSharedMemory<UserModelReadinessMessage> readiness_message;
+  UserModelReadinessMessage* readiness_payload = nullptr;
+  try {
+    readiness_message =
+        shm_pool_->Load<UserModelReadinessMessage>(ipc_message->Args());
+    readiness_payload = readiness_message.data_.get();
+  }
+  catch (const PythonBackendException& pb_exception) {
+    LOG_ERROR << "Failed to process model readiness request: "
+              << pb_exception.what();
+    return;
+  }
+
+  if (ipc_message->ResponseMutex() == nullptr) {
+    LOG_ERROR << "Failed to process model readiness request";
+    return;
+  }
+
+  bool is_ready = true;
+  bool function_exists = false;
+  bool has_exception = false;
+  std::string error_string;
+
+  try {
+    py::gil_scoped_acquire acquire;
+
+    function_exists = py::hasattr(model_instance_, "is_ready");
+    if (!function_exists) {
+      is_ready = true;
+    } else {
+      py::object result = model_instance_.attr("is_ready")();
+
+      bool is_coroutine = py::module::import("asyncio")
+                              .attr("iscoroutine")(result)
+                              .cast<bool>();
+      if (is_coroutine) {
+        result = RunCoroutine(result, false /* in_background */);
+      }
+
+      if (!py::isinstance<py::bool_>(result)) {
+        throw PythonBackendException("is_ready() must return a boolean value");
+      }
+
+      is_ready = result.cast<bool>();
+    }
+  }
+  catch (const PythonBackendException& pb_exception) {
+    has_exception = true;
+    error_string = pb_exception.what();
+  }
+  catch (const py::error_already_set& error) {
+    has_exception = true;
+    error_string = error.what();
+  }
+
+  // Populate response payload
+  readiness_payload->function_exists = function_exists;
+  readiness_payload->is_ready = has_exception ? false : is_ready;
+  readiness_payload->has_error = has_exception;
+  readiness_payload->is_error_set = false;
+  readiness_payload->error = 0;
+
+  if (has_exception) {
+    std::unique_ptr<PbString> error_string_shm;
+    LOG_IF_EXCEPTION(
+        error_string_shm = PbString::Create(shm_pool_, error_string));
+    if (error_string_shm != nullptr) {
+      readiness_payload->is_error_set = true;
+      readiness_payload->error = error_string_shm->ShmHandle();
+    }
+  }
+
+  // Signal parent process that response is ready
+  {
+    bi::scoped_lock<bi::interprocess_mutex> lock{
+        *(ipc_message->ResponseMutex())};
+    readiness_payload->waiting_on_stub = true;
+    ipc_message->ResponseCondition()->notify_all();
+
+    // Wait for parent ack with timeout to avoid deadlock
+    boost::posix_time::ptime timeout =
+        boost::get_system_time() +
+        boost::posix_time::milliseconds(kUserModelReadinessTimeoutMs);
+    while (readiness_payload->waiting_on_stub) {
+      if (!ipc_message->ResponseCondition()->timed_wait(lock, timeout)) {
+        readiness_payload->waiting_on_stub = false;
+        break;
+      }
+    }
+  }
+}
+
 PYBIND11_EMBEDDED_MODULE(c_python_backend_utils, module)
 {
   py::class_<PbError, std::shared_ptr<PbError>> triton_error(
 
@@ -1,4 +1,4 @@
-// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions
@@ -265,6 +265,11 @@ class Stub {
   /// Get the CUDA memory pool address from the parent process.
   void GetCUDAMemoryPoolAddress(std::unique_ptr<IPCMessage>& ipc_message);
 
+  /// Calls the user's is_ready() Python method and returns its response
+  /// when handling model readiness check requests.
+  void ProcessUserModelReadinessRequest(
+      std::unique_ptr<IPCMessage>& ipc_message);
+
  private:
   bi::interprocess_mutex* stub_mutex_;
   bi::interprocess_condition* stub_cond_;
 
@@ -1,4 +1,4 @@
-// Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// Copyright 2021-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions
@@ -53,6 +53,10 @@ namespace triton { namespace backend { namespace python {
 
 namespace bi = boost::interprocess;
 
+// Timeout for user-defined model readiness requests
+// and mutex locks (in milliseconds)
+constexpr uint64_t kUserModelReadinessTimeoutMs = 5000;
+
 #define STUB_SET_RESPONSE_ERROR_IF_ERROR(SHM_POOL, RESPONSE, R, X) \
   do {                                                             \
     try {                                                          \
@@ -141,6 +145,7 @@ struct IPCControlShm {
   bool parent_health;
   bool uses_env;
   bool decoupled;
+  bool stub_has_user_model_readiness_fn;
   bi::interprocess_mutex parent_health_mutex;
   bi::interprocess_mutex stub_health_mutex;
   bi::managed_external_buffer::handle_t stub_message_queue;
@@ -226,6 +231,14 @@ struct ModelLoaderMessage : SendMessageBase {
   bool is_model_ready;
 };
 
+struct UserModelReadinessMessage : SendMessageBase {
+  bool is_ready;
+  bool function_exists;
+  bool has_error;
+  bool is_error_set;
+  bi::managed_external_buffer::handle_t error;
+};
+
 struct ResponseSenderBase {
   bi::interprocess_mutex mu;
   bi::interprocess_condition cv;