fix: /v2/health/ready returns 200 when Python backend stub is dead (#8604)#473
Conversation
Under strict_readiness_=true, IsReady() only checked lifecycle state (ModelStates()) but not runtime backend readiness. This meant that /v2/health/ready returned 200 even when a model's backend had crashed (e.g., Python backend stub process died), because the lifecycle still showed READY. Add a runtime readiness check via ModelIsReady() for models in READY lifecycle state. ModelIsReady() calls model->IsReady() which invokes TRITONBACKEND_ModelInstanceReady — in the Python backend this calls StubActive() -> waitpid(WNOHANG), a non-blocking microsecond-level check. This ensures /v2/health/ready correctly reflects backend health, enabling orchestrators (K8s, etc.) to detect and restart unhealthy pods. Fixes: triton-inference-server/server#8604
There was a problem hiding this comment.
Pull request overview
This pull request enhances the strict readiness check for Triton Inference Server by adding runtime backend health verification. Under strict_readiness_=true, the IsReady() function previously only checked the lifecycle state (ModelStates()) but not runtime backend readiness. This meant /v2/health/ready would return 200 even when a model's backend had crashed (e.g., Python backend stub process died) because the lifecycle still showed READY.
Changes:
- Added runtime backend readiness check via
ModelIsReady()for models in READY lifecycle state within the strict readiness evaluation - Enhanced logging to report when models fail runtime readiness checks despite being in READY lifecycle state
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
LGTM, @yinggeh please take a look at this as well. Thanks. |
|
I thought we already fixed this bug in triton-inference-server/python_backend#423. @pskiran1 Can you clarify? |
|
Please fill out the template pull_request_template_external_contrib.md |
Co-authored-by: J Wyman <jeremy.wyman@outlook.com>
OK, I updated my PR description based on pull_request_template_external_contrib.md template. |
|
@itsnothuy, I've not seen your signed Contributor License Agreement yet. Have you submitted it or are you covered as part of your employer or university? |
Mb, I forgot. Just signed it and submit them via email. |
@yinggeh, this change targets the server-level health endpoint to report accurate status on |
yinggeh
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the contribution
|
Merged. Thank you very much for your contribution. 🎉 |
Gladly!! |
Checklist
Agreement
<commit_type>: <Title>pre-commit install, pre-commit run --all)Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
src/server.cc, functionInferenceServer::IsReady()— the only changed file. Look for the new block after the existingModelReadyState::READY/"unloaded"check.Test plan:
--strict-readinessand a Python backend model/v2/health/readyreturns 200kill -9 <stub_pid>)/v2/health/readynow returns 503Caveats:
strict_readiness_mode (the default). Non-strict mode is unchanged.ModelIsReady()calls during health checks, butTRITONBACKEND_ModelInstanceReadyis designed to be lightweight (Python backend checksStubActive()viawaitpid(WNOHANG), a non-blocking syscall ~1–2µs). Health probes typically run every 5–30s.TRITONBACKEND_ModelInstanceReadyare unaffected — the function pointer isnullptrandIsReady()returnsStatus::Successby default.Background
edit (need maintainers' clarification):
The per-model endpoint
/v2/models/{model}/readyalready correctly returns 503 whena stub dies — it calls through
ModelIsReady()→Model::IsReady()→TritonModelInstance::IsReady()→TRITONBACKEND_ModelInstanceReady()→StubActive().The server-level endpoint
/v2/health/readydoes not, becauseInferenceServer::IsReady()only checksModelStates()— a lifecycle enum set atload time that is never updated when a backend fails at runtime.
Call chain comparison:
The fix adds the missing
ModelIsReady()call insideIsReady()so the server-levelhealth endpoint also checks runtime backend health.
/v2/models/{model}/ready/v2/health/readyRelated
v2/healthstill return 200 OK server#7230