Skip to content

fix: /v2/health/ready returns 200 when Python backend stub is dead (#8604)#473

Merged
whoisj merged 5 commits intotriton-inference-server:mainfrom
itsnothuy:fix-8604-server-readiness
Mar 17, 2026
Merged

fix: /v2/health/ready returns 200 when Python backend stub is dead (#8604)#473
whoisj merged 5 commits intotriton-inference-server:mainfrom
itsnothuy:fix-8604-server-readiness

Conversation

@itsnothuy
Copy link
Contributor

@itsnothuy itsnothuy commented Feb 24, 2026

Checklist

  • I have read the Contribution guidelines and signed the Contributor License
    Agreement
  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • I ran pre-commit locally (pre-commit install, pre-commit run --all)
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

src/server.cc, function InferenceServer::IsReady() — the only changed file. Look for the new block after the existing ModelReadyState::READY / "unloaded" check.

Test plan:

  1. Start Triton with --strict-readiness and a Python backend model
  2. Verify /v2/health/ready returns 200
  3. Kill the Python backend stub process (kill -9 <stub_pid>)
  4. Verify /v2/health/ready now returns 503
  5. Existing CI tests pass (pre-commit hooks: clang-format, codespell, copyright)

Caveats:

  • Only affects strict_readiness_ mode (the default). Non-strict mode is unchanged.
  • Adds per-model ModelIsReady() calls during health checks, but TRITONBACKEND_ModelInstanceReady is designed to be lightweight (Python backend checks StubActive() via waitpid(WNOHANG), a non-blocking syscall ~1–2µs). Health probes typically run every 5–30s.
  • Backends that don't implement TRITONBACKEND_ModelInstanceReady are unaffected — the function pointer is nullptr and IsReady() returns Status::Success by default.

Background

edit (need maintainers' clarification):

The per-model endpoint /v2/models/{model}/ready already correctly returns 503 when
a stub dies — it calls through ModelIsReady()Model::IsReady()
TritonModelInstance::IsReady()TRITONBACKEND_ModelInstanceReady()StubActive().

The server-level endpoint /v2/health/ready does not, because
InferenceServer::IsReady() only checks ModelStates() — a lifecycle enum set at
load time that is never updated when a backend fails at runtime.

Call chain comparison:

# /v2/health/ready (BUG — before this fix)
HTTP GET /v2/health/ready
  → TRITONSERVER_ServerIsReady()
    → InferenceServer::IsReady()        ← checks ModelStates() only, never calls ModelIsReady()

# /v2/models/{name}/ready (already correct)
HTTP GET /v2/models/{name}/ready
  → TRITONSERVER_ServerModelIsReady()
    → InferenceServer::ModelIsReady()
      → Model::IsReady()
        → TritonModelInstance::IsReady()
          → TRITONBACKEND_ModelInstanceReady()  ← checks stub health via waitpid()

The fix adds the missing ModelIsReady() call inside IsReady() so the server-level
health endpoint also checks runtime backend health.

Endpoint Before After
/v2/models/{model}/ready 503 (correct) 503 (unchanged)
/v2/health/ready 200 (bug) 503 (fixed)

Related

Under strict_readiness_=true, IsReady() only checked lifecycle state
(ModelStates()) but not runtime backend readiness. This meant that
/v2/health/ready returned 200 even when a model's backend had crashed
(e.g., Python backend stub process died), because the lifecycle still
showed READY.

Add a runtime readiness check via ModelIsReady() for models in READY
lifecycle state. ModelIsReady() calls model->IsReady() which invokes
TRITONBACKEND_ModelInstanceReady — in the Python backend this calls
StubActive() -> waitpid(WNOHANG), a non-blocking microsecond-level
check.

This ensures /v2/health/ready correctly reflects backend health,
enabling orchestrators (K8s, etc.) to detect and restart unhealthy
pods.

Fixes: triton-inference-server/server#8604
Copilot AI review requested due to automatic review settings February 24, 2026 06:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enhances the strict readiness check for Triton Inference Server by adding runtime backend health verification. Under strict_readiness_=true, the IsReady() function previously only checked the lifecycle state (ModelStates()) but not runtime backend readiness. This meant /v2/health/ready would return 200 even when a model's backend had crashed (e.g., Python backend stub process died) because the lifecycle still showed READY.

Changes:

  • Added runtime backend readiness check via ModelIsReady() for models in READY lifecycle state within the strict readiness evaluation
  • Enhanced logging to report when models fail runtime readiness checks despite being in READY lifecycle state

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@itsnothuy itsnothuy changed the title Check runtime backend readiness in IsReady() for strict mode fix: /v2/health/ready returns 200 when Python backend stub is dead (#8604) Feb 24, 2026
@whoisj whoisj requested a review from yinggeh February 24, 2026 23:29
@whoisj
Copy link
Contributor

whoisj commented Feb 24, 2026

LGTM, @yinggeh please take a look at this as well. Thanks.

@whoisj whoisj added the PR: fix A bug fix label Feb 24, 2026
@yinggeh
Copy link
Contributor

yinggeh commented Feb 24, 2026

I thought we already fixed this bug in triton-inference-server/python_backend#423. @pskiran1 Can you clarify?

@yinggeh
Copy link
Contributor

yinggeh commented Feb 24, 2026

Please fill out the template pull_request_template_external_contrib.md

Co-authored-by: J Wyman <jeremy.wyman@outlook.com>
@itsnothuy
Copy link
Contributor Author

itsnothuy commented Feb 25, 2026

Please fill out the template pull_request_template_external_contrib.md

OK, I updated my PR description based on pull_request_template_external_contrib.md template.

@whoisj
Copy link
Contributor

whoisj commented Feb 25, 2026

@itsnothuy, I've not seen your signed Contributor License Agreement yet. Have you submitted it or are you covered as part of your employer or university?

@itsnothuy
Copy link
Contributor Author

itsnothuy commented Feb 26, 2026

@itsnothuy, I've not seen your signed Contributor License Agreement yet. Have you submitted it or are you covered as part of your employer or university?

Mb, I forgot. Just signed it and submit them via email.

@yinggeh yinggeh requested a review from pskiran1 March 2, 2026 09:41
@whoisj
Copy link
Contributor

whoisj commented Mar 3, 2026

Mb, I forgot. Just signed it and submit them via email.

You were approved today. Just waiting on @pskiran1 to review the change per @yinggeh's request.

@itsnothuy
Copy link
Contributor Author

Mb, I forgot. Just signed it and submit them via email.

You were approved today. Just waiting on @pskiran1 to review the change per @yinggeh's request.

Noted. Please keep me updated.

@pskiran1
Copy link
Member

pskiran1 commented Mar 4, 2026

I thought we already fixed this bug in triton-inference-server/python_backend#423. @pskiran1 Can you clarify?

@yinggeh, this change targets the server-level health endpoint to report accurate status on /v2/health/ready, whereas my earlier changes only applied to /v2/models/{model}/ready.

Copy link
Contributor

@yinggeh yinggeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution

@whoisj whoisj merged commit 53fc26e into triton-inference-server:main Mar 17, 2026
1 check passed
@whoisj
Copy link
Contributor

whoisj commented Mar 17, 2026

Merged. Thank you very much for your contribution. 🎉

@itsnothuy
Copy link
Contributor Author

itsnothuy commented Mar 18, 2026

Merged. Thank you very much for your contribution. 🎉

Gladly!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: fix A bug fix

Development

Successfully merging this pull request may close these issues.

Python Backend fails to restart on unhealthy state but health API remains 200

5 participants