fix(trainer): validate polling_interval is strictly less than timeout#402
fix(trainer): validate polling_interval is strictly less than timeout#402prabindersinghh wants to merge 1 commit intokubeflow:mainfrom
Conversation
In wait_for_job_status(), the guard used strict greater-than (>), allowing polling_interval == timeout to pass validation. This results in round(timeout/polling_interval) == 1, meaning the job is polled exactly once with no retry window — silently wrong behavior that contradicts the documented constraint 'must be less than timeout'. Fix: change > to >= in the validation guard. Fixes kubeflow#400 Signed-off-by: Prabinder Singh <prabindersinghh@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR fixes a validation edge case in the Kubernetes runtime backend so wait_for_job_status() enforces the documented constraint that polling_interval must be strictly less than timeout, and adds a regression test to cover the equality case.
Changes:
- Update
KubernetesBackend.wait_for_job_status()validation to rejectpolling_interval >= timeout. - Add a new unit test asserting
polling_interval == timeoutraisesValueError.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
kubeflow/trainer/backends/kubernetes/backend.py |
Tightens polling_interval vs timeout validation to match the documented contract. |
kubeflow/trainer/backends/kubernetes/backend_test.py |
Adds a regression test for the polling_interval == timeout invalid-input case. |
Summary
Fixes a silent validation bug in
KubernetesBackend.wait_for_job_status()where
polling_interval == timeoutpassed the guard but produced only1 poll iteration with no retry window.
The Bug
The error message says "must be less than timeout" but the code
allowed equal values — a direct contradiction.
The Fix
One character change. No logic impact for valid inputs.
Validation
make verify → PASSED
backend_test.py → 48/48 PASSED (1 new test case added)
New test case:
polling_interval equal to timeout raises ValueErrorRelated