Skip to content

fix(trainer): validate polling_interval is strictly less than timeout#402

Open
prabindersinghh wants to merge 1 commit intokubeflow:mainfrom
prabindersinghh:fix/polling-interval-validation
Open

fix(trainer): validate polling_interval is strictly less than timeout#402
prabindersinghh wants to merge 1 commit intokubeflow:mainfrom
prabindersinghh:fix/polling-interval-validation

Conversation

@prabindersinghh
Copy link
Copy Markdown

Summary

Fixes a silent validation bug in KubernetesBackend.wait_for_job_status()
where polling_interval == timeout passed the guard but produced only
1 poll iteration with no retry window.

The Bug

# Before — allows equal values silently
if polling_interval > timeout:
    raise ValueError(...)

# When equal: round(10/10) == 1 → only 1 poll, then TimeoutError

The error message says "must be less than timeout" but the code
allowed equal values — a direct contradiction.

The Fix

# After — matches documented constraint exactly
if polling_interval >= timeout:
    raise ValueError(...)

One character change. No logic impact for valid inputs.

Validation

make verify → PASSED
backend_test.py → 48/48 PASSED (1 new test case added)

New test case: polling_interval equal to timeout raises ValueError

Related

In wait_for_job_status(), the guard used strict greater-than (>),
allowing polling_interval == timeout to pass validation. This results
in round(timeout/polling_interval) == 1, meaning the job is polled
exactly once with no retry window — silently wrong behavior that
contradicts the documented constraint 'must be less than timeout'.

Fix: change > to >= in the validation guard.

Fixes kubeflow#400

Signed-off-by: Prabinder Singh <prabindersinghh@gmail.com>
Copilot AI review requested due to automatic review settings March 19, 2026 09:41
@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a validation edge case in the Kubernetes runtime backend so wait_for_job_status() enforces the documented constraint that polling_interval must be strictly less than timeout, and adds a regression test to cover the equality case.

Changes:

  • Update KubernetesBackend.wait_for_job_status() validation to reject polling_interval >= timeout.
  • Add a new unit test asserting polling_interval == timeout raises ValueError.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
kubeflow/trainer/backends/kubernetes/backend.py Tightens polling_interval vs timeout validation to match the documented contract.
kubeflow/trainer/backends/kubernetes/backend_test.py Adds a regression test for the polling_interval == timeout invalid-input case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(trainer): wait_for_job_status silently allows polling_interval == timeout

2 participants