fix(trainer): validate polling_interval strictly less than timeout (#400) by debojit11070 · Pull Request #417 · kubeflow/sdk

debojit11070 · 2026-03-23T18:10:39Z

Fixes #400

Changes

Changed polling_interval > timeout to >= across all three backends
Kubernetes: Fixed validation to reject equal values
Container: Added completely missing polling_interval validation
LocalProcess: Fixed validation to reject equal values
Container tests: Added test case for polling_interval >= timeout scenario

Testing

✅ Ruff linting: All checks passed
✅ Ruff formatting: All files properly formatted
✅ Tests: New test case added

…ubeflow#400) - Changed polling_interval > timeout to >= across all three backends - Kubernetes: Fixed validation to reject equal values - Container: Added completely missing polling_interval validation - LocalProcess: Fixed validation to reject equal values - Container tests: Added test case for polling_interval >= timeout scenario Fixes: kubeflow#400

google-oss-prow · 2026-03-23T18:10:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-03-23T18:10:49Z

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Slack: Join our #kubeflow-ml-experience and #kubeflow-trainer Slack channels
Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot

Pull request overview

This PR fixes an inconsistency in wait_for_job_status() validation across Trainer backends by enforcing that polling_interval must be strictly less than timeout, preventing “single poll then exit” behavior when the values are equal.

Changes:

Tightened validation from polling_interval > timeout to polling_interval >= timeout in Kubernetes and LocalProcess backends.
Added missing polling_interval >= timeout validation in the Container backend.
Added a Container backend test case for the polling_interval >= timeout error path.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
kubeflow/trainer/backends/localprocess/backend.py	Enforces `polling_interval < timeout` and updates the error message.
kubeflow/trainer/backends/kubernetes/backend.py	Enforces `polling_interval < timeout` and updates the error message.
kubeflow/trainer/backends/container/backend.py	Adds missing `polling_interval < timeout` validation to container polling loop.
kubeflow/trainer/backends/container/backend_test.py	Adds a test case covering `polling_interval >= timeout`.
.gitignore	Ignores a local development notes file.

Copilot · 2026-03-23T18:13:48Z

kubeflow/trainer/backends/localprocess/backend.py

+        if polling_interval >= timeout:
            raise ValueError(
-                f"Polling interval {polling_interval} must be less than timeout: {timeout}"
+                f"polling_interval ({polling_interval}) must be strictly less than "
+                f"timeout ({timeout})"
            )

        for _ in range(round(timeout / polling_interval)):


The new polling_interval >= timeout guard still allows polling_interval <= 0 (and/or timeout <= 0), which will lead to a ZeroDivisionError in round(timeout / polling_interval) or nonsensical waiting behavior; validate both values are positive before doing the division and sleeping.

Copilot · 2026-03-23T18:13:48Z

kubeflow/trainer/backends/kubernetes/backend.py

+        if polling_interval >= timeout:
            raise ValueError(
-                f"Polling interval {polling_interval} must be less than timeout: {timeout}"
+                f"polling_interval ({polling_interval}) must be strictly less than "
+                f"timeout ({timeout})"
            )

        for _ in range(round(timeout / polling_interval)):


This validation should also reject polling_interval <= 0 (and likely timeout <= 0) because the subsequent round(timeout / polling_interval) will raise ZeroDivisionError or behave incorrectly; add an explicit positive-value check before the division/sleep loop.

Copilot · 2026-03-23T18:13:49Z

kubeflow/trainer/backends/container/backend.py

+        if polling_interval >= timeout:
+            raise ValueError(
+                f"polling_interval ({polling_interval}) must be strictly less than "
+                f"timeout ({timeout})"
+            )
+
        end = time.time() + timeout


Consider validating polling_interval > 0 (and timeout > 0) here as well: polling_interval=0 currently causes a tight loop (no sleep) until timeout, which can hammer the API and CPU; raising ValueError for non-positive values avoids this.

Copilot · 2026-03-23T18:13:49Z

kubeflow/trainer/backends/container/backend_test.py

+        elif test_case.name == "polling interval >= timeout error":
+            container_backend.wait_for_job_status(
+                job_name,
+                timeout=test_case.config["timeout"],
+                polling_interval=test_case.config["polling_interval"],
+            )


In this test (and the other error branches), if wait_for_job_status unexpectedly does not raise, the test will still pass; use pytest.raises(...) (or an explicit pytest.fail after the call) to ensure the expected exception is actually raised.

…timeout - Validates polling_interval > 0 and timeout > 0 in LocalProcess backend - Validates polling_interval > 0 and timeout > 0 in Kubernetes backend - Validates polling_interval > 0 and timeout > 0 in Container backend - Uses pytest.raises() in container backend tests to properly assert exceptions This prevents ZeroDivisionError and tight loops when non-positive values are provided.

Copilot AI review requested due to automatic review settings March 23, 2026 18:10

google-oss-prow bot requested review from astefanutti, kramaranya and szaher March 23, 2026 18:10

google-oss-prow bot added the size/M label Mar 23, 2026

Copilot started reviewing on behalf of debojit11070 March 23, 2026 18:11 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(trainer): validate polling_interval strictly less than timeout (#400)#417

fix(trainer): validate polling_interval strictly less than timeout (#400)#417
debojit11070 wants to merge 2 commits intokubeflow:mainfrom
debojit11070:main

debojit11070 commented Mar 23, 2026

Uh oh!

google-oss-prow bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

debojit11070 commented Mar 23, 2026

Changes

Testing

Uh oh!

google-oss-prow bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants