Skip to content

Fix Running condition being re-emitted when pod and job informers are out-of-sync#787

Open
GonzaloSaez wants to merge 1 commit intokubeflow:masterfrom
GonzaloSaez:gonzalo/fix_running-condition-on-completion
Open

Fix Running condition being re-emitted when pod and job informers are out-of-sync#787
GonzaloSaez wants to merge 1 commit intokubeflow:masterfrom
GonzaloSaez:gonzalo/fix_running-condition-on-completion

Conversation

@GonzaloSaez
Copy link
Copy Markdown
Contributor

When the pod and job informers are out-of-sync, it's possible for the launcher job to be finished but the pod be running. In that case, the MPIJob may be considered as completed (when using runLauncherAsWorker and workers having finished). In this scenario, the running condition may be re-emitted with a last transition time after the MPIJob was deemed completed. This results in other controllers watching MPIJob to not be able to evaluate the start and end times using the last transition time of the running condition.

To fix this, we can avoid re-emitting the Running condition. Moreover, we can also ensure that the Running condition is always emitted and that the last transition time is <= the completion time.

Another solution to this would be to re-queue if we see the job and pod informers are out-of-sync but I'm not sure if the latter would be harder to implement.

… out-of-sync

Signed-off-by: Gonzalo Saez <11050889+GonzaloSaez@users.noreply.github.com>
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rongou for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@GonzaloSaez
Copy link
Copy Markdown
Contributor Author

cc: @tenzen-y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant