Add re-submission of tasks during spot interruption disconnects#516
Add re-submission of tasks during spot interruption disconnects#516blanked wants to merge 8 commits intojenkinsci:masterfrom
Conversation
|
Can someone help to review this PR to see if its ok? It's actually identical to #485 but I opened a new PR so that it's eligible for hacktoberfest 😅 |
res0nance
left a comment
There was a problem hiding this comment.
Feature looks very interesting, AFAICT this seems to do what it says but it is a hard to test feature.
|
yeah i'll have a think on how to mock a spot interruption event and see if its possible using the aws sdk. if anyone has any idea on how to do so, that'll be super helpful! |
| final boolean isUnexpectedDisconnection = computer.isOffline() && computer.getOfflineCause() | ||
| instanceof OfflineCause.ChannelTermination; |
There was a problem hiding this comment.
Some of the customers complained that OfflineCause.ChannelTermination was not always triggered for spot interruption. You may be able to dig this further here: jenkinsci/ec2-fleet-plugin#121
|
This seems to have been approved in October 2020. Is this going to be merged soon? This would be really helpful for us |
|
Yes, this would be really awesome to add - any plans? |
|
Hello, we're also looking forward to this feature. |
It should be possible to test it now with this new-ish* AWS feature: |
|
We are looking forward to use this feature. When this is expected to be released? |
This PR adds a new feature - re-submission of tasks for agents that are disconnected due to spot interruption event in AWS.
Whenever an agent is disconnected, there are checks to determine if it is an unexpected disconnect and if the disconnection is a spot interruption event. If the answer is yes to both, the tasks that were running on the agent will be re-submitted to the queue.
Motivation
Builds may fail due to spot instances being terminated. This PR can help to reduce the number of build failures for spot interruption events.
Notes
This may or may not prevent build failures. There doesn't seem to be any documentation on how tasks can be resubmitted. This PR is inspired by another Jenkins plugin that has the suggested behaviour implemented - https://github.com/jenkinsci/ec2-fleet-plugin/blob/master/src/main/java/com/amazon/jenkins/ec2fleet/EC2FleetAutoResubmitComputerLauncher.java