Skip to content

Add re-submission of tasks during spot interruption disconnects#516

Open
blanked wants to merge 8 commits intojenkinsci:masterfrom
blanked:master
Open

Add re-submission of tasks during spot interruption disconnects#516
blanked wants to merge 8 commits intojenkinsci:masterfrom
blanked:master

Conversation

@blanked
Copy link
Copy Markdown

@blanked blanked commented Oct 16, 2020

This PR adds a new feature - re-submission of tasks for agents that are disconnected due to spot interruption event in AWS.
Whenever an agent is disconnected, there are checks to determine if it is an unexpected disconnect and if the disconnection is a spot interruption event. If the answer is yes to both, the tasks that were running on the agent will be re-submitted to the queue.

Motivation

Builds may fail due to spot instances being terminated. This PR can help to reduce the number of build failures for spot interruption events.

Notes

This may or may not prevent build failures. There doesn't seem to be any documentation on how tasks can be resubmitted. This PR is inspired by another Jenkins plugin that has the suggested behaviour implemented - https://github.com/jenkinsci/ec2-fleet-plugin/blob/master/src/main/java/com/amazon/jenkins/ec2fleet/EC2FleetAutoResubmitComputerLauncher.java

@blanked
Copy link
Copy Markdown
Author

blanked commented Oct 21, 2020

Can someone help to review this PR to see if its ok? It's actually identical to #485 but I opened a new PR so that it's eligible for hacktoberfest 😅

Copy link
Copy Markdown
Contributor

@res0nance res0nance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature looks very interesting, AFAICT this seems to do what it says but it is a hard to test feature.

@blanked
Copy link
Copy Markdown
Author

blanked commented Oct 22, 2020

yeah i'll have a think on how to mock a spot interruption event and see if its possible using the aws sdk. if anyone has any idea on how to do so, that'll be super helpful!

Comment on lines +109 to +110
final boolean isUnexpectedDisconnection = computer.isOffline() && computer.getOfflineCause()
instanceof OfflineCause.ChannelTermination;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the customers complained that OfflineCause.ChannelTermination was not always triggered for spot interruption. You may be able to dig this further here: jenkinsci/ec2-fleet-plugin#121

@dgiffordaudio
Copy link
Copy Markdown

This seems to have been approved in October 2020. Is this going to be merged soon? This would be really helpful for us

@opajonk
Copy link
Copy Markdown

opajonk commented May 15, 2023

Yes, this would be really awesome to add - any plans?

@minhnnhat-urbanise
Copy link
Copy Markdown

Hello, we're also looking forward to this feature.

@schottsfired
Copy link
Copy Markdown
Contributor

schottsfired commented Sep 29, 2023

AFAICT this seems to do what it says but it is a hard to test feature.

i'll have a think on how to mock a spot interruption event and see if its possible using the aws sdk. if anyone has any idea on how to do so, that'll be super helpful!

It should be possible to test it now with this new-ish* AWS feature:
AWS Fault Injection Simulator now injects Spot Instance Interruptions

@DhruvJ225
Copy link
Copy Markdown

DhruvJ225 commented Jul 19, 2024

We are looking forward to use this feature. When this is expected to be released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants